collections.rst 37 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11

:mod:`collections` --- High-performance container datatypes
===========================================================

.. module:: collections
   :synopsis: High-performance datatypes
.. moduleauthor:: Raymond Hettinger <python@rcn.com>
.. sectionauthor:: Raymond Hettinger <python@rcn.com>

.. versionadded:: 2.4

12 13 14 15 16 17
.. testsetup:: *

   from collections import *
   import itertools
   __name__ = '<doctest>'

18
This module implements high-performance container datatypes.  Currently,
19
there are four datatypes, :class:`Counter`, :class:`deque`, :class:`OrderedDict` and
20 21 22 23 24 25 26 27
:class:`defaultdict`, and one datatype factory function, :func:`namedtuple`.

The specialized containers provided in this module provide alternatives
to Python's general purpose built-in containers, :class:`dict`,
:class:`list`, :class:`set`, and :class:`tuple`.

.. versionchanged:: 2.4
   Added :class:`deque`.
28 29 30 31 32

.. versionchanged:: 2.5
   Added :class:`defaultdict`.

.. versionchanged:: 2.6
33
   Added :func:`namedtuple` and added abstract base classes.
34

35
.. versionchanged:: 2.7
36
   Added :class:`Counter` and :class:`OrderedDict`.
Raymond Hettinger's avatar
Raymond Hettinger committed
37 38

In addition to containers, the collections module provides some ABCs
39
(abstract base classes) that can be used to test whether a class
40
provides a particular interface, for example, whether it is hashable or
41
a mapping.
Raymond Hettinger's avatar
Raymond Hettinger committed
42 43 44 45 46 47 48


ABCs - abstract base classes
----------------------------

The collections module offers the following ABCs:

Georg Brandl's avatar
Georg Brandl committed
49 50 51 52 53 54
=========================  =====================  ======================  ====================================================
ABC                        Inherits               Abstract Methods        Mixin Methods
=========================  =====================  ======================  ====================================================
:class:`Container`                                ``__contains__``
:class:`Hashable`                                 ``__hash__``
:class:`Iterable`                                 ``__iter__``
Ezio Melotti's avatar
Ezio Melotti committed
55
:class:`Iterator`          :class:`Iterable`      ``next``                ``__iter__``
56
:class:`Sized`                                    ``__len__``
Georg Brandl's avatar
Georg Brandl committed
57
:class:`Callable`                                 ``__call__``
58

Georg Brandl's avatar
Georg Brandl committed
59
:class:`Sequence`          :class:`Sized`,        ``__getitem__``         ``__contains__``. ``__iter__``, ``__reversed__``.
60
                           :class:`Iterable`,                             ``index``, and ``count``
61 62
                           :class:`Container`

63
:class:`MutableSequence`   :class:`Sequence`      ``__setitem__``         Inherited Sequence methods and
Georg Brandl's avatar
Georg Brandl committed
64
                                                  ``__delitem__``,        ``append``, ``reverse``, ``extend``, ``pop``,
65
                                                  and ``insert``          ``remove``, and ``__iadd__``
66

67 68 69
:class:`Set`               :class:`Sized`,                                ``__le__``, ``__lt__``, ``__eq__``, ``__ne__``,
                           :class:`Iterable`,                             ``__gt__``, ``__ge__``, ``__and__``, ``__or__``
                           :class:`Container`                             ``__sub__``, ``__xor__``, and ``isdisjoint``
70

Georg Brandl's avatar
Georg Brandl committed
71 72 73
:class:`MutableSet`        :class:`Set`           ``add`` and             Inherited Set methods and
                                                  ``discard``             ``clear``, ``pop``, ``remove``, ``__ior__``,
                                                                          ``__iand__``, ``__ixor__``, and ``__isub__``
74

75 76 77 78 79 80 81
:class:`Mapping`           :class:`Sized`,        ``__getitem__``         ``__contains__``, ``keys``, ``items``, ``values``,
                           :class:`Iterable`,                             ``get``, ``__eq__``, and ``__ne__``
                           :class:`Container`

:class:`MutableMapping`    :class:`Mapping`       ``__setitem__`` and     Inherited Mapping methods and
                                                  ``__delitem__``         ``pop``, ``popitem``, ``clear``, ``update``,
                                                                          and ``setdefault``
82 83


Georg Brandl's avatar
Georg Brandl committed
84 85 86 87 88 89 90
:class:`MappingView`       :class:`Sized`                                 ``__len__``
:class:`KeysView`          :class:`MappingView`,                          ``__contains__``,
                           :class:`Set`                                   ``__iter__``
:class:`ItemsView`         :class:`MappingView`,                          ``__contains__``,
                           :class:`Set`                                   ``__iter__``
:class:`ValuesView`        :class:`MappingView`                           ``__contains__``, ``__iter__``
=========================  =====================  ======================  ====================================================
Raymond Hettinger's avatar
Raymond Hettinger committed
91 92 93 94 95 96

These ABCs allow us to ask classes or instances if they provide
particular functionality, for example::

    size = None
    if isinstance(myvar, collections.Sized):
97
        size = len(myvar)
Raymond Hettinger's avatar
Raymond Hettinger committed
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126

Several of the ABCs are also useful as mixins that make it easier to develop
classes supporting container APIs.  For example, to write a class supporting
the full :class:`Set` API, it only necessary to supply the three underlying
abstract methods: :meth:`__contains__`, :meth:`__iter__`, and :meth:`__len__`.
The ABC supplies the remaining methods such as :meth:`__and__` and
:meth:`isdisjoint` ::

    class ListBasedSet(collections.Set):
         ''' Alternate set implementation favoring space over speed
             and not requiring the set elements to be hashable. '''
         def __init__(self, iterable):
             self.elements = lst = []
             for value in iterable:
                 if value not in lst:
                     lst.append(value)
         def __iter__(self):
             return iter(self.elements)
         def __contains__(self, value):
             return value in self.elements
         def __len__(self):
             return len(self.elements)

    s1 = ListBasedSet('abcdef')
    s2 = ListBasedSet('defghi')
    overlap = s1 & s2            # The __and__() method is supported automatically

Notes on using :class:`Set` and :class:`MutableSet` as a mixin:

127
(1)
Raymond Hettinger's avatar
Raymond Hettinger committed
128
   Since some set operations create new sets, the default mixin methods need
129 130
   a way to create new instances from an iterable. The class constructor is
   assumed to have a signature in the form ``ClassName(iterable)``.
Raymond Hettinger's avatar
Raymond Hettinger committed
131
   That assumption is factored-out to an internal classmethod called
Raymond Hettinger's avatar
Raymond Hettinger committed
132 133
   :meth:`_from_iterable` which calls ``cls(iterable)`` to produce a new set.
   If the :class:`Set` mixin is being used in a class with a different
134 135
   constructor signature, you will need to override :meth:`from_iterable`
   with a classmethod that can construct new instances from
Raymond Hettinger's avatar
Raymond Hettinger committed
136 137 138 139 140 141 142 143 144 145 146 147 148 149
   an iterable argument.

(2)
   To override the comparisons (presumably for speed, as the
   semantics are fixed), redefine :meth:`__le__` and
   then the other operations will automatically follow suit.

(3)
   The :class:`Set` mixin provides a :meth:`_hash` method to compute a hash value
   for the set; however, :meth:`__hash__` is not defined because not all sets
   are hashable or immutable.  To add set hashabilty using mixins,
   inherit from both :meth:`Set` and :meth:`Hashable`, then define
   ``__hash__ = Set._hash``.

150 151 152 153 154 155
.. seealso::

   * `OrderedSet recipe <http://code.activestate.com/recipes/576694/>`_ for an
     example built on :class:`MutableSet`.

   * For more about ABCs, see the :mod:`abc` module and :pep:`3119`.
Raymond Hettinger's avatar
Raymond Hettinger committed
156 157


158 159 160 161 162 163
:class:`Counter` objects
------------------------

A counter tool is provided to support convenient and rapid tallies.
For example::

164
    >>> # Tally occurrences of words in a list
165
    >>> cnt = Counter()
166
    >>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
167 168
    ...     cnt[word] += 1
    >>> cnt
169
    Counter({'blue': 3, 'red': 2, 'green': 1})
170

171
    >>> # Find the ten most common words in Hamlet
172 173
    >>> import re
    >>> words = re.findall('\w+', open('hamlet.txt').read().lower())
Raymond Hettinger's avatar
Raymond Hettinger committed
174
    >>> Counter(words).most_common(10)
175 176 177
    [('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
     ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]

178
.. class:: Counter([iterable-or-mapping])
179

180
   A :class:`Counter` is a :class:`dict` subclass for counting hashable objects.
181 182 183 184
   It is an unordered collection where elements are stored as dictionary keys
   and their counts are stored as dictionary values.  Counts are allowed to be
   any integer value including zero or negative counts.  The :class:`Counter`
   class is similar to bags or multisets in other languages.
185

186
   Elements are counted from an *iterable* or initialized from another
187
   *mapping* (or counter):
188

189 190 191 192
        >>> c = Counter()                           # a new, empty counter
        >>> c = Counter('gallahad')                 # a new counter from an iterable
        >>> c = Counter({'red': 4, 'blue': 2})      # a new counter from a mapping
        >>> c = Counter(cats=4, dogs=8)             # a new counter from keyword args
193

194
   Counter objects have a dictionary interface except that they return a zero
195
   count for missing items instead of raising a :exc:`KeyError`:
196

Raymond Hettinger's avatar
Raymond Hettinger committed
197
        >>> c = Counter(['eggs', 'ham'])
Raymond Hettinger's avatar
Raymond Hettinger committed
198
        >>> c['bacon']                              # count of a missing element is zero
199 200
        0

Raymond Hettinger's avatar
Raymond Hettinger committed
201 202
   Setting a count to zero does not remove an element from a counter.
   Use ``del`` to remove it entirely:
203

Raymond Hettinger's avatar
Raymond Hettinger committed
204 205
        >>> c['sausage'] = 0                        # counter entry with a zero count
        >>> del c['sausage']                        # del actually removes the entry
206 207 208 209

   .. versionadded:: 2.7


210
   Counter objects support three methods beyond those available for all
211 212 213 214
   dictionaries:

   .. method:: elements()

215 216 217
      Return an iterator over elements repeating each as many times as its
      count.  Elements are returned in arbitrary order.  If an element's count
      is less than one, :meth:`elements` will ignore it.
218

Raymond Hettinger's avatar
Raymond Hettinger committed
219
            >>> c = Counter(a=4, b=2, c=0, d=-2)
220 221 222 223 224
            >>> list(c.elements())
            ['a', 'a', 'a', 'a', 'b', 'b']

   .. method:: most_common([n])

225
      Return a list of the *n* most common elements and their counts from the
Raymond Hettinger's avatar
Raymond Hettinger committed
226
      most common to the least.  If *n* is not specified, :func:`most_common`
227
      returns *all* elements in the counter.  Elements with equal counts are
228
      ordered arbitrarily:
229 230 231 232

            >>> Counter('abracadabra').most_common(3)
            [('a', 5), ('r', 2), ('b', 2)]

233 234 235 236 237 238 239 240 241 242 243
   .. method:: subtract([iterable-or-mapping])

      Elements are subtracted from an *iterable* or from another *mapping*
      (or counter).  Like :meth:`dict.update` but subtracts counts instead
      of replacing them.  Both inputs and outputs may be zero or negative.

            >>> c = Counter(a=4, b=2, c=0, d=-2)
            >>> d = Counter(a=1, b=2, c=3, d=4)
            >>> c.subtract(d)
            Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

244 245
   The usual dictionary methods are available for :class:`Counter` objects
   except for two which work differently for counters.
246 247 248

   .. method:: fromkeys(iterable)

249
      This class method is not implemented for :class:`Counter` objects.
250

251
   .. method:: update([iterable-or-mapping])
252

253 254 255 256
      Elements are counted from an *iterable* or added-in from another
      *mapping* (or counter).  Like :meth:`dict.update` but adds counts
      instead of replacing them.  Also, the *iterable* is expected to be a
      sequence of elements, not a sequence of ``(key, value)`` pairs.
257

258
Common patterns for working with :class:`Counter` objects::
259

260 261 262 263 264 265 266 267 268
    sum(c.values())                 # total of all counts
    c.clear()                       # reset all counts
    list(c)                         # list unique elements
    set(c)                          # convert to a set
    dict(c)                         # convert to a regular dictionary
    c.items()                       # convert to a list of (elem, cnt) pairs
    Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
    c.most_common()[:-n:-1]         # n least common elements
    c += Counter()                  # remove zero and negative counts
269

270 271 272 273 274 275
Several mathematical operations are provided for combining :class:`Counter`
objects to produce multisets (counters that have counts greater than zero).
Addition and subtraction combine counters by adding or subtracting the counts
of corresponding elements.  Intersection and union return the minimum and
maximum of corresponding counts.  Each operation can accept inputs with signed
counts, but the output will exclude results with counts of zero or less.
276

277 278
    >>> c = Counter(a=3, b=1)
    >>> d = Counter(a=1, b=2)
279
    >>> c + d                       # add two counters together:  c[x] + d[x]
280
    Counter({'a': 4, 'b': 3})
281
    >>> c - d                       # subtract (keeping only positive counts)
282
    Counter({'a': 2})
283
    >>> c & d                       # intersection:  min(c[x], d[x])
284
    Counter({'a': 1, 'b': 1})
285
    >>> c | d                       # union:  max(c[x], d[x])
286 287
    Counter({'a': 3, 'b': 2})

288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314
.. note::

   Counters were primarily designed to work with positive integers to represent
   running counts; however, care was taken to not unnecessarily preclude use
   cases needing other types or negative values.  To help with those use cases,
   this section documents the minimum range and type restrictions.

   * The :class:`Counter` class itself is a dictionary subclass with no
     restrictions on its keys and values.  The values are intended to be numbers
     representing counts, but you *could* store anything in the value field.

   * The :meth:`most_common` method requires only that the values be orderable.

   * For in-place operations such as ``c[key] += 1``, the value type need only
     support addition and subtraction.  So fractions, floats, and decimals would
     work and negative values are supported.  The same is also true for
     :meth:`update` and :meth:`subtract` which allow negative and zero values
     for both inputs and outputs.

   * The multiset methods are designed only for use cases with positive values.
     The inputs may be negative or zero, but only outputs with positive values
     are created.  There are no type restrictions, but the value type needs to
     support support addition, subtraction, and comparison.

   * The :meth:`elements` method requires integer counts.  It ignores zero and
     negative counts.

315
.. seealso::
316

Raymond Hettinger's avatar
Raymond Hettinger committed
317 318 319 320
    * `Counter class <http://code.activestate.com/recipes/576611/>`_
      adapted for Python 2.5 and an early `Bag recipe
      <http://code.activestate.com/recipes/259174/>`_ for Python 2.4.

321 322
    * `Bag class <http://www.gnu.org/software/smalltalk/manual-base/html_node/Bag.html>`_
      in Smalltalk.
323

324
    * Wikipedia entry for `Multisets <http://en.wikipedia.org/wiki/Multiset>`_\.
325

326
    * `C++ multisets <http://www.demo2s.com/Tutorial/Cpp/0380__set-multiset/Catalog0380__set-multiset.htm>`_
Raymond Hettinger's avatar
Raymond Hettinger committed
327
      tutorial with examples.
328

Raymond Hettinger's avatar
Raymond Hettinger committed
329
    * For mathematical operations on multisets and their use cases, see
330 331 332
      *Knuth, Donald. The Art of Computer Programming Volume II,
      Section 4.6.3, Exercise 19*\.

333
    * To enumerate all distinct multisets of a given size over a given set of
334
      elements, see :func:`itertools.combinations_with_replacement`.
335

Raymond Hettinger's avatar
Raymond Hettinger committed
336
          map(Counter, combinations_with_replacement('ABC', 2)) --> AA AB AC BB BC CC
337

338

339 340 341
:class:`deque` objects
----------------------

342
.. class:: deque([iterable[, maxlen]])
343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358

   Returns a new deque object initialized left-to-right (using :meth:`append`) with
   data from *iterable*.  If *iterable* is not specified, the new deque is empty.

   Deques are a generalization of stacks and queues (the name is pronounced "deck"
   and is short for "double-ended queue").  Deques support thread-safe, memory
   efficient appends and pops from either side of the deque with approximately the
   same O(1) performance in either direction.

   Though :class:`list` objects support similar operations, they are optimized for
   fast fixed-length operations and incur O(n) memory movement costs for
   ``pop(0)`` and ``insert(0, v)`` operations which change both the size and
   position of the underlying data representation.

   .. versionadded:: 2.4

359
   If *maxlen* is not specified or is *None*, deques may grow to an
360 361 362 363 364 365 366 367
   arbitrary length.  Otherwise, the deque is bounded to the specified maximum
   length.  Once a bounded length deque is full, when new items are added, a
   corresponding number of items are discarded from the opposite end.  Bounded
   length deques provide functionality similar to the ``tail`` filter in
   Unix. They are also useful for tracking transactions and other pools of data
   where only the most recent activity is of interest.

   .. versionchanged:: 2.6
Georg Brandl's avatar
Georg Brandl committed
368
      Added *maxlen* parameter.
369

370
   Deque objects support the following methods:
371 372


373
   .. method:: append(x)
374

375
      Add *x* to the right side of the deque.
376 377


378
   .. method:: appendleft(x)
379

380
      Add *x* to the left side of the deque.
381 382


383
   .. method:: clear()
384

385
      Remove all elements from the deque leaving it with length 0.
386 387


388 389 390 391 392 393
   .. method:: count(x)

      Count the number of deque elements equal to *x*.

      .. versionadded:: 2.7

394
   .. method:: extend(iterable)
395

396 397
      Extend the right side of the deque by appending elements from the iterable
      argument.
398 399


400
   .. method:: extendleft(iterable)
401

402 403 404
      Extend the left side of the deque by appending elements from *iterable*.
      Note, the series of left appends results in reversing the order of
      elements in the iterable argument.
405 406


407
   .. method:: pop()
408

409 410
      Remove and return an element from the right side of the deque. If no
      elements are present, raises an :exc:`IndexError`.
411 412


413
   .. method:: popleft()
414

415 416
      Remove and return an element from the left side of the deque. If no
      elements are present, raises an :exc:`IndexError`.
417 418


419
   .. method:: remove(value)
420

421 422 423 424
      Removed the first occurrence of *value*.  If not found, raises a
      :exc:`ValueError`.

      .. versionadded:: 2.5
425

426 427 428 429 430
   .. method:: reverse()

      Reverse the elements of the deque in-place and then return ``None``.

      .. versionadded:: 2.7
431

432
   .. method:: rotate(n)
433

434 435 436
      Rotate the deque *n* steps to the right.  If *n* is negative, rotate to
      the left.  Rotating one step to the right is equivalent to:
      ``d.appendleft(d.pop())``.
437 438


439 440 441 442 443 444 445 446 447
   Deque objects also provide one read-only attribute:

   .. attribute:: maxlen

      Maximum size of a deque or *None* if unbounded.

      .. versionadded:: 2.7


448 449
In addition to the above, deques support iteration, pickling, ``len(d)``,
``reversed(d)``, ``copy.copy(d)``, ``copy.deepcopy(d)``, membership testing with
450 451 452
the :keyword:`in` operator, and subscript references such as ``d[-1]``.  Indexed
access is O(1) at both ends but slows to O(n) in the middle.  For fast random
access, use lists instead.
453

454 455 456
Example:

.. doctest::
457 458 459 460

   >>> from collections import deque
   >>> d = deque('ghi')                 # make a new deque with three items
   >>> for elem in d:                   # iterate over the deque's elements
461
   ...     print elem.upper()
462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509
   G
   H
   I

   >>> d.append('j')                    # add a new entry to the right side
   >>> d.appendleft('f')                # add a new entry to the left side
   >>> d                                # show the representation of the deque
   deque(['f', 'g', 'h', 'i', 'j'])

   >>> d.pop()                          # return and remove the rightmost item
   'j'
   >>> d.popleft()                      # return and remove the leftmost item
   'f'
   >>> list(d)                          # list the contents of the deque
   ['g', 'h', 'i']
   >>> d[0]                             # peek at leftmost item
   'g'
   >>> d[-1]                            # peek at rightmost item
   'i'

   >>> list(reversed(d))                # list the contents of a deque in reverse
   ['i', 'h', 'g']
   >>> 'h' in d                         # search the deque
   True
   >>> d.extend('jkl')                  # add multiple elements at once
   >>> d
   deque(['g', 'h', 'i', 'j', 'k', 'l'])
   >>> d.rotate(1)                      # right rotation
   >>> d
   deque(['l', 'g', 'h', 'i', 'j', 'k'])
   >>> d.rotate(-1)                     # left rotation
   >>> d
   deque(['g', 'h', 'i', 'j', 'k', 'l'])

   >>> deque(reversed(d))               # make a new deque in reverse order
   deque(['l', 'k', 'j', 'i', 'h', 'g'])
   >>> d.clear()                        # empty the deque
   >>> d.pop()                          # cannot pop from an empty deque
   Traceback (most recent call last):
     File "<pyshell#6>", line 1, in -toplevel-
       d.pop()
   IndexError: pop from an empty deque

   >>> d.extendleft('abc')              # extendleft() reverses the input order
   >>> d
   deque(['c', 'b', 'a'])


510 511
:class:`deque` Recipes
^^^^^^^^^^^^^^^^^^^^^^
512 513 514

This section shows various approaches to working with deques.

515 516 517 518 519 520 521 522 523 524 525 526 527 528
Bounded length deques provide functionality similar to the ``tail`` filter
in Unix::

   def tail(filename, n=10):
       'Return the last n lines of a file'
       return deque(open(filename), n)

Another approach to using deques is to maintain a sequence of recently
added elements by appending to the right and popping to the left::

    def moving_average(iterable, n=3):
        # moving_average([40, 30, 50, 46, 39, 44]) --> 40.0 42.0 45.0 43.0
        # http://en.wikipedia.org/wiki/Moving_average
        it = iter(iterable)
529 530
        d = deque(itertools.islice(it, n-1))
        d.appendleft(0)
531 532 533 534
        s = sum(d)
        for elem in it:
            s += elem - d.popleft()
            d.append(elem)
535
            yield s / float(n)
536

537
The :meth:`rotate` method provides a way to implement :class:`deque` slicing and
538
deletion.  For example, a pure Python implementation of ``del d[n]`` relies on
539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560
the :meth:`rotate` method to position elements to be popped::

   def delete_nth(d, n):
       d.rotate(-n)
       d.popleft()
       d.rotate(n)

To implement :class:`deque` slicing, use a similar approach applying
:meth:`rotate` to bring a target element to the left side of the deque. Remove
old entries with :meth:`popleft`, add new entries with :meth:`extend`, and then
reverse the rotation.
With minor variations on that approach, it is easy to implement Forth style
stack manipulations such as ``dup``, ``drop``, ``swap``, ``over``, ``pick``,
``rot``, and ``roll``.


:class:`defaultdict` objects
----------------------------

.. class:: defaultdict([default_factory[, ...]])

   Returns a new dictionary-like object.  :class:`defaultdict` is a subclass of the
Georg Brandl's avatar
Georg Brandl committed
561
   built-in :class:`dict` class.  It overrides one method and adds one writable
562 563 564 565 566 567 568 569 570 571
   instance variable.  The remaining functionality is the same as for the
   :class:`dict` class and is not documented here.

   The first argument provides the initial value for the :attr:`default_factory`
   attribute; it defaults to ``None``. All remaining arguments are treated the same
   as if they were passed to the :class:`dict` constructor, including keyword
   arguments.

   .. versionadded:: 2.5

572 573 574
   :class:`defaultdict` objects support the following method in addition to the
   standard :class:`dict` operations:

575

576
   .. method:: defaultdict.__missing__(key)
577

Skip Montanaro's avatar
Skip Montanaro committed
578
      If the :attr:`default_factory` attribute is ``None``, this raises a
579
      :exc:`KeyError` exception with the *key* as argument.
580

581 582 583
      If :attr:`default_factory` is not ``None``, it is called without arguments
      to provide a default value for the given *key*, this value is inserted in
      the dictionary for the *key*, and returned.
584

585 586
      If calling :attr:`default_factory` raises an exception this exception is
      propagated unchanged.
587

588 589 590
      This method is called by the :meth:`__getitem__` method of the
      :class:`dict` class when the requested key is not found; whatever it
      returns or raises is then returned or raised by :meth:`__getitem__`.
591 592


593
   :class:`defaultdict` objects support the following instance variable:
594 595


596
   .. attribute:: defaultdict.default_factory
597

598 599 600
      This attribute is used by the :meth:`__missing__` method; it is
      initialized from the first argument to the constructor, if present, or to
      ``None``, if absent.
601 602 603 604 605 606


:class:`defaultdict` Examples
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Using :class:`list` as the :attr:`default_factory`, it is easy to group a
607
sequence of key-value pairs into a dictionary of lists:
608 609 610 611 612 613 614 615 616 617 618 619 620 621 622

   >>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
   >>> d = defaultdict(list)
   >>> for k, v in s:
   ...     d[k].append(v)
   ...
   >>> d.items()
   [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

When each key is encountered for the first time, it is not already in the
mapping; so an entry is automatically created using the :attr:`default_factory`
function which returns an empty :class:`list`.  The :meth:`list.append`
operation then attaches the value to the new list.  When keys are encountered
again, the look-up proceeds normally (returning the list for that key) and the
:meth:`list.append` operation adds another value to the list. This technique is
623
simpler and faster than an equivalent technique using :meth:`dict.setdefault`:
624 625 626 627 628 629 630 631 632 633

   >>> d = {}
   >>> for k, v in s:
   ...     d.setdefault(k, []).append(v)
   ...
   >>> d.items()
   [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

Setting the :attr:`default_factory` to :class:`int` makes the
:class:`defaultdict` useful for counting (like a bag or multiset in other
634
languages):
635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650

   >>> s = 'mississippi'
   >>> d = defaultdict(int)
   >>> for k in s:
   ...     d[k] += 1
   ...
   >>> d.items()
   [('i', 4), ('p', 2), ('s', 4), ('m', 1)]

When a letter is first encountered, it is missing from the mapping, so the
:attr:`default_factory` function calls :func:`int` to supply a default count of
zero.  The increment operation then builds up the count for each letter.

The function :func:`int` which always returns zero is just a special case of
constant functions.  A faster and more flexible way to create constant functions
is to use :func:`itertools.repeat` which can supply any constant value (not just
651
zero):
652 653 654 655 656 657 658 659 660

   >>> def constant_factory(value):
   ...     return itertools.repeat(value).next
   >>> d = defaultdict(constant_factory('<missing>'))
   >>> d.update(name='John', action='ran')
   >>> '%(name)s %(action)s to %(object)s' % d
   'John ran to <missing>'

Setting the :attr:`default_factory` to :class:`set` makes the
661
:class:`defaultdict` useful for building a dictionary of sets:
662 663 664 665 666 667 668 669 670 671

   >>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
   >>> d = defaultdict(set)
   >>> for k, v in s:
   ...     d[k].add(v)
   ...
   >>> d.items()
   [('blue', set([2, 4])), ('red', set([1, 3]))]


672
:func:`namedtuple` Factory Function for Tuples with Named Fields
673
----------------------------------------------------------------
674

675 676 677
Named tuples assign meaning to each position in a tuple and allow for more readable,
self-documenting code.  They can be used wherever regular tuples are used, and
they add the ability to access fields by name instead of position index.
678

679
.. function:: namedtuple(typename, field_names, [verbose], [rename])
680 681

   Returns a new tuple subclass named *typename*.  The new subclass is used to
682
   create tuple-like objects that have fields accessible by attribute lookup as
683
   well as being indexable and iterable.  Instances of the subclass also have a
684
   helpful docstring (with typename and field_names) and a helpful :meth:`__repr__`
685 686
   method which lists the tuple contents in a ``name=value`` format.

687 688
   The *field_names* are a single string with each fieldname separated by whitespace
   and/or commas, for example ``'x y'`` or ``'x, y'``.  Alternatively, *field_names*
689
   can be a sequence of strings such as ``['x', 'y']``.
690 691

   Any valid Python identifier may be used for a fieldname except for names
692 693
   starting with an underscore.  Valid identifiers consist of letters, digits,
   and underscores but do not start with a digit or underscore and cannot be
694 695
   a :mod:`keyword` such as *class*, *for*, *return*, *global*, *pass*, *print*,
   or *raise*.
696

697 698
   If *rename* is true, invalid fieldnames are automatically replaced
   with positional names.  For example, ``['abc', 'def', 'ghi', 'abc']`` is
699
   converted to ``['abc', '_1', 'ghi', '_3']``, eliminating the keyword
700 701
   ``def`` and the duplicate fieldname ``abc``.

702
   If *verbose* is true, the class definition is printed just before being built.
703

704
   Named tuple instances do not have per-instance dictionaries, so they are
Raymond Hettinger's avatar
Raymond Hettinger committed
705
   lightweight and require no more memory than regular tuples.
706

707 708
   .. versionadded:: 2.6

709 710 711
   .. versionchanged:: 2.7
      added support for *rename*.

712 713 714 715
Example:

.. doctest::
   :options: +NORMALIZE_WHITESPACE
716

717
   >>> Point = namedtuple('Point', 'x y', verbose=True)
718 719
   class Point(tuple):
           'Point(x, y)'
720
   <BLANKLINE>
721
           __slots__ = ()
722
   <BLANKLINE>
723
           _fields = ('x', 'y')
724
   <BLANKLINE>
725
           def __new__(_cls, x, y):
726
               'Create a new instance of Point(x, y)'
727
               return _tuple.__new__(_cls, (x, y))
728
   <BLANKLINE>
729
           @classmethod
730
           def _make(cls, iterable, new=tuple.__new__, len=len):
731
               'Make a new Point object from a sequence or iterable'
732
               result = new(cls, iterable)
733 734 735
               if len(result) != 2:
                   raise TypeError('Expected 2 arguments, got %d' % len(result))
               return result
736
   <BLANKLINE>
737
           def __repr__(self):
738
               'Return a nicely formatted representation string'
739
               return 'Point(x=%r, y=%r)' % self
740
   <BLANKLINE>
741 742 743
           def _asdict(self):
               'Return a new OrderedDict which maps field names to their values'
               return OrderedDict(zip(self._fields, self))
744
   <BLANKLINE>
745
           def _replace(_self, **kwds):
746
               'Return a new Point object replacing specified fields with new values'
747
               result = _self._make(map(kwds.pop, ('x', 'y'), _self))
748 749 750
               if kwds:
                   raise ValueError('Got unexpected field names: %r' % kwds.keys())
               return result
751 752
   <BLANKLINE>
           def __getnewargs__(self):
753
               'Return self as a plain tuple.   Used by copy and pickle.'
754
               return tuple(self)
755
   <BLANKLINE>
756 757
           x = _property(_itemgetter(0), doc='Alias for field number 0')
           y = _property(_itemgetter(1), doc='Alias for field number 1')
758 759

   >>> p = Point(11, y=22)     # instantiate with positional or keyword arguments
760
   >>> p[0] + p[1]             # indexable like the plain tuple (11, 22)
761 762 763 764
   33
   >>> x, y = p                # unpack like a regular tuple
   >>> x, y
   (11, 22)
765
   >>> p.x + p.y               # fields also accessible by name
766 767 768 769 770 771 772
   33
   >>> p                       # readable __repr__ with a name=value style
   Point(x=11, y=22)

Named tuples are especially useful for assigning field names to result tuples returned
by the :mod:`csv` or :mod:`sqlite3` modules::

773
   EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')
774

775
   import csv
776
   for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))):
777 778
       print emp.name, emp.title

779 780 781 782
   import sqlite3
   conn = sqlite3.connect('/companydata')
   cursor = conn.cursor()
   cursor.execute('SELECT name, age, title, department, paygrade FROM employees')
783
   for emp in map(EmployeeRecord._make, cursor.fetchall()):
784 785
       print emp.name, emp.title

786
In addition to the methods inherited from tuples, named tuples support
787 788
three additional methods and one attribute.  To prevent conflicts with
field names, the method and attribute names start with an underscore.
789

790
.. method:: somenamedtuple._make(iterable)
791

792
   Class method that makes a new instance from an existing sequence or iterable.
793

794
   .. doctest::
795

796 797 798
      >>> t = [11, 22]
      >>> Point._make(t)
      Point(x=11, y=22)
799

800
.. method:: somenamedtuple._asdict()
801

802 803
   Return a new :class:`OrderedDict` which maps field names to their corresponding
   values::
804

805
      >>> p._asdict()
806 807
      OrderedDict([('x', 11), ('y', 22)])

Raymond Hettinger's avatar
Raymond Hettinger committed
808
   .. versionchanged:: 2.7
809
      Returns an :class:`OrderedDict` instead of a regular :class:`dict`.
810

811
.. method:: somenamedtuple._replace(kwargs)
812

813
   Return a new instance of the named tuple replacing specified fields with new
814
   values::
815

816
      >>> p = Point(x=11, y=22)
817
      >>> p._replace(x=33)
818 819
      Point(x=33, y=22)

820
      >>> for partnum, record in inventory.items():
821
      ...     inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now())
822

823
.. attribute:: somenamedtuple._fields
824

Raymond Hettinger's avatar
Raymond Hettinger committed
825
   Tuple of strings listing the field names.  Useful for introspection
826
   and for creating new named tuple types from existing named tuples.
Raymond Hettinger's avatar
Raymond Hettinger committed
827

828
   .. doctest::
829

830
      >>> p._fields            # view the field names
831 832
      ('x', 'y')

833
      >>> Color = namedtuple('Color', 'red green blue')
834
      >>> Pixel = namedtuple('Pixel', Point._fields + Color._fields)
835
      >>> Pixel(11, 22, 128, 255, 0)
Raymond Hettinger's avatar
Raymond Hettinger committed
836
      Pixel(x=11, y=22, red=128, green=255, blue=0)
837

Raymond Hettinger's avatar
Raymond Hettinger committed
838
To retrieve a field whose name is stored in a string, use the :func:`getattr`
839
function:
Raymond Hettinger's avatar
Raymond Hettinger committed
840 841 842 843

    >>> getattr(p, 'x')
    11

844 845
To convert a dictionary to a named tuple, use the double-star-operator
(as described in :ref:`tut-unpacking-arguments`):
846 847 848 849 850

   >>> d = {'x': 11, 'y': 22}
   >>> Point(**d)
   Point(x=11, y=22)

851
Since a named tuple is a regular Python class, it is easy to add or change
852
functionality with a subclass.  Here is how to add a calculated field and
853
a fixed-width print format:
854

855
    >>> class Point(namedtuple('Point', 'x y')):
856
    ...     __slots__ = ()
857 858 859 860
    ...     @property
    ...     def hypot(self):
    ...         return (self.x ** 2 + self.y ** 2) ** 0.5
    ...     def __str__(self):
861
    ...         return 'Point: x=%6.3f  y=%6.3f  hypot=%6.3f' % (self.x, self.y, self.hypot)
862

863
    >>> for p in Point(3, 4), Point(14, 5/7.):
864
    ...     print p
865 866
    Point: x= 3.000  y= 4.000  hypot= 5.000
    Point: x=14.000  y= 0.714  hypot=14.018
867

Georg Brandl's avatar
Georg Brandl committed
868
The subclass shown above sets ``__slots__`` to an empty tuple.  This helps
Raymond Hettinger's avatar
Raymond Hettinger committed
869
keep memory requirements low by preventing the creation of instance dictionaries.
870

871
Subclassing is not useful for adding new, stored fields.  Instead, simply
872
create a new named tuple type from the :attr:`_fields` attribute:
873

874
    >>> Point3D = namedtuple('Point3D', Point._fields + ('z',))
875

Raymond Hettinger's avatar
Raymond Hettinger committed
876
Default values can be implemented by using :meth:`_replace` to
877
customize a prototype instance:
878 879

    >>> Account = namedtuple('Account', 'owner balance transaction_count')
880 881
    >>> default_account = Account('<owner name>', 0.0, 0)
    >>> johns_account = default_account._replace(owner='John')
882

883 884 885 886 887 888 889 890 891
Enumerated constants can be implemented with named tuples, but it is simpler
and more efficient to use a simple class declaration:

    >>> Status = namedtuple('Status', 'open pending closed')._make(range(3))
    >>> Status.open, Status.pending, Status.closed
    (0, 1, 2)
    >>> class Status:
    ...     open, pending, closed = range(3)

892
.. seealso::
893

894 895
   `Named tuple recipe <http://code.activestate.com/recipes/500261/>`_
   adapted for Python 2.4.
896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914


:class:`OrderedDict` objects
----------------------------

Ordered dictionaries are just like regular dictionaries but they remember the
order that items were inserted.  When iterating over an ordered dictionary,
the items are returned in the order their keys were first added.

.. class:: OrderedDict([items])

   Return an instance of a dict subclass, supporting the usual :class:`dict`
   methods.  An *OrderedDict* is a dict that remembers the order that keys
   were first inserted. If a new entry overwrites an existing entry, the
   original insertion position is left unchanged.  Deleting an entry and
   reinserting it will move it to the end.

   .. versionadded:: 2.7

915 916 917 918 919
.. method:: OrderedDict.popitem(last=True)

   The :meth:`popitem` method for ordered dictionaries returns and removes
   a (key, value) pair.  The pairs are returned in LIFO order if *last* is
   true or FIFO order if false.
920

921 922 923
In addition to the usual mapping methods, ordered dictionaries also support
reverse iteration using :func:`reversed`.

924 925 926 927 928 929
Equality tests between :class:`OrderedDict` objects are order-sensitive
and are implemented as ``list(od1.items())==list(od2.items())``.
Equality tests between :class:`OrderedDict` objects and other
:class:`Mapping` objects are order-insensitive like regular dictionaries.
This allows :class:`OrderedDict` objects to be substituted anywhere a
regular dictionary is used.
930

931 932 933 934
The :class:`OrderedDict` constructor and :meth:`update` method both accept
keyword arguments, but their order is lost because Python's function call
semantics pass-in keyword arguments using a regular unordered dictionary.

935 936 937 938
.. seealso::

   `Equivalent OrderedDict recipe <http://code.activestate.com/recipes/576693/>`_
   that runs on Python 2.4 or later.
939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960

Since an ordered dictionary remembers its insertion order, it can be used
in conjuction with sorting to make a sorted dictionary::

    >>> # regular unsorted dictionary
    >>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}

    >>> # dictionary sorted by key
    >>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))
    OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])

    >>> # dictionary sorted by value
    >>> OrderedDict(sorted(d.items(), key=lambda t: t[1]))
    OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])

    >>> # dictionary sorted by length of the key string
    >>> OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))
    OrderedDict([('pear', 1), ('apple', 4), ('orange', 2), ('banana', 3)])

The new sorted dictionaries maintain their sort order when entries
are deleted.  But when new keys are added, the keys are appended
to the end and the sort is not maintained.