1. 16 Agu, 2001 1 kayıt (commit)
  2. 10 Agu, 2001 1 kayıt (commit)
  3. 02 Agu, 2001 1 kayıt (commit)
  4. 26 Haz, 2001 1 kayıt (commit)
    • Barry Warsaw's avatar
      dict_update(): Generalize this method so {}.update() accepts any · 66a0d1d9
      Barry Warsaw yazdı
      "mapping" object, specifically one that supports PyMapping_Keys() and
      PyObject_GetItem().  This allows you to say e.g. {}.update(UserDict())
      
      We keep the special case for concrete dict objects, although that
      seems moderately questionable.  OTOH, the code exists and works, so
      why change that?
      
      .update()'s docstring already claims that D.update(E) implies calling
      E.keys() so it's appropriate not to transform AttributeErrors in
      PyMapping_Keys() to TypeErrors.
      
      Patch eyeballed by Tim.
      66a0d1d9
  5. 16 Haz, 2001 2 kayıt (commit)
  6. 04 Haz, 2001 1 kayıt (commit)
  7. 03 Haz, 2001 2 kayıt (commit)
  8. 02 Haz, 2001 4 kayıt (commit)
  9. 27 May, 2001 1 kayıt (commit)
    • Tim Peters's avatar
      Implement an old idea of Christian Tismer's: use polynomial division · 15d4929a
      Tim Peters yazdı
      instead of multiplication to generate the probe sequence.  The idea is
      recorded in Python-Dev for Dec 2000, but that version is prone to rare
      infinite loops.
      
      The value is in getting *all* the bits of the hash code to participate;
      and, e.g., this speeds up querying every key in a dict with keys
       [i << 16 for i in range(20000)] by a factor of 500.  Should be equally
      valuable in any bad case where the high-order hash bits were getting
      ignored.
      
      Also wrote up some of the motivations behind Python's ever-more-subtle
      hash table strategy.
      15d4929a
  10. 24 May, 2001 2 kayıt (commit)
  11. 23 May, 2001 1 kayıt (commit)
    • Tim Peters's avatar
      Jack Jansen hit a bug in the new dict code, reported on python-dev. · 0c6010be
      Tim Peters yazdı
      dictresize() was too aggressive about never ever resizing small dicts.
      If a small dict is entirely full, it needs to rebuild it despite that
      it won't actually resize it, in order to purge old dummy entries thus
      creating at least one virgin slot (lookdict assumes at least one such
      exists).
      
      Also took the opportunity to add some high-level comments to dictresize.
      0c6010be
  12. 22 May, 2001 2 kayıt (commit)
    • Fred Drake's avatar
      Remove unused variable. · 0c23231f
      Fred Drake yazdı
      0c23231f
    • Tim Peters's avatar
      SF patch #425242: Patch which "inlines" small dictionaries. · dea48ec5
      Tim Peters yazdı
      The idea is Marc-Andre Lemburg's, the implementation is Tim's.
      Add a new ma_smalltable member to dictobjects, an embedded vector of
      MINSIZE (8) dictentry structs.  Short course is that this lets us avoid
      additional malloc(s) for dicts with no more than 5 entries.
      
      The changes are widespread but mostly small.
      
      Long course:  WRT speed, all scalar operations (getitem, setitem, delitem)
      on non-empty dicts benefit from no longer needing NULL-pointer checks
      (ma_table is never NULL anymore).  Bulk operations (copy, update, resize,
      clearing slots during dealloc) benefit in some cases from now looping
      on the ma_fill count rather than on ma_size, but that was an unexpected
      benefit:  the original reason to loop on ma_fill was to let bulk
      operations on empty dicts end quickly (since the NULL-pointer checks
      went away, empty dicts aren't special-cased any more).
      
      Special considerations:
      
      For dicts that remain empty, this change is a lose on two counts:
      the dict object contains 8 new dictentry slots now that weren't
      needed before, and dict object creation also spends time memset'ing
      these doomed-to-be-unsused slots to NULLs.
      
      For dicts with one or two entries that never get larger than 2, it's
      a mix:  a malloc()/free() pair is no longer needed, and the 2-entry case
      gets to use 8 slots (instead of 4) thus decreasing the chance of
      collision.  Against that, dict object creation spends time memset'ing
      4 slots that aren't strictly needed in this case.
      
      For dicts with 3 through 5 entries that never get larger than 5, it's a
      pure win:  the dict is created with all the space they need, and they
      never need to resize.  Before they suffered two malloc()/free() calls,
      plus 1 dict resize, to get enough space.  In addition, the 8-slot
      table they ended with consumed more memory overall, because of the
      hidden overhead due to the additional malloc.
      
      For dicts with 6 or more entries, the ma_smalltable member is wasted
      space, but then these are large(r) dicts so 8 slots more or less doesn't
      make much difference.  They still benefit all the time from removing
      ubiquitous dynamic null-pointer checks, and get a small benefit (but
      relatively smaller the larger the dict) from not having to do two
      mallocs, two frees, and a resize on the way *to* getting their sixth
      entry.
      
      All in all it appears a small but definite general win, with larger
      benefits in specific cases.  It's especially nice that it allowed to
      get rid of several branches, gotos and labels, and overall made the
      code smaller.
      dea48ec5
  13. 19 May, 2001 1 kayıt (commit)
    • Tim Peters's avatar
      Bugfix candidate. · 91a364df
      Tim Peters yazdı
      Two exceedingly unlikely errors in dictresize():
      1. The loop for finding the new size had an off-by-one error at the
         end (could over-index the polys[] vector).
      2. The polys[] vector ended with a 0, apparently intended as a sentinel
         value but never used as such; i.e., it was never checked, so 0 could
         have been used *as* a polynomial.
      Neither bug could trigger unless a dict grew to 2**30 slots; since that
      would consume at least 12GB of memory just to hold the dict pointers,
      I'm betting it's not the cause of the bug Fred's tracking down <wink>.
      91a364df
  14. 17 May, 2001 1 kayıt (commit)
    • Tim Peters's avatar
      Speed dictresize by collapsing its two passes into one; the reason given · 1928314e
      Tim Peters yazdı
      in the comments for using two passes was bogus, as the only object that
      can get decref'ed due to the copy is the dummy key, and decref'ing dummy
      can't have side effects (for one thing, dummy is immortal!  for another,
      it's a string object, not a potentially dangerous user-defined object).
      1928314e
  15. 13 May, 2001 2 kayıt (commit)
    • Tim Peters's avatar
      Aggressive reordering of dict comparisons. In case of collision, it stands · 342c65e1
      Tim Peters yazdı
      to reason that me_key is much more likely to match the key we're looking
      for than to match dummy, and if the key is absent me_key is much more
      likely to be NULL than dummy:  most dicts don't even have a dummy entry.
      Running instrumented dict code over the test suite and some apps confirmed
      that matching dummy was 200-300x less frequent than matching key in
      practice.  So this reorders the tests to try the common case first.
      It can lose if a large dict with many collisions is mostly deleted, not
      resized, and then frequently searched, but that's hardly a case we
      should be favoring.
      342c65e1
    • Tim Peters's avatar
      Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask". · 2f228e75
      Tim Peters yazdı
      The comment following used to say:
      	/* We use ~hash instead of hash, as degenerate hash functions, such
      	   as for ints <sigh>, can have lots of leading zeros. It's not
      	   really a performance risk, but better safe than sorry.
      	   12-Dec-00 tim:  so ~hash produces lots of leading ones instead --
      	   what's the gain? */
      That is, there was never a good reason for doing it.  And to the contrary,
      as explained on Python-Dev last December, it tended to make the *sum*
      (i + incr) & mask (which is the first table index examined in case of
      collison) the same "too often" across distinct hashes.
      
      Changing to the simpler "i = hash & mask" reduced the number of string-dict
      collisions (== # number of times we go around the lookup for-loop) from about
      6 million to 5 million during a full run of the test suite (these are
      approximate because the test suite does some random stuff from run to run).
      The number of collisions in non-string dicts also decreased, but not as
      dramatically.
      
      Note that this may, for a given dict, change the order (wrt previous
      releases) of entries exposed by .keys(), .values() and .items().  A number
      of std tests suffered bogus failures as a result.  For dicts keyed by
      small ints, or (less so) by characters, the order is much more likely to be
      in increasing order of key now; e.g.,
      
      >>> d = {}
      >>> for i in range(10):
      ...    d[i] = i
      ...
      >>> d
      {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
      >>>
      
      Unfortunately. people may latch on to that in small examples and draw a
      bogus conclusion.
      
      test_support.py
          Moved test_extcall's sortdict() into test_support, made it stronger,
          and imported sortdict into other std tests that needed it.
      test_unicode.py
          Excluced cp875 from the "roundtrip over range(128)" test, because
          cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
          See Python-Dev for excruciating details.
      Cookie.py
          Chaged various output functions to sort dicts before building
          strings from them.
      test_extcall
          Fiddled the expected-result file.  This remains sensitive to native
          dict ordering, because, e.g., if there are multiple errors in a
          keyword-arg dict (and test_extcall sets up many cases like that), the
          specific error Python complains about first depends on native dict
          ordering.
      2f228e75
  16. 10 May, 2001 3 kayıt (commit)
    • Tim Peters's avatar
      Restore dicts' tp_compare slot, and change dict_richcompare to say it · 4fa58bfa
      Tim Peters yazdı
      doesn't know how to do LE, LT, GE, GT.  dict_richcompare can't do the
      latter any faster than dict_compare can.  More importantly, for
      cmp(dict1, dict2), Python *first* tries rich compares with EQ, LT, and
      GT one at a time, even if the tp_compare slot is defined, and
      dict_richcompare called dict_compare for the latter two because
      it couldn't do them itself.  The result was a lot of wasted calls to
      dict_compare.  Now dict_richcompare gives up at once the times Python
      calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare
      is called only once (when Python gets around to trying the tp_compare
      slot).
      Continued mystery:  despite that this cut the number of calls to
      dict_compare approximately in half in test_mutants.py, the latter still
      runs amazingly slowly.  Running under the debugger doesn't show excessive
      activity in the dict comparison code anymore, so I'm guessing the culprit
      is somewhere else -- but where?  Perhaps in the element (key/value)
      comparison code?  We clearly spend a lot of time figuring out how to
      compare things.
      4fa58bfa
    • Tim Peters's avatar
      Repair typo in comment. · 3918fb25
      Tim Peters yazdı
      3918fb25
    • Tim Peters's avatar
      SF bug #422121 Insecurities in dict comparison. · 95bf9390
      Tim Peters yazdı
      Fixed a half dozen ways in which general dict comparison could crash
      Python (even cause Win98SE to reboot) in the presence of kay and/or
      value comparison routines that mutate the dict during dict comparison.
      Bugfix candidate.
      95bf9390
  17. 08 May, 2001 1 kayıt (commit)
  18. 02 May, 2001 1 kayıt (commit)
  19. 01 May, 2001 1 kayıt (commit)
  20. 23 Nis, 2001 1 kayıt (commit)
    • Guido van Rossum's avatar
      Mondo changes to the iterator stuff, without changing how Python code · 213c7a6a
      Guido van Rossum yazdı
      sees it (test_iter.py is unchanged).
      
      - Added a tp_iternext slot, which calls the iterator's next() method;
        this is much faster for built-in iterators over built-in types
        such as lists and dicts, speeding up pybench's ForLoop with about
        25% compared to Python 2.1.  (Now there's a good argument for
        iterators. ;-)
      
      - Renamed the built-in sequence iterator SeqIter, affecting the C API
        functions for it.  (This frees up the PyIter prefix for generic
        iterator operations.)
      
      - Added PyIter_Check(obj), which checks that obj's type has a
        tp_iternext slot and that the proper feature flag is set.
      
      - Added PyIter_Next(obj) which calls the tp_iternext slot.  It has a
        somewhat complex return condition due to the need for speed: when it
        returns NULL, it may not have set an exception condition, meaning
        the iterator is exhausted; when the exception StopIteration is set
        (or a derived exception class), it means the same thing; any other
        exception means some other error occurred.
      213c7a6a
  21. 20 Nis, 2001 3 kayıt (commit)
    • Guido van Rossum's avatar
      Iterators phase 1. This comprises: · 59d1d2b4
      Guido van Rossum yazdı
      new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER
      new C API PyObject_GetIter(), calls tp_iter
      new builtin iter(), with two forms: iter(obj), and iter(function, sentinel)
      new internal object types iterobject and calliterobject
      new exception StopIteration
      new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py)
      new magic number for .pyc files
      new special method for instances: __iter__() returns an iterator
      iteration over dictionaries: "for x in dict" iterates over the keys
      iteration over files: "for x in file" iterates over lines
      
      TODO:
      
      documentation
      test suite
      decide whether to use a different way to spell iter(function, sentinal)
      decide whether "for key in dict" is a good idea
      use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?)
      speed tuning (make next() a slot tp_next???)
      59d1d2b4
    • Guido van Rossum's avatar
    • Guido van Rossum's avatar
      Implement, test and document "key in dict" and "key not in dict". · 0dbb4fba
      Guido van Rossum yazdı
      I know some people don't like this -- if it's really controversial,
      I'll take it out again.  (If it's only Alex Martelli who doesn't like
      it, that doesn't count as "real controversial" though. :-)
      
      That's why this is a separate checkin from the iterators stuff I'm
      about to check in next.
      0dbb4fba
  22. 16 Nis, 2001 1 kayıt (commit)
    • Guido van Rossum's avatar
      Tim pointed out a remaining vulnerability in popitem(): the · e04eaec5
      Guido van Rossum yazdı
      PyTuple_New() could *conceivably* clear the dict, so move the test for
      an empty dict after the tuple allocation.  It means that we waste time
      allocating and deallocating a 2-tuple when the dict is empty, but who
      cares.  It also means that when the dict is empty *and* there's no
      memory to allocate a 2-tuple, we raise MemoryError, not KeyError --
      but that may actually a good idea: if there's no room for a lousy
      2-tuple, what are the chances that there's room for a KeyError
      instance?
      e04eaec5
  23. 15 Nis, 2001 1 kayıt (commit)
    • Guido van Rossum's avatar
      Tentative fix for a problem that Tim discovered at the last moment, · a4dd0112
      Guido van Rossum yazdı
      and reported to python-dev: because we were calling dict_resize() in
      PyDict_Next(), and because GC's dict_traverse() uses PyDict_Next(),
      and because PyTuple_New() can cause GC, and because dict_items() calls
      PyTuple_New(), it was possible for dict_items() to have the dict
      resized right under its nose.
      
      The solution is convoluted, and touches several places: keys(),
      values(), items(), popitem(), PyDict_Next(), and PyDict_SetItem().
      
      There are two parts to it. First, we no longer call dict_resize() in
      PyDict_Next(), which seems to solve the immediate problem.  But then
      PyDict_SetItem() must have a different policy about when *it* calls
      dict_resize(), because we want to guarantee (e.g. for an algorithm
      that Jeremy uses in the compiler) that you can loop over a dict using
      PyDict_Next() and make changes to the dict as long as those changes
      are only value replacements for existing keys using PyDict_SetItem().
      This is done by resizing *after* the insertion instead of before, and
      by remembering the size before we insert the item, and if the size is
      still the same, we don't bother to even check if we might need to
      resize.  An additional detail is that if the dict starts out empty, we
      must still resize it before the insertion.
      
      That was the first part. :-)
      
      The second part is to make keys(), values(), items(), and popitem()
      safe against side effects on the dict caused by allocations, under the
      assumption that if the GC can cause arbitrary Python code to run, it
      can cause other threads to run, and it's not inconceivable that our
      dict could be resized -- it would be insane to write code that relies
      on this, but not all code is sane.
      
      Now, I have this nagging feeling that the loops in lookdict probably
      are blissfully assuming that doing a simple key comparison does not
      change the dict's size.  This is not necessarily true (the keys could
      be class instances after all).  But that's a battle for another day.
      a4dd0112
  24. 21 Mar, 2001 1 kayıt (commit)
  25. 18 Ock, 2001 1 kayıt (commit)
    • Guido van Rossum's avatar
      Rich comparisons: · b932420c
      Guido van Rossum yazdı
      - Use PyObject_RichCompareBool() when comparing keys; this makes the
        error handling cleaner.
      
      - There were two implementations for dictionary comparison, an old one
        (#ifdef'ed out) and a new one.  Got rid of the old one, which was
        abandoned years ago.
      
      - In the characterize() function, part of dictionary comparison, use
        PyObject_RichCompareBool() to compare keys and values instead.  But
        continue to use PyObject_Compare() for comparing the final
        (deciding) elements.
      
      - Align the comments in the type struct initializer.
      
      Note: I don't implement rich comparison for dictionaries -- there
      doesn't seem to be much to be gained.  (The existing comparison
      already decides that shorter dicts are always smaller than longer
      dicts.)
      b932420c
  26. 03 Ock, 2001 1 kayıt (commit)
  27. 13 Ara, 2000 2 kayıt (commit)