Kayıtlar (commit) · 705f0f5a91fec6f4edaa2d8fd4faf17d75c4700c · Batuhan Osman TASKAYA / cpython

16 Agu, 2001 1 kayıt (commit)
- Patch #427190: Implement and use METH_NOARGS and METH_O. · e3eb1f2b
  Martin v. Löwis 23 years ago yazdı
  
  e3eb1f2b
10 Agu, 2001 1 kayıt (commit)

Add PyDict_Merge(a, b, override): · 05ac6de2

PyDict_Merge(a, b, 1) is the same as PyDict_Update(a, b).
PyDict_Merge(a, b, 0) does something similar but leaves existing items
unchanged.

05ac6de2

02 Agu, 2001 1 kayıt (commit)
- Merge of descr-branch back into trunk. · 6d6c1a35
  Tim Peters 23 years ago yazdı
  
  6d6c1a35
26 Haz, 2001 1 kayıt (commit)

dict_update(): Generalize this method so {}.update() accepts any · 66a0d1d9

Barry Warsaw 23 years ago yazdı

"mapping" object, specifically one that supports PyMapping_Keys() and
PyObject_GetItem().  This allows you to say e.g. {}.update(UserDict())

We keep the special case for concrete dict objects, although that
seems moderately questionable.  OTOH, the code exists and works, so
why change that?

.update()'s docstring already claims that D.update(E) implies calling
E.keys() so it's appropriate not to transform AttributeErrors in
PyMapping_Keys() to TypeErrors.

Patch eyeballed by Tim.

66a0d1d9

16 Haz, 2001 2 kayıt (commit)

dict_repr: Reuse one of the int vars (minor code simplification). · c6057841
Tim Peters 23 years ago yazdı

c6057841

SF bug 433228: repr(list) woes when len(list) big. · a7259597

Tim Peters 23 years ago yazdı

Gave Python linear-time repr() implementations for dicts, lists, strings.
This means, e.g., that repr(range(50000)) is no longer 50x slower than
pprint.pprint() in 2.2 <wink>.

I don't consider this a bugfix candidate, as it's a performance boost.

Added _PyString_Join() to the internal string API.  If we want that in the
public API, fine, but then it requires runtime error checks instead of
asserts.

a7259597

04 Haz, 2001 1 kayıt (commit)

Store the mask instead of the size in dictobjects. The mask is more · afb6ae84

Tim Peters 23 years ago yazdı

frequently used, and in particular this allows to drop the last
remaining obvious time-waster in the crucial lookdict() and
lookdict_string() functions.  Other changes consist mostly of changing
"i < ma_size" to "i <= ma_mask" everywhere.

afb6ae84

03 Haz, 2001 2 kayıt (commit)

lookdict: stop more insane core-dump mutating comparison cases. Should · 453163d8

Tim Peters 23 years ago yazdı

be possible to provoke unbounded recursion now, but leaving that to someone
else to provoke and repair.
Bugfix candidate -- although this is getting harder to backstitch, and the
cases it's protecting against are mondo contrived.

453163d8

lookdict: Reduce obfuscating code duplication with a judicious goto. · 7b5d0afb

Tim Peters 23 years ago yazdı

This code is likely to get even hairier to squash core dumps due to
mutating comparisons, and it's hard enough to follow without that.

7b5d0afb

02 Haz, 2001 4 kayıt (commit)
- Finish the dict->string coredump fix. Need sleep. · 19b77cfc
  Tim Peters 23 years ago yazdı
```
Bugfix candidate.
```
  19b77cfc
- Coredumpers from Michael Hudson, mutating dicts while printing or · 23cf6be2
  Tim Peters 23 years ago yazdı
```
converting to string.
Critical bugfix candidate -- if you take this seriously <wink>.
```
  23cf6be2
- dict_popitem(): Repaired last-second 2.1 comment, which misidentified the · f4b33f61
  Tim Peters 23 years ago yazdı
```
true reason for allocating the tuple before checking the dict size.
```
  f4b33f61
- New collision resolution scheme: no polynomials, simpler, faster, less · eb28ef20
  Tim Peters 23 years ago yazdı
```
code, less memory.  Tests have uncovered no drawbacks.  Christian and
Vladimir are the other two people who have burned many brain cells on the
dict code in recent years, and they like the approach too, so I'm checking
it in without further ado.
```
  eb28ef20
27 May, 2001 1 kayıt (commit)

Implement an old idea of Christian Tismer's: use polynomial division · 15d4929a

Tim Peters 23 years ago yazdı

instead of multiplication to generate the probe sequence.  The idea is
recorded in Python-Dev for Dec 2000, but that version is prone to rare
infinite loops.

The value is in getting *all* the bits of the hash code to participate;
and, e.g., this speeds up querying every key in a dict with keys
 [i << 16 for i in range(20000)] by a factor of 500.  Should be equally
valuable in any bad case where the high-order hash bits were getting
ignored.

Also wrote up some of the motivations behind Python's ever-more-subtle
hash table strategy.

15d4929a

24 May, 2001 2 kayıt (commit)
- Patch #424335: Implement string_richcompare, remove string_compare. · cd35306a
  Martin v. Löwis 23 years ago yazdı
```
               Use new _PyString_Eq in lookdict_string.
```
  cd35306a
- dictresize(): Rebuild small tables if there are any dummies, not just if · f8a548c2
  Tim Peters 23 years ago yazdı
```
they're entirely full.  Not a question of correctness, but of temporarily
misplaced common sense.
```
  f8a548c2
23 May, 2001 1 kayıt (commit)

Jack Jansen hit a bug in the new dict code, reported on python-dev. · 0c6010be

Tim Peters 23 years ago yazdı

dictresize() was too aggressive about never ever resizing small dicts.
If a small dict is entirely full, it needs to rebuild it despite that
it won't actually resize it, in order to purge old dummy entries thus
creating at least one virgin slot (lookdict assumes at least one such
exists).

Also took the opportunity to add some high-level comments to dictresize.

0c6010be

22 May, 2001 2 kayıt (commit)

Remove unused variable. · 0c23231f
Fred Drake 23 years ago yazdı

0c23231f

SF patch #425242: Patch which "inlines" small dictionaries. · dea48ec5

Tim Peters 23 years ago yazdı

The idea is Marc-Andre Lemburg's, the implementation is Tim's.
Add a new ma_smalltable member to dictobjects, an embedded vector of
MINSIZE (8) dictentry structs. Short course is that this lets us avoid
additional malloc(s) for dicts with no more than 5 entries.

The changes are widespread but mostly small.

Long course: WRT speed, all scalar operations (getitem, setitem, delitem)
on non-empty dicts benefit from no longer needing NULL-pointer checks
(ma_table is never NULL anymore). Bulk operations (copy, update, resize,
clearing slots during dealloc) benefit in some cases from now looping
on the ma_fill count rather than on ma_size, but that was an unexpected
benefit: the original reason to loop on ma_fill was to let bulk
operations on empty dicts end quickly (since the NULL-pointer checks
went away, empty dicts aren't special-cased any more).

Special considerations:

For dicts that remain empty, this change is a lose on two counts:
the dict object contains 8 new dictentry slots now that weren't
needed before, and dict object creation also spends time memset'ing
these doomed-to-be-unsused slots to NULLs.

For dicts with one or two entries that never get larger than 2, it's
a mix: a malloc()/free() pair is no longer needed, and the 2-entry case
gets to use 8 slots (instead of 4) thus decreasing the chance of
collision. Against that, dict object creation spends time memset'ing
4 slots that aren't strictly needed in this case.

For dicts with 3 through 5 entries that never get larger than 5, it's a
pure win: the dict is created with all the space they need, and they
never need to resize. Before they suffered two malloc()/free() calls,
plus 1 dict resize, to get enough space. In addition, the 8-slot
table they ended with consumed more memory overall, because of the
hidden overhead due to the additional malloc.

For dicts with 6 or more entries, the ma_smalltable member is wasted
space, but then these are large(r) dicts so 8 slots more or less doesn't
make much difference. They still benefit all the time from removing
ubiquitous dynamic null-pointer checks, and get a small benefit (but
relatively smaller the larger the dict) from not having to do two
mallocs, two frees, and a resize on the way *to* getting their sixth
entry.

All in all it appears a small but definite general win, with larger
benefits in specific cases. It's especially nice that it allowed to
get rid of several branches, gotos and labels, and overall made the
code smaller.

dea48ec5

19 May, 2001 1 kayıt (commit)

Bugfix candidate. · 91a364df

Tim Peters 23 years ago yazdı

Two exceedingly unlikely errors in dictresize():
1. The loop for finding the new size had an off-by-one error at the
   end (could over-index the polys[] vector).
2. The polys[] vector ended with a 0, apparently intended as a sentinel
   value but never used as such; i.e., it was never checked, so 0 could
   have been used *as* a polynomial.
Neither bug could trigger unless a dict grew to 2**30 slots; since that
would consume at least 12GB of memory just to hold the dict pointers,
I'm betting it's not the cause of the bug Fred's tracking down <wink>.

91a364df

17 May, 2001 1 kayıt (commit)

Speed dictresize by collapsing its two passes into one; the reason given · 1928314e

Tim Peters 23 years ago yazdı

in the comments for using two passes was bogus, as the only object that
can get decref'ed due to the copy is the dummy key, and decref'ing dummy
can't have side effects (for one thing, dummy is immortal! for another,
it's a string object, not a potentially dangerous user-defined object).

1928314e

13 May, 2001 2 kayıt (commit)

Aggressive reordering of dict comparisons. In case of collision, it stands · 342c65e1

Tim Peters 23 years ago yazdı

to reason that me_key is much more likely to match the key we're looking
for than to match dummy, and if the key is absent me_key is much more
likely to be NULL than dummy: most dicts don't even have a dummy entry.
Running instrumented dict code over the test suite and some apps confirmed
that matching dummy was 200-300x less frequent than matching key in
practice. So this reorders the tests to try the common case first.
It can lose if a large dict with many collisions is mostly deleted, not
resized, and then frequently searched, but that's hardly a case we
should be favoring.

342c65e1

Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask". · 2f228e75

Tim Peters 23 years ago yazdı

The comment following used to say:
	/* We use ~hash instead of hash, as degenerate hash functions, such
	   as for ints <sigh>, can have lots of leading zeros. It's not
	   really a performance risk, but better safe than sorry.
	   12-Dec-00 tim:  so ~hash produces lots of leading ones instead --
	   what's the gain? */
That is, there was never a good reason for doing it.  And to the contrary,
as explained on Python-Dev last December, it tended to make the *sum*
(i + incr) & mask (which is the first table index examined in case of
collison) the same "too often" across distinct hashes.

Changing to the simpler "i = hash & mask" reduced the number of string-dict
collisions (== # number of times we go around the lookup for-loop) from about
6 million to 5 million during a full run of the test suite (these are
approximate because the test suite does some random stuff from run to run).
The number of collisions in non-string dicts also decreased, but not as
dramatically.

Note that this may, for a given dict, change the order (wrt previous
releases) of entries exposed by .keys(), .values() and .items().  A number
of std tests suffered bogus failures as a result.  For dicts keyed by
small ints, or (less so) by characters, the order is much more likely to be
in increasing order of key now; e.g.,

>>> d = {}
>>> for i in range(10):
...    d[i] = i
...
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>>

Unfortunately. people may latch on to that in small examples and draw a
bogus conclusion.

test_support.py
    Moved test_extcall's sortdict() into test_support, made it stronger,
    and imported sortdict into other std tests that needed it.
test_unicode.py
    Excluced cp875 from the "roundtrip over range(128)" test, because
    cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
    See Python-Dev for excruciating details.
Cookie.py
    Chaged various output functions to sort dicts before building
    strings from them.
test_extcall
    Fiddled the expected-result file.  This remains sensitive to native
    dict ordering, because, e.g., if there are multiple errors in a
    keyword-arg dict (and test_extcall sets up many cases like that), the
    specific error Python complains about first depends on native dict
    ordering.

2f228e75

10 May, 2001 3 kayıt (commit)

Restore dicts' tp_compare slot, and change dict_richcompare to say it · 4fa58bfa

Tim Peters 23 years ago yazdı

doesn't know how to do LE, LT, GE, GT. dict_richcompare can't do the
latter any faster than dict_compare can. More importantly, for
cmp(dict1, dict2), Python *first* tries rich compares with EQ, LT, and
GT one at a time, even if the tp_compare slot is defined, and
dict_richcompare called dict_compare for the latter two because
it couldn't do them itself. The result was a lot of wasted calls to
dict_compare. Now dict_richcompare gives up at once the times Python
calls it with LT and GT from try_rich_to_3way_compare(), and dict_compare
is called only once (when Python gets around to trying the tp_compare
slot).
Continued mystery: despite that this cut the number of calls to
dict_compare approximately in half in test_mutants.py, the latter still
runs amazingly slowly. Running under the debugger doesn't show excessive
activity in the dict comparison code anymore, so I'm guessing the culprit
is somewhere else -- but where? Perhaps in the element (key/value)
comparison code? We clearly spend a lot of time figuring out how to
compare things.

4fa58bfa

Repair typo in comment. · 3918fb25
Tim Peters 23 years ago yazdı

3918fb25

SF bug #422121 Insecurities in dict comparison. · 95bf9390

Tim Peters 23 years ago yazdı

Fixed a half dozen ways in which general dict comparison could crash
Python (even cause Win98SE to reboot) in the presence of kay and/or
value comparison routines that mutate the dict during dict comparison.
Bugfix candidate.

95bf9390

08 May, 2001 1 kayıt (commit)

SF patch #421922: Implement rich comparison for dicts. · e63415ea

Tim Peters 23 years ago yazdı

d1 == d2 and d1 != d2 now work even if the keys and values in d1 and d2
don't support comparisons other than ==, and testing dicts for equality
is faster now (especially when inequality obtains).

e63415ea

02 May, 2001 1 kayıt (commit)

Mchael Hudson pointed out that the code for detecting changes in · b1f35bff

Guido van Rossum 23 years ago yazdı

dictionary size was comparing ma_size, the hash table size, which is
always a power of two, rather than ma_used, wich changes on each
insertion or deletion.  Fixed this.

b1f35bff

01 May, 2001 1 kayıt (commit)

Add experimental iterkeys(), itervalues(), iteritems() to dict · 09e563ab

Guido van Rossum 23 years ago yazdı

objects.

Tests show that iteritems() is 5-10% faster than iterating over the
dict and extracting the value with dict[key].

09e563ab

23 Nis, 2001 1 kayıt (commit)

Mondo changes to the iterator stuff, without changing how Python code · 213c7a6a

Guido van Rossum 23 years ago yazdı

sees it (test_iter.py is unchanged).

- Added a tp_iternext slot, which calls the iterator's next() method;
  this is much faster for built-in iterators over built-in types
  such as lists and dicts, speeding up pybench's ForLoop with about
  25% compared to Python 2.1.  (Now there's a good argument for
  iterators. ;-)

- Renamed the built-in sequence iterator SeqIter, affecting the C API
  functions for it.  (This frees up the PyIter prefix for generic
  iterator operations.)

- Added PyIter_Check(obj), which checks that obj's type has a
  tp_iternext slot and that the proper feature flag is set.

- Added PyIter_Next(obj) which calls the tp_iternext slot.  It has a
  somewhat complex return condition due to the need for speed: when it
  returns NULL, it may not have set an exception condition, meaning
  the iterator is exhausted; when the exception StopIteration is set
  (or a derived exception class), it means the same thing; any other
  exception means some other error occurred.

213c7a6a

20 Nis, 2001 3 kayıt (commit)

Iterators phase 1. This comprises: · 59d1d2b4

Guido van Rossum 23 years ago yazdı

new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER
new C API PyObject_GetIter(), calls tp_iter
new builtin iter(), with two forms: iter(obj), and iter(function, sentinel)
new internal object types iterobject and calliterobject
new exception StopIteration
new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py)
new magic number for .pyc files
new special method for instances: __iter__() returns an iterator
iteration over dictionaries: "for x in dict" iterates over the keys
iteration over files: "for x in file" iterates over lines

TODO:

documentation
test suite
decide whether to use a different way to spell iter(function, sentinal)
decide whether "for key in dict" is a good idea
use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?)
speed tuning (make next() a slot tp_next???)

59d1d2b4

Oops. Removed dictiter_new decl that wasn't supposed to go in yet. · 55ad67d7
Guido van Rossum 23 years ago yazdı

55ad67d7

Implement, test and document "key in dict" and "key not in dict". · 0dbb4fba

Guido van Rossum 23 years ago yazdı

I know some people don't like this -- if it's really controversial,
I'll take it out again.  (If it's only Alex Martelli who doesn't like
it, that doesn't count as "real controversial" though. :-)

That's why this is a separate checkin from the iterators stuff I'm
about to check in next.

0dbb4fba

16 Nis, 2001 1 kayıt (commit)

Tim pointed out a remaining vulnerability in popitem(): the · e04eaec5

Guido van Rossum 23 years ago yazdı

PyTuple_New() could *conceivably* clear the dict, so move the test for
an empty dict after the tuple allocation.  It means that we waste time
allocating and deallocating a 2-tuple when the dict is empty, but who
cares.  It also means that when the dict is empty *and* there's no
memory to allocate a 2-tuple, we raise MemoryError, not KeyError --
but that may actually a good idea: if there's no room for a lousy
2-tuple, what are the chances that there's room for a KeyError
instance?

e04eaec5

15 Nis, 2001 1 kayıt (commit)

Tentative fix for a problem that Tim discovered at the last moment, · a4dd0112

Guido van Rossum 23 years ago yazdı

and reported to python-dev: because we were calling dict_resize() in
PyDict_Next(), and because GC's dict_traverse() uses PyDict_Next(),
and because PyTuple_New() can cause GC, and because dict_items() calls
PyTuple_New(), it was possible for dict_items() to have the dict
resized right under its nose.

The solution is convoluted, and touches several places: keys(),
values(), items(), popitem(), PyDict_Next(), and PyDict_SetItem().

There are two parts to it. First, we no longer call dict_resize() in
PyDict_Next(), which seems to solve the immediate problem. But then
PyDict_SetItem() must have a different policy about when *it* calls
dict_resize(), because we want to guarantee (e.g. for an algorithm
that Jeremy uses in the compiler) that you can loop over a dict using
PyDict_Next() and make changes to the dict as long as those changes
are only value replacements for existing keys using PyDict_SetItem().
This is done by resizing *after* the insertion instead of before, and
by remembering the size before we insert the item, and if the size is
still the same, we don't bother to even check if we might need to
resize. An additional detail is that if the dict starts out empty, we
must still resize it before the insertion.

That was the first part. :-)

The second part is to make keys(), values(), items(), and popitem()
safe against side effects on the dict caused by allocations, under the
assumption that if the GC can cause arbitrary Python code to run, it
can cause other threads to run, and it's not inconceivable that our
dict could be resized -- it would be insane to write code that relies
on this, but not all code is sane.

Now, I have this nagging feeling that the loops in lookdict probably
are blissfully assuming that doing a simple key comparison does not
change the dict's size. This is not necessarily true (the keys could
be class instances after all). But that's a battle for another day.

a4dd0112

21 Mar, 2001 1 kayıt (commit)

Make PyDict_Next safe to use for loops that merely modify the values · 6783070e

Tim Peters 24 years ago yazdı

associated with existing dict keys.
This is a variant of part of Michael Hudson's patch #409864 "lazy fix for
Pings bizarre scoping crash".

6783070e

18 Ock, 2001 1 kayıt (commit)

Rich comparisons: · b932420c

Guido van Rossum 24 years ago yazdı

- Use PyObject_RichCompareBool() when comparing keys; this makes the
  error handling cleaner.

- There were two implementations for dictionary comparison, an old one
  (#ifdef'ed out) and a new one.  Got rid of the old one, which was
  abandoned years ago.

- In the characterize() function, part of dictionary comparison, use
  PyObject_RichCompareBool() to compare keys and values instead.  But
  continue to use PyObject_Compare() for comparing the final
  (deciding) elements.

- Align the comments in the type struct initializer.

Note: I don't implement rich comparison for dictionaries -- there
doesn't seem to be much to be gained.  (The existing comparison
already decides that shorter dicts are always smaller than longer
dicts.)

b932420c

03 Ock, 2001 1 kayıt (commit)
- dict_update has two boundary conditions: a.update(a) and a.update({}) · 1fb6088e
  Jeremy Hylton 24 years ago yazdı
```
Added test for second one.
```
  1fb6088e
13 Ara, 2000 2 kayıt (commit)
- Add long-overdue docstrings to dict methods. · f7f88b11
  Tim Peters 24 years ago yazdı
  
  f7f88b11
- Typo repair in comments. Fell for GregS's .popitem() poke. · f1c7c884
  Tim Peters 24 years ago yazdı
  
  f1c7c884