• Tim Peters's avatar
    Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask". · 2f228e75
    Tim Peters yazdı
    The comment following used to say:
    	/* We use ~hash instead of hash, as degenerate hash functions, such
    	   as for ints <sigh>, can have lots of leading zeros. It's not
    	   really a performance risk, but better safe than sorry.
    	   12-Dec-00 tim:  so ~hash produces lots of leading ones instead --
    	   what's the gain? */
    That is, there was never a good reason for doing it.  And to the contrary,
    as explained on Python-Dev last December, it tended to make the *sum*
    (i + incr) & mask (which is the first table index examined in case of
    collison) the same "too often" across distinct hashes.
    
    Changing to the simpler "i = hash & mask" reduced the number of string-dict
    collisions (== # number of times we go around the lookup for-loop) from about
    6 million to 5 million during a full run of the test suite (these are
    approximate because the test suite does some random stuff from run to run).
    The number of collisions in non-string dicts also decreased, but not as
    dramatically.
    
    Note that this may, for a given dict, change the order (wrt previous
    releases) of entries exposed by .keys(), .values() and .items().  A number
    of std tests suffered bogus failures as a result.  For dicts keyed by
    small ints, or (less so) by characters, the order is much more likely to be
    in increasing order of key now; e.g.,
    
    >>> d = {}
    >>> for i in range(10):
    ...    d[i] = i
    ...
    >>> d
    {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
    >>>
    
    Unfortunately. people may latch on to that in small examples and draw a
    bogus conclusion.
    
    test_support.py
        Moved test_extcall's sortdict() into test_support, made it stronger,
        and imported sortdict into other std tests that needed it.
    test_unicode.py
        Excluced cp875 from the "roundtrip over range(128)" test, because
        cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
        See Python-Dev for excruciating details.
    Cookie.py
        Chaged various output functions to sort dicts before building
        strings from them.
    test_extcall
        Fiddled the expected-result file.  This remains sensitive to native
        dict ordering, because, e.g., if there are multiple errors in a
        keyword-arg dict (and test_extcall sets up many cases like that), the
        specific error Python complains about first depends on native dict
        ordering.
    2f228e75
test_regex.py 3.65 KB