• Ezio Melotti's avatar
    Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629. · e57e50c8
    Ezio Melotti yazdı
    1) #8271: when a byte sequence is invalid, only the start byte and all the
       valid continuation bytes are now replaced by U+FFFD, instead of replacing
       the number of bytes specified by the start byte.
       See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
    2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
       in behavior);
    3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in
       RFC 3629, but leave it commented out since it's not backward compatible;
    4) Change the error messages "unexpected code byte" to "invalid start byte"
       and "invalid data" to "invalid continuation byte";
    5) Add an extensive set of tests in test_unicode;
    6) Fix test_codeccallbacks because it was failing after this change.
    e57e50c8
Adı
Son kayıt (commit)
Son güncelleme
Demo Loading commit data...
Doc Loading commit data...
Grammar Loading commit data...
Include Loading commit data...
Lib Loading commit data...
Mac Loading commit data...
Misc Loading commit data...
Modules Loading commit data...
Objects Loading commit data...
PC Loading commit data...
PCbuild Loading commit data...
Parser Loading commit data...
Python Loading commit data...
RISCOS Loading commit data...
Tools Loading commit data...
.bzrignore Loading commit data...
.hgignore Loading commit data...
.hgtags Loading commit data...
LICENSE Loading commit data...
Makefile.pre.in Loading commit data...
README Loading commit data...
configure Loading commit data...
configure.in Loading commit data...
install-sh Loading commit data...
pyconfig.h.in Loading commit data...
setup.py Loading commit data...