Dosyalar · e57e50c8e77bc64e1ebab7a9ddf6f13fc3440c48 · Batuhan Osman TASKAYA / cpython

Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629. · e57e50c8

Ezio Melotti Haz 05, 2010 yazdı

1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in
RFC 3629, but leave it commented out since it's not backward compatible;
4) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
5) Add an extensive set of tests in test_unicode;
6) Fix test_codeccallbacks because it was failing after this change.

e57e50c8

Adı	Son kayıt (commit)	Son güncelleme
Demo		Loading commit data...
Doc		Loading commit data...
Grammar		Loading commit data...
Include		Loading commit data...
Lib		Loading commit data...
Mac		Loading commit data...
Misc		Loading commit data...
Modules		Loading commit data...
Objects		Loading commit data...
PC		Loading commit data...
PCbuild		Loading commit data...
Parser		Loading commit data...
Python		Loading commit data...
RISCOS		Loading commit data...
Tools		Loading commit data...
.bzrignore		Loading commit data...
.hgignore		Loading commit data...
.hgtags		Loading commit data...
LICENSE		Loading commit data...
Makefile.pre.in		Loading commit data...
README		Loading commit data...
configure		Loading commit data...
configure.in		Loading commit data...
install-sh		Loading commit data...
pyconfig.h.in		Loading commit data...
setup.py		Loading commit data...

README