Skip to content
Projeler
Gruplar
Parçacıklar
Yardım
Yükleniyor...
Oturum aç / Kaydol
Gezinmeyi değiştir
C
cpython
Proje
Proje
Ayrıntılar
Etkinlik
Cycle Analytics
Depo (repository)
Depo (repository)
Dosyalar
Kayıtlar (commit)
Dallar (branch)
Etiketler
Katkıda bulunanlar
Grafik
Karşılaştır
Grafikler
Konular (issue)
0
Konular (issue)
0
Liste
Pano
Etiketler
Kilometre Taşları
Birleştirme (merge) Talepleri
0
Birleştirme (merge) Talepleri
0
CI / CD
CI / CD
İş akışları (pipeline)
İşler
Zamanlamalar
Grafikler
Paketler
Paketler
Wiki
Wiki
Parçacıklar
Parçacıklar
Üyeler
Üyeler
Collapse sidebar
Close sidebar
Etkinlik
Grafik
Grafikler
Yeni bir konu (issue) oluştur
İşler
Kayıtlar (commit)
Konu (issue) Panoları
Kenar çubuğunu aç
Batuhan Osman TASKAYA
cpython
Commits
b5507ecd
Kaydet (Commit)
b5507ecd
authored
Eki 19, 2001
tarafından
Marc-André Lemburg
Dosyalara gözat
Seçenekler
Dosyalara Gözat
İndir
Eposta Yamaları
Sade Fark
Additional test and documentation for the unicode() changes.
This patch should also be applied to the 2.2b1 trunk.
üst
f6fb171c
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
93 additions
and
37 deletions
+93
-37
libfuncs.tex
Doc/lib/libfuncs.tex
+27
-13
test_unicode
Lib/test/output/test_unicode
+1
-0
test_unicode.py
Lib/test/test_unicode.py
+61
-21
NEWS
Misc/NEWS
+1
-1
unicodeobject.c
Objects/unicodeobject.c
+3
-2
No files found.
Doc/lib/libfuncs.tex
Dosyayı görüntüle @
b5507ecd
...
...
@@ -758,19 +758,33 @@ def my_import(name):
\versionadded
{
2.0
}
\end{funcdesc}
\begin{funcdesc}
{
unicode
}{
string
\optional
{
, encoding
\optional
{
, errors
}}}
Create a Unicode string from an 8-bit string
\var
{
string
}
using the
codec for
\var
{
encoding
}
. The
\var
{
encoding
}
parameter is a string
giving the name of an encoding. Error handling is done according to
\var
{
errors
}
; this specifies the treatment of characters which are
invalid in the input encoding. If
\var
{
errors
}
is
\code
{
'strict'
}
(the default), a
\exception
{
ValueError
}
is raised on errors, while a
value of
\code
{
'ignore'
}
causes errors to be silently ignored, and a
value of
\code
{
'replace'
}
causes the official Unicode replacement
character,
\code
{
U+FFFD
}
, to be used to replace input characters
which cannot be decoded. The default behavior is to decode UTF-8 in
strict mode, meaning that encoding errors raise
\exception
{
ValueError
}
. See also the
\refmodule
{
codecs
}
module.
\begin{funcdesc}
{
unicode
}{
object
\optional
{
, encoding
\optional
{
, errors
}}}
Return the Unicode string version of
\var
{
object
}
using one of the
following modes:
If
\var
{
encoding
}
and/or
\var
{
errors
}
are given,
\code
{
unicode()
}
will decode the object which can either be an 8-bit string or a
character buffer using the codec for
\var
{
encoding
}
. The
\var
{
encoding
}
parameter is a string giving the name of an encoding.
Error handling is done according to
\var
{
errors
}
; this specifies the
treatment of characters which are invalid in the input encoding. If
\var
{
errors
}
is
\code
{
'strict'
}
(the default), a
\exception
{
ValueError
}
is raised on errors, while a value of
\code
{
'ignore'
}
causes errors to be silently ignored, and a value of
\code
{
'replace'
}
causes the official Unicode replacement character,
\code
{
U+FFFD
}
, to be used to replace input characters which cannot
be decoded. See also the
\refmodule
{
codecs
}
module.
If no optional parameters are given,
\code
{
unicode()
}
will mimic the
behaviour of
\code
{
str()
}
except that it returns Unicode strings
instead of 8-bit strings. More precisely, if
\var
{
object
}
is an
Unicode string or subclass it will return a Unicode string without
any additional decoding applied. For objects which provide a
\code
{__
unicode
__}
method, it will call this method without
arguments to create a Unicode string. For all other objects, the
8-bit string version or representation is requested and then
converted to a Unicode string using the codec for the default
encoding in
\code
{
'strict'
}
mode.
\versionadded
{
2.0
}
\end{funcdesc}
...
...
Lib/test/output/test_unicode
Dosyayı görüntüle @
b5507ecd
...
...
@@ -2,6 +2,7 @@ test_unicode
Testing Unicode comparisons... done.
Testing Unicode contains method... done.
Testing Unicode formatting strings... done.
Testing builtin unicode()... done.
Testing builtin codecs... done.
Testing standard mapping codecs... 0-127... 128-255... done.
Testing Unicode string concatenation... done.
Lib/test/test_unicode.py
Dosyayı görüntüle @
b5507ecd
...
...
@@ -389,6 +389,67 @@ verify('%i %*.*s' % (10, 5,3,u'abc',) == u'10 abc')
verify
(
'
%
i
%
s
%*.*
s'
%
(
10
,
3
,
5
,
3
,
u'abc'
,)
==
u'103 abc'
)
print
'done.'
print
'Testing builtin unicode()...'
,
# unicode(obj) tests (this maps to PyObject_Unicode() at C level)
verify
(
unicode
(
u'unicode remains unicode'
)
==
u'unicode remains unicode'
)
class
UnicodeSubclass
(
unicode
):
pass
verify
(
unicode
(
UnicodeSubclass
(
'unicode subclass becomes unicode'
))
==
u'unicode subclass becomes unicode'
)
verify
(
unicode
(
'strings are converted to unicode'
)
==
u'strings are converted to unicode'
)
class
UnicodeCompat
:
def
__init__
(
self
,
x
):
self
.
x
=
x
def
__unicode__
(
self
):
return
self
.
x
verify
(
unicode
(
UnicodeCompat
(
'__unicode__ compatible objects are recognized'
))
==
u'__unicode__ compatible objects are recognized'
)
class
StringCompat
:
def
__init__
(
self
,
x
):
self
.
x
=
x
def
__str__
(
self
):
return
self
.
x
verify
(
unicode
(
StringCompat
(
'__str__ compatible objects are recognized'
))
==
u'__str__ compatible objects are recognized'
)
# unicode(obj) is compatible to str():
o
=
StringCompat
(
'unicode(obj) is compatible to str()'
)
verify
(
unicode
(
o
)
==
u'unicode(obj) is compatible to str()'
)
verify
(
str
(
o
)
==
'unicode(obj) is compatible to str()'
)
for
obj
in
(
123
,
123.45
,
123L
):
verify
(
unicode
(
obj
)
==
unicode
(
str
(
obj
)))
# unicode(obj, encoding, error) tests (this maps to
# PyUnicode_FromEncodedObject() at C level)
try
:
unicode
(
u'decoding unicode is not supported'
,
'utf-8'
,
'strict'
)
except
TypeError
:
pass
else
:
raise
TestFailed
,
"decoding unicode should NOT be supported"
verify
(
unicode
(
'strings are decoded to unicode'
,
'utf-8'
,
'strict'
)
==
u'strings are decoded to unicode'
)
verify
(
unicode
(
buffer
(
'character buffers are decoded to unicode'
),
'utf-8'
,
'strict'
)
==
u'character buffers are decoded to unicode'
)
print
'done.'
# Test builtin codecs
print
'Testing builtin codecs...'
,
...
...
@@ -437,32 +498,11 @@ verify(unicode(''.join((chr(0xe2), chr(0x82), chr(0xac))),
# * strict decoding testing for all of the
# UTF8_ERROR cases in PyUnicode_DecodeUTF8
verify
(
unicode
(
'hello'
,
'ascii'
)
==
u'hello'
)
verify
(
unicode
(
'hello'
,
'utf-8'
)
==
u'hello'
)
verify
(
unicode
(
'hello'
,
'utf8'
)
==
u'hello'
)
verify
(
unicode
(
'hello'
,
'latin-1'
)
==
u'hello'
)
# Compatibility to str():
class
String
:
x
=
''
def
__str__
(
self
):
return
self
.
x
o
=
String
()
o
.
x
=
'abc'
verify
(
unicode
(
o
)
==
u'abc'
)
verify
(
str
(
o
)
==
'abc'
)
o
.
x
=
u'abc'
verify
(
unicode
(
o
)
==
u'abc'
)
verify
(
str
(
o
)
==
'abc'
)
for
obj
in
(
123
,
123.45
,
123L
):
verify
(
unicode
(
obj
)
==
unicode
(
str
(
obj
)))
# Error handling
try
:
u'Andr
\202
x'
.
encode
(
'ascii'
)
...
...
Misc/NEWS
Dosyayı görüntüle @
b5507ecd
...
...
@@ -44,7 +44,7 @@ Core and builtins
- unicode(obj) now behaves more like str(obj), accepting arbitrary
objects, and calling a __unicode__ method if it exists.
unicode(obj, encoding) and unicode(obj, encoding, errors) still
require an 8-bit string argument.
require an 8-bit string
or character buffer
argument.
- isinstance() now allows any object as the first argument and a
class, a type or something with a __bases__ tuple attribute for the
...
...
Objects/unicodeobject.c
Dosyayı görüntüle @
b5507ecd
...
...
@@ -426,8 +426,9 @@ PyObject *PyUnicode_FromEncodedObject(register PyObject *obj,
#if 0
/* For b/w compatibility we also accept Unicode objects provided
that no encodings is given and then redirect to PyObject_Unicode()
which then applies the additional logic for Unicode subclasses.
that no encodings is given and then redirect to
PyObject_Unicode() which then applies the additional logic for
Unicode subclasses.
NOTE: This API should really only be used for object which
represent *encoded* Unicode !
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment