Kaydet (Commit) 2270d58a authored tarafından Raymond Hettinger's avatar Raymond Hettinger

Make an entry for the os module's bytes accessors.

Split codecs into a separate section.  Rewrite
the Unicode section.
üst 03ca1a92
...@@ -1135,6 +1135,28 @@ wrong results. ...@@ -1135,6 +1135,28 @@ wrong results.
(Patch submitted by Nir Aides in :issue:`7610`.) (Patch submitted by Nir Aides in :issue:`7610`.)
os
--
Different operating systems use various encodings for filenames and environment
variables. The :mod:`os` module provides two new functions,
:func:`~os.fsencode` and :func:`~os.fsdecode`, for encoding and decoding
filenames:
>>> filename = 'словарь'
>>> os.fsencode(filename)
b'\xd1\x81\xd0\xbb\xd0\xbe\xd0\xb2\xd0\xb0\xd1\x80\xd1\x8c'
>>> open(os.fsencode(filename))
Some operating systems allow direct access to the unencoded bytes in the
environment. If so, the :attr:`os.supports_bytes_environ` constant will be
true.
For direct access to unencoded environment variables (if available),
use the new :func:`os.getenvb` function or use :data:`os.environb`
which is a bytes version of :data:`os.environ`.
shutil shutil
------ ------
...@@ -1728,49 +1750,39 @@ multi-line arguments a bit faster (:issue:`7113` by Łukasz Langa). ...@@ -1728,49 +1750,39 @@ multi-line arguments a bit faster (:issue:`7113` by Łukasz Langa).
Unicode Unicode
======= =======
Python has been updated to Unicode 6.0.0. The new features of the Python has been updated to `Unicode 6.0.0
Unicode Standard that will affect Python users include: <http://unicode.org/versions/Unicode6.0.0/>`_. The update to the standard adds
over 2,000 new characters including `emoji <http://en.wikipedia.org/wiki/Emoji>`_
* addition of 2,088 characters, including over 1,000 additional symbols which are important for mobile phones.
symbols—chief among them the additional emoji symbols, which are
especially important for mobile phones;
* changes to character properties for existing characters including In addition, the updated standard has altered the character properties for two
Kannada characters (U+0CF1, U+0CF2) and one New Tai Lue numeric character
(U+19DA), making the former eligible for use in identifiers while disqualifying
the latter. For more information, see `Unicode Character Database Changes
<http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes>`_.
- a general category change to two Kannada characters (U+0CF1,
U+0CF2), which has the effect of making them newly eligible for
inclusion in identifiers;
- a general category change to one New Tai Lue numeric character Codecs
(U+19DA), which has the effect of disqualifying it from ======
inclusion in identifiers.
For more information, see `Unicode Character Database Changes Support was added for *cp720* Arabic DOS encoding (:issue:`1616979`).
<http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes>`_
at the `Unicode Consortium <http://www.unicode.org/>`_ web site.
The :mod:`os` module has two new functions: :func:`~os.fsencode` and MBCS encoding no longer ignores the error handler argument. In the default
:func:`~os.fsdecode`. Add :data:`os.environb`: bytes version of strict mode, it raises an :exc:`UnicodeDecodeError` when it encounters an
:data:`os.environ`, :func:`os.getenvb` function and undecodable byte sequence and an :exc:`UnicodeEncodeError` for an unencodable
:data:`os.supports_bytes_environ` constant. character.
MBCS encoding doesn't ignore the error handler argument any more. By The MBCS codec supports ``'strict'`` and ``'ignore'`` error handlers for
default (strict mode), it raises an UnicodeDecodeError on undecodable byte decoding, and ``'strict'`` and ``'replace'`` for encoding.
sequence and UnicodeEncodeError on unencodable character. To get the MBCS
encoding of Python 3.1, use ``'ignore'`` error handler to decode and
``'replace'`` error handler to encode. The MBCS codec supports ``'strict'`` and
``'ignore'`` error handlers for decoding, and ``'strict'`` and ``'replace'``
for encoding.
On Mac OS X, Python uses ``'utf-8'`` to decode the command line arguments, To emulate Python3.1 MBCS encoding, select the ``'ignore'`` handler for decoding
instead of the locale encoding (which is ISO-8859-1 if the ``LANG`` environment and the ``'replace'`` handler for encoding.
variable is not set).
By default, tarfile uses ``'utf-8'`` encoding on Windows (instead of On Mac OS/X, Python decodes command line arguments with ``'utf-8'`` rather than
``'mbcs'``), and the ``'surrogateescape'`` error handler on all operating the locale encoding.
systems.
Also, support was added for *cp720* Arabic DOS encoding (:issue:`1616979`). By default, tarfile uses ``'utf-8'`` encoding on Windows (instead of ``'mbcs'``)
and the ``'surrogateescape'`` error handler on all operating systems.
Documentation Documentation
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment