Kaydet (Commit) a1ce93f8 authored tarafından Barry Warsaw's avatar Barry Warsaw

From http://mail.python.org/pipermail/i18n-sig/2003-April/001557.html

- Expose NullTranslations and GNUTranslations to __all__

- Set the default charset to iso-8859-1.  It used to be None, which
would cause problems with .ugettext() if the file had no charset
parameter.  Arguably, the po/mo file would be broken, but I still think
iso-8859-1 is a reasonable default.

- Add a "coerce" default argument to GNUTranslations's constructor.  The
reason for this is that in Zope, we want all msgids and msgstrs to be
Unicode.  For the latter, we could use .ugettext() but there isn't
currently a mechanism for Unicode-ifying msgids.

The plan then is that the charset parameter specifies the encoding for
both the msgids and msgstrs, and both are decoded to Unicode when read.
For example, we might encode po files with utf-8. I think the GNU
gettext tools don't care.

Since this could potentially break code [*] that wants to use the
encoded interface .gettext(), the constructor flag is added, defaulting
to False.  Most code I suspect will want to set this to True and use
.ugettext().

- A few other minor changes from the Zope project, including asserting
that a zero-length msgid must have a Project-ID-Version header for it to
be counted as the metadata record.
üst de354b74
...@@ -285,13 +285,17 @@ The \module{gettext} module provides one additional class derived from ...@@ -285,13 +285,17 @@ The \module{gettext} module provides one additional class derived from
\class{NullTranslations}: \class{GNUTranslations}. This class \class{NullTranslations}: \class{GNUTranslations}. This class
overrides \method{_parse()} to enable reading GNU \program{gettext} overrides \method{_parse()} to enable reading GNU \program{gettext}
format \file{.mo} files in both big-endian and little-endian format. format \file{.mo} files in both big-endian and little-endian format.
It also adds the ability to coerce both message ids and message
It also parses optional meta-data out of the translation catalog. It strings to Unicode.
is convention with GNU \program{gettext} to include meta-data as the
translation for the empty string. This meta-data is in \rfc{822}-style \class{GNUTranslations} parses optional meta-data out of the
\code{key: value} pairs. If the key \code{Content-Type} is found, translation catalog. It is convention with GNU \program{gettext} to
then the \code{charset} property is used to initialize the include meta-data as the translation for the empty string. This
``protected'' \member{_charset} instance variable. The entire set of meta-data is in \rfc{822}-style \code{key: value} pairs, and must
contain the \code{Project-Id-Version}. If the key
\code{Content-Type} is found, then the \code{charset} property is used
to initialize the ``protected'' \member{_charset} instance variable,
defaulting to \code{iso-8859-1} if not found. The entire set of
key/value pairs are placed into a dictionary and set as the key/value pairs are placed into a dictionary and set as the
``protected'' \member{_info} instance variable. ``protected'' \member{_info} instance variable.
...@@ -302,11 +306,27 @@ can raise \exception{IOError}. ...@@ -302,11 +306,27 @@ can raise \exception{IOError}.
The other usefully overridden method is \method{ugettext()}, which The other usefully overridden method is \method{ugettext()}, which
returns a Unicode string by passing both the translated message string returns a Unicode string by passing both the translated message string
and the value of the ``protected'' \member{_charset} variable to the and the value of the ``protected'' \member{_charset} variable to the
builtin \function{unicode()} function. builtin \function{unicode()} function. Note that if you use
\method{ugettext()} you probably also want your message ids to be
Unicode. To do this, set the variable \var{coerce} to \code{True} in
the \class{GNUTranslations} constructor. This ensures that both the
message ids and message strings are decoded to Unicode when the file
is read, using the file's \code{charset} value. If you do this, you
will not want to use the \method{gettext()} method -- always use
\method{ugettext()} instead.
To facilitate plural forms, the methods \method{ngettext} and To facilitate plural forms, the methods \method{ngettext} and
\method{ungettext} are overridden as well. \method{ungettext} are overridden as well.
\begin{methoddesc}[GNUTranslations]{__init__}{
\optional{fp\optional{, coerce}}
Constructs and parses a translation catalog in GNU gettext format.
\var{fp} is passed to the base class (\class{NullTranslations})
constructor. \var{coerce} is a flag specifying whether message ids
and message strings should be converted to Unicode when the file is
parsed. It defaults to \code{False} for backward compatibility.
\end{methoddesc}
\subsubsection{Solaris message catalog support} \subsubsection{Solaris message catalog support}
The Solaris operating system defines its own binary The Solaris operating system defines its own binary
......
...@@ -50,8 +50,10 @@ import copy, os, re, struct, sys ...@@ -50,8 +50,10 @@ import copy, os, re, struct, sys
from errno import ENOENT from errno import ENOENT
__all__ = ["bindtextdomain","textdomain","gettext","dgettext", __all__ = ['NullTranslations', 'GNUTranslations', 'Catalog',
"find","translation","install","Catalog"] 'find', 'translation', 'install', 'textdomain', 'bindtextdomain',
'dgettext', 'dngettext', 'gettext', 'ngettext',
]
_default_localedir = os.path.join(sys.prefix, 'share', 'locale') _default_localedir = os.path.join(sys.prefix, 'share', 'locale')
...@@ -170,7 +172,7 @@ def _expand_lang(locale): ...@@ -170,7 +172,7 @@ def _expand_lang(locale):
class NullTranslations: class NullTranslations:
def __init__(self, fp=None): def __init__(self, fp=None):
self._info = {} self._info = {}
self._charset = None self._charset = 'iso-8859-1'
self._fallback = None self._fallback = None
if fp is not None: if fp is not None:
self._parse(fp) self._parse(fp)
...@@ -226,6 +228,12 @@ class GNUTranslations(NullTranslations): ...@@ -226,6 +228,12 @@ class GNUTranslations(NullTranslations):
LE_MAGIC = 0x950412deL LE_MAGIC = 0x950412deL
BE_MAGIC = 0xde120495L BE_MAGIC = 0xde120495L
def __init__(self, fp=None, coerce=False):
# Set this attribute before calling the base class constructor, since
# the latter calls _parse() which depends on self._coerce.
self._coerce = coerce
NullTranslations.__init__(self, fp)
def _parse(self, fp): def _parse(self, fp):
"""Override this method to support alternative .mo formats.""" """Override this method to support alternative .mo formats."""
unpack = struct.unpack unpack = struct.unpack
...@@ -260,16 +268,22 @@ class GNUTranslations(NullTranslations): ...@@ -260,16 +268,22 @@ class GNUTranslations(NullTranslations):
# Plural forms # Plural forms
msgid1, msgid2 = msg.split('\x00') msgid1, msgid2 = msg.split('\x00')
tmsg = tmsg.split('\x00') tmsg = tmsg.split('\x00')
if self._coerce:
msgid1 = unicode(msgid1, self._charset)
tmsg = [unicode(x, self._charset) for x in tmsg]
for i in range(len(tmsg)): for i in range(len(tmsg)):
catalog[(msgid1, i)] = tmsg[i] catalog[(msgid1, i)] = tmsg[i]
else: else:
if self._coerce:
msg = unicode(msg, self._charset)
tmsg = unicode(tmsg, self._charset)
catalog[msg] = tmsg catalog[msg] = tmsg
else: else:
raise IOError(0, 'File is corrupt', filename) raise IOError(0, 'File is corrupt', filename)
# See if we're looking at GNU .mo conventions for metadata # See if we're looking at GNU .mo conventions for metadata
if mlen == 0: if mlen == 0 and tmsg.lower().startswith('project-id-version:'):
# Catalog description # Catalog description
for item in tmsg.split('\n'): for item in tmsg.splitlines():
item = item.strip() item = item.strip()
if not item: if not item:
continue continue
...@@ -297,7 +311,6 @@ class GNUTranslations(NullTranslations): ...@@ -297,7 +311,6 @@ class GNUTranslations(NullTranslations):
return self._fallback.gettext(message) return self._fallback.gettext(message)
return message return message
def ngettext(self, msgid1, msgid2, n): def ngettext(self, msgid1, msgid2, n):
try: try:
return self._catalog[(msgid1, self.plural(n))] return self._catalog[(msgid1, self.plural(n))]
...@@ -309,16 +322,17 @@ class GNUTranslations(NullTranslations): ...@@ -309,16 +322,17 @@ class GNUTranslations(NullTranslations):
else: else:
return msgid2 return msgid2
def ugettext(self, message): def ugettext(self, message):
try: missing = object()
tmsg = self._catalog[message] tmsg = self._catalog.get(message, missing)
except KeyError: if tmsg is missing:
if self._fallback: if self._fallback:
return self._fallback.ugettext(message) return self._fallback.ugettext(message)
tmsg = message tmsg = message
return unicode(tmsg, self._charset) if not self._coerce:
return unicode(tmsg, self._charset)
# The msgstr is already coerced to Unicode
return tmsg
def ungettext(self, msgid1, msgid2, n): def ungettext(self, msgid1, msgid2, n):
try: try:
...@@ -330,7 +344,10 @@ class GNUTranslations(NullTranslations): ...@@ -330,7 +344,10 @@ class GNUTranslations(NullTranslations):
tmsg = msgid1 tmsg = msgid1
else: else:
tmsg = msgid2 tmsg = msgid2
return unicode(tmsg, self._charset) if not self._coerce:
return unicode(tmsg, self._charset)
# The msgstr is already coerced to Unicode
return tmsg
# Locate a .mo file using the gettext strategy # Locate a .mo file using the gettext strategy
......
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment