Issue #22088: Clarify base-64 alphabets and which characters are discarded

* There are only two base-64 alphabets defined by the RFCs, not three * Due to the internal translation, plus (+) and slash (/) are never discarded * standard_ and urlsafe_b64decode() discard characters as well Also update the doc strings to clarify data types, based on revision 92760d2edc9e, correct the exception raised by b16decode(), and correct the parameter name for the base-85 functions.

Issue #22088: Clarify base-64 alphabets and which characters are discarded
* There are only two base-64 alphabets defined by the RFCs, not three * Due to the internal translation, plus (+) and slash (/) are never discarded * standard_ and urlsafe_b64decode() discard characters as well Also update the doc strings to clarify data types, based on revision 92760d2edc9e, correct the exception raised by b16decode(), and correct the parameter name for the base-85 functions.
ee3074e1 · Martin Panter · e1d4e587 · ee3074e1 · ee3074e1 · ee3074e1
Kaydet (Commit) ee3074e1 authored Şub 23, 2016 tarafından Martin Panter
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 34 additions and 15 deletions

base64.rst Doc/library/base64.rst +16 -13

base64.py Lib/base64.py +0 -0

test_base64.py Lib/test/test_base64.py +18 -2

No files found.
--- a/Doc/library/base64.rst
+++ b/Doc/library/base64.rst
@@ -24,8 +24,8 @@ POST request.  The encoding algorithm is not the same as the
 There are two interfaces provided by this module.  The modern interface
 supports encoding :term:`bytes-like objects <bytes-like object>` to ASCII
 :class:`bytes`, and decoding :term:`bytes-like objects <bytes-like object>` or
-strings containing ASCII to :class:`bytes`.  All three :rfc:`3548` defined
+strings containing ASCII to :class:`bytes`.  Both base-64 alphabets
-alphabets (normal, URL-safe, and filesystem-safe) are supported.
+defined in :rfc:`3548` (normal, and URL- and filesystem-safe) are supported.
 The legacy interface does not support decoding from strings, but it does
 provide functions for encoding and decoding to and from :term:`file objects
@@ -69,9 +69,10 @@ The modern interface provides:
   A :exc:`binascii.Error` exception is raised
   if *s* is incorrectly padded.
-   If *validate* is ``False`` (the default), non-base64-alphabet characters are
+   If *validate* is ``False`` (the default), characters that are neither
+   in the normal base-64 alphabet nor the alternative alphabet are
   discarded prior to the padding check.  If *validate* is ``True``,
-   non-base64-alphabet characters in the input result in a
+   these non-alphabet characters in the input result in a
   :exc:`binascii.Error`.
@@ -89,7 +90,8 @@ The modern interface provides:
 .. function:: urlsafe_b64encode(s)
-   Encode :term:`bytes-like object` *s* using a URL-safe alphabet, which
+   Encode :term:`bytes-like object` *s* using the
+   URL- and filesystem-safe alphabet, which
   substitutes ``-`` instead of ``+`` and ``_`` instead of ``/`` in the
   standard Base64 alphabet, and return the encoded :class:`bytes`.  The result
   can still contain ``=``.
@@ -97,7 +99,8 @@ The modern interface provides:
 .. function:: urlsafe_b64decode(s)
-   Decode :term:`bytes-like object` or ASCII string *s* using a URL-safe
+   Decode :term:`bytes-like object` or ASCII string *s*
+   using the URL- and filesystem-safe
   alphabet, which substitutes ``-`` instead of ``+`` and ``_`` instead of
   ``/`` in the standard Base64 alphabet, and return the decoded
   :class:`bytes`.
@@ -145,14 +148,14 @@ The modern interface provides:
   lowercase alphabet is acceptable as input.  For security purposes, the default
   is ``False``.
-   A :exc:`TypeError` is raised if *s* is
+   A :exc:`binascii.Error` is raised if *s* is
   incorrectly padded or if there are non-alphabet characters present in the
   input.
-.. function:: a85encode(s, *, foldspaces=False, wrapcol=0, pad=False, adobe=False)
+.. function:: a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False)
-   Encode the :term:`bytes-like object` *s* using Ascii85 and return the
+   Encode the :term:`bytes-like object` *b* using Ascii85 and return the
   encoded :class:`bytes`.
   *foldspaces* is an optional flag that uses the special short sequence 'y'
@@ -172,9 +175,9 @@ The modern interface provides:
   .. versionadded:: 3.4
-.. function:: a85decode(s, *, foldspaces=False, adobe=False, ignorechars=b' \\t\\n\\r\\v')
+.. function:: a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \\t\\n\\r\\v')
-   Decode the Ascii85 encoded :term:`bytes-like object` or ASCII string *s* and
+   Decode the Ascii85 encoded :term:`bytes-like object` or ASCII string *b* and
   return the decoded :class:`bytes`.
   *foldspaces* is a flag that specifies whether the 'y' short sequence
@@ -192,9 +195,9 @@ The modern interface provides:
   .. versionadded:: 3.4
-.. function:: b85encode(s, pad=False)
+.. function:: b85encode(b, pad=False)
-   Encode the :term:`bytes-like object` *s* using base85 (as used in e.g.
+   Encode the :term:`bytes-like object` *b* using base85 (as used in e.g.
   git-style binary diffs) and return the encoded :class:`bytes`.
   If *pad* is true, the input is padded with ``b'\0'`` so its length is a

--- a/Lib/base64.py
+++ b/Lib/base64.py
--- a/Lib/test/test_base64.py
+++ b/Lib/test/test_base64.py
@@ -243,14 +243,26 @@ class BaseXYTestCase(unittest.TestCase):
                 (b'@@', b''),
                 (b'!', b''),
                 (b'YWJj\nYWI=', b'abcab'))
+        funcs = (
+            base64.b64decode,
+            base64.standard_b64decode,
+            base64.urlsafe_b64decode,
+        )
        for bstr, res in tests:
-            self.assertEqual(base64.b64decode(bstr), res)
+            for func in funcs:
-            self.assertEqual(base64.b64decode(bstr.decode('ascii')), res)
+                with self.subTest(bstr=bstr, func=func):
+                    self.assertEqual(func(bstr), res)
+                    self.assertEqual(func(bstr.decode('ascii')), res)
            with self.assertRaises(binascii.Error):
                base64.b64decode(bstr, validate=True)
            with self.assertRaises(binascii.Error):
                base64.b64decode(bstr.decode('ascii'), validate=True)
+        # Normal alphabet characters not discarded when alternative given
+        res = b'\xFB\xEF\xBE\xFF\xFF\xFF'
+        self.assertEqual(base64.b64decode(b'++[[//]]', b'[]'), res)
+        self.assertEqual(base64.urlsafe_b64decode(b'++--//__'), res)
    def test_b32encode(self):
        eq = self.assertEqual
        eq(base64.b32encode(b''), b'')
@@ -360,6 +372,10 @@ class BaseXYTestCase(unittest.TestCase):
           b'\x01\x02\xab\xcd\xef')
        eq(base64.b16decode(array('B', b"0102abcdef"), True),
           b'\x01\x02\xab\xcd\xef')
+        # Non-alphabet characters
+        self.assertRaises(binascii.Error, base64.b16decode, '0102AG')
+        # Incorrect "padding"
+        self.assertRaises(binascii.Error, base64.b16decode, '010')
    def test_a85encode(self):
        eq = self.assertEqual