Kaydet (Commit) 21024f06 authored tarafından Ratnadeep Debnath's avatar Ratnadeep Debnath Kaydeden (comit) Nick Coghlan

bpo-16285: Update urllib quoting to RFC 3986 (#173)

* bpo-16285: Update urllib quoting to RFC 3986

urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.

Patch by Christian Theune and Ratnadeep Debnath.
üst 140792bd
...@@ -451,13 +451,17 @@ task isn't already covered by the URL parsing functions above. ...@@ -451,13 +451,17 @@ task isn't already covered by the URL parsing functions above.
.. function:: quote(string, safe='/', encoding=None, errors=None) .. function:: quote(string, safe='/', encoding=None, errors=None)
Replace special characters in *string* using the ``%xx`` escape. Letters, Replace special characters in *string* using the ``%xx`` escape. Letters,
digits, and the characters ``'_.-'`` are never quoted. By default, this digits, and the characters ``'_.-~'`` are never quoted. By default, this
function is intended for quoting the path section of URL. The optional *safe* function is intended for quoting the path section of URL. The optional *safe*
parameter specifies additional ASCII characters that should not be quoted parameter specifies additional ASCII characters that should not be quoted
--- its default value is ``'/'``. --- its default value is ``'/'``.
*string* may be either a :class:`str` or a :class:`bytes`. *string* may be either a :class:`str` or a :class:`bytes`.
.. versionchanged:: 3.7
Moved from RFC 2396 to RFC 3986 for quoting URL strings. "~" is now
included in the set of reserved characters.
The optional *encoding* and *errors* parameters specify how to deal with The optional *encoding* and *errors* parameters specify how to deal with
non-ASCII characters, as accepted by the :meth:`str.encode` method. non-ASCII characters, as accepted by the :meth:`str.encode` method.
*encoding* defaults to ``'utf-8'``. *encoding* defaults to ``'utf-8'``.
......
...@@ -103,6 +103,13 @@ The :const:`~unittest.mock.sentinel` attributes now preserve their identity ...@@ -103,6 +103,13 @@ The :const:`~unittest.mock.sentinel` attributes now preserve their identity
when they are :mod:`copied <copy>` or :mod:`pickled <pickle>`. when they are :mod:`copied <copy>` or :mod:`pickled <pickle>`.
(Contributed by Serhiy Storchaka in :issue:`20804`.) (Contributed by Serhiy Storchaka in :issue:`20804`.)
urllib.parse
------------
:func:`urllib.parse.quote` has been updated to from RFC 2396 to RFC 3986,
adding `~` to the set of characters that is never quoted by default.
(Contributed by Christian Theune and Ratnadeep Debnath in :issue:`16285`.)
Optimizations Optimizations
============= =============
......
...@@ -733,7 +733,7 @@ FF ...@@ -733,7 +733,7 @@ FF
class QuotingTests(unittest.TestCase): class QuotingTests(unittest.TestCase):
r"""Tests for urllib.quote() and urllib.quote_plus() r"""Tests for urllib.quote() and urllib.quote_plus()
According to RFC 2396 (Uniform Resource Identifiers), to escape a According to RFC 3986 (Uniform Resource Identifiers), to escape a
character you write it as '%' + <2 character US-ASCII hex value>. character you write it as '%' + <2 character US-ASCII hex value>.
The Python code of ``'%' + hex(ord(<character>))[2:]`` escapes a The Python code of ``'%' + hex(ord(<character>))[2:]`` escapes a
character properly. Case does not matter on the hex letters. character properly. Case does not matter on the hex letters.
...@@ -761,7 +761,7 @@ class QuotingTests(unittest.TestCase): ...@@ -761,7 +761,7 @@ class QuotingTests(unittest.TestCase):
do_not_quote = '' .join(["ABCDEFGHIJKLMNOPQRSTUVWXYZ", do_not_quote = '' .join(["ABCDEFGHIJKLMNOPQRSTUVWXYZ",
"abcdefghijklmnopqrstuvwxyz", "abcdefghijklmnopqrstuvwxyz",
"0123456789", "0123456789",
"_.-"]) "_.-~"])
result = urllib.parse.quote(do_not_quote) result = urllib.parse.quote(do_not_quote)
self.assertEqual(do_not_quote, result, self.assertEqual(do_not_quote, result,
"using quote(): %r != %r" % (do_not_quote, result)) "using quote(): %r != %r" % (do_not_quote, result))
......
...@@ -704,7 +704,7 @@ def unquote_plus(string, encoding='utf-8', errors='replace'): ...@@ -704,7 +704,7 @@ def unquote_plus(string, encoding='utf-8', errors='replace'):
_ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' _ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b'abcdefghijklmnopqrstuvwxyz' b'abcdefghijklmnopqrstuvwxyz'
b'0123456789' b'0123456789'
b'_.-') b'_.-~')
_ALWAYS_SAFE_BYTES = bytes(_ALWAYS_SAFE) _ALWAYS_SAFE_BYTES = bytes(_ALWAYS_SAFE)
_safe_quoters = {} _safe_quoters = {}
...@@ -736,15 +736,18 @@ def quote(string, safe='/', encoding=None, errors=None): ...@@ -736,15 +736,18 @@ def quote(string, safe='/', encoding=None, errors=None):
Each part of a URL, e.g. the path info, the query, etc., has a Each part of a URL, e.g. the path info, the query, etc., has a
different set of reserved characters that must be quoted. different set of reserved characters that must be quoted.
RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists RFC 3986 Uniform Resource Identifiers (URI): Generic Syntax lists
the following reserved characters. the following reserved characters.
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | "," "$" | "," | "~"
Each of these characters is reserved in some component of a URL, Each of these characters is reserved in some component of a URL,
but not necessarily in all of them. but not necessarily in all of them.
Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings.
Now, "~" is included in the set of reserved characters.
By default, the quote function is intended for quoting the path By default, the quote function is intended for quoting the path
section of a URL. Thus, it will not encode '/'. This character section of a URL. Thus, it will not encode '/'. This character
is reserved, but in typical usage the quote function is being is reserved, but in typical usage the quote function is being
......
No preview for this file type
...@@ -249,6 +249,10 @@ Extension Modules ...@@ -249,6 +249,10 @@ Extension Modules
Library Library
------- -------
- Issue #16285: urrlib.parse.quote is now based on RFC 3986 and hence includes
'~' in the set of characters that is not quoted by default. Patch by
Christian Theune and Ratnadeep Debnath.
- bpo-29532: Altering a kwarg dictionary passed to functools.partial() - bpo-29532: Altering a kwarg dictionary passed to functools.partial()
no longer affects a partial object after creation. no longer affects a partial object after creation.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment