Kaydet (Commit) 7060380d authored tarafından INADA Naoki's avatar INADA Naoki Kaydeden (comit) GitHub

bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers (GH-3872)

Pattern `[a-z]` with `IGNORECASE` flag can match to some non-ASCII characters.

Straightforward solution for this is using `IGNORECASE | ASCII` flag.
But users may subclass `Template` and override only `idpattern`. So we want to
avoid changing `Template.flags`.

So this commit uses local flag `-i` for `idpattern` and change `[a-z]` to `[a-zA-Z]`.
(cherry picked from commit b22273ec)
üst 6234e906
...@@ -746,8 +746,18 @@ to parse template strings. To do this, you can override these class attributes: ...@@ -746,8 +746,18 @@ to parse template strings. To do this, you can override these class attributes:
* *idpattern* -- This is the regular expression describing the pattern for * *idpattern* -- This is the regular expression describing the pattern for
non-braced placeholders (the braces will be added automatically as non-braced placeholders (the braces will be added automatically as
appropriate). The default value is the regular expression appropriate). The default value is the regular expression
``[_a-z][_a-z0-9]*``. ``(?-i:[_a-zA-Z][_a-zA-Z0-9]*)``.
.. note::
Since default *flags* is ``re.IGNORECASE``, pattern ``[a-z]`` can match
with some non-ASCII characters. That's why we use local ``-i`` flag here.
While *flags* is kept to ``re.IGNORECASE`` for backward compatibility,
you can override it to ``0`` or ``re.IGNORECASE | re.ASCII`` when
subclassing.
* *flags* -- The regular expression flags that will be applied when compiling * *flags* -- The regular expression flags that will be applied when compiling
the regular expression used for recognizing substitutions. The default value the regular expression used for recognizing substitutions. The default value
......
...@@ -78,7 +78,11 @@ class Template(metaclass=_TemplateMetaclass): ...@@ -78,7 +78,11 @@ class Template(metaclass=_TemplateMetaclass):
"""A string class for supporting $-substitutions.""" """A string class for supporting $-substitutions."""
delimiter = '$' delimiter = '$'
idpattern = r'[_a-z][_a-z0-9]*' # r'[a-z]' matches to non-ASCII letters when used with IGNORECASE,
# but without ASCII flag. We can't add re.ASCII to flags because of
# backward compatibility. So we use local -i flag and [a-zA-Z] pattern.
# See https://bugs.python.org/issue31672
idpattern = r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'
flags = _re.IGNORECASE flags = _re.IGNORECASE
def __init__(self, template): def __init__(self, template):
......
...@@ -271,6 +271,12 @@ class TestTemplate(unittest.TestCase): ...@@ -271,6 +271,12 @@ class TestTemplate(unittest.TestCase):
raises(ValueError, s.substitute, dict(who='tim')) raises(ValueError, s.substitute, dict(who='tim'))
s = Template('$who likes $100') s = Template('$who likes $100')
raises(ValueError, s.substitute, dict(who='tim')) raises(ValueError, s.substitute, dict(who='tim'))
# Template.idpattern should match to only ASCII characters.
# https://bugs.python.org/issue31672
s = Template("$who likes $\u0131") # (DOTLESS I)
raises(ValueError, s.substitute, dict(who='tim'))
s = Template("$who likes $\u0130") # (LATIN CAPITAL LETTER I WITH DOT ABOVE)
raises(ValueError, s.substitute, dict(who='tim'))
def test_idpattern_override(self): def test_idpattern_override(self):
class PathPattern(Template): class PathPattern(Template):
......
``idpattern`` in ``string.Template`` matched some non-ASCII characters. Now
it uses ``-i`` regular expression local flag to avoid non-ASCII characters.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment