Kaydet (Commit) 7060380d authored tarafından INADA Naoki's avatar INADA Naoki Kaydeden (comit) GitHub

bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers (GH-3872)

Pattern `[a-z]` with `IGNORECASE` flag can match to some non-ASCII characters.

Straightforward solution for this is using `IGNORECASE | ASCII` flag.
But users may subclass `Template` and override only `idpattern`. So we want to
avoid changing `Template.flags`.

So this commit uses local flag `-i` for `idpattern` and change `[a-z]` to `[a-zA-Z]`.
(cherry picked from commit b22273ec)
üst 6234e906
......@@ -746,8 +746,18 @@ to parse template strings. To do this, you can override these class attributes:
* *idpattern* -- This is the regular expression describing the pattern for
non-braced placeholders (the braces will be added automatically as
appropriate). The default value is the regular expression
``[_a-z][_a-z0-9]*``.
appropriate). The default value is the regular expression
``(?-i:[_a-zA-Z][_a-zA-Z0-9]*)``.
.. note::
Since default *flags* is ``re.IGNORECASE``, pattern ``[a-z]`` can match
with some non-ASCII characters. That's why we use local ``-i`` flag here.
While *flags* is kept to ``re.IGNORECASE`` for backward compatibility,
you can override it to ``0`` or ``re.IGNORECASE | re.ASCII`` when
subclassing.
* *flags* -- The regular expression flags that will be applied when compiling
the regular expression used for recognizing substitutions. The default value
......
......@@ -78,7 +78,11 @@ class Template(metaclass=_TemplateMetaclass):
"""A string class for supporting $-substitutions."""
delimiter = '$'
idpattern = r'[_a-z][_a-z0-9]*'
# r'[a-z]' matches to non-ASCII letters when used with IGNORECASE,
# but without ASCII flag. We can't add re.ASCII to flags because of
# backward compatibility. So we use local -i flag and [a-zA-Z] pattern.
# See https://bugs.python.org/issue31672
idpattern = r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'
flags = _re.IGNORECASE
def __init__(self, template):
......
......@@ -271,6 +271,12 @@ class TestTemplate(unittest.TestCase):
raises(ValueError, s.substitute, dict(who='tim'))
s = Template('$who likes $100')
raises(ValueError, s.substitute, dict(who='tim'))
# Template.idpattern should match to only ASCII characters.
# https://bugs.python.org/issue31672
s = Template("$who likes $\u0131") # (DOTLESS I)
raises(ValueError, s.substitute, dict(who='tim'))
s = Template("$who likes $\u0130") # (LATIN CAPITAL LETTER I WITH DOT ABOVE)
raises(ValueError, s.substitute, dict(who='tim'))
def test_idpattern_override(self):
class PathPattern(Template):
......
``idpattern`` in ``string.Template`` matched some non-ASCII characters. Now
it uses ``-i`` regular expression local flag to avoid non-ASCII characters.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment