Kaydet (Commit) 32eddc1b authored tarafından Serhiy Storchaka's avatar Serhiy Storchaka

Issue #16203: Add re.fullmatch() function and regex.fullmatch() method,

which anchor the pattern at both ends of the string to match.

Original patch by Matthew Barnett.
üst 3ed82c55
...@@ -481,7 +481,7 @@ form. ...@@ -481,7 +481,7 @@ form.
.. note:: .. note::
The compiled versions of the most recent patterns passed to The compiled versions of the most recent patterns passed to
:func:`re.match`, :func:`re.search` or :func:`re.compile` are cached, so :func:`re.compile` and the module-level matching functions are cached, so
programs that use only a few regular expressions at a time needn't worry programs that use only a few regular expressions at a time needn't worry
about compiling regular expressions. about compiling regular expressions.
...@@ -584,6 +584,16 @@ form. ...@@ -584,6 +584,16 @@ form.
instead (see also :ref:`search-vs-match`). instead (see also :ref:`search-vs-match`).
.. function:: fullmatch(pattern, string, flags=0)
If the whole *string* matches the regular expression *pattern*, return a
corresponding :ref:`match object <match-objects>`. Return ``None`` if the
string does not match the pattern; note that this is different from a
zero-length match.
.. versionadded:: 3.4
.. function:: split(pattern, string, maxsplit=0, flags=0) .. function:: split(pattern, string, maxsplit=0, flags=0)
Split *string* by the occurrences of *pattern*. If capturing parentheses are Split *string* by the occurrences of *pattern*. If capturing parentheses are
...@@ -778,6 +788,24 @@ attributes: ...@@ -778,6 +788,24 @@ attributes:
:meth:`~regex.search` instead (see also :ref:`search-vs-match`). :meth:`~regex.search` instead (see also :ref:`search-vs-match`).
.. method:: regex.fullmatch(string[, pos[, endpos]])
If the whole *string* matches this regular expression, return a corresponding
:ref:`match object <match-objects>`. Return ``None`` if the string does not
match the pattern; note that this is different from a zero-length match.
The optional *pos* and *endpos* parameters have the same meaning as for the
:meth:`~regex.search` method.
>>> pattern = re.compile("o[gh]")
>>> pattern.fullmatch("dog") # No match as "o" is not at the start of "dog".
>>> pattern.fullmatch("ogre") # No match as not the full string matches.
>>> pattern.fullmatch("doggie", 1, 3) # Matches within given limits.
<_sre.SRE_Match object at ...>
.. versionadded:: 3.4
.. method:: regex.split(string, maxsplit=0) .. method:: regex.split(string, maxsplit=0)
Identical to the :func:`split` function, using the compiled pattern. Identical to the :func:`split` function, using the compiled pattern.
......
...@@ -647,6 +647,13 @@ behaviours has been improved substantially by the underlying changes in ...@@ -647,6 +647,13 @@ behaviours has been improved substantially by the underlying changes in
the :mod:`inspect` module. the :mod:`inspect` module.
re
--
Added :func:`re.fullmatch` function and :meth:`regex.fullmatch` method,
which anchor the pattern at both ends of the string to match.
(Contributed by Matthew Barnett in :issue:`16203`.)
resource resource
-------- --------
......
...@@ -85,16 +85,17 @@ resulting RE will match the second character. ...@@ -85,16 +85,17 @@ resulting RE will match the second character.
\\ Matches a literal backslash. \\ Matches a literal backslash.
This module exports the following functions: This module exports the following functions:
match Match a regular expression pattern to the beginning of a string. match Match a regular expression pattern to the beginning of a string.
search Search a string for the presence of a pattern. fullmatch Match a regular expression pattern to all of a string.
sub Substitute occurrences of a pattern found in a string. search Search a string for the presence of a pattern.
subn Same as sub, but also return the number of substitutions made. sub Substitute occurrences of a pattern found in a string.
split Split a string by the occurrences of a pattern. subn Same as sub, but also return the number of substitutions made.
findall Find all occurrences of a pattern in a string. split Split a string by the occurrences of a pattern.
finditer Return an iterator yielding a match object for each match. findall Find all occurrences of a pattern in a string.
compile Compile a pattern into a RegexObject. finditer Return an iterator yielding a match object for each match.
purge Clear the regular expression cache. compile Compile a pattern into a RegexObject.
escape Backslash all non-alphanumerics in a string. purge Clear the regular expression cache.
escape Backslash all non-alphanumerics in a string.
Some of the functions in this module takes flags as optional parameters: Some of the functions in this module takes flags as optional parameters:
A ASCII For string patterns, make \w, \W, \b, \B, \d, \D A ASCII For string patterns, make \w, \W, \b, \B, \d, \D
...@@ -123,7 +124,7 @@ import sre_compile ...@@ -123,7 +124,7 @@ import sre_compile
import sre_parse import sre_parse
# public symbols # public symbols
__all__ = [ "match", "search", "sub", "subn", "split", "findall", __all__ = [ "match", "fullmatch", "search", "sub", "subn", "split", "findall",
"compile", "purge", "template", "escape", "A", "I", "L", "M", "S", "X", "compile", "purge", "template", "escape", "A", "I", "L", "M", "S", "X",
"U", "ASCII", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE", "U", "ASCII", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE",
"UNICODE", "error" ] "UNICODE", "error" ]
...@@ -154,6 +155,11 @@ def match(pattern, string, flags=0): ...@@ -154,6 +155,11 @@ def match(pattern, string, flags=0):
a match object, or None if no match was found.""" a match object, or None if no match was found."""
return _compile(pattern, flags).match(string) return _compile(pattern, flags).match(string)
def fullmatch(pattern, string, flags=0):
"""Try to apply the pattern to all of the string, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).fullmatch(string)
def search(pattern, string, flags=0): def search(pattern, string, flags=0):
"""Scan through string looking for a match to the pattern, returning """Scan through string looking for a match to the pattern, returning
a match object, or None if no match was found.""" a match object, or None if no match was found."""
......
...@@ -349,6 +349,36 @@ class ReTests(unittest.TestCase): ...@@ -349,6 +349,36 @@ class ReTests(unittest.TestCase):
(None, 'b', None)) (None, 'b', None))
self.assertEqual(pat.match('ac').group(1, 'b2', 3), ('a', None, 'c')) self.assertEqual(pat.match('ac').group(1, 'b2', 3), ('a', None, 'c'))
def test_re_fullmatch(self):
# Issue 16203: Proposal: add re.fullmatch() method.
self.assertEqual(re.fullmatch(r"a", "a").span(), (0, 1))
for string in "ab", S("ab"):
self.assertEqual(re.fullmatch(r"a|ab", string).span(), (0, 2))
for string in b"ab", B(b"ab"), bytearray(b"ab"), memoryview(b"ab"):
self.assertEqual(re.fullmatch(br"a|ab", string).span(), (0, 2))
for a, b in "\xe0\xdf", "\u0430\u0431", "\U0001d49c\U0001d49e":
r = r"%s|%s" % (a, a + b)
self.assertEqual(re.fullmatch(r, a + b).span(), (0, 2))
self.assertEqual(re.fullmatch(r".*?$", "abc").span(), (0, 3))
self.assertEqual(re.fullmatch(r".*?", "abc").span(), (0, 3))
self.assertEqual(re.fullmatch(r"a.*?b", "ab").span(), (0, 2))
self.assertEqual(re.fullmatch(r"a.*?b", "abb").span(), (0, 3))
self.assertEqual(re.fullmatch(r"a.*?b", "axxb").span(), (0, 4))
self.assertIsNone(re.fullmatch(r"a+", "ab"))
self.assertIsNone(re.fullmatch(r"abc$", "abc\n"))
self.assertIsNone(re.fullmatch(r"abc\Z", "abc\n"))
self.assertIsNone(re.fullmatch(r"(?m)abc$", "abc\n"))
self.assertEqual(re.fullmatch(r"ab(?=c)cd", "abcd").span(), (0, 4))
self.assertEqual(re.fullmatch(r"ab(?<=b)cd", "abcd").span(), (0, 4))
self.assertEqual(re.fullmatch(r"(?=a|ab)ab", "ab").span(), (0, 2))
self.assertEqual(
re.compile(r"bc").fullmatch("abcd", pos=1, endpos=3).span(), (1, 3))
self.assertEqual(
re.compile(r".*?$").fullmatch("abcd", pos=1, endpos=3).span(), (1, 3))
self.assertEqual(
re.compile(r".*?").fullmatch("abcd", pos=1, endpos=3).span(), (1, 3))
def test_re_groupref_exists(self): def test_re_groupref_exists(self):
self.assertEqual(re.match('^(\()?([^()]+)(?(1)\))$', '(a)').groups(), self.assertEqual(re.match('^(\()?([^()]+)(?(1)\))$', '(a)').groups(),
('(', 'a')) ('(', 'a'))
......
...@@ -68,6 +68,10 @@ Core and Builtins ...@@ -68,6 +68,10 @@ Core and Builtins
Library Library
------- -------
- Issue #16203: Add re.fullmatch() function and regex.fullmatch() method,
which anchor the pattern at both ends of the string to match.
Original patch by Matthew Barnett.
- Issue #13592: Improved the repr for regular expression pattern objects. - Issue #13592: Improved the repr for regular expression pattern objects.
Based on patch by Hugo Lopes Tavares. Based on patch by Hugo Lopes Tavares.
......
...@@ -4,24 +4,25 @@ ...@@ -4,24 +4,25 @@
* regular expression matching engine * regular expression matching engine
* *
* partial history: * partial history:
* 1999-10-24 fl created (based on existing template matcher code) * 1999-10-24 fl created (based on existing template matcher code)
* 2000-03-06 fl first alpha, sort of * 2000-03-06 fl first alpha, sort of
* 2000-08-01 fl fixes for 1.6b1 * 2000-08-01 fl fixes for 1.6b1
* 2000-08-07 fl use PyOS_CheckStack() if available * 2000-08-07 fl use PyOS_CheckStack() if available
* 2000-09-20 fl added expand method * 2000-09-20 fl added expand method
* 2001-03-20 fl lots of fixes for 2.1b2 * 2001-03-20 fl lots of fixes for 2.1b2
* 2001-04-15 fl export copyright as Python attribute, not global * 2001-04-15 fl export copyright as Python attribute, not global
* 2001-04-28 fl added __copy__ methods (work in progress) * 2001-04-28 fl added __copy__ methods (work in progress)
* 2001-05-14 fl fixes for 1.5.2 compatibility * 2001-05-14 fl fixes for 1.5.2 compatibility
* 2001-07-01 fl added BIGCHARSET support (from Martin von Loewis) * 2001-07-01 fl added BIGCHARSET support (from Martin von Loewis)
* 2001-10-18 fl fixed group reset issue (from Matthew Mueller) * 2001-10-18 fl fixed group reset issue (from Matthew Mueller)
* 2001-10-20 fl added split primitive; reenable unicode for 1.6/2.0/2.1 * 2001-10-20 fl added split primitive; reenable unicode for 1.6/2.0/2.1
* 2001-10-21 fl added sub/subn primitive * 2001-10-21 fl added sub/subn primitive
* 2001-10-24 fl added finditer primitive (for 2.2 only) * 2001-10-24 fl added finditer primitive (for 2.2 only)
* 2001-12-07 fl fixed memory leak in sub/subn (Guido van Rossum) * 2001-12-07 fl fixed memory leak in sub/subn (Guido van Rossum)
* 2002-11-09 fl fixed empty sub/subn return type * 2002-11-09 fl fixed empty sub/subn return type
* 2003-04-18 mvl fully support 4-byte codes * 2003-04-18 mvl fully support 4-byte codes
* 2003-10-17 gn implemented non recursive scheme * 2003-10-17 gn implemented non recursive scheme
* 2013-02-04 mrab added fullmatch primitive
* *
* Copyright (c) 1997-2001 by Secret Labs AB. All rights reserved. * Copyright (c) 1997-2001 by Secret Labs AB. All rights reserved.
* *
...@@ -558,6 +559,40 @@ pattern_match(PatternObject* self, PyObject* args, PyObject* kw) ...@@ -558,6 +559,40 @@ pattern_match(PatternObject* self, PyObject* args, PyObject* kw)
return pattern_new_match(self, &state, status); return pattern_new_match(self, &state, status);
} }
static PyObject*
pattern_fullmatch(PatternObject* self, PyObject* args, PyObject* kw)
{
SRE_STATE state;
Py_ssize_t status;
PyObject* string;
Py_ssize_t start = 0;
Py_ssize_t end = PY_SSIZE_T_MAX;
static char* kwlist[] = { "pattern", "pos", "endpos", NULL };
if (!PyArg_ParseTupleAndKeywords(args, kw, "O|nn:fullmatch", kwlist,
&string, &start, &end))
return NULL;
string = state_init(&state, self, string, start, end);
if (!string)
return NULL;
state.match_all = 1;
state.ptr = state.start;
TRACE(("|%p|%p|FULLMATCH\n", PatternObject_GetCode(self), state.ptr));
status = sre_match(&state, PatternObject_GetCode(self));
TRACE(("|%p|%p|END\n", PatternObject_GetCode(self), state.ptr));
if (PyErr_Occurred())
return NULL;
state_fini(&state);
return pattern_new_match(self, &state, status);
}
static PyObject* static PyObject*
pattern_search(PatternObject* self, PyObject* args, PyObject* kw) pattern_search(PatternObject* self, PyObject* args, PyObject* kw)
{ {
...@@ -1223,6 +1258,10 @@ PyDoc_STRVAR(pattern_match_doc, ...@@ -1223,6 +1258,10 @@ PyDoc_STRVAR(pattern_match_doc,
"match(string[, pos[, endpos]]) -> match object or None.\n\ "match(string[, pos[, endpos]]) -> match object or None.\n\
Matches zero or more characters at the beginning of the string"); Matches zero or more characters at the beginning of the string");
PyDoc_STRVAR(pattern_fullmatch_doc,
"fullmatch(string[, pos[, endpos]]) -> match object or None.\n\
Matches against all of the string");
PyDoc_STRVAR(pattern_search_doc, PyDoc_STRVAR(pattern_search_doc,
"search(string[, pos[, endpos]]) -> match object or None.\n\ "search(string[, pos[, endpos]]) -> match object or None.\n\
Scan through string looking for a match, and return a corresponding\n\ Scan through string looking for a match, and return a corresponding\n\
...@@ -1258,6 +1297,8 @@ PyDoc_STRVAR(pattern_doc, "Compiled regular expression objects"); ...@@ -1258,6 +1297,8 @@ PyDoc_STRVAR(pattern_doc, "Compiled regular expression objects");
static PyMethodDef pattern_methods[] = { static PyMethodDef pattern_methods[] = {
{"match", (PyCFunction) pattern_match, METH_VARARGS|METH_KEYWORDS, {"match", (PyCFunction) pattern_match, METH_VARARGS|METH_KEYWORDS,
pattern_match_doc}, pattern_match_doc},
{"fullmatch", (PyCFunction) pattern_fullmatch, METH_VARARGS|METH_KEYWORDS,
pattern_fullmatch_doc},
{"search", (PyCFunction) pattern_search, METH_VARARGS|METH_KEYWORDS, {"search", (PyCFunction) pattern_search, METH_VARARGS|METH_KEYWORDS,
pattern_search_doc}, pattern_search_doc},
{"sub", (PyCFunction) pattern_sub, METH_VARARGS|METH_KEYWORDS, {"sub", (PyCFunction) pattern_sub, METH_VARARGS|METH_KEYWORDS,
......
...@@ -86,6 +86,7 @@ typedef struct { ...@@ -86,6 +86,7 @@ typedef struct {
SRE_REPEAT *repeat; SRE_REPEAT *repeat;
/* hooks */ /* hooks */
SRE_TOLOWER_HOOK lower; SRE_TOLOWER_HOOK lower;
int match_all;
} SRE_STATE; } SRE_STATE;
typedef struct { typedef struct {
......
...@@ -454,17 +454,24 @@ do { \ ...@@ -454,17 +454,24 @@ do { \
#define JUMP_ASSERT 12 #define JUMP_ASSERT 12
#define JUMP_ASSERT_NOT 13 #define JUMP_ASSERT_NOT 13
#define DO_JUMP(jumpvalue, jumplabel, nextpattern) \ #define DO_JUMPX(jumpvalue, jumplabel, nextpattern, matchall) \
DATA_ALLOC(SRE(match_context), nextctx); \ DATA_ALLOC(SRE(match_context), nextctx); \
nextctx->last_ctx_pos = ctx_pos; \ nextctx->last_ctx_pos = ctx_pos; \
nextctx->jump = jumpvalue; \ nextctx->jump = jumpvalue; \
nextctx->pattern = nextpattern; \ nextctx->pattern = nextpattern; \
nextctx->match_all = matchall; \
ctx_pos = alloc_pos; \ ctx_pos = alloc_pos; \
ctx = nextctx; \ ctx = nextctx; \
goto entrance; \ goto entrance; \
jumplabel: \ jumplabel: \
while (0) /* gcc doesn't like labels at end of scopes */ \ while (0) /* gcc doesn't like labels at end of scopes */ \
#define DO_JUMP(jumpvalue, jumplabel, nextpattern) \
DO_JUMPX(jumpvalue, jumplabel, nextpattern, ctx->match_all)
#define DO_JUMP0(jumpvalue, jumplabel, nextpattern) \
DO_JUMPX(jumpvalue, jumplabel, nextpattern, 0)
typedef struct { typedef struct {
Py_ssize_t last_ctx_pos; Py_ssize_t last_ctx_pos;
Py_ssize_t jump; Py_ssize_t jump;
...@@ -477,6 +484,7 @@ typedef struct { ...@@ -477,6 +484,7 @@ typedef struct {
SRE_CODE chr; SRE_CODE chr;
SRE_REPEAT* rep; SRE_REPEAT* rep;
} u; } u;
int match_all;
} SRE(match_context); } SRE(match_context);
/* check if string matches the given pattern. returns <0 for /* check if string matches the given pattern. returns <0 for
...@@ -499,6 +507,7 @@ SRE(match)(SRE_STATE* state, SRE_CODE* pattern) ...@@ -499,6 +507,7 @@ SRE(match)(SRE_STATE* state, SRE_CODE* pattern)
ctx->last_ctx_pos = -1; ctx->last_ctx_pos = -1;
ctx->jump = JUMP_NONE; ctx->jump = JUMP_NONE;
ctx->pattern = pattern; ctx->pattern = pattern;
ctx->match_all = state->match_all;
ctx_pos = alloc_pos; ctx_pos = alloc_pos;
entrance: entrance:
...@@ -571,8 +580,11 @@ entrance: ...@@ -571,8 +580,11 @@ entrance:
case SRE_OP_SUCCESS: case SRE_OP_SUCCESS:
/* end of pattern */ /* end of pattern */
TRACE(("|%p|%p|SUCCESS\n", ctx->pattern, ctx->ptr)); TRACE(("|%p|%p|SUCCESS\n", ctx->pattern, ctx->ptr));
state->ptr = ctx->ptr; if (!ctx->match_all || ctx->ptr == state->end) {
RETURN_SUCCESS; state->ptr = ctx->ptr;
RETURN_SUCCESS;
}
RETURN_FAILURE;
case SRE_OP_AT: case SRE_OP_AT:
/* match at given position */ /* match at given position */
...@@ -726,7 +738,8 @@ entrance: ...@@ -726,7 +738,8 @@ entrance:
if (ctx->count < (Py_ssize_t) ctx->pattern[1]) if (ctx->count < (Py_ssize_t) ctx->pattern[1])
RETURN_FAILURE; RETURN_FAILURE;
if (ctx->pattern[ctx->pattern[0]] == SRE_OP_SUCCESS) { if (ctx->pattern[ctx->pattern[0]] == SRE_OP_SUCCESS &&
(!ctx->match_all || ctx->ptr == state->end)) {
/* tail is empty. we're finished */ /* tail is empty. we're finished */
state->ptr = ctx->ptr; state->ptr = ctx->ptr;
RETURN_SUCCESS; RETURN_SUCCESS;
...@@ -810,7 +823,8 @@ entrance: ...@@ -810,7 +823,8 @@ entrance:
ctx->ptr += ctx->count; ctx->ptr += ctx->count;
} }
if (ctx->pattern[ctx->pattern[0]] == SRE_OP_SUCCESS) { if (ctx->pattern[ctx->pattern[0]] == SRE_OP_SUCCESS &&
(!ctx->match_all || ctx->ptr == state->end)) {
/* tail is empty. we're finished */ /* tail is empty. we're finished */
state->ptr = ctx->ptr; state->ptr = ctx->ptr;
RETURN_SUCCESS; RETURN_SUCCESS;
...@@ -1082,7 +1096,7 @@ entrance: ...@@ -1082,7 +1096,7 @@ entrance:
state->ptr = ctx->ptr - ctx->pattern[1]; state->ptr = ctx->ptr - ctx->pattern[1];
if (state->ptr < state->beginning) if (state->ptr < state->beginning)
RETURN_FAILURE; RETURN_FAILURE;
DO_JUMP(JUMP_ASSERT, jump_assert, ctx->pattern+2); DO_JUMP0(JUMP_ASSERT, jump_assert, ctx->pattern+2);
RETURN_ON_FAILURE(ret); RETURN_ON_FAILURE(ret);
ctx->pattern += ctx->pattern[0]; ctx->pattern += ctx->pattern[0];
break; break;
...@@ -1094,7 +1108,7 @@ entrance: ...@@ -1094,7 +1108,7 @@ entrance:
ctx->ptr, ctx->pattern[1])); ctx->ptr, ctx->pattern[1]));
state->ptr = ctx->ptr - ctx->pattern[1]; state->ptr = ctx->ptr - ctx->pattern[1];
if (state->ptr >= state->beginning) { if (state->ptr >= state->beginning) {
DO_JUMP(JUMP_ASSERT_NOT, jump_assert_not, ctx->pattern+2); DO_JUMP0(JUMP_ASSERT_NOT, jump_assert_not, ctx->pattern+2);
if (ret) { if (ret) {
RETURN_ON_ERROR(ret); RETURN_ON_ERROR(ret);
RETURN_FAILURE; RETURN_FAILURE;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment