Revised all texts concerning the ASCII flag: (1) put Unicode case first

(since that's the default), (2) made all descriptions consistent, (3) dropped mention of re.LOCALE in most places since it is not recommended.

Revised all texts concerning the ASCII flag: (1) put Unicode case first
(since that's the default), (2) made all descriptions consistent, (3) dropped mention of re.LOCALE in most places since it is not recommended.
6c4f6179 · Mark Summerfield · 5ef6d18b · 6c4f6179
Kaydet (Commit) 6c4f6179 authored Agu 20, 2008 tarafından Mark Summerfield
Show whitespace changes
Inline Side-by-side

Showing with 54 additions and 48 deletions

re.rst Doc/library/re.rst +54 -48

No files found.
--- a/Doc/library/re.rst
+++ b/Doc/library/re.rst
@@ -323,67 +323,78 @@ the second character.  For example, ``\$`` matches the character ``'$'``.
   Matches only at the start of the string.
 ``\b``
-   Matches the empty string, but only at the beginning or end of a word.  A word is
+   Matches the empty string, but only at the beginning or end of a word.
-   defined as a sequence of alphanumeric or underscore characters, so the end of a
+   A word is defined as a sequence of Unicode alphanumeric or underscore
-   word is indicated by whitespace or a non-alphanumeric, non-underscore character.
+   characters, so the end of a word is indicated by whitespace or a
-   Note that  ``\b`` is defined as the boundary between ``\w`` and ``\ W``, so the
+   non-alphanumeric, non-underscore Unicode character. Note that
-   precise set of characters deemed to be alphanumeric depends on the values of the
+   formally, ``\b`` is defined as the boundary between a ``\w`` and a
-   ``ASCII`` and ``LOCALE`` flags.  Inside a character range, ``\b`` represents
+   ``\W`` character (or vice versa). By default Unicode alphanumerics
-   the backspace character, for compatibility with Python's string literals.
+   are the ones used, but this can be changed by using the :const:`ASCII`
+   flag.  Inside a character range, ``\b`` represents the backspace
+   character, for compatibility with Python's string literals.
 ``\B``
   Matches the empty string, but only when it is *not* at the beginning or end of a
-   word.  This is just the opposite of ``\b``, so is also subject to the settings
+   word.  This is just the opposite of ``\b``, so word characters are
-   of ``ASCII`` and ``LOCALE`` .
+   Unicode alphanumerics or the underscore, although this can be changed
+   by using the :const:`ASCII` flag.
 ``\d``
   For Unicode (str) patterns:
-      When the :const:`ASCII` flag is specified, matches any decimal digit; this
+      Matches any Unicode digit (which includes ``[0-9]``, and also many
-      is equivalent to the set ``[0-9]``.  Otherwise, it will match whatever
+      other digit characters). If the :const:`ASCII` flag is used only
-      is classified as a digit in the Unicode character properties database
+      ``[0-9]`` is matched (but the flag affects the entire regular
-      (but this does include the standard ASCII digits and is thus a superset
+      expression, so in such cases using an explicit ``[0-9]`` may be a
-      of [0-9]).
+      better choice).
   For 8-bit (bytes) patterns:
-      Matches any decimal digit; this is equivalent to the set ``[0-9]``.
+      Matches any decimal digit; this is equivalent to ``[0-9]``.
 ``\D``
-   Matches any character which is not a decimal digit. This is the
+   Matches any character which is not a Unicode decimal digit. This is
-   opposite of ``\d`` and is therefore similarly subject to the settings of
+   the opposite of ``\d``. If the :const:`ASCII` flag is used this
-   ``ASCII`` and ``LOCALE``.
+   becomes the equivalent of ``[^0-9]`` (but the flag affects the entire
+   regular expression, so in such cases using an explicit ``[^0-9]`` may
+   be a better choice).
 ``\s``
   For Unicode (str) patterns:
-      When the :const:`ASCII` flag is specified, matches only ASCII whitespace
+      Matches Unicode whitespace characters (which includes
-      characters; this is equivalent to the set ``[ \t\n\r\f\v]``. Otherwise,
+      ``[ \t\n\r\f\v]``, and also many other characters, for example the
-      it will match this set whatever is classified as space in the Unicode
+      non-breaking spaces mandated by typography rules in many
-      character properties database (including for example the non-breaking
+      languages). If the :const:`ASCII` flag is used, only
-      spaces mandated by typography rules in many languages).
+      ``[ \t\n\r\f\v]`` is matched (but the flag affects the entire
+      regular expression, so in such cases using an explicit
+      ``[ \t\n\r\f\v]`` may be a better choice).
   For 8-bit (bytes) patterns:
      Matches characters considered whitespace in the ASCII character set;
-      this is equivalent to the set ``[ \t\n\r\f\v]``.
+      this is equivalent to ``[ \t\n\r\f\v]``.
 ``\S``
-   Matches any character which is not a whitespace character. This is the
+   Matches any character which is not a Unicode whitespace character. This is
-   opposite of ``\s`` and is therefore similarly subject to the settings of
+   the opposite of ``\s``. If the :const:`ASCII` flag is used this
-   ``ASCII`` and ``LOCALE``.
+   becomes the equivalent of ``[^ \t\n\r\f\v]`` (but the flag affects the entire
+   regular expression, so in such cases using an explicit ``[^ \t\n\r\f\v]`` may
+   be a better choice).
 ``\w``
   For Unicode (str) patterns:
-      When the :const:`ASCII` flag is specified, this is equivalent to the set
+      Matches Unicode word characters; this includes most characters
-      ``[a-zA-Z0-9_]``. Otherwise, it will match whatever is classified as
+      that can be part of a word in any language, as well as numbers and
-      alphanumeric in the Unicode character properties database (it will
+      the underscore. If the :const:`ASCII` flag is used, only
-      include most characters that can be part of a word in whatever language,
+      ``[a-zA-Z0-9_]`` is matched (but the flag affects the entire
-      as well as numbers and the underscore sign).
+      regular expression, so in such cases using an explicit
+      ``[a-zA-Z0-9_]`` may be a better choice).
   For 8-bit (bytes) patterns:
      Matches characters considered alphanumeric in the ASCII character set;
-      this is equivalent to the set ``[a-zA-Z0-9_]``. With :const:`LOCALE`, 
+      this is equivalent to ``[a-zA-Z0-9_]``.
-      it will additionally match whatever characters are defined as
-      alphanumeric for the current locale.
 ``\W``
-   Matches any character which is not an alphanumeric character. This is the
+   Matches any character which is not a Unicode word character. This is
-   opposite of ``\w`` and is therefore similarly subject to the settings of
+   the opposite of ``\w``. If the :const:`ASCII` flag is used this
-   ``ASCII`` and ``LOCALE``.
+   becomes the equivalent of ``[^a-zA-Z0-9_]`` (but the flag affects the
+   entire regular expression, so in such cases using an explicit
+   ``[^a-zA-Z0-9_]`` may be a better choice).
 ``\Z``
   Matches only at the end of the string.
@@ -471,16 +482,11 @@ form.
   matching instead of full Unicode matching. This is only meaningful for
   Unicode patterns, and is ignored for byte patterns.
-   Note that the :const:`re.U` flag still exists (as well as its synonym
+   Note that for backward compatibility, the :const:`re.U` flag still
-   :const:`re.UNICODE` and its embedded counterpart ``(?u)``), but it has
+   exists (as well as its synonym :const:`re.UNICODE` and its embedded
-   become useless in Python 3.0.
+   counterpart ``(?u)``), but these are redundant in Python 3.0 since
-   In previous Python versions, it was used to specify that 
+   matches are Unicode by default for strings (and Unicode matching
-   matching had to be Unicode dependent (the default was ASCII matching in
+   isn't allowed for bytes).
-   all circumstances). Starting from Python 3.0, the default is Unicode 
-   matching for Unicode strings (which can be changed by specifying the
-   ``'a'`` flag), and ASCII matching for 8-bit strings. Further, Unicode
-   dependent matching for 8-bit strings isn't allowed anymore and results
-   in a ValueError.
 .. data:: I