Kaydet (Commit) c51da2b8 authored tarafından Andrew Kuchling's avatar Andrew Kuchling

#14332: provide a better explanation of junk in difflib docs

Initial patch by Alba Magallanes.
üst 2e3743cd
...@@ -27,7 +27,9 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module. ...@@ -27,7 +27,9 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
little fancier than, an algorithm published in the late 1980's by Ratcliff and little fancier than, an algorithm published in the late 1980's by Ratcliff and
Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
find the longest contiguous matching subsequence that contains no "junk" find the longest contiguous matching subsequence that contains no "junk"
elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same elements; these "junk" elements are ones that are uninteresting in some
sense, such as blank lines or whitespace. (Handling junk is an
extension to the Ratcliff and Obershelp algorithm.) The same
idea is then applied recursively to the pieces of the sequences to the left and idea is then applied recursively to the pieces of the sequences to the left and
to the right of the matching subsequence. This does not yield minimal edit to the right of the matching subsequence. This does not yield minimal edit
sequences, but does tend to yield matches that "look right" to people. sequences, but does tend to yield matches that "look right" to people.
...@@ -210,7 +212,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module. ...@@ -210,7 +212,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
delta (a :term:`generator` generating the delta lines). delta (a :term:`generator` generating the delta lines).
Optional keyword parameters *linejunk* and *charjunk* are for filter functions Optional keyword parameters *linejunk* and *charjunk* are filtering functions
(or ``None``): (or ``None``):
*linejunk*: A function that accepts a single string argument, and returns *linejunk*: A function that accepts a single string argument, and returns
...@@ -224,7 +226,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module. ...@@ -224,7 +226,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
*charjunk*: A function that accepts a character (a string of length 1), and *charjunk*: A function that accepts a character (a string of length 1), and
returns if the character is junk, or false if not. The default is module-level returns if the character is junk, or false if not. The default is module-level
function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
blank or tab; note: bad idea to include newline in this!). blank or tab; it's a bad idea to include newline in this!).
:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function. :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
...@@ -624,6 +626,12 @@ The :class:`Differ` class has this constructor: ...@@ -624,6 +626,12 @@ The :class:`Differ` class has this constructor:
length 1), and returns true if the character is junk. The default is ``None``, length 1), and returns true if the character is junk. The default is ``None``,
meaning that no character is considered junk. meaning that no character is considered junk.
These junk-filtering functions speed up matching to find
differences and do not cause any differing lines or characters to
be ignored. Read the description of the
:meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
parameter for an explanation.
:class:`Differ` objects are used (deltas generated) via a single method: :class:`Differ` objects are used (deltas generated) via a single method:
......
...@@ -853,10 +853,9 @@ class Differ: ...@@ -853,10 +853,9 @@ class Differ:
and return true iff the string is junk. The module-level function and return true iff the string is junk. The module-level function
`IS_LINE_JUNK` may be used to filter out lines without visible `IS_LINE_JUNK` may be used to filter out lines without visible
characters, except for at most one splat ('#'). It is recommended characters, except for at most one splat ('#'). It is recommended
to leave linejunk None; as of Python 2.3, the underlying to leave linejunk None; the underlying SequenceMatcher class has
SequenceMatcher class has grown an adaptive notion of "noise" lines an adaptive notion of "noise" lines that's better than any static
that's better than any static definition the author has ever been definition the author has ever been able to craft.
able to craft.
- `charjunk`: A function that should accept a string of length 1. The - `charjunk`: A function that should accept a string of length 1. The
module-level function `IS_CHARACTER_JUNK` may be used to filter out module-level function `IS_CHARACTER_JUNK` may be used to filter out
...@@ -1299,17 +1298,18 @@ def ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK): ...@@ -1299,17 +1298,18 @@ def ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK):
Compare `a` and `b` (lists of strings); return a `Differ`-style delta. Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
Optional keyword parameters `linejunk` and `charjunk` are for filter Optional keyword parameters `linejunk` and `charjunk` are for filter
functions (or None): functions, or can be None:
- linejunk: A function that should accept a single string argument, and - linejunk: A function that should accept a single string argument and
return true iff the string is junk. The default is None, and is return true iff the string is junk. The default is None, and is
recommended; as of Python 2.3, an adaptive notion of "noise" lines is recommended; the underlying SequenceMatcher class has an adaptive
used that does a good job on its own. notion of "noise" lines.
- charjunk: A function that should accept a string of length 1. The - charjunk: A function that accepts a character (string of length
default is module-level function IS_CHARACTER_JUNK, which filters out 1), and returns true iff the character is junk. The default is
whitespace characters (a blank or tab; note: bad idea to include newline the module-level function IS_CHARACTER_JUNK, which filters out
in this!). whitespace characters (a blank or tab; note: it's a bad idea to
include newline in this!).
Tools/scripts/ndiff.py is a command-line front-end to this function. Tools/scripts/ndiff.py is a command-line front-end to this function.
...@@ -1680,7 +1680,7 @@ class HtmlDiff(object): ...@@ -1680,7 +1680,7 @@ class HtmlDiff(object):
tabsize -- tab stop spacing, defaults to 8. tabsize -- tab stop spacing, defaults to 8.
wrapcolumn -- column number where lines are broken and wrapped, wrapcolumn -- column number where lines are broken and wrapped,
defaults to None where lines are not wrapped. defaults to None where lines are not wrapped.
linejunk,charjunk -- keyword arguments passed into ndiff() (used to by linejunk,charjunk -- keyword arguments passed into ndiff() (used by
HtmlDiff() to generate the side by side HTML differences). See HtmlDiff() to generate the side by side HTML differences). See
ndiff() documentation for argument default values and descriptions. ndiff() documentation for argument default values and descriptions.
""" """
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment