Kaydet (Commit) a49fe8b7 authored tarafından Patrick Maupin's avatar Patrick Maupin

Several changes to round-tripping tool:

  - Rebased against master
  - Renamed from anti8 to rtrip
  - Defaults to round-tripping the stdlib
  - Shows usage if bad parameters given
  - Creates files with AST dumps if round-tripping fails
üst 9dfc534e
......@@ -134,3 +134,29 @@ class ExplicitNodeVisitor(ast.NodeVisitor):
method = 'visit_' + node.__class__.__name__
visitor = getattr(self, method, abort)
return visitor(node)
def allow_ast_comparison():
"""This ugly little monkey-patcher adds in a helper class
to all the AST node types. This helper class allows
eq/ne comparisons to work, so that entire trees can
be easily compared by Python's comparison machinery.
Used by the anti8 functions to compare old and new ASTs.
Could also be used by the test library.
"""
class CompareHelper(object):
def __eq__(self, other):
return type(self) == type(other) and vars(self) == vars(other)
def __ne__(self, other):
return type(self) != type(other) or vars(self) != vars(other)
for item in vars(ast).values():
if type(item) != type:
continue
if issubclass(item, ast.AST):
try:
item.__bases__ = tuple(list(item.__bases__) + [CompareHelper])
except TypeError:
pass
......@@ -5,47 +5,57 @@ Part of the astor library for Python AST manipulation.
License: 3-clause BSD
Copyright 2015 (c) Patrick Maupin
Copyright (c) 2015 Patrick Maupin
Usage:
python -m astor.anti8 [readonly] <srcdir>
python -m astor.rtrip [readonly] [<srcdir>]
This will create a mirror directory named tmp_anti8 and will
If readonly is specified, then the source will be tested,
but no files will be written.
If srcdir is not specified, the standard library will be used.
This will create a mirror directory named tmp_rtrip and will
recursively round-trip all the Python source from the srcdir
into the tmp_anti8 dir, after compiling it and then reconstituting
it through codegen.
into the tmp_rtrip dir, after compiling it and then reconstituting
it through code_gen.to_source.
The purpose of rtrip is to place Python code into a canonical form.
This is useful both for functional testing of astor, and for
validating code edits.
For example, if you make manual edits for PEP8 compliance,
you can diff the rtrip output of the original code against
the rtrip output of the edited code, to insure that you
didn't make any functional changes.
For testing astor itself, it is useful to point to a big codebase,
e.g::
The purpose of anti8 is to place Python code into a canonical form --
that just happens to be about as far away from PEP 8 as you can get.
python -m astor.rtrip
How is this possibly useful?
to roundtrip the standard library.
Well, for a start, since it is a canonical form, you can compare the anti8
representation of a source tree against the anti8 representation of the
same tree after a PEP8 tool was run on it.
If any round-tripped files fail to be built or to match, the
tmp_rtrip directory will also contain fname.srcdmp and fname.dstdmp,
which are textual representations of the ASTs.
Or, maybe more importantly, after manual edits were made in the name
of PEP8. Trust, but verify.
Note 1: The canonical form is only canonical for a given version of
Note 1:
The canonical form is only canonical for a given version of
this module and the astor toolbox. It is not guaranteed to
be stable. The only desired guarantee is that two source modules
that parse to the same AST will be converted back into the same
canonical form.
Note 2: This tool WILL TRASH the tmp_anti8 directory -- as far as it is
concerned, it OWNS that directory.
Note 2:
This tool WILL TRASH the tmp_rtrip directory (unless readonly
is specified) -- as far as it is concerned, it OWNS that directory.
Note 3: This tools WILL CRASH if you don't give it exactly one parameter
on the command line -- the top of the tree you want to apply
anti8 to. You can read the traceback and figure this out, right?
Note 4: I lied a little bit in notes 2 and 3. You can also pass "readonly" as
a command line option for readonly (non-destructive mode).
This is primarily useful for testing astor itself.
Note 5: Why is it "readonly" and not "-r"? Because python -m slurps
Note 3: Why is it "readonly" and not "-r"? Because python -m slurps
all the thingies starting with the dash.
"""
......@@ -56,33 +66,21 @@ import shutil
import logging
#Avoid import loops
from .misc import parsefile, striplinecol, pyfiles
from .codegen import to_source
def dump(top, dict=dict, list=list, type=type, isinstance=isinstance, len=len):
def dump(node, indent):
if isinstance(node, dict):
for key, value in sorted(node.items()):
result.append('%s%s' % (indent, key))
dump(value, indent+' ')
elif isinstance(node, list):
for i, value in enumerate(node):
result.append('%s[%s]' % (indent, i))
dump(value, indent+' ')
else:
result.append('%s%s' % (indent, repr(node)))
result = []
dump(top, '')
return '\n'.join(result)
def convert(srctree, dsttree='tmp_anti8', readonly=False):
from .code_gen import to_source
from .file_util import code_to_ast
from .node_util import allow_ast_comparison, dump_tree, strip_tree
def convert(srctree, dsttree='tmp_rtrip', readonly=False):
"""Walk the srctree, and convert/copy all python files
into the dsttree
"""
allow_ast_comparison()
parse_file = code_to_ast.parse_file
find_py_files = code_to_ast.find_py_files
srctree = os.path.normpath(srctree)
if not readonly:
......@@ -91,13 +89,17 @@ def convert(srctree, dsttree='tmp_anti8', readonly=False):
logging.info('Trashing ' + dsttree)
shutil.rmtree(dsttree, True)
unknown_src_nodes = set()
unknown_dst_nodes = set()
badfiles = set()
broken = []
#TODO: When issue #26 resolved, remove UnicodeDecodeError
handled_exceptions = SyntaxError, UnicodeDecodeError
oldpath = ''
for srcpath, fname in pyfiles(srctree, None if readonly else dsttree):
allfiles = find_py_files(srctree, None if readonly else dsttree)
for srcpath, fname in allfiles:
# Create destination directory
if not readonly and srcpath != oldpath:
oldpath = srcpath
......@@ -110,7 +112,7 @@ def convert(srctree, dsttree='tmp_anti8', readonly=False):
srcfname = os.path.join(srcpath, fname)
logging.info('Converting ' + srcfname)
try:
srcast = parsefile(srcfname)
srcast = parse_file(srcfname)
except handled_exceptions:
badfiles.add(srcfname)
continue
......@@ -119,38 +121,91 @@ def convert(srctree, dsttree='tmp_anti8', readonly=False):
if not readonly:
dstfname = os.path.join(dstpath, fname)
with open(dstfname, 'w') as f:
f.write(dsttxt)
try:
with open(dstfname, 'w') as f:
f.write(dsttxt)
except UnicodeEncodeError:
badfiles.add(dstfname)
# As a sanity check, make sure that ASTs themselves
# round-trip OK
dstast = ast.parse(dsttxt) if readonly else parsefile(dstfname)
srcast = striplinecol(srcast)
dstast = striplinecol(dstast)
try:
dstast = ast.parse(dsttxt) if readonly else parse_file(dstfname)
except SyntaxError:
dstast = []
unknown_src_nodes.update(strip_tree(srcast))
unknown_dst_nodes.update(strip_tree(dstast))
if srcast != dstast:
broken.append((srcfname, srcast, dstast))
srcdump = dump_tree(srcast)
dstdump = dump_tree(dstast)
bad = srcdump != dstdump
logging.warning(' calculating dump -- %s' % ('bad' if bad else 'OK'))
if bad:
broken.append(srcfname)
if not readonly:
try:
with open(dstfname[:-3] +'.srcdmp', 'w') as f:
f.write(srcdump)
except UnicodeEncodeError:
badfiles.add(dstfname[:-3] +'.srcdmp')
try:
with open(dstfname[:-3] +'.dstdmp', 'w') as f:
f.write(dstdump)
except UnicodeEncodeError:
badfiles.add(dstfname[:-3] +'.dstdmp')
if badfiles:
logging.warning('')
logging.warning('Files not processed due to syntax errors:')
logging.warning('\nFiles not processed due to syntax errors:')
for fname in sorted(badfiles):
logging.warning(' ' + fname)
logging.warning(' %s' % fname)
if broken:
logging.warning('')
logging.warning('Files failed to round-trip to AST:')
for i, (fname, srcast, dstast) in enumerate(sorted(broken)):
logging.warning(' ' + fname)
with open('1_bad_anti8_%d.txt' % i, 'w') as f:
f.write(dump(srcast))
with open('2_bad_anti8_%d.txt' % i, 'w') as f:
f.write(dump(dstast))
logging.warning('\nFiles failed to round-trip to AST:')
for srcfname in broken:
logging.warning(' %s' % srcfname)
ok_to_strip = set(['col_offset', '_precedence', '_use_parens', 'lineno'])
bad_nodes = (unknown_dst_nodes | unknown_src_nodes) - ok_to_strip
if bad_nodes:
logging.error('\nERROR -- UNKNOWN NODES STRIPPED: %s' % bad_nodes)
logging.info('\n')
if __name__ == '__main__':
readonly = 'readonly' in sys.argv
import textwrap
args = sys.argv[1:]
readonly = 'readonly' in args
if readonly:
sys.argv.remove('readonly')
args.remove('readonly')
if not args:
args = [os.path.dirname(textwrap.__file__)]
msg = "Too many arguments" if len(args) != 1 else (
"%s is not a directory" % args[0] if not os.path.isdir(args[0])
else "")
if msg:
raise SystemExit(textwrap.dedent("""
Error: %s
Usage:
python -m astor.rtrip [readonly] [<srcdir>]
If readonly is specified, then the source will be tested,
but no files will be written.
If srcdir is not specified, the standard library will be used.
This will create a mirror directory named tmp_rtrip and will
recursively round-trip all the Python source from the srcdir
into the tmp_rtrip dir, after compiling it and then reconstituting
it through code_gen.to_source.
""") % msg)
srctree, = sys.argv[1:]
logging.basicConfig(format='%(msg)s', level=logging.INFO)
if convert(srctree, readonly=readonly):
if convert(args[0], readonly=readonly):
raise SystemExit('\nWARNING: Not all files converted\n')
......@@ -211,29 +211,50 @@ Functions
Command line utilities
--------------------------
anti8
rtrip
''''''
There is currently one command-line utility::
python -m astor.anti8 [readonly] <srcdir>
python -m astor.rtrip [readonly] [<srcdir>]
This will create a mirror directory named tmp_anti8 and will
This utility tests round-tripping of Python source to AST
and back to source.
.. versionadded:: 0.6
If readonly is specified, then the source will be tested,
but no files will be written.
If srcdir is not specified, the standard library will be used.
This will create a mirror directory named tmp_rtrip and will
recursively round-trip all the Python source from the srcdir
into the tmp_anti8 dir, after compiling it and then reconstituting
it through codegen.
into the tmp_rtrip dir, after compiling it and then reconstituting
it through code_gen.to_source.
The purpose of rtrip is to place Python code into a canonical form.
This is useful both for functional testing of astor, and for
validating code edits.
For example, if you make manual edits for PEP8 compliance,
you can diff the rtrip output of the original code against
the rtrip output of the edited code, to insure that you
didn't make any functional changes.
For testing astor itself, it is useful to point to a big codebase,
e.g::
The purpose of anti8 is to place Python code into a canonical form --
that just happens to be about as far away from PEP 8 as you can get.
python -m astor.rtrip
How is this possibly useful?
to round-trip the standard library.
Well, for a start, since it is a canonical form, you can compare the anti8
representation of a source tree against the anti8 representation of the
same tree after a PEP8 tool was run on it.
If any round-tripped files fail to be built or to match, the
tmp_rtrip directory will also contain fname.srcdmp and fname.dstdmp,
which are textual representations of the ASTs.
Or, maybe more importantly, after manual edits were made in the name
of PEP8. Trust, but verify.
Note 1:
The canonical form is only canonical for a given version of
......@@ -243,18 +264,8 @@ Note 1:
canonical form.
Note 2:
This tool WILL TRASH the tmp_anti8 directory -- as far as it is
concerned, it OWNS that directory.
Note 3:
This tools WILL CRASH if you don't give it exactly one parameter
on the command line -- the top of the tree you want to apply
anti8 to. You can read the traceback and figure this out, right?
Note 4:
I lied a little bit in notes 2 and 3. You can also pass **readonly**
as a command line option for readonly (non-destructive mode).
This is primarily useful for testing astor itself.
This tool WILL TRASH the tmp_rtrip directory (unless readonly
is specified) -- as far as it is concerned, it OWNS that directory.
.. _GitHub: https://github.com/berkerpeksag/astor/
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment