parser.rst 14.8 KB
Newer Older
1 2 3 4 5 6 7 8 9
:mod:`parser` --- Access Python parse trees
===========================================

.. module:: parser
   :synopsis: Access parse trees for Python source code.
.. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>


10 11 12 13 14 15
.. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
   L. Drake, Jr.  This copyright notice must be distributed on all copies, but
   this document otherwise may be distributed as part of the Python
   distribution.  No fee may be charged for this document in any representation,
   either on paper or electronically.  This restriction does not affect other
   elements in a distributed package in any way.
16 17 18 19 20 21 22 23 24 25

.. index:: single: parsing; Python source code

The :mod:`parser` module provides an interface to Python's internal parser and
byte-code compiler.  The primary purpose for this interface is to allow Python
code to edit the parse tree of a Python expression and create executable code
from this.  This is better than trying to parse and modify an arbitrary Python
code fragment as a string because parsing is performed in a manner identical to
the code forming the application.  It is also faster.

Georg Brandl's avatar
Georg Brandl committed
26 27 28 29 30 31
.. note::

   From Python 2.5 onward, it's much more convenient to cut in at the Abstract
   Syntax Tree (AST) generation and compilation stage, using the :mod:`ast`
   module.

32 33 34 35 36 37 38 39 40 41
There are a few things to note about this module which are important to making
use of the data structures created.  This is not a tutorial on editing the parse
trees for Python code, but some examples of using the :mod:`parser` module are
presented.

Most importantly, a good understanding of the Python grammar processed by the
internal parser is required.  For full information on the language syntax, refer
to :ref:`reference-index`.  The parser
itself is created from a grammar specification defined in the file
:file:`Grammar/Grammar` in the standard Python distribution.  The parse trees
Georg Brandl's avatar
Georg Brandl committed
42
stored in the ST objects created by this module are the actual output from the
43
internal parser when created by the :func:`expr` or :func:`suite` functions,
Georg Brandl's avatar
Georg Brandl committed
44
described below.  The ST objects created by :func:`sequence2st` faithfully
45 46 47 48 49 50 51 52 53
simulate those structures.  Be aware that the values of the sequences which are
considered "correct" will vary from one version of Python to another as the
formal grammar for the language is revised.  However, transporting code from one
Python version to another as source text will always allow correct parse trees
to be created in the target version, with the only restriction being that
migrating to an older version of the interpreter will not support more recent
language constructs.  The parse trees are not typically compatible from one
version to another, whereas source code has always been forward-compatible.

Georg Brandl's avatar
Georg Brandl committed
54
Each element of the sequences returned by :func:`st2list` or :func:`st2tuple`
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
has a simple form.  Sequences representing non-terminal elements in the grammar
always have a length greater than one.  The first element is an integer which
identifies a production in the grammar.  These integers are given symbolic names
in the C header file :file:`Include/graminit.h` and the Python module
:mod:`symbol`.  Each additional element of the sequence represents a component
of the production as recognized in the input string: these are always sequences
which have the same form as the parent.  An important aspect of this structure
which should be noted is that keywords used to identify the parent node type,
such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
node tree without any special treatment.  For example, the :keyword:`if` keyword
is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
associated with all :const:`NAME` tokens, including variable and function names
defined by the user.  In an alternate form returned when line number information
is requested, the same token might be represented as ``(1, 'if', 12)``, where
the ``12`` represents the line number at which the terminal symbol was found.

Terminal elements are represented in much the same way, but without any child
elements and the addition of the source text which was identified.  The example
of the :keyword:`if` keyword above is representative.  The various types of
terminal symbols are defined in the C header file :file:`Include/token.h` and
the Python module :mod:`token`.

Georg Brandl's avatar
Georg Brandl committed
77
The ST objects are not required to support the functionality of this module,
78 79 80 81 82
but are provided for three purposes: to allow an application to amortize the
cost of processing complex parse trees, to provide a parse tree representation
which conserves memory space when compared to the Python list or tuple
representation, and to ease the creation of additional modules in C which
manipulate parse trees.  A simple "wrapper" class may be created in Python to
Georg Brandl's avatar
Georg Brandl committed
83
hide the use of ST objects.
84 85

The :mod:`parser` module defines functions for a few distinct purposes.  The
Georg Brandl's avatar
Georg Brandl committed
86
most important purposes are to create ST objects and to convert ST objects to
87 88
other representations such as parse trees and compiled code objects, but there
are also functions which serve to query the type of parse tree represented by an
Georg Brandl's avatar
Georg Brandl committed
89
ST object.
90 91 92 93 94 95 96 97 98 99 100 101


.. seealso::

   Module :mod:`symbol`
      Useful constants representing internal nodes of the parse tree.

   Module :mod:`token`
      Useful constants representing leaf nodes of the parse tree and functions for
      testing node values.


Georg Brandl's avatar
Georg Brandl committed
102
.. _creating-sts:
103

Georg Brandl's avatar
Georg Brandl committed
104 105
Creating ST Objects
-------------------
106

Georg Brandl's avatar
Georg Brandl committed
107 108
ST objects may be created from source code or from a parse tree. When creating
an ST object from source, different functions are used to create the ``'eval'``
109 110 111 112 113 114
and ``'exec'`` forms.


.. function:: expr(source)

   The :func:`expr` function parses the parameter *source* as if it were an input
Georg Brandl's avatar
Georg Brandl committed
115
   to ``compile(source, 'file.py', 'eval')``.  If the parse succeeds, an ST object
116
   is created to hold the internal parse tree representation, otherwise an
117
   appropriate exception is raised.
118 119 120 121 122


.. function:: suite(source)

   The :func:`suite` function parses the parameter *source* as if it were an input
Georg Brandl's avatar
Georg Brandl committed
123
   to ``compile(source, 'file.py', 'exec')``.  If the parse succeeds, an ST object
124
   is created to hold the internal parse tree representation, otherwise an
125
   appropriate exception is raised.
126 127


Georg Brandl's avatar
Georg Brandl committed
128
.. function:: sequence2st(sequence)
129 130 131 132

   This function accepts a parse tree represented as a sequence and builds an
   internal representation if possible.  If it can validate that the tree conforms
   to the Python grammar and all nodes are valid node types in the host version of
Georg Brandl's avatar
Georg Brandl committed
133
   Python, an ST object is created from the internal representation and returned
134
   to the called.  If there is a problem creating the internal representation, or
135
   if the tree cannot be validated, a :exc:`ParserError` exception is raised.  An
Georg Brandl's avatar
Georg Brandl committed
136
   ST object created this way should not be assumed to compile correctly; normal
137
   exceptions raised by compilation may still be initiated when the ST object is
Georg Brandl's avatar
Georg Brandl committed
138
   passed to :func:`compilest`.  This may indicate problems not related to syntax
139 140 141 142 143 144 145 146 147 148 149
   (such as a :exc:`MemoryError` exception), but may also be due to constructs such
   as the result of parsing ``del f(0)``, which escapes the Python parser but is
   checked by the bytecode compiler.

   Sequences representing terminal tokens may be represented as either two-element
   lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
   'name', 56)``.  If the third element is present, it is assumed to be a valid
   line number.  The line number may be specified for any subset of the terminal
   symbols in the input tree.


Georg Brandl's avatar
Georg Brandl committed
150
.. function:: tuple2st(sequence)
151

Georg Brandl's avatar
Georg Brandl committed
152
   This is the same function as :func:`sequence2st`.  This entry point is
153 154 155
   maintained for backward compatibility.


Georg Brandl's avatar
Georg Brandl committed
156
.. _converting-sts:
157

Georg Brandl's avatar
Georg Brandl committed
158 159
Converting ST Objects
---------------------
160

Georg Brandl's avatar
Georg Brandl committed
161
ST objects, regardless of the input used to create them, may be converted to
162 163 164 165 166
parse trees represented as list- or tuple- trees, or may be compiled into
executable code objects.  Parse trees may be extracted with or without line
numbering information.


167
.. function:: st2list(st, line_info=False, col_info=False)
168

169
   This function accepts an ST object from the caller in *st* and returns a
170 171 172 173
   Python list representing the equivalent parse tree.  The resulting list
   representation can be used for inspection or the creation of a new parse tree in
   list form.  This function does not fail so long as memory is available to build
   the list representation.  If the parse tree will only be used for inspection,
Georg Brandl's avatar
Georg Brandl committed
174
   :func:`st2tuple` should be used instead to reduce memory consumption and
175 176 177 178 179 180 181 182 183 184
   fragmentation.  When the list representation is required, this function is
   significantly faster than retrieving a tuple representation and converting that
   to nested lists.

   If *line_info* is true, line number information will be included for all
   terminal tokens as a third element of the list representing the token.  Note
   that the line number provided specifies the line on which the token *ends*.
   This information is omitted if the flag is false or omitted.


185
.. function:: st2tuple(st, line_info=False, col_info=False)
186

187
   This function accepts an ST object from the caller in *st* and returns a
188
   Python tuple representing the equivalent parse tree.  Other than returning a
Georg Brandl's avatar
Georg Brandl committed
189
   tuple instead of a list, this function is identical to :func:`st2list`.
190 191 192 193 194 195

   If *line_info* is true, line number information will be included for all
   terminal tokens as a third element of the list representing the token.  This
   information is omitted if the flag is false or omitted.


196
.. function:: compilest(st, filename='<syntax-tree>')
197 198 199 200 201

   .. index::
      builtin: exec
      builtin: eval

Georg Brandl's avatar
Georg Brandl committed
202
   The Python byte compiler can be invoked on an ST object to produce code objects
203 204
   which can be used as part of a call to the built-in :func:`exec` or :func:`eval`
   functions. This function provides the interface to the compiler, passing the
205
   internal parse tree from *st* to the parser, using the source file name
206
   specified by the *filename* parameter. The default value supplied for *filename*
Georg Brandl's avatar
Georg Brandl committed
207
   indicates that the source was an ST object.
208

Georg Brandl's avatar
Georg Brandl committed
209
   Compiling an ST object may result in exceptions related to compilation; an
210 211 212 213 214 215 216 217 218
   example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
   this statement is considered legal within the formal grammar for Python but is
   not a legal language construct.  The :exc:`SyntaxError` raised for this
   condition is actually generated by the Python byte-compiler normally, which is
   why it can be raised at this point by the :mod:`parser` module.  Most causes of
   compilation failure can be diagnosed programmatically by inspection of the parse
   tree.


Georg Brandl's avatar
Georg Brandl committed
219
.. _querying-sts:
220

Georg Brandl's avatar
Georg Brandl committed
221 222
Queries on ST Objects
---------------------
223

Georg Brandl's avatar
Georg Brandl committed
224
Two functions are provided which allow an application to determine if an ST was
225
created as an expression or a suite.  Neither of these functions can be used to
Georg Brandl's avatar
Georg Brandl committed
226 227
determine if an ST was created from source code via :func:`expr` or
:func:`suite` or from a parse tree via :func:`sequence2st`.
228 229


230
.. function:: isexpr(st)
231 232 233

   .. index:: builtin: compile

234
   When *st* represents an ``'eval'`` form, this function returns true, otherwise
235 236
   it returns false.  This is useful, since code objects normally cannot be queried
   for this information using existing built-in functions.  Note that the code
Georg Brandl's avatar
Georg Brandl committed
237
   objects created by :func:`compilest` cannot be queried like this either, and
238 239 240
   are identical to those created by the built-in :func:`compile` function.


241
.. function:: issuite(st)
242

Georg Brandl's avatar
Georg Brandl committed
243
   This function mirrors :func:`isexpr` in that it reports whether an ST object
244
   represents an ``'exec'`` form, commonly known as a "suite."  It is not safe to
245
   assume that this function is equivalent to ``not isexpr(st)``, as additional
246 247 248
   syntactic fragments may be supported in the future.


Georg Brandl's avatar
Georg Brandl committed
249
.. _st-errors:
250 251 252 253 254 255 256 257 258 259 260 261

Exceptions and Error Handling
-----------------------------

The parser module defines a single exception, but may also pass other built-in
exceptions from other portions of the Python runtime environment.  See each
function for information about the exceptions it can raise.


.. exception:: ParserError

   Exception raised when a failure occurs within the parser module.  This is
262 263
   generally produced for validation failures rather than the built-in
   :exc:`SyntaxError` raised during normal parsing. The exception argument is
264
   either a string describing the reason of the failure or a tuple containing a
Georg Brandl's avatar
Georg Brandl committed
265 266
   sequence causing the failure from a parse tree passed to :func:`sequence2st`
   and an explanatory string.  Calls to :func:`sequence2st` need to be able to
267 268 269
   handle either type of exception, while calls to other functions in the module
   will only need to be aware of the simple string values.

Georg Brandl's avatar
Georg Brandl committed
270
Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
271
raise exceptions which are normally raised by the parsing and compilation
272 273 274 275 276 277
process.  These include the built in exceptions :exc:`MemoryError`,
:exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`.  In these
cases, these exceptions carry all the meaning normally associated with them.
Refer to the descriptions of each function for detailed information.


Georg Brandl's avatar
Georg Brandl committed
278
.. _st-objects:
279

Georg Brandl's avatar
Georg Brandl committed
280 281
ST Objects
----------
282

Georg Brandl's avatar
Georg Brandl committed
283 284
Ordered and equality comparisons are supported between ST objects. Pickling of
ST objects (using the :mod:`pickle` module) is also supported.
285 286


Georg Brandl's avatar
Georg Brandl committed
287
.. data:: STType
288 289

   The type of the objects returned by :func:`expr`, :func:`suite` and
Georg Brandl's avatar
Georg Brandl committed
290
   :func:`sequence2st`.
291

Georg Brandl's avatar
Georg Brandl committed
292
ST objects have the following methods:
293 294


295
.. method:: ST.compile(filename='<syntax-tree>')
296

Georg Brandl's avatar
Georg Brandl committed
297
   Same as ``compilest(st, filename)``.
298 299


Georg Brandl's avatar
Georg Brandl committed
300
.. method:: ST.isexpr()
301

Georg Brandl's avatar
Georg Brandl committed
302
   Same as ``isexpr(st)``.
303 304


Georg Brandl's avatar
Georg Brandl committed
305
.. method:: ST.issuite()
306

Georg Brandl's avatar
Georg Brandl committed
307
   Same as ``issuite(st)``.
308 309


310
.. method:: ST.tolist(line_info=False, col_info=False)
311

312
   Same as ``st2list(st, line_info, col_info)``.
313 314


315
.. method:: ST.totuple(line_info=False, col_info=False)
316

317
   Same as ``st2tuple(st, line_info, col_info)``.
318 319


320 321
Example: Emulation of :func:`compile`
-------------------------------------
322 323 324 325 326 327 328 329 330 331 332 333

While many useful operations may take place between parsing and bytecode
generation, the simplest operation is to do nothing.  For this purpose, using
the :mod:`parser` module to produce an intermediate data structure is equivalent
to the code ::

   >>> code = compile('a + 5', 'file.py', 'eval')
   >>> a = 5
   >>> eval(code)
   10

The equivalent operation using the :mod:`parser` module is somewhat longer, and
Georg Brandl's avatar
Georg Brandl committed
334
allows the intermediate internal parse tree to be retained as an ST object::
335 336

   >>> import parser
Georg Brandl's avatar
Georg Brandl committed
337 338
   >>> st = parser.expr('a + 5')
   >>> code = st.compile('file.py')
339 340 341 342
   >>> a = 5
   >>> eval(code)
   10

Georg Brandl's avatar
Georg Brandl committed
343
An application which needs both ST and code objects can package this code into
344 345 346 347 348
readily available functions::

   import parser

   def load_suite(source_string):
Georg Brandl's avatar
Georg Brandl committed
349 350
       st = parser.suite(source_string)
       return st, st.compile()
351 352

   def load_expression(source_string):
Georg Brandl's avatar
Georg Brandl committed
353 354
       st = parser.expr(source_string)
       return st, st.compile()