libmultifile.tex 6.33 KB
Newer Older
Fred Drake's avatar
Fred Drake committed
1
\section{\module{multifile} ---
2
         Support for files containing distinct parts}
3

4
\declaremodule{standard}{multifile}
Fred Drake's avatar
Fred Drake committed
5
\modulesynopsis{Support for reading files which contain distinct
6 7
                parts, such as some MIME data.}
\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
8

9

Fred Drake's avatar
Fred Drake committed
10 11 12
The \class{MultiFile} object enables you to treat sections of a text
file as file-like input objects, with \code{''} being returned by
\method{readline()} when a given delimiter pattern is encountered.  The
13 14 15 16
defaults of this class are designed to make it useful for parsing
MIME multipart messages, but by subclassing it and overriding methods 
it can be easily adapted for more general use.

Fred Drake's avatar
Fred Drake committed
17
\begin{classdesc}{MultiFile}{fp\optional{, seekable}}
18
Create a multi-file.  You must instantiate this class with an input
Fred Drake's avatar
Fred Drake committed
19 20 21 22 23 24 25 26 27 28
object argument for the \class{MultiFile} instance to get lines from,
such as as a file object returned by \function{open()}.

\class{MultiFile} only ever looks at the input object's
\method{readline()}, \method{seek()} and \method{tell()} methods, and
the latter two are only needed if you want random access to the
individual MIME parts. To use \class{MultiFile} on a non-seekable
stream object, set the optional \var{seekable} argument to false; this
will prevent using the input object's \method{seek()} and
\method{tell()} methods.
29 30
\end{classdesc}

Fred Drake's avatar
Fred Drake committed
31
It will be useful to know that in \class{MultiFile}'s view of the world, text
32 33 34 35 36
is composed of three kinds of lines: data, section-dividers, and
end-markers.  MultiFile is designed to support parsing of
messages that may have multiple nested message parts, each with its
own pattern for section-divider and end-marker lines.

37 38 39 40 41
\begin{seealso}
  \seemodule{email}{Comprehensive email handling package; supercedes
                    the \module{multifile} module.}
\end{seealso}

Fred Drake's avatar
Fred Drake committed
42 43

\subsection{MultiFile Objects \label{MultiFile-objects}}
44 45 46 47 48 49

A \class{MultiFile} instance has the following methods:

\begin{methoddesc}{readline}{str}
Read a line.  If the line is data (not a section-divider or end-marker
or real EOF) return it.  If the line matches the most-recently-stacked
50
boundary, return \code{''} and set \code{self.last} to 1 or 0 according as
51
the match is or is not an end-marker.  If the line matches any other
Fred Drake's avatar
Fred Drake committed
52 53 54
stacked boundary, raise an error.  On encountering end-of-file on the
underlying stream object, the method raises \exception{Error} unless
all boundaries have been popped.
55 56 57
\end{methoddesc}

\begin{methoddesc}{readlines}{str}
Fred Drake's avatar
Fred Drake committed
58
Return all lines remaining in this part as a list of strings.
59 60
\end{methoddesc}

Fred Drake's avatar
Fred Drake committed
61
\begin{methoddesc}{read}{}
62 63 64 65
Read all lines, up to the next section.  Return them as a single
(multiline) string.  Note that this doesn't take a size argument!
\end{methoddesc}

Fred Drake's avatar
Fred Drake committed
66
\begin{methoddesc}{seek}{pos\optional{, whence}}
67
Seek.  Seek indices are relative to the start of the current section.
Fred Drake's avatar
Fred Drake committed
68 69
The \var{pos} and \var{whence} arguments are interpreted as for a file
seek.
70 71
\end{methoddesc}

Fred Drake's avatar
Fred Drake committed
72 73
\begin{methoddesc}{tell}{}
Return the file position relative to the start of the current section.
74 75
\end{methoddesc}

76 77 78 79 80 81 82
\begin{methoddesc}{next}{}
Skip lines to the next section (that is, read lines until a
section-divider or end-marker has been consumed).  Return true if
there is such a section, false if an end-marker is seen.  Re-enable
the most-recently-pushed boundary.
\end{methoddesc}

83
\begin{methoddesc}{is_data}{str}
Fred Drake's avatar
Fred Drake committed
84
Return true if \var{str} is data and false if it might be a section
85
boundary.  As written, it tests for a prefix other than \code{'-}\code{-'} at
Fred Drake's avatar
Fred Drake committed
86 87
start of line (which all MIME boundaries have) but it is declared so
it can be overridden in derived classes.
88 89

Note that this test is used intended as a fast guard for the real
Fred Drake's avatar
Fred Drake committed
90 91
boundary tests; if it always returns false it will merely slow
processing, not cause it to fail.
92 93
\end{methoddesc}

94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
\begin{methoddesc}{push}{str}
Push a boundary string.  When an appropriately decorated version of
this boundary is found as an input line, it will be interpreted as a
section-divider or end-marker.  All subsequent
reads will return the empty string to indicate end-of-file, until a
call to \method{pop()} removes the boundary a or \method{next()} call
reenables it.

It is possible to push more than one boundary.  Encountering the
most-recently-pushed boundary will return EOF; encountering any other
boundary will raise an error.
\end{methoddesc}

\begin{methoddesc}{pop}{}
Pop a section boundary.  This boundary will no longer be interpreted
as EOF.
\end{methoddesc}

112 113
\begin{methoddesc}{section_divider}{str}
Turn a boundary into a section-divider line.  By default, this
114
method prepends \code{'-}\code{-'} (which MIME section boundaries have) but
Fred Drake's avatar
Fred Drake committed
115 116 117
it is declared so it can be overridden in derived classes.  This
method need not append LF or CR-LF, as comparison with the result
ignores trailing whitespace. 
118 119 120 121
\end{methoddesc}

\begin{methoddesc}{end_marker}{str}
Turn a boundary string into an end-marker line.  By default, this
122
method prepends \code{'-}\code{-'} and appends \code{'-}\code{-'} (like a
Fred Drake's avatar
Fred Drake committed
123 124 125
MIME-multipart end-of-message marker) but it is declared so it can be
be overridden in derived classes.  This method need not append LF or
CR-LF, as comparison with the result ignores trailing whitespace.
126 127 128 129 130
\end{methoddesc}

Finally, \class{MultiFile} instances have two public instance variables:

\begin{memberdesc}{level}
Fred Drake's avatar
Fred Drake committed
131
Nesting depth of the current part.
132 133 134
\end{memberdesc}

\begin{memberdesc}{last}
Fred Drake's avatar
Fred Drake committed
135
True if the last end-of-file was for an end-of-message marker. 
136 137
\end{memberdesc}

Fred Drake's avatar
Fred Drake committed
138

Fred Drake's avatar
Fred Drake committed
139
\subsection{\class{MultiFile} Example \label{multifile-example}}
140
\sectionauthor{Skip Montanaro}{skip@mojam.com}
141 142

\begin{verbatim}
143
import mimetools
144
import multifile
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
import StringIO

def extract_mime_part_matching(stream, mimetype):
    """Return the first element in a multipart MIME message on stream
    matching mimetype."""

    msg = mimetools.Message(stream)
    msgtype = msg.gettype()
    params = msg.getplist()

    data = StringIO.StringIO()
    if msgtype[:10] == "multipart/":

        file = multifile.MultiFile(stream)
        file.push(msg.getparam("boundary"))
        while file.next():
            submsg = mimetools.Message(file)
            try:
                data = StringIO.StringIO()
                mimetools.decode(file, data, submsg.getencoding())
            except ValueError:
                continue
            if submsg.gettype() == mimetype:
                break
        file.pop()
    return data.getvalue()
171
\end{verbatim}