librfc822.tex 5.79 KB
Newer Older
1
\section{Standard Module \sectcode{rfc822}}
Guido van Rossum's avatar
Guido van Rossum committed
2 3
\stmodindex{rfc822}

4 5
\renewcommand{\indexsubitem}{(in module rfc822)}

Guido van Rossum's avatar
Guido van Rossum committed
6 7 8 9 10
This module defines a class, \code{Message}, which represents a
collection of ``email headers'' as defined by the Internet standard
RFC 822.  It is used in various contexts, usually to read such headers
from a file.

11 12 13
(Note that there's a separate, currently undocumented, module to read
Unix style mailbox files: \code{mailbox}.)

Guido van Rossum's avatar
Guido van Rossum committed
14
A \code{Message} instance is instantiated with an open file object as
15 16 17 18
parameter.  The optional \code{seekable} parameter indicates if the
file object is seekable; the default value is 1 for true.
Instantiation reads headers from the file up to a blank line and
stores them in the instance; after instantiation, the file is
Guido van Rossum's avatar
Guido van Rossum committed
19 20 21 22 23 24 25 26 27 28
positioned directly after the blank line that terminates the headers.

Input lines as read from the file may either be terminated by CR-LF or
by a single linefeed; a terminating CR-LF is replaced by a single
linefeed before the line is stored.

All header matching is done independent of upper or lower case;
e.g. \code{m['From']}, \code{m['from']} and \code{m['FROM']} all yield
the same result.

29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
\begin{funcdesc}{parsedate}{date}
Attempts to parse a date according to the rules in RFC822.  however,
some mailers don't follow that format as specified, so
\code{parsedate()} tries to guess correctly in such cases. 
\var{date} is a string containing an RFC822 date, such as 
\code{"Mon, 20 Nov 1995 19:12:08 -0500"}.  If it succeeds in parsing
the date, \code{parsedate()} returns a 9-tuple that can be passed
directly to \code{time.mktime()}; otherwise \code{None} will be
returned.  
\end{funcdesc}

\begin{funcdesc}{parsedate_tz}{date}
Performs the same function as \code{parsedate}, but returns either
\code{None} or a 10-tuple; the first 9 elements make up a tuple that
can be passed directly to \code{time.mktime()}, and the tenth is the
offset of the date's time zone from UTC (which is the official term
for Greenwich Mean Time).
\end{funcdesc}

48 49
\subsection{Message Objects}

Guido van Rossum's avatar
Guido van Rossum committed
50 51 52 53 54 55 56 57
A \code{Message} instance has the following methods:

\begin{funcdesc}{rewindbody}{}
Seek to the start of the message body.  This only works if the file
object is seekable.
\end{funcdesc}

\begin{funcdesc}{getallmatchingheaders}{name}
58
Return a list of lines consisting of all headers matching
Guido van Rossum's avatar
Guido van Rossum committed
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
\var{name}, if any.  Each physical line, whether it is a continuation
line or not, is a separate list item.  Return the empty list if no
header matches \var{name}.
\end{funcdesc}

\begin{funcdesc}{getfirstmatchingheader}{name}
Return a list of lines comprising the first header matching
\var{name}, and its continuation line(s), if any.  Return \code{None}
if there is no header matching \var{name}.
\end{funcdesc}

\begin{funcdesc}{getrawheader}{name}
Return a single string consisting of the text after the colon in the
first header matching \var{name}.  This includes leading whitespace,
the trailing linefeed, and internal linefeeds and whitespace if there
any continuation line(s) were present.  Return \code{None} if there is
no header matching \var{name}.
\end{funcdesc}

\begin{funcdesc}{getheader}{name}
Like \code{getrawheader(\var{name})}, but strip leading and trailing
whitespace (but not internal whitespace).
\end{funcdesc}

\begin{funcdesc}{getaddr}{name}
Return a pair (full name, email address) parsed from the string
returned by \code{getheader(\var{name})}.  If no header matching
\var{name} exists, return \code{None, None}; otherwise both the full
name and the address are (possibly empty )strings.

89 90
Example: If \code{m}'s first \code{From} header contains the string\\
\code{'jack@cwi.nl (Jack Jansen)'}, then
Guido van Rossum's avatar
Guido van Rossum committed
91
\code{m.getaddr('From')} will yield the pair
92
\code{('Jack Jansen', 'jack@cwi.nl')}.
Guido van Rossum's avatar
Guido van Rossum committed
93
If the header contained
94
\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
Guido van Rossum's avatar
Guido van Rossum committed
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
exact same result.
\end{funcdesc}

\begin{funcdesc}{getaddrlist}{name}
This is similar to \code{getaddr(\var{list})}, but parses a header
containing a list of email addresses (e.g. a \code{To} header) and
returns a list of (full name, email address) pairs (even if there was
only one address in the header).  If there is no header matching
\var{name}, return an empty list.

XXX The current version of this function is not really correct.  It
yields bogus results if a full name contains a comma.
\end{funcdesc}

\begin{funcdesc}{getdate}{name}
Retrieve a header using \code{getheader} and parse it into a 9-tuple
111
compatible with \code{time.mktime()}.  If there is no header matching
Guido van Rossum's avatar
Guido van Rossum committed
112 113 114 115 116 117 118 119
\var{name}, or it is unparsable, return \code{None}.

Date parsing appears to be a black art, and not all mailers adhere to
the standard.  While it has been tested and found correct on a large
collection of email from many sources, it is still possible that this
function may occasionally yield an incorrect result.
\end{funcdesc}

120 121 122 123 124 125 126 127 128
\begin{funcdesc}{getdate_tz}{name}
Retrieve a header using \code{getheader} and parse it into a 10-tuple;
the first 9 elements will make a tuple compatible with
\code{time.mktime()}, and the 10th is a number giving the offset of
the date's time zone from UTC.  Similarly to \code{getdate()}, if
there is no header matching \var{name}, or it is unparsable, return
\code{None}. 
\end{funcdesc}

Guido van Rossum's avatar
Guido van Rossum committed
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
\code{Message} instances also support a read-only mapping interface.
In particular: \code{m[name]} is the same as \code{m.getheader(name)};
and \code{len(m)}, \code{m.has_key(name)}, \code{m.keys()},
\code{m.values()} and \code{m.items()} act as expected (and
consistently).

Finally, \code{Message} instances have two public instance variables:

\begin{datadesc}{headers}
A list containing the entire set of header lines, in the order in
which they were read.  Each line contains a trailing newline.  The
blank line terminating the headers is not contained in the list.
\end{datadesc}

\begin{datadesc}{fp}
The file object passed at instantiation time.
\end{datadesc}