rfc822.py 30.5 KB
Newer Older
1 2 3 4 5 6 7 8 9 10
"""RFC-822 message manipulation class.

XXX This is only a very rough sketch of a full RFC-822 parser;
in particular the tokenizing of addresses does not adhere to all the
quoting rules.

Directions for use:

To create a Message object: first open a file, e.g.:
  fp = open(file, 'r')
11 12
You can use any other legal way of getting an open file object, e.g. use
sys.stdin or call os.popen().
13 14 15
Then pass the open file object to the Message() constructor:
  m = Message(fp)

16 17 18 19 20 21 22
This class can work with any input object that supports a readline
method.  If the input object has seek and tell capability, the
rewindbody method will work; also illegal lines will be pushed back
onto the input stream.  If the input object lacks seek but has an
`unread' method that can push back a line of input, Message will use
that to push back illegal lines.  Thus this class can be used to parse
messages coming from a buffered stream.
23 24 25 26 27 28 29 30

The optional `seekable' argument is provided as a workaround for
certain stdio libraries in which tell() discards buffered data before
discovering that the lseek() system call doesn't work.  For maximum
portability, you should set the seekable argument to zero to prevent
that initial \code{tell} when passing in an unseekable object such as
a a file object created from a socket object.  If it is 1 on entry --
which it is by default -- the tell() method of the open file object is
Tim Peters's avatar
Tim Peters committed
31
called once; if this raises an exception, seekable is reset to 0.  For
32 33
other nonzero values of seekable, this test is not made.

34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
To get the text of a particular header there are several methods:
  str = m.getheader(name)
  str = m.getrawheader(name)
where name is the name of the header, e.g. 'Subject'.
The difference is that getheader() strips the leading and trailing
whitespace, while getrawheader() doesn't.  Both functions retain
embedded whitespace (including newlines) exactly as they are
specified in the header, and leave the case of the text unchanged.

For addresses and address lists there are functions
  realname, mailaddress = m.getaddr(name) and
  list = m.getaddrlist(name)
where the latter returns a list of (realname, mailaddr) tuples.

There is also a method
  time = m.getdate(name)
which parses a Date-like field and returns a time-compatible tuple,
i.e. a tuple such as returned by time.localtime() or accepted by
time.mktime().

See the class definition for lower level access methods.

There are also some utility functions here.
"""
58
# Cleanup and extensions by Eric S. Raymond <esr@thyrsus.com>
59

60
import time
61 62


63
_blanklines = ('\r\n', '\n')            # Optimization for islast()
64 65


66
class Message:
67
    """Represents a single RFC-822-compliant message."""
Tim Peters's avatar
Tim Peters committed
68

69 70
    def __init__(self, fp, seekable = 1):
        """Initialize the class instance and read the headers."""
71 72 73 74 75 76 77 78 79
        if seekable == 1:
            # Exercise tell() to make sure it works
            # (and then assume seek() works, too)
            try:
                fp.tell()
            except:
                seekable = 0
            else:
                seekable = 1
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
        self.fp = fp
        self.seekable = seekable
        self.startofheaders = None
        self.startofbody = None
        #
        if self.seekable:
            try:
                self.startofheaders = self.fp.tell()
            except IOError:
                self.seekable = 0
        #
        self.readheaders()
        #
        if self.seekable:
            try:
                self.startofbody = self.fp.tell()
            except IOError:
                self.seekable = 0
Tim Peters's avatar
Tim Peters committed
98

99 100 101 102 103
    def rewindbody(self):
        """Rewind the file to the start of the body (if seekable)."""
        if not self.seekable:
            raise IOError, "unseekable file"
        self.fp.seek(self.startofbody)
Tim Peters's avatar
Tim Peters committed
104

105 106
    def readheaders(self):
        """Read header lines.
Tim Peters's avatar
Tim Peters committed
107

108 109 110 111 112 113
        Read header lines up to the entirely blank line that
        terminates them.  The (normally blank) line that ends the
        headers is skipped, but not included in the returned list.
        If a non-header line ends the headers, (which is an error),
        an attempt is made to backspace over it; it is never
        included in the returned list.
Tim Peters's avatar
Tim Peters committed
114

115 116 117 118 119 120 121 122 123 124 125 126
        The variable self.status is set to the empty string if all
        went well, otherwise it is an error message.
        The variable self.headers is a completely uninterpreted list
        of lines contained in the header (so printing them will
        reproduce the header exactly as it appears in the file).
        """
        self.dict = {}
        self.unixfrom = ''
        self.headers = list = []
        self.status = ''
        headerseen = ""
        firstline = 1
127 128 129 130 131
        startofline = unread = tell = None
        if hasattr(self.fp, 'unread'):
            unread = self.fp.unread
        elif self.seekable:
            tell = self.fp.tell
132
        while 1:
133
            if tell:
134 135 136 137 138
                try:
                    startofline = tell()
                except IOError:
                    startofline = tell = None
                    self.seekable = 0
139 140 141 142 143
            line = self.fp.readline()
            if not line:
                self.status = 'EOF in headers'
                break
            # Skip unix From name time lines
144
            if firstline and line.startswith('From '):
145 146 147
                self.unixfrom = self.unixfrom + line
                continue
            firstline = 0
148
            if headerseen and line[0] in ' \t':
149 150
                # It's a continuation line.
                list.append(line)
151 152
                x = (self.dict[headerseen] + "\n " + line.strip())
                self.dict[headerseen] = x.strip()
153
                continue
154
            elif self.iscomment(line):
155 156 157 158 159 160 161 162 163
                # It's a comment.  Ignore it.
                continue
            elif self.islast(line):
                # Note! No pushback here!  The delimiter line gets eaten.
                break
            headerseen = self.isheader(line)
            if headerseen:
                # It's a legal header line, save it.
                list.append(line)
164
                self.dict[headerseen] = line[len(headerseen)+1:].strip()
165
                continue
166
            else:
167 168
                # It's not a header line; throw it back and stop here.
                if not self.dict:
169 170
                    self.status = 'No headers'
                else:
171
                    self.status = 'Non-header line where header expected'
172
                # Try to undo the read.
173 174 175 176
                if unread:
                    unread(line)
                elif tell:
                    self.fp.seek(startofline)
177
                else:
178
                    self.status = self.status + '; bad seek'
179
                break
180 181 182 183 184 185 186 187

    def isheader(self, line):
        """Determine whether a given line is a legal header.

        This method should return the header name, suitably canonicalized.
        You may override this method in order to use Message parsing
        on tagged data in RFC822-like formats with special header formats.
        """
188
        i = line.find(':')
189
        if i > 0:
190
            return line[:i].lower()
191 192
        else:
            return None
Tim Peters's avatar
Tim Peters committed
193

194 195
    def islast(self, line):
        """Determine whether a line is a legal end of RFC-822 headers.
Tim Peters's avatar
Tim Peters committed
196

197 198
        You may override this method if your application wants
        to bend the rules, e.g. to strip trailing whitespace,
199
        or to recognize MH template separators ('--------').
200
        For convenience (e.g. for code reading from sockets) a
Tim Peters's avatar
Tim Peters committed
201
        line consisting of \r\n also matches.
202 203
        """
        return line in _blanklines
204 205 206 207 208 209 210 211 212

    def iscomment(self, line):
        """Determine whether a line should be skipped entirely.

        You may override this method in order to use Message parsing
        on tagged data in RFC822-like formats that support embedded
        comments or free-text data.
        """
        return None
Tim Peters's avatar
Tim Peters committed
213

214 215
    def getallmatchingheaders(self, name):
        """Find all header lines matching a given header name.
Tim Peters's avatar
Tim Peters committed
216

217 218 219 220 221 222 223 224
        Look through the list of headers and find all lines
        matching a given header name (and their continuation
        lines).  A list of the lines is returned, without
        interpretation.  If the header does not occur, an
        empty list is returned.  If the header occurs multiple
        times, all occurrences are returned.  Case is not
        important in the header name.
        """
225
        name = name.lower() + ':'
226 227 228 229
        n = len(name)
        list = []
        hit = 0
        for line in self.headers:
230
            if line[:n].lower() == name:
231
                hit = 1
232
            elif not line[:1].isspace():
233 234 235 236
                hit = 0
            if hit:
                list.append(line)
        return list
Tim Peters's avatar
Tim Peters committed
237

238 239
    def getfirstmatchingheader(self, name):
        """Get the first header line matching name.
Tim Peters's avatar
Tim Peters committed
240

241 242 243 244
        This is similar to getallmatchingheaders, but it returns
        only the first matching header (and its continuation
        lines).
        """
245
        name = name.lower() + ':'
246 247 248 249 250
        n = len(name)
        list = []
        hit = 0
        for line in self.headers:
            if hit:
251
                if not line[:1].isspace():
252
                    break
253
            elif line[:n].lower() == name:
254 255 256 257
                hit = 1
            if hit:
                list.append(line)
        return list
Tim Peters's avatar
Tim Peters committed
258

259 260
    def getrawheader(self, name):
        """A higher-level interface to getfirstmatchingheader().
Tim Peters's avatar
Tim Peters committed
261

262 263 264 265 266 267
        Return a string containing the literal text of the
        header but with the keyword stripped.  All leading,
        trailing and embedded whitespace is kept in the
        string, however.
        Return None if the header does not occur.
        """
Tim Peters's avatar
Tim Peters committed
268

269 270 271 272
        list = self.getfirstmatchingheader(name)
        if not list:
            return None
        list[0] = list[0][len(name) + 1:]
273
        return ''.join(list)
Tim Peters's avatar
Tim Peters committed
274

275
    def getheader(self, name, default=None):
276
        """Get the header value for a name.
Tim Peters's avatar
Tim Peters committed
277

278
        This is the normal interface: it returns a stripped
279 280 281 282 283
        version of the header value for a given header name,
        or None if it doesn't exist.  This uses the dictionary
        version which finds the *last* such header.
        """
        try:
284
            return self.dict[name.lower()]
285
        except KeyError:
286 287
            return default
    get = getheader
288 289 290 291 292 293

    def getheaders(self, name):
        """Get all values for a header.

        This returns a list of values for headers given more than once;
        each value in the result list is stripped in the same way as the
294 295
        result of getheader().  If the header is not given, return an
        empty list.
296 297 298 299 300
        """
        result = []
        current = ''
        have_header = 0
        for s in self.getallmatchingheaders(name):
301
            if s[0].isspace():
302
                if current:
303
                    current = "%s\n %s" % (current, s.strip())
304
                else:
305
                    current = s.strip()
306 307 308
            else:
                if have_header:
                    result.append(current)
309
                current = s[s.find(":") + 1:].strip()
310 311 312
                have_header = 1
        if have_header:
            result.append(current)
313
        return result
Tim Peters's avatar
Tim Peters committed
314

315 316
    def getaddr(self, name):
        """Get a single address from a header, as a tuple.
Tim Peters's avatar
Tim Peters committed
317

318 319 320 321 322 323 324 325 326
        An example return value:
        ('Guido van Rossum', 'guido@cwi.nl')
        """
        # New, by Ben Escoto
        alist = self.getaddrlist(name)
        if alist:
            return alist[0]
        else:
            return (None, None)
Tim Peters's avatar
Tim Peters committed
327

328 329
    def getaddrlist(self, name):
        """Get a list of addresses from a header.
330 331 332 333 334

        Retrieves a list of addresses from a header, where each address is a
        tuple as returned by getaddr().  Scans all named headers, so it works
        properly with multiple To: or Cc: headers for example.

335
        """
336 337
        raw = []
        for h in self.getallmatchingheaders(name):
338 339 340 341 342
            if h[0] in ' \t':
                raw.append(h)
            else:
                if raw:
                    raw.append(', ')
343
                i = h.find(':')
344 345 346
                if i > 0:
                    addr = h[i+1:]
                raw.append(addr)
347
        alladdrs = ''.join(raw)
348
        a = AddrlistClass(alladdrs)
349
        return a.getaddrlist()
Tim Peters's avatar
Tim Peters committed
350

351 352
    def getdate(self, name):
        """Retrieve a date field from a header.
Tim Peters's avatar
Tim Peters committed
353

354 355 356 357 358 359 360 361
        Retrieves a date field from the named header, returning
        a tuple compatible with time.mktime().
        """
        try:
            data = self[name]
        except KeyError:
            return None
        return parsedate(data)
Tim Peters's avatar
Tim Peters committed
362

363 364
    def getdate_tz(self, name):
        """Retrieve a date field from a header as a 10-tuple.
Tim Peters's avatar
Tim Peters committed
365

366 367 368 369 370 371 372 373 374
        The first 9 elements make up a tuple compatible with
        time.mktime(), and the 10th is the offset of the poster's
        time zone from GMT/UTC.
        """
        try:
            data = self[name]
        except KeyError:
            return None
        return parsedate_tz(data)
Tim Peters's avatar
Tim Peters committed
375 376


377
    # Access as a dictionary (only finds *last* header of each type):
Tim Peters's avatar
Tim Peters committed
378

379 380 381
    def __len__(self):
        """Get the number of headers in a message."""
        return len(self.dict)
Tim Peters's avatar
Tim Peters committed
382

383 384
    def __getitem__(self, name):
        """Get a specific header, as from a dictionary."""
385
        return self.dict[name.lower()]
386 387

    def __setitem__(self, name, value):
388 389
        """Set the value of a header.

Tim Peters's avatar
Tim Peters committed
390
        Note: This is not a perfect inversion of __getitem__, because
391 392 393
        any changed headers get stuck at the end of the raw-headers list
        rather than where the altered header was.
        """
394
        del self[name] # Won't fail if it doesn't exist
395
        self.dict[name.lower()] = value
396
        text = name + ": " + value
397
        lines = text.split("\n")
398 399
        for line in lines:
            self.headers.append(line + "\n")
Tim Peters's avatar
Tim Peters committed
400

401 402
    def __delitem__(self, name):
        """Delete all occurrences of a specific header, if it is present."""
403
        name = name.lower()
404 405 406 407
        if not self.dict.has_key(name):
            return
        del self.dict[name]
        name = name + ':'
408 409 410 411 412
        n = len(name)
        list = []
        hit = 0
        for i in range(len(self.headers)):
            line = self.headers[i]
413
            if line[:n].lower() == name:
414
                hit = 1
415
            elif not line[:1].isspace():
416 417 418 419 420 421 422
                hit = 0
            if hit:
                list.append(i)
        list.reverse()
        for i in list:
            del self.headers[i]

423 424
    def has_key(self, name):
        """Determine whether a message contains the named header."""
425
        return self.dict.has_key(name.lower())
Tim Peters's avatar
Tim Peters committed
426

427 428 429
    def keys(self):
        """Get all of a message's header field names."""
        return self.dict.keys()
Tim Peters's avatar
Tim Peters committed
430

431 432 433
    def values(self):
        """Get all of a message's header field values."""
        return self.dict.values()
Tim Peters's avatar
Tim Peters committed
434

435 436
    def items(self):
        """Get all of a message's headers.
Tim Peters's avatar
Tim Peters committed
437

438 439 440
        Returns a list of name, value tuples.
        """
        return self.dict.items()
441

442 443 444 445 446
    def __str__(self):
        str = ''
        for hdr in self.headers:
            str = str + hdr
        return str
447 448 449 450 451


# Utility functions
# -----------------

452
# XXX Should fix unquote() and quote() to be really conformant.
453 454
# XXX The inverses of the parse functions may also be useful.

455 456

def unquote(str):
457 458 459 460 461 462 463
    """Remove quotes from a string."""
    if len(str) > 1:
        if str[0] == '"' and str[-1:] == '"':
            return str[1:-1]
        if str[0] == '<' and str[-1:] == '>':
            return str[1:-1]
    return str
464

465 466

def quote(str):
467
    """Add quotes around a string."""
468
    return str.replace('\\', '\\\\').replace('"', '\\"')
469

470

471
def parseaddr(address):
472
    """Parse an address into a (realname, mailaddr) tuple."""
473 474 475
    a = AddrlistClass(address)
    list = a.getaddrlist()
    if not list:
476
        return (None, None)
477
    else:
478
        return list[0]
479 480 481


class AddrlistClass:
482
    """Address parser class by Ben Escoto.
Tim Peters's avatar
Tim Peters committed
483

484 485
    To understand what this class does, it helps to have a copy of
    RFC-822 in front of you.
486 487 488

    Note: this class interface is deprecated and may be removed in the future.
    Use rfc822.AddressList instead.
489
    """
Tim Peters's avatar
Tim Peters committed
490

491
    def __init__(self, field):
492
        """Initialize a new instance.
Tim Peters's avatar
Tim Peters committed
493

494 495 496 497 498 499
        `field' is an unparsed address header field, containing
        one or more addresses.
        """
        self.specials = '()<>@,:;.\"[]'
        self.pos = 0
        self.LWS = ' \t'
500
        self.CR = '\r\n'
501 502 503
        self.atomends = self.specials + self.LWS + self.CR
        self.field = field
        self.commentlist = []
Tim Peters's avatar
Tim Peters committed
504

505
    def gotonext(self):
506 507 508 509 510 511 512
        """Parse up to the start of the next address."""
        while self.pos < len(self.field):
            if self.field[self.pos] in self.LWS + '\n\r':
                self.pos = self.pos + 1
            elif self.field[self.pos] == '(':
                self.commentlist.append(self.getcomment())
            else: break
Tim Peters's avatar
Tim Peters committed
513

514
    def getaddrlist(self):
515
        """Parse all addresses.
Tim Peters's avatar
Tim Peters committed
516

517 518 519 520 521 522
        Returns a list containing all of the addresses.
        """
        ad = self.getaddress()
        if ad:
            return ad + self.getaddrlist()
        else: return []
Tim Peters's avatar
Tim Peters committed
523

524
    def getaddress(self):
525 526 527
        """Parse the next address."""
        self.commentlist = []
        self.gotonext()
Tim Peters's avatar
Tim Peters committed
528

529 530 531
        oldpos = self.pos
        oldcl = self.commentlist
        plist = self.getphraselist()
Tim Peters's avatar
Tim Peters committed
532

533 534
        self.gotonext()
        returnlist = []
Tim Peters's avatar
Tim Peters committed
535

536 537 538
        if self.pos >= len(self.field):
            # Bad email address technically, no domain.
            if plist:
539
                returnlist = [(' '.join(self.commentlist), plist[0])]
Tim Peters's avatar
Tim Peters committed
540

541 542 543 544 545 546
        elif self.field[self.pos] in '.@':
            # email address is just an addrspec
            # this isn't very efficient since we start over
            self.pos = oldpos
            self.commentlist = oldcl
            addrspec = self.getaddrspec()
547
            returnlist = [(' '.join(self.commentlist), addrspec)]
Tim Peters's avatar
Tim Peters committed
548

549 550 551
        elif self.field[self.pos] == ':':
            # address is a group
            returnlist = []
Tim Peters's avatar
Tim Peters committed
552

553
            fieldlen = len(self.field)
554 555 556
            self.pos = self.pos + 1
            while self.pos < len(self.field):
                self.gotonext()
557
                if self.pos < fieldlen and self.field[self.pos] == ';':
558 559 560
                    self.pos = self.pos + 1
                    break
                returnlist = returnlist + self.getaddress()
Tim Peters's avatar
Tim Peters committed
561

562 563 564
        elif self.field[self.pos] == '<':
            # Address is a phrase then a route addr
            routeaddr = self.getrouteaddr()
Tim Peters's avatar
Tim Peters committed
565

566
            if self.commentlist:
567 568 569
                returnlist = [(' '.join(plist) + ' (' + \
                         ' '.join(self.commentlist) + ')', routeaddr)]
            else: returnlist = [(' '.join(plist), routeaddr)]
Tim Peters's avatar
Tim Peters committed
570

571 572
        else:
            if plist:
573
                returnlist = [(' '.join(self.commentlist), plist[0])]
574 575
            elif self.field[self.pos] in self.specials:
                self.pos = self.pos + 1
Tim Peters's avatar
Tim Peters committed
576

577 578 579 580
        self.gotonext()
        if self.pos < len(self.field) and self.field[self.pos] == ',':
            self.pos = self.pos + 1
        return returnlist
Tim Peters's avatar
Tim Peters committed
581

582
    def getrouteaddr(self):
583
        """Parse a route address (Return-path value).
Tim Peters's avatar
Tim Peters committed
584

585 586 587 588
        This method just skips all the route stuff and returns the addrspec.
        """
        if self.field[self.pos] != '<':
            return
Tim Peters's avatar
Tim Peters committed
589

590 591 592
        expectroute = 0
        self.pos = self.pos + 1
        self.gotonext()
593
        adlist = None
594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611
        while self.pos < len(self.field):
            if expectroute:
                self.getdomain()
                expectroute = 0
            elif self.field[self.pos] == '>':
                self.pos = self.pos + 1
                break
            elif self.field[self.pos] == '@':
                self.pos = self.pos + 1
                expectroute = 1
            elif self.field[self.pos] == ':':
                self.pos = self.pos + 1
                expectaddrspec = 1
            else:
                adlist = self.getaddrspec()
                self.pos = self.pos + 1
                break
            self.gotonext()
Tim Peters's avatar
Tim Peters committed
612

613
        return adlist
Tim Peters's avatar
Tim Peters committed
614

615
    def getaddrspec(self):
616 617
        """Parse an RFC-822 addr-spec."""
        aslist = []
Tim Peters's avatar
Tim Peters committed
618

619 620 621 622 623 624
        self.gotonext()
        while self.pos < len(self.field):
            if self.field[self.pos] == '.':
                aslist.append('.')
                self.pos = self.pos + 1
            elif self.field[self.pos] == '"':
Guido van Rossum's avatar
Guido van Rossum committed
625
                aslist.append('"%s"' % self.getquote())
626 627 628 629
            elif self.field[self.pos] in self.atomends:
                break
            else: aslist.append(self.getatom())
            self.gotonext()
Tim Peters's avatar
Tim Peters committed
630

631
        if self.pos >= len(self.field) or self.field[self.pos] != '@':
632
            return ''.join(aslist)
Tim Peters's avatar
Tim Peters committed
633

634 635 636
        aslist.append('@')
        self.pos = self.pos + 1
        self.gotonext()
637
        return ''.join(aslist) + self.getdomain()
Tim Peters's avatar
Tim Peters committed
638

639
    def getdomain(self):
640 641 642 643 644 645 646 647 648 649 650 651 652 653 654
        """Get the complete domain name from an address."""
        sdlist = []
        while self.pos < len(self.field):
            if self.field[self.pos] in self.LWS:
                self.pos = self.pos + 1
            elif self.field[self.pos] == '(':
                self.commentlist.append(self.getcomment())
            elif self.field[self.pos] == '[':
                sdlist.append(self.getdomainliteral())
            elif self.field[self.pos] == '.':
                self.pos = self.pos + 1
                sdlist.append('.')
            elif self.field[self.pos] in self.atomends:
                break
            else: sdlist.append(self.getatom())
655
        return ''.join(sdlist)
Tim Peters's avatar
Tim Peters committed
656

657
    def getdelimited(self, beginchar, endchars, allowcomments = 1):
658
        """Parse a header fragment delimited by special characters.
Tim Peters's avatar
Tim Peters committed
659

660 661 662
        `beginchar' is the start character for the fragment.
        If self is not looking at an instance of `beginchar' then
        getdelimited returns the empty string.
Tim Peters's avatar
Tim Peters committed
663

664 665
        `endchars' is a sequence of allowable end-delimiting characters.
        Parsing stops when one of these is encountered.
Tim Peters's avatar
Tim Peters committed
666

667 668 669 670 671
        If `allowcomments' is non-zero, embedded RFC-822 comments
        are allowed within the parsed fragment.
        """
        if self.field[self.pos] != beginchar:
            return ''
Tim Peters's avatar
Tim Peters committed
672

673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689
        slist = ['']
        quote = 0
        self.pos = self.pos + 1
        while self.pos < len(self.field):
            if quote == 1:
                slist.append(self.field[self.pos])
                quote = 0
            elif self.field[self.pos] in endchars:
                self.pos = self.pos + 1
                break
            elif allowcomments and self.field[self.pos] == '(':
                slist.append(self.getcomment())
            elif self.field[self.pos] == '\\':
                quote = 1
            else:
                slist.append(self.field[self.pos])
            self.pos = self.pos + 1
Tim Peters's avatar
Tim Peters committed
690

691
        return ''.join(slist)
Tim Peters's avatar
Tim Peters committed
692

693
    def getquote(self):
694 695
        """Get a quote-delimited fragment from self's field."""
        return self.getdelimited('"', '"\r', 0)
Tim Peters's avatar
Tim Peters committed
696

697
    def getcomment(self):
698 699
        """Get a parenthesis-delimited fragment from self's field."""
        return self.getdelimited('(', ')\r', 1)
Tim Peters's avatar
Tim Peters committed
700

701
    def getdomainliteral(self):
702
        """Parse an RFC-822 domain-literal."""
703
        return '[%s]' % self.getdelimited('[', ']\r', 0)
Tim Peters's avatar
Tim Peters committed
704

705
    def getatom(self):
706 707
        """Parse an RFC-822 atom."""
        atomlist = ['']
Tim Peters's avatar
Tim Peters committed
708

709 710 711 712 713
        while self.pos < len(self.field):
            if self.field[self.pos] in self.atomends:
                break
            else: atomlist.append(self.field[self.pos])
            self.pos = self.pos + 1
Tim Peters's avatar
Tim Peters committed
714

715
        return ''.join(atomlist)
Tim Peters's avatar
Tim Peters committed
716

717
    def getphraselist(self):
718
        """Parse a sequence of RFC-822 phrases.
Tim Peters's avatar
Tim Peters committed
719

720
        A phrase is a sequence of words, which are in turn either
721 722
        RFC-822 atoms or quoted-strings.  Phrases are canonicalized
        by squeezing all runs of continuous whitespace into one space.
723 724
        """
        plist = []
Tim Peters's avatar
Tim Peters committed
725

726 727 728 729 730 731 732 733 734 735
        while self.pos < len(self.field):
            if self.field[self.pos] in self.LWS:
                self.pos = self.pos + 1
            elif self.field[self.pos] == '"':
                plist.append(self.getquote())
            elif self.field[self.pos] == '(':
                self.commentlist.append(self.getcomment())
            elif self.field[self.pos] in self.atomends:
                break
            else: plist.append(self.getatom())
Tim Peters's avatar
Tim Peters committed
736

737
        return plist
738

739 740 741 742 743 744 745 746 747 748 749 750 751
class AddressList(AddrlistClass):
    """An AddressList encapsulates a list of parsed RFC822 addresses."""
    def __init__(self, field):
        AddrlistClass.__init__(self, field)
        if field:
            self.addresslist = self.getaddrlist()
        else:
            self.addresslist = []

    def __len__(self):
        return len(self.addresslist)

    def __str__(self):
752
        return ", ".join(map(dump_address_pair, self.addresslist))
753 754 755 756 757 758 759 760 761 762

    def __add__(self, other):
        # Set union
        newaddr = AddressList(None)
        newaddr.addresslist = self.addresslist[:]
        for x in other.addresslist:
            if not x in self.addresslist:
                newaddr.addresslist.append(x)
        return newaddr

763 764 765 766 767 768 769
    def __iadd__(self, other):
        # Set union, in-place
        for x in other.addresslist:
            if not x in self.addresslist:
                self.addresslist.append(x)
        return self

770 771 772 773 774 775 776 777
    def __sub__(self, other):
        # Set difference
        newaddr = AddressList(None)
        for x in self.addresslist:
            if not x in other.addresslist:
                newaddr.addresslist.append(x)
        return newaddr

778 779 780 781 782 783 784
    def __isub__(self, other):
        # Set difference, in-place
        for x in other.addresslist:
            if x in self.addresslist:
                self.addresslist.remove(x)
        return self

785 786
    def __getitem__(self, index):
        # Make indexing, slices, and 'in' work
787
        return self.addresslist[index]
788

789 790 791 792 793 794
def dump_address_pair(pair):
    """Dump a (name, address) pair in a canonicalized form."""
    if pair[0]:
        return '"' + pair[0] + '" <' + pair[1] + '>'
    else:
        return pair[1]
795 796 797

# Parse a date field

Guido van Rossum's avatar
Guido van Rossum committed
798 799
_monthnames = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul',
               'aug', 'sep', 'oct', 'nov', 'dec',
800
               'january', 'february', 'march', 'april', 'may', 'june', 'july',
Guido van Rossum's avatar
Guido van Rossum committed
801 802
               'august', 'september', 'october', 'november', 'december']
_daynames = ['mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun']
803

804 805 806 807 808 809
# The timezone table does not include the military time zones defined
# in RFC822, other than Z.  According to RFC1123, the description in
# RFC822 gets the signs wrong, so we can't rely on any such time
# zones.  RFC1123 recommends that numeric timezone indicators be used
# instead of timezone names.

Tim Peters's avatar
Tim Peters committed
810
_timezones = {'UT':0, 'UTC':0, 'GMT':0, 'Z':0,
811
              'AST': -400, 'ADT': -300,  # Atlantic (used in Canada)
812
              'EST': -500, 'EDT': -400,  # Eastern
813 814 815
              'CST': -600, 'CDT': -500,  # Central
              'MST': -700, 'MDT': -600,  # Mountain
              'PST': -800, 'PDT': -700   # Pacific
Tim Peters's avatar
Tim Peters committed
816
              }
817

818 819

def parsedate_tz(data):
820
    """Convert a date string to a time tuple.
Tim Peters's avatar
Tim Peters committed
821

822 823
    Accounts for military timezones.
    """
824 825
    data = data.split()
    if data[0][-1] in (',', '.') or data[0].lower() in _daynames:
826 827 828
        # There's a dayname here. Skip it
        del data[0]
    if len(data) == 3: # RFC 850 date, deprecated
829
        stuff = data[0].split('-')
830 831 832 833
        if len(stuff) == 3:
            data = stuff + data[1:]
    if len(data) == 4:
        s = data[3]
834
        i = s.find('+')
835 836 837 838 839 840 841 842
        if i > 0:
            data[3:] = [s[:i], s[i+1:]]
        else:
            data.append('') # Dummy tz
    if len(data) < 5:
        return None
    data = data[:5]
    [dd, mm, yy, tm, tz] = data
843
    mm = mm.lower()
844
    if not mm in _monthnames:
845
        dd, mm = mm, dd.lower()
846 847 848
        if not mm in _monthnames:
            return None
    mm = _monthnames.index(mm)+1
849
    if mm > 12: mm = mm - 12
Guido van Rossum's avatar
Guido van Rossum committed
850
    if dd[-1] == ',':
851
        dd = dd[:-1]
852
    i = yy.find(':')
Guido van Rossum's avatar
Guido van Rossum committed
853
    if i > 0:
854
        yy, tm = tm, yy
Guido van Rossum's avatar
Guido van Rossum committed
855
    if yy[-1] == ',':
856
        yy = yy[:-1]
857
    if not yy[0].isdigit():
858
        yy, tz = tz, yy
Guido van Rossum's avatar
Guido van Rossum committed
859
    if tm[-1] == ',':
860
        tm = tm[:-1]
861
    tm = tm.split(':')
862 863 864
    if len(tm) == 2:
        [thh, tmm] = tm
        tss = '0'
865
    elif len(tm) == 3:
866
        [thh, tmm, tss] = tm
867 868
    else:
        return None
869
    try:
870 871 872 873 874 875
        yy = int(yy)
        dd = int(dd)
        thh = int(thh)
        tmm = int(tmm)
        tss = int(tss)
    except ValueError:
876
        return None
877 878
    tzoffset = None
    tz = tz.upper()
879
    if _timezones.has_key(tz):
880
        tzoffset = _timezones[tz]
881
    else:
Tim Peters's avatar
Tim Peters committed
882
        try:
883
            tzoffset = int(tz)
Tim Peters's avatar
Tim Peters committed
884
        except ValueError:
885 886
            pass
    # Convert a timezone offset into seconds ; -0500 -> -18000
887
    if tzoffset:
888 889 890 891 892 893
        if tzoffset < 0:
            tzsign = -1
            tzoffset = -tzoffset
        else:
            tzsign = 1
        tzoffset = tzsign * ( (tzoffset/100)*3600 + (tzoffset % 100)*60)
894 895 896
    tuple = (yy, mm, dd, thh, tmm, tss, 0, 0, 0, tzoffset)
    return tuple

897

898
def parsedate(data):
899
    """Convert a time string to a time tuple."""
900 901
    t = parsedate_tz(data)
    if type(t) == type( () ):
902
        return t[:9]
Tim Peters's avatar
Tim Peters committed
903
    else: return t
904

905

906
def mktime_tz(data):
907
    """Turn a 10-tuple as returned by parsedate_tz() into a UTC timestamp."""
908
    if data[9] is None:
909 910
        # No zone info, so localtime is better assumption than GMT
        return time.mktime(data[:8] + (-1,))
911
    else:
912 913
        t = time.mktime(data[:8] + (0,))
        return t - data[9] - time.timezone
914

915 916 917 918 919 920 921 922 923 924
def formatdate(timeval=None):
    """Returns time format preferred for Internet standards.

    Sun, 06 Nov 1994 08:49:37 GMT  ; RFC 822, updated by RFC 1123
    """
    if timeval is None:
        timeval = time.time()
    return "%s" % time.strftime('%a, %d %b %Y %H:%M:%S GMT',
                                time.gmtime(timeval))

925 926 927 928 929 930

# When used as script, run a small test program.
# The first command line argument must be a filename containing one
# message in RFC-822 format.

if __name__ == '__main__':
931 932 933 934 935 936 937 938 939 940
    import sys, os
    file = os.path.join(os.environ['HOME'], 'Mail/inbox/1')
    if sys.argv[1:]: file = sys.argv[1]
    f = open(file, 'r')
    m = Message(f)
    print 'From:', m.getaddr('from')
    print 'To:', m.getaddrlist('to')
    print 'Subject:', m.getheader('subject')
    print 'Date:', m.getheader('date')
    date = m.getdate_tz('date')
941 942
    tz = date[-1]
    date = time.localtime(mktime_tz(date))
943
    if date:
944 945
        print 'ParsedDate:', time.asctime(date),
        hhmmss = tz
946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964
        hhmm, ss = divmod(hhmmss, 60)
        hh, mm = divmod(hhmm, 60)
        print "%+03d%02d" % (hh, mm),
        if ss: print ".%02d" % ss,
        print
    else:
        print 'ParsedDate:', None
    m.rewindbody()
    n = 0
    while f.readline():
        n = n + 1
    print 'Lines:', n
    print '-'*70
    print 'len =', len(m)
    if m.has_key('Date'): print 'Date =', m['Date']
    if m.has_key('X-Nonsense'): pass
    print 'keys =', m.keys()
    print 'values =', m.values()
    print 'items =', m.items()