Documentation updates for urllib package. Modified the documentation for the

urllib,urllib2 -> urllib.request,urllib.error urlparse -> urllib.parse RobotParser -> urllib.robotparser Updated tutorial references and other module references (http.client.rst, ftplib.rst,contextlib.rst) Updated the examples in the urllib2-howto Addresses Issue3142.

Documentation updates for urllib package. Modified the documentation for the
urllib,urllib2 -> urllib.request,urllib.error urlparse -> urllib.parse RobotParser -> urllib.robotparser Updated tutorial references and other module references (http.client.rst, ftplib.rst,contextlib.rst) Updated the examples in the urllib2-howto Addresses Issue3142.
aca8fd7a · Senthil Kumaran · d11a4431 · aca8fd7a · aca8fd7a · aca8fd7a
Kaydet (Commit) aca8fd7a authored Haz 23, 2008 tarafından Senthil Kumaran
12 changed files
--- a/Doc/howto/urllib2.rst
+++ b/Doc/howto/urllib2.rst
--- a/Doc/library/contextlib.rst
+++ b/Doc/library/contextlib.rst
@@ -98,9 +98,9 @@ Functions provided:
   And lets you write code like this::

      from contextlib import closing
-      import urllib
+      import urllib.request

-      with closing(urllib.urlopen('http://www.python.org')) as page:
+      with closing(urllib.request.urlopen('http://www.python.org')) as page:
          for line in page:
              print(line)


--- a/Doc/library/fileformats.rst
+++ b/Doc/library/fileformats.rst
@@ -13,7 +13,6 @@ that aren't markup languages or are related to e-mail.

   csv.rst
   configparser.rst
-   robotparser.rst
   netrc.rst
   xdrlib.rst
   plistlib.rst
--- a/Doc/library/ftplib.rst
+++ b/Doc/library/ftplib.rst
@@ -13,9 +13,9 @@
 This module defines the class :class:`FTP` and a few related items. The
 :class:`FTP` class implements the client side of the FTP protocol.  You can use
 this to write Python programs that perform a variety of automated FTP jobs, such
-as mirroring other ftp servers.  It is also used by the module :mod:`urllib` to
-handle URLs that use FTP.  For more information on FTP (File Transfer Protocol),
-see Internet :rfc:`959`.
+as mirroring other ftp servers.  It is also used by the module
+:mod:`urllib.request` to handle URLs that use FTP.  For more information on FTP
+(File Transfer Protocol), see Internet :rfc:`959`.

 Here's a sample session using the :mod:`ftplib` module::


--- a/Doc/library/http.client.rst
+++ b/Doc/library/http.client.rst
@@ -9,10 +9,11 @@
   pair: HTTP; protocol
   single: HTTP; http.client (standard module)

-.. index:: module: urllib
+.. index:: module: urllib.request

 This module defines classes which implement the client side of the HTTP and
-HTTPS protocols.  It is normally not used directly --- the module :mod:`urllib`
+HTTPS protocols.  It is normally not used directly --- the module
+:mod:`urllib.request`
 uses it to handle URLs that use HTTP and HTTPS.

 .. note::
@@ -484,8 +485,8 @@ Here is an example session that uses the ``GET`` method::

 Here is an example session that shows how to ``POST`` requests::

-   >>> import http.client, urllib
-   >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
+   >>> import http.client, urllib.parse
+   >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
   >>> headers = {"Content-type": "application/x-www-form-urlencoded",
   ...            "Accept": "text/plain"}
   >>> conn = http.client.HTTPConnection("musi-cal.mojam.com:80")

--- a/Doc/library/internet.rst
+++ b/Doc/library/internet.rst
@@ -24,8 +24,10 @@ is currently supported on most popular platforms.  Here is an overview:
   cgi.rst
   cgitb.rst
   wsgiref.rst
-   urllib.rst
-   urllib2.rst
+   urllib.request.rst
+   urllib.parse.rst
+   urllib.error.rst
+   urllib.robotparser.rst
   http.client.rst
   ftplib.rst
   poplib.rst
@@ -35,7 +37,6 @@ is currently supported on most popular platforms.  Here is an overview:
   smtpd.rst
   telnetlib.rst
   uuid.rst
-   urlparse.rst
   socketserver.rst
   http.server.rst
   http.cookies.rst

--- a/Doc/library/urllib.error.rst
+++ b/Doc/library/urllib.error.rst
+:mod:`urllib.error` --- Exception classes raised by urllib.request
+==================================================================
+
+.. module:: urllib.error
+   :synopsis: Next generation URL opening library.
+.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
+.. sectionauthor:: Senthil Kumaran <orsenthil@gmail.com>
+
+
+The :mod:`urllib.error` module defines exception classes raise by
+urllib.request. The base exception class is URLError, which inherits from
+IOError.
+
+The following exceptions are raised by :mod:`urllib.error` as appropriate:
+
+
+.. exception:: URLError
+
+   The handlers raise this exception (or derived exceptions) when they run into a
+   problem.  It is a subclass of :exc:`IOError`.
+
+   .. attribute:: reason
+
+      The reason for this error.  It can be a message string or another exception
+      instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
+      URLs).
+
+
+.. exception:: HTTPError
+
+   Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
+   can also function as a non-exceptional file-like return value (the same thing
+   that :func:`urlopen` returns).  This is useful when handling exotic HTTP
+   errors, such as requests for authentication.
+
+   .. attribute:: code
+
+      An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_. 
+      This numeric value corresponds to a value found in the dictionary of
+      codes as found in :attr:`http.server.BaseHTTPRequestHandler.responses`.
+
+.. exception:: ContentTooShortError(msg[, content])
+
+   This exception is raised when the :func:`urlretrieve` function detects that the
+   amount of the downloaded data is less than the  expected amount (given by the
+   *Content-Length* header). The :attr:`content` attribute stores the downloaded
+   (and supposedly truncated) data.
+
--- a/Doc/library/urlparse.rst
+++ b/Doc/library/urlparse.rst
-:mod:`urlparse` --- Parse URLs into components
-==============================================
+:mod:`urllib.parse` --- Parse URLs into components
+==================================================

-.. module:: urlparse
+.. module:: urllib.parse
   :synopsis: Parse URLs into or assemble them from components.


@@ -24,7 +24,7 @@ following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
 ``rsync``, ``rtsp``, ``rtspu``,  ``sftp``, ``shttp``, ``sip``, ``sips``,
 ``snews``, ``svn``,  ``svn+ssh``, ``telnet``, ``wais``.

-The :mod:`urlparse` module defines the following functions:
+The :mod:`urllib.parse` module defines the following functions:


 .. function:: urlparse(urlstring[, default_scheme[, allow_fragments]])
@@ -37,7 +37,7 @@ The :mod:`urlparse` module defines the following functions:
   result, except for a leading slash in the *path* component, which is retained if
   present.  For example:

-      >>> from urlparse import urlparse
+      >>> from urllib.parse import urlparse
      >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
      >>> o   # doctest: +NORMALIZE_WHITESPACE
      ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
@@ -154,7 +154,7 @@ The :mod:`urlparse` module defines the following functions:
   particular the addressing scheme, the network location and (part of) the path,
   to provide missing components in the relative URL.  For example:

-      >>> from urlparse import urljoin
+      >>> from urllib.parse import urljoin
      >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
      'http://www.cwi.nl/%7Eguido/FAQ.html'

@@ -183,6 +183,52 @@ The :mod:`urlparse` module defines the following functions:
   If there is no fragment identifier in *url*, returns *url* unmodified and an
   empty string.

+.. function:: quote(string[, safe])
+
+   Replace special characters in *string* using the ``%xx`` escape. Letters,
+   digits, and the characters ``'_.-'`` are never quoted. The optional *safe*
+   parameter specifies additional characters that should not be quoted --- its
+   default value is ``'/'``.
+
+   Example: ``quote('/~connolly/')`` yields ``'/%7econnolly/'``.
+
+
+.. function:: quote_plus(string[, safe])
+
+   Like :func:`quote`, but also replaces spaces by plus signs, as required for
+   quoting HTML form values.  Plus signs in the original string are escaped unless
+   they are included in *safe*.  It also does not have *safe* default to ``'/'``.
+
+
+.. function:: unquote(string)
+
+   Replace ``%xx`` escapes by their single-character equivalent.
+
+   Example: ``unquote('/%7Econnolly/')`` yields ``'/~connolly/'``.
+
+
+.. function:: unquote_plus(string)
+
+   Like :func:`unquote`, but also replaces plus signs by spaces, as required for
+   unquoting HTML form values.
+
+
+.. function:: urlencode(query[, doseq])
+
+   Convert a mapping object or a sequence of two-element tuples  to a "url-encoded"
+   string, suitable to pass to :func:`urlopen` above as the optional *data*
+   argument.  This is useful to pass a dictionary of form fields to a ``POST``
+   request.  The resulting string is a series of ``key=value`` pairs separated by
+   ``'&'`` characters, where both *key* and *value* are quoted using
+   :func:`quote_plus` above.  If the optional parameter *doseq* is present and
+   evaluates to true, individual ``key=value`` pairs are generated for each element
+   of the sequence. When a sequence of two-element tuples is used as the *query*
+   argument, the first element of each tuple is a key and the second is a value.
+   The order of parameters in the encoded string will match the order of parameter
+   tuples in the sequence. The :mod:`cgi` module provides the functions
+   :func:`parse_qs` and :func:`parse_qsl` which are used to parse query strings
+   into Python data structures.
+

 .. seealso::

@@ -219,14 +265,14 @@ described in those functions, as well as provide an additional method:
   The result of this method is a fixpoint if passed back through the original
   parsing function:

-      >>> import urlparse
+      >>> import urllib.parse
      >>> url = 'HTTP://www.Python.org/doc/#'

-      >>> r1 = urlparse.urlsplit(url)
+      >>> r1 = urllib.parse.urlsplit(url)
      >>> r1.geturl()
      'http://www.Python.org/doc/'

-      >>> r2 = urlparse.urlsplit(r1.geturl())
+      >>> r2 = urllib.parse.urlsplit(r1.geturl())
      >>> r2.geturl()
      'http://www.Python.org/doc/'


--- a/Doc/library/urllib2.rst
+++ b/Doc/library/urllib2.rst
--- a/Doc/library/urllib.robotparser.rst
+++ b/Doc/library/urllib.robotparser.rst
+
+:mod:`urllib.robotparser` ---  Parser for robots.txt
+====================================================
+
+.. module:: urllib.robotparser
+   :synopsis: Loads a robots.txt file and answers questions about
+              fetchability of other URLs.
+.. sectionauthor:: Skip Montanaro <skip@pobox.com>
+
+
+.. index::
+   single: WWW
+   single: World Wide Web
+   single: URL
+   single: robots.txt
+
+This module provides a single class, :class:`RobotFileParser`, which answers
+questions about whether or not a particular user agent can fetch a URL on the
+Web site that published the :file:`robots.txt` file.  For more details on the
+structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
+
+
+.. class:: RobotFileParser()
+
+   This class provides a set of methods to read, parse and answer questions
+   about a single :file:`robots.txt` file.
+
+
+   .. method:: set_url(url)
+
+      Sets the URL referring to a :file:`robots.txt` file.
+
+
+   .. method:: read()
+
+      Reads the :file:`robots.txt` URL and feeds it to the parser.
+
+
+   .. method:: parse(lines)
+
+      Parses the lines argument.
+
+
+   .. method:: can_fetch(useragent, url)
+
+      Returns ``True`` if the *useragent* is allowed to fetch the *url*
+      according to the rules contained in the parsed :file:`robots.txt`
+      file.
+
+
+   .. method:: mtime()
+
+      Returns the time the ``robots.txt`` file was last fetched.  This is
+      useful for long-running web spiders that need to check for new
+      ``robots.txt`` files periodically.
+
+
+   .. method:: modified()
+
+      Sets the time the ``robots.txt`` file was last fetched to the current
+      time.
+
+The following example demonstrates basic use of the RobotFileParser class. ::
+
+   >>> import urllib.robotparser
+   >>> rp = urllib.robotparser.RobotFileParser()
+   >>> rp.set_url("http://www.musi-cal.com/robots.txt")
+   >>> rp.read()
+   >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
+   False
+   >>> rp.can_fetch("*", "http://www.musi-cal.com/")
+   True
+
--- a/Doc/library/urllib.rst
+++ b/Doc/library/urllib.rst
--- a/Doc/tutorial/stdlib.rst
+++ b/Doc/tutorial/stdlib.rst
@@ -147,11 +147,11 @@ Internet Access
 ===============

 There are a number of modules for accessing the internet and processing internet
-protocols. Two of the simplest are :mod:`urllib2` for retrieving data from urls
-and :mod:`smtplib` for sending mail::
+protocols. Two of the simplest are :mod:`urllib.request` for retrieving data
+from urls and :mod:`smtplib` for sending mail::

-   >>> import urllib2
-   >>> for line in urllib2.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
+   >>> import urllib.request
+   >>> for line in urllib.request.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
   ...     if 'EST' in line or 'EDT' in line:  # look for Eastern Time
   ...         print(line)