Kaydet (Commit) aca8fd7a authored tarafından Senthil Kumaran's avatar Senthil Kumaran

Documentation updates for urllib package. Modified the documentation for the

urllib,urllib2 -> urllib.request,urllib.error
urlparse -> urllib.parse
RobotParser -> urllib.robotparser

Updated tutorial references and other module references (http.client.rst,
ftplib.rst,contextlib.rst)
Updated the examples in the urllib2-howto

Addresses Issue3142.
üst d11a4431
This diff is collapsed.
......@@ -98,9 +98,9 @@ Functions provided:
And lets you write code like this::
from contextlib import closing
import urllib
import urllib.request
with closing(urllib.urlopen('http://www.python.org')) as page:
with closing(urllib.request.urlopen('http://www.python.org')) as page:
for line in page:
print(line)
......
......@@ -13,7 +13,6 @@ that aren't markup languages or are related to e-mail.
csv.rst
configparser.rst
robotparser.rst
netrc.rst
xdrlib.rst
plistlib.rst
......@@ -13,9 +13,9 @@
This module defines the class :class:`FTP` and a few related items. The
:class:`FTP` class implements the client side of the FTP protocol. You can use
this to write Python programs that perform a variety of automated FTP jobs, such
as mirroring other ftp servers. It is also used by the module :mod:`urllib` to
handle URLs that use FTP. For more information on FTP (File Transfer Protocol),
see Internet :rfc:`959`.
as mirroring other ftp servers. It is also used by the module
:mod:`urllib.request` to handle URLs that use FTP. For more information on FTP
(File Transfer Protocol), see Internet :rfc:`959`.
Here's a sample session using the :mod:`ftplib` module::
......
......@@ -9,10 +9,11 @@
pair: HTTP; protocol
single: HTTP; http.client (standard module)
.. index:: module: urllib
.. index:: module: urllib.request
This module defines classes which implement the client side of the HTTP and
HTTPS protocols. It is normally not used directly --- the module :mod:`urllib`
HTTPS protocols. It is normally not used directly --- the module
:mod:`urllib.request`
uses it to handle URLs that use HTTP and HTTPS.
.. note::
......@@ -484,8 +485,8 @@ Here is an example session that uses the ``GET`` method::
Here is an example session that shows how to ``POST`` requests::
>>> import http.client, urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> import http.client, urllib.parse
>>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> headers = {"Content-type": "application/x-www-form-urlencoded",
... "Accept": "text/plain"}
>>> conn = http.client.HTTPConnection("musi-cal.mojam.com:80")
......
......@@ -24,8 +24,10 @@ is currently supported on most popular platforms. Here is an overview:
cgi.rst
cgitb.rst
wsgiref.rst
urllib.rst
urllib2.rst
urllib.request.rst
urllib.parse.rst
urllib.error.rst
urllib.robotparser.rst
http.client.rst
ftplib.rst
poplib.rst
......@@ -35,7 +37,6 @@ is currently supported on most popular platforms. Here is an overview:
smtpd.rst
telnetlib.rst
uuid.rst
urlparse.rst
socketserver.rst
http.server.rst
http.cookies.rst
......
:mod:`urllib.error` --- Exception classes raised by urllib.request
==================================================================
.. module:: urllib.error
:synopsis: Next generation URL opening library.
.. moduleauthor:: Jeremy Hylton <jhylton@users.sourceforge.net>
.. sectionauthor:: Senthil Kumaran <orsenthil@gmail.com>
The :mod:`urllib.error` module defines exception classes raise by
urllib.request. The base exception class is URLError, which inherits from
IOError.
The following exceptions are raised by :mod:`urllib.error` as appropriate:
.. exception:: URLError
The handlers raise this exception (or derived exceptions) when they run into a
problem. It is a subclass of :exc:`IOError`.
.. attribute:: reason
The reason for this error. It can be a message string or another exception
instance (:exc:`socket.error` for remote URLs, :exc:`OSError` for local
URLs).
.. exception:: HTTPError
Though being an exception (a subclass of :exc:`URLError`), an :exc:`HTTPError`
can also function as a non-exceptional file-like return value (the same thing
that :func:`urlopen` returns). This is useful when handling exotic HTTP
errors, such as requests for authentication.
.. attribute:: code
An HTTP status code as defined in `RFC 2616 <http://www.faqs.org/rfcs/rfc2616.html>`_.
This numeric value corresponds to a value found in the dictionary of
codes as found in :attr:`http.server.BaseHTTPRequestHandler.responses`.
.. exception:: ContentTooShortError(msg[, content])
This exception is raised when the :func:`urlretrieve` function detects that the
amount of the downloaded data is less than the expected amount (given by the
*Content-Length* header). The :attr:`content` attribute stores the downloaded
(and supposedly truncated) data.
:mod:`urlparse` --- Parse URLs into components
==============================================
:mod:`urllib.parse` --- Parse URLs into components
==================================================
.. module:: urlparse
.. module:: urllib.parse
:synopsis: Parse URLs into or assemble them from components.
......@@ -24,7 +24,7 @@ following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
The :mod:`urlparse` module defines the following functions:
The :mod:`urllib.parse` module defines the following functions:
.. function:: urlparse(urlstring[, default_scheme[, allow_fragments]])
......@@ -37,7 +37,7 @@ The :mod:`urlparse` module defines the following functions:
result, except for a leading slash in the *path* component, which is retained if
present. For example:
>>> from urlparse import urlparse
>>> from urllib.parse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
>>> o # doctest: +NORMALIZE_WHITESPACE
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
......@@ -154,7 +154,7 @@ The :mod:`urlparse` module defines the following functions:
particular the addressing scheme, the network location and (part of) the path,
to provide missing components in the relative URL. For example:
>>> from urlparse import urljoin
>>> from urllib.parse import urljoin
>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
'http://www.cwi.nl/%7Eguido/FAQ.html'
......@@ -183,6 +183,52 @@ The :mod:`urlparse` module defines the following functions:
If there is no fragment identifier in *url*, returns *url* unmodified and an
empty string.
.. function:: quote(string[, safe])
Replace special characters in *string* using the ``%xx`` escape. Letters,
digits, and the characters ``'_.-'`` are never quoted. The optional *safe*
parameter specifies additional characters that should not be quoted --- its
default value is ``'/'``.
Example: ``quote('/~connolly/')`` yields ``'/%7econnolly/'``.
.. function:: quote_plus(string[, safe])
Like :func:`quote`, but also replaces spaces by plus signs, as required for
quoting HTML form values. Plus signs in the original string are escaped unless
they are included in *safe*. It also does not have *safe* default to ``'/'``.
.. function:: unquote(string)
Replace ``%xx`` escapes by their single-character equivalent.
Example: ``unquote('/%7Econnolly/')`` yields ``'/~connolly/'``.
.. function:: unquote_plus(string)
Like :func:`unquote`, but also replaces plus signs by spaces, as required for
unquoting HTML form values.
.. function:: urlencode(query[, doseq])
Convert a mapping object or a sequence of two-element tuples to a "url-encoded"
string, suitable to pass to :func:`urlopen` above as the optional *data*
argument. This is useful to pass a dictionary of form fields to a ``POST``
request. The resulting string is a series of ``key=value`` pairs separated by
``'&'`` characters, where both *key* and *value* are quoted using
:func:`quote_plus` above. If the optional parameter *doseq* is present and
evaluates to true, individual ``key=value`` pairs are generated for each element
of the sequence. When a sequence of two-element tuples is used as the *query*
argument, the first element of each tuple is a key and the second is a value.
The order of parameters in the encoded string will match the order of parameter
tuples in the sequence. The :mod:`cgi` module provides the functions
:func:`parse_qs` and :func:`parse_qsl` which are used to parse query strings
into Python data structures.
.. seealso::
......@@ -219,14 +265,14 @@ described in those functions, as well as provide an additional method:
The result of this method is a fixpoint if passed back through the original
parsing function:
>>> import urlparse
>>> import urllib.parse
>>> url = 'HTTP://www.Python.org/doc/#'
>>> r1 = urlparse.urlsplit(url)
>>> r1 = urllib.parse.urlsplit(url)
>>> r1.geturl()
'http://www.Python.org/doc/'
>>> r2 = urlparse.urlsplit(r1.geturl())
>>> r2 = urllib.parse.urlsplit(r1.geturl())
>>> r2.geturl()
'http://www.Python.org/doc/'
......
:mod:`urllib.robotparser` --- Parser for robots.txt
====================================================
.. module:: urllib.robotparser
:synopsis: Loads a robots.txt file and answers questions about
fetchability of other URLs.
.. sectionauthor:: Skip Montanaro <skip@pobox.com>
.. index::
single: WWW
single: World Wide Web
single: URL
single: robots.txt
This module provides a single class, :class:`RobotFileParser`, which answers
questions about whether or not a particular user agent can fetch a URL on the
Web site that published the :file:`robots.txt` file. For more details on the
structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
.. class:: RobotFileParser()
This class provides a set of methods to read, parse and answer questions
about a single :file:`robots.txt` file.
.. method:: set_url(url)
Sets the URL referring to a :file:`robots.txt` file.
.. method:: read()
Reads the :file:`robots.txt` URL and feeds it to the parser.
.. method:: parse(lines)
Parses the lines argument.
.. method:: can_fetch(useragent, url)
Returns ``True`` if the *useragent* is allowed to fetch the *url*
according to the rules contained in the parsed :file:`robots.txt`
file.
.. method:: mtime()
Returns the time the ``robots.txt`` file was last fetched. This is
useful for long-running web spiders that need to check for new
``robots.txt`` files periodically.
.. method:: modified()
Sets the time the ``robots.txt`` file was last fetched to the current
time.
The following example demonstrates basic use of the RobotFileParser class. ::
>>> import urllib.robotparser
>>> rp = urllib.robotparser.RobotFileParser()
>>> rp.set_url("http://www.musi-cal.com/robots.txt")
>>> rp.read()
>>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
False
>>> rp.can_fetch("*", "http://www.musi-cal.com/")
True
This diff is collapsed.
......@@ -147,11 +147,11 @@ Internet Access
===============
There are a number of modules for accessing the internet and processing internet
protocols. Two of the simplest are :mod:`urllib2` for retrieving data from urls
and :mod:`smtplib` for sending mail::
protocols. Two of the simplest are :mod:`urllib.request` for retrieving data
from urls and :mod:`smtplib` for sending mail::
>>> import urllib2
>>> for line in urllib2.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
>>> import urllib.request
>>> for line in urllib.request.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
... if 'EST' in line or 'EDT' in line: # look for Eastern Time
... print(line)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment