Kaydet (Commit) 5e0759d3 authored tarafından Guido van Rossum's avatar Guido van Rossum

Add chapter on classes (mostly from ../misc/CLASSES).

üst 2d4aa4f5
......@@ -57,6 +57,7 @@ a more formal definition of the language.
\pagenumbering{arabic}
\chapter{Whetting Your Appetite}
If you ever wrote a large shell script, you probably know this
......@@ -141,6 +142,7 @@ should read the Library Reference, which gives complete (though terse)
reference material about built-in and standard types, functions and
modules that can save you a lot of time when writing Python programs.
\chapter{Using the Python Interpreter}
\section{Invoking the Interpreter}
......@@ -380,6 +382,7 @@ completion mechanism might use the interpreter's symbol table. A
command to check (or even suggest) matching parentheses, quotes etc.
would also be useful.
\chapter{An Informal Introduction to Python}
In the following examples, input and output are distinguished by the
......@@ -786,6 +789,7 @@ prompt if the last line was not completed.
\end{itemize}
\chapter{More Control Flow Tools}
Besides the {\tt while} statement just introduced, Python knows the
......@@ -1065,6 +1069,7 @@ it is equivalent to {\tt result = result + [b]}, but more efficient.
\end{itemize}
\chapter{Odds and Ends}
This chapter describes some things you've learned about already in
......@@ -1359,6 +1364,7 @@ to their numeric value, so 0 equals 0.0, etc.%
the language.
}
\chapter{Modules}
If you quit from the Python interpreter and enter it again, the
......@@ -1581,6 +1587,7 @@ meError', 'SystemError', 'TypeError', 'abs', 'chr', 'dir', 'divmod', 'eval',
>>>
\end{verbatim}\ecode
\chapter{Output Formatting}
So far we've encountered two ways of writing values: {\em expression
......@@ -1675,6 +1682,7 @@ signs:%
>>>
\end{verbatim}\ecode
\chapter{Errors and Exceptions}
Until now error messages haven't been more than mentioned, but if you
......@@ -1963,4 +1971,594 @@ handler (and even if another exception occurred in the handler).
It is also executed when the {\tt try} statement is left via a
{\tt break} or {\tt return} statement.
\chapter{Classes}
Python's class mechanism adds classes to the language with a minimum
of new syntax and semantics. It is a mixture of the class mechanisms
found in C++ and Modula-3. As is true for modules, classes in Python
do not put an absolute barrier between definition and user, but rather
rely on the politeness of the user not to ``break into the
definition.'' The most important features of classes are retained
with full power, however: the class inheritance mechanism allows
multiple base classes, a derived class can override any methods of its
base class(es), a method can call the method of a base class with the
same name. Objects can contain an arbitrary amount of private data.
In C++ terminology, all class members (including the data members) are
{\em public}, and all member functions are {\em virtual}. There are
no special constructors or desctructors. As in Modula-3, there are no
shorthands for referencing the object's members from its methods: the
method function is declared with an explicit first argument
representing the object, which is provided implicitly by the call. As
in Smalltalk, classes themselves are objects, albeit in the wider
sense of the word: in Python, all data types are objects. This
provides semantics for importing and renaming. But, just like in C++
or Modula-3, built-in types cannot be used as base classes for
extension by the user. Also, like in Modula-3 but unlike in C++, the
built-in operators with special syntax (arithmetic operators,
subscriptong etc.) cannot be redefined for class members.
\section{A word about terminology}
Lacking universally accepted terminology to talk about classes, I'll
make occasional use of Smalltalk and C++ terms. (I'd use Modula-3
terms, since its object-oriented semantics are closer to those of
Python than C++, but I expect that few readers have heard of it...)
I also have to warn you that there's a terminological pitfall for
object-oriented readers: the word ``object'' in Python does not
necessarily mean a class instance. Like C++ and Modula-3, and unlike
Smalltalk, not all types in Python are classes: the basic built-in
types like integers and lists aren't, and even somewhat more exotic
types like files aren't. However, {\em all} Python types share a little
bit of common semantics that is best described by using the word
object.
Objects have individuality, and multiple names (in multiple scopes)
can be bound to the same object. This is known as aliasing in other
languages. This is usually not appreciated on a first glance at
Python, and can be safely ignored when dealing with immutable basic
types (numbers, strings, tuples). However, aliasing has an
(intended!) effect on the semantics of Python code involving mutable
objects such as lists, dictionaries, and most types representing
entities outside the program (files, windows, etc.). This is usually
used to the benefit of the program, since aliases behave like pointers
in some respects. For example, passing an object is cheap since only
a pointer is passed by the implementation; and if a function modifies
an object passed as an argument, the caller will see the change --- this
obviates the need for two different argument passing mechanisms as in
Pascal.
\section{Python scopes and name spaces}
Before introducing classes, I first have to tell you something about
Python's scope rules. Class definitions play some neat tricks with
name spaces, and you need to know how scopes and name spaces work to
fully understand what's going on. Incidentally, knowledge about this
subject is useful for any advanced Python programmer.
Let's begin with some definitions.
A {\em name space} is a mapping from names to objects. Most name
spaces are currently implemented as Python dictionaries, but that's
normally not noticeable in any way (except for performance), and it
may change in the future. Examples of name spaces are: the set of
built-in names (functions such as \verb\abs()\, and built-in exception
names); the global names in a module; and the local names in a
function invocation. In a sense the set of attributes of an object
also form a name space. The important things to know about name
spaces is that there is absolutely no relation between names in
different name spaces; for instance, two different modules may both
define a function ``maximize'' without confusion --- users of the
modules must prefix it with the module name.
By the way, I use the word {\em attribute} for any name following a
dot --- for example, in the expression \verb\z.real\, \verb\real\ is
an attribute of the object \verb\z\. Strictly speaking, references to
names in modules are attribute references: in the expression
\verb\modname.funcname\, \verb\modname\ is a module object and
\verb\funcname\ is an attribute of it. In this case there happens to
be a straightforward mapping between the module's attributes and the
global names defined in the module: they share the same name space!%
\footnote{
Except for one thing. Module objects have a secret read-only
attribute called {\tt __dict__} which returns the dictionary
used to implement the module's name space; the name
{\tt __dict__} is an attribute but not a global name.
Obviously, using this violates the abstraction of name space
implementation, and should be restricted to things like
post-mortem debuggers...
}
Attributes may be read-only or writable. In the latter case,
assignment to attributes is possible. Module attributes are writable:
you can write \verb\modname.the_answer = 42\. Writable attributes may
also be deleted with the del statement, e.g.
\verb\del modname.the_answer\.
Name spaces are created at different moments and have different
lifetimes. The name space containing the built-in names is created
when the Python interpreter starts up, and is never deleted. The
global name space for a module is created when the module definition
is read in; normally, module name spaces also last until the
interpreter quits. The statements executed by the top-level
invocation of the interpreter, either read from a script file or
interactively, are considered part of a module called \verb\__main__\,
so they have their own global name space. (The built-in names
actually also live in a module; this is called \verb\builtin\,
although it should really have been called \verb\__builtin__\.)
The local name space for a function is created when the function is
called, and deleted when the function returns or raises an exception
that is not handled within the function. (Actually, forgetting would
be a better way to describe what actually happens.) Of course,
recursive invocations each have their own local name space.
A {\em scope} is a textual region of a Python program where a name space
is directly accessible. ``Directly accessible'' here means that an
unqualified reference to a name attempts to find the name in the name
space.
Although scopes are determined statically, they are used dynamically.
At any time during execution, exactly three nested scopes are in use
(i.e., exactly three name spaces are directly accessible): the
innermost scope, which is searched first, contains the local names,
the middle scope, searched next, contains the current module's global
names, and the outermost scope (searched last) is the name space
containing built-in names.
Usually, the local scope references the local names of the (textually)
current function. Outside functions, the the local scope references
the same name space as the global scope: the module's name space.
Class definitions place yet another name space in the local scope.
It is important to realize that scopes are determined textually: the
global scope of a function defined in a module is that module's name
space, no matter from where or by what alias the function is called.
On the other hand, the actual search for names is done dynamically, at
run time --- however, the the language definition is evolving towards
static name resolution, at ``compile'' time, so don't rely on dynamic
name resolution! (In fact, local variables are already determined
statically.)
A special quirk of Python is that assignments always go into the
innermost scope. Assignments do not copy data --- they just
bind names to objects. The same is true for deletions: the statement
\verb\del x\ removes the binding of x from the name space referenced by the
local scope. In fact, all operations that introduce new names use the
local scope: in particular, import statements and function definitions
bind the module or function name in the local scope. (The
\verb\global\ statement can be used to indicate that particular
variables live in the global scope.)
\section{A first look at classes}
Classes introduce a little bit of new syntax, three new object types,
and some new semantics.
\subsection{Class definition syntax}
The simplest form of class definition looks like this:
\begin{verbatim}
class ClassName:
<statement-1>
.
.
.
<statement-N>
\end{verbatim}
Class definitions, like function definitions (\verb\def\ statements)
must be executed before they have any effect. (You could conceivably
place a class definition in a branch of an \verb\if\ statement, or
inside a function.)
In practice, the statements inside a class definition will usually be
function definitions, but other statements are allowed, and sometimes
useful --- we'll come back to this later. The function definitions
inside a class normally have a peculiar form of argument list,
dictated by the calling conventions for methods --- again, this is
explained later.
When a class definition is entered, a new name space is created, and
used as the local scope --- thus, all assignments to local variables
go into this new name space. In particular, function definitions bind
the name of the new function here.
When a class definition is left normally (via the end), a {\em class
object} is created. This is basically a wrapper around the contents
of the name space created by the class definition; we'll learn more
about class objects in the next section. The original local scope
(the one in effect just before the class definitions was entered) is
reinstated, and the class object is bound here to class name given in
the class definition header (ClassName in the example).
\subsection{Class objects}
Class objects support two kinds of operations: attribute references
and instantiation.
{\em Attribute references} use the standard syntax used for all
attribute references in Python: \verb\obj.name\. Valid attribute
names are all the names that were in the class's name space when the
class object was created. So, if the class definition looked like
this:
\begin{verbatim}
class MyClass:
i = 12345
def f(x):
return 'hello world'
\end{verbatim}
then \verb\MyClass.i\ and \verb\MyClass.f\ are valid attribute
references, returning an integer and a function object, respectively.
Class attributes can also be assigned to, so you can change the
value of \verb\MyClass.i\ by assignment.
Class {\em instantiation} uses function notation. Just pretend that
the class object is a parameterless function that returns a new
instance of the class. For example, (assuming the above class):
\begin{verbatim}
x = MyClass()
\end{verbatim}
creates a new {\em instance} of the class and assigns this object to
the local variable \verb\x\.
\subsection{Instance objects}
Now what can we do with instance objects? The only operations
understood by instance objects are attribute references. There are
two kinds of valid attribute names.
The first I'll call {\em data attributes}. These correspond to
``instance variables'' in Smalltalk, and to ``data members'' in C++.
Data attributes need not be declared; like local variables, they
spring into existence when they are first assigned to. For example,
if \verb\x\ in the instance of \verb\MyClass\ created above, the
following piece of code will print the value 16, without leaving a
trace:
\begin{verbatim}
x.counter = 1
while x.counter < 10:
x.counter = x.counter * 2
print x.counter
del x.counter
\end{verbatim}
The second kind of attribute references understood by instance objects
are {\em methods}. A method is a function that ``belongs to'' an
object. (In Python, the term method is not unique to class instances:
other object types can have methods as well, e.g., list objects have
methods called append, insert, remove, sort, and so on. However,
below, we'll use the term method exclusively to mean methods of class
instance objects, unless explicitly stated otherwise.)
Valid method names of an instance object depend on its class. By
definition, all attributes of a class that are (user-defined) function
objects define corresponding methods of its instances. So in our
example, \verb\x.f\ is a valid method reference, since
\verb\MyClass.f\ is a function, but \verb\x.i\ is not, since
\verb\MyClass.i\ is not. But \verb\x.f\ is not the
same thing as \verb\MyClass.f\ --- it is a {\em method object}, not a
function object.
\subsection{Method objects}
Usually, a method is called immediately, e.g.:
\begin{verbatim}
x.f()
\end{verbatim}
In our example, this will return the string \verb\'hello world'\.
However, it is not necessary to call a method right away: \verb\x.f\
is a method object, and can be stored away and called at a later
moment, for example:
\begin{verbatim}
xf = x.f
while 1:
print xf()
\end{verbatim}
will continue to print \verb\hello world\ until the end of time.
What exactly happens when a method is called? You may have noticed
that \verb\x.f()\ was called without an argument above, even though
the function definition for \verb\f\ specified an argument. What
happened to the argument? Surely Python raises an exception when a
function that requires an argument is called without any --- even if
the argument isn't actually used...
Actually, you may have guessed the answer: the special thing about
methods is that the object is passed as the first argument of the
function. In our example, the call \verb\x.f()\ is exactly equivalent
to \verb\MyClass.f(x)\. In general, calling a method with a list of
{\em n} arguments is equivalent to calling the corresponding function
with an argument list that is created by inserting the method's object
before the first argument.
If you still don't understand how methods work, a look at the
implementation can perhaps clarify matters. When an instance
attribute is referenced that isn't a data attribute, its class is
searched. If the name denotes a valid class attribute that is a
function object, a method object is created by packing (pointers to)
the instance object and the function object just found together in an
abstract object: this is the method object. When the method object is
called with an argument list, it is unpacked again, a new argument
list is constructed from the instance object and the original argument
list, and the function object is called with this new argument list.
\section{Random remarks}
[These should perhaps be placed more carefully...]
Data attributes override method attributes with the same name; to
avoid accidental name conflicts, which may cause hard-to-find bugs in
large programs, it is wise to use some kind of convention that
minimizes the chance of conflicts, e.g., capitalize method names,
prefix data attribute names with a small unique string (perhaps just
an undescore), or use verbs for methods and nouns for data attributes.
Data attributes may be referenced by methods as well as by ordinary
users (``clients'') of an object. In other words, classes are not
usable to implement pure abstract data types. In fact, nothing in
Python makes it possible to enforce data hiding --- it is all based
upon convention. (On the other hand, the Python implementation,
written in C, can completely hide implementation details and control
access to an object if necessary; this can be used by extensions to
Python written in C.)
Clients should use data attributes with care --- clients may mess up
invariants maintained by the methods by stamping on their data
attributes. Note that clients may add data attributes of their own to
an instance object without affecting the validity of the methods, as
long as name conflicts are avoided --- again, a naming convention can
save a lot of headaches here.
There is no shorthand for referencing data attributes (or other
methods!) from within methods. I find that this actually increases
the readability of methods: there is no chance of confusing local
variables and instance variables when glancing through a method.
Conventionally, the first argument of methods is often called
\verb\self\. This is nothing more than a convention: the name
\verb\self\ has absolutely no special meaning to Python. (Note,
however, that by not following the convention your code may be less
readable by other Python programmers, and it is also conceivable that
a {\em class browser} program be written which relies upon such a
convention.)
Any function object that is a class attribute defines a method for
instances of that class. It is not necessary that the function
definition is textually enclosed in the class definition: assigning a
function object to a local variable in the class is also ok. For
example:
\begin{verbatim}
# Function defined outside the class
def f1(self, x, y):
return min(x, x+y)
class C:
f = f1
def g(self):
return 'hello world'
h = g
\end{verbatim}
Now \verb\f\, \verb\g\ and \verb\h\ are all attributes of class
\verb\C\ that refer to function objects, and consequently they are all
methods of instances of \verb\C\ --- \verb\h\ being exactly equivalent
to \verb\g\. Note that this practice usually only serves to confuse
the reader of a program.
Methods may call other methods by using method attributes of the
\verb\self\ argument, e.g.:
\begin{verbatim}
class Bag:
def empty(self):
self.data = []
def add(self, x):
self.data.append(x)
def addtwice(self, x):
self.add(x) self.add(x)
\end{verbatim}
The instantiation operation (``calling'' a class object) creates an
empty object. Many classes like to create objects in a known initial
state. There is no special syntax to enforce this, but a convention
works almost as well: add a method named \verb\init\ to the class,
which initializes the instance (by assigning to some important data
attributes) and returns the instance itself. For example, class
\verb\Bag\ above could have the following method:
\begin{verbatim}
def init(self):
self.empty()
return self
\end{verbatim}
The client can then create and initialize an instance in one
statement, as follows:
\begin{verbatim}
x = Bag().init()
\end{verbatim}
Of course, the \verb\init\ method may have arguments for greater
flexibility.
Warning: a common mistake is to forget the \verb\return self\ at the
end of an init method!
Methods may reference global names in the same way as ordinary
functions. The global scope associated with a method is the module
containing the class definition. (The class itself is never used as a
global scope!) While one rarely encounters a good reason for using
global data in a method, there are many legitimate uses of the global
scope: for one thing, functions and modules imported into the global
scope can be used by methods, as well as functions and classes defined
in it. Usually, the class containing the method is itself defined in
this global scope, and in the next section we'll find some good
reasons why a method would want to reference its own class!
\section{Inheritance}
Of course, a language feature would not be worthy of the name ``class''
without supporting inheritance. The syntax for a derived class
definition looks as follows:
\begin{verbatim}
class DerivedClassName(BaseClassName):
<statement-1>
.
.
.
<statement-N>
\end{verbatim}
The name \verb\BaseClassName\ must be defined in a scope containing
the derived class definition. Instead of a base class name, an
expression is also allowed. This is useful when the base class is
defined in another module, e.g.,
\begin{verbatim}
class DerivedClassName(modname.BaseClassName):
\end{verbatim}
Execution of a derived class definition proceeds the same as for a
base class. When the class object is constructed, the base class is
remembered. This is used for resolving attribute references: if a
requested attribute is not found in the class, it is searched in the
base class. This rule is applied recursively if the base class itself
is derived from some other class.
There's nothing special about instantiation of derived classes:
\verb\DerivedClassName()\ creates a new instance of the class. Method
references are resolved as follows: the corresponding class attribute
is searched, descending down the chain of base classes if necessary,
and the method reference is valid if this yields a function object.
Derived classes may override methods of their base classes. Because
methods have no special privileges when calling other methods of the
same object, a method of a base class that calls another method
defined in the same base class, may in fact end up calling a method of
a derived class that overrides it. (For C++ programmers: all methods
in Python are ``virtual functions''.)
An overriding method in a derived class may in fact want to extend
rather than simply replace the base class method of the same name.
There is a simple way to call the base class method directly: just
call \verb\BaseClassName.methodname(self, arguments)\. This is
occasionally useful to clients as well. (Note that this only works if
the base class is defined or imported directly in the global scope.)
\subsection{Multiple inheritance}
Poython supports a limited form of multiple inheritance as well. A
class definition with multiple base classes looks as follows:
\begin{verbatim}
class DerivedClassName(Base1, Base2, Base3):
<statement-1>
.
.
.
<statement-N>
\end{verbatim}
The only rule necessary to explain the semantics is the resolution
rule used for class attribute references. This is depth-first,
left-to-right. Thus, if an attribute is not found in
\verb\DerivedClassName\, it is searched in \verb\Base1\, then
(recursively) in the base classes of \verb\Base1\, and only if it is
not found there, it is searched in \verb\Base2\, and so on.
(To some people breadth first --- searching \verb\Base2\ and
\verb\Base3\ before the base classes of \verb\Base1\ --- looks more
natural. However, this would require you to know whether a particular
attribute of \verb\Base1\ is actually defined in \verb\Base1\ or in
one of its base classes before you can figure out the consequences of
a name conflict with an attribute of \verb\Base2\. The depth-first
rule makes no differences between direct and inherited attributes of
\verb\Base1\.)
It is clear that indiscriminate use of multiple inheritance is a
maintenance nightmare, given the reliance in Python on conventions to
avoid accidental name conflicts. A well-known problem with multiple
inheritance is a class derived from two classes that happen to have a
common base class. While it is easy enough to figure out what happens
in this case (the instance will have a single copy of ``instance
variables'' or data attributes used by the common base class), it is
not clear that these semantics are in any way useful.
\section{Odds and ends}
Sometimes it is useful to have a data type similar to the Pascal
``record'' or C ``struct'', bundling together a couple of named data
items. An empty class definition will do nicely, e.g.:
\begin{verbatim}
class Employee:
pass
john = Employee() # Create an empty employee record
# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000
\end{verbatim}
A piece of Python code that expects a particular abstract data type
can often be passed a class that emulates the methods of that data
type instead. For instance, if you have a function that formats some
data from a file object, you can define a class with methods
\verb\read()\ and \verb\readline()\ that gets the data from a string
buffer instead, and pass it as an argument. (Unfortunately, this
technique has its limitations: a class can't define operations that
are accessed by special syntax such as sequence subscripting or
arithmetic operators, and assigning such a ``pseudo-file'' to
\verb\sys.stdin\ will not cause the interpreter to read further input
from it.)
Instance method objects have attributes, too: \verb\m.im_self\ is the
object of which the method is an instance, and \verb\m.im_func\ is the
function object corresponding to the method.
XXX Mention bw compat hacks.
\end{document}
......@@ -57,6 +57,7 @@ a more formal definition of the language.
\pagenumbering{arabic}
\chapter{Whetting Your Appetite}
If you ever wrote a large shell script, you probably know this
......@@ -141,6 +142,7 @@ should read the Library Reference, which gives complete (though terse)
reference material about built-in and standard types, functions and
modules that can save you a lot of time when writing Python programs.
\chapter{Using the Python Interpreter}
\section{Invoking the Interpreter}
......@@ -380,6 +382,7 @@ completion mechanism might use the interpreter's symbol table. A
command to check (or even suggest) matching parentheses, quotes etc.
would also be useful.
\chapter{An Informal Introduction to Python}
In the following examples, input and output are distinguished by the
......@@ -786,6 +789,7 @@ prompt if the last line was not completed.
\end{itemize}
\chapter{More Control Flow Tools}
Besides the {\tt while} statement just introduced, Python knows the
......@@ -1065,6 +1069,7 @@ it is equivalent to {\tt result = result + [b]}, but more efficient.
\end{itemize}
\chapter{Odds and Ends}
This chapter describes some things you've learned about already in
......@@ -1359,6 +1364,7 @@ to their numeric value, so 0 equals 0.0, etc.%
the language.
}
\chapter{Modules}
If you quit from the Python interpreter and enter it again, the
......@@ -1581,6 +1587,7 @@ meError', 'SystemError', 'TypeError', 'abs', 'chr', 'dir', 'divmod', 'eval',
>>>
\end{verbatim}\ecode
\chapter{Output Formatting}
So far we've encountered two ways of writing values: {\em expression
......@@ -1675,6 +1682,7 @@ signs:%
>>>
\end{verbatim}\ecode
\chapter{Errors and Exceptions}
Until now error messages haven't been more than mentioned, but if you
......@@ -1963,4 +1971,594 @@ handler (and even if another exception occurred in the handler).
It is also executed when the {\tt try} statement is left via a
{\tt break} or {\tt return} statement.
\chapter{Classes}
Python's class mechanism adds classes to the language with a minimum
of new syntax and semantics. It is a mixture of the class mechanisms
found in C++ and Modula-3. As is true for modules, classes in Python
do not put an absolute barrier between definition and user, but rather
rely on the politeness of the user not to ``break into the
definition.'' The most important features of classes are retained
with full power, however: the class inheritance mechanism allows
multiple base classes, a derived class can override any methods of its
base class(es), a method can call the method of a base class with the
same name. Objects can contain an arbitrary amount of private data.
In C++ terminology, all class members (including the data members) are
{\em public}, and all member functions are {\em virtual}. There are
no special constructors or desctructors. As in Modula-3, there are no
shorthands for referencing the object's members from its methods: the
method function is declared with an explicit first argument
representing the object, which is provided implicitly by the call. As
in Smalltalk, classes themselves are objects, albeit in the wider
sense of the word: in Python, all data types are objects. This
provides semantics for importing and renaming. But, just like in C++
or Modula-3, built-in types cannot be used as base classes for
extension by the user. Also, like in Modula-3 but unlike in C++, the
built-in operators with special syntax (arithmetic operators,
subscriptong etc.) cannot be redefined for class members.
\section{A word about terminology}
Lacking universally accepted terminology to talk about classes, I'll
make occasional use of Smalltalk and C++ terms. (I'd use Modula-3
terms, since its object-oriented semantics are closer to those of
Python than C++, but I expect that few readers have heard of it...)
I also have to warn you that there's a terminological pitfall for
object-oriented readers: the word ``object'' in Python does not
necessarily mean a class instance. Like C++ and Modula-3, and unlike
Smalltalk, not all types in Python are classes: the basic built-in
types like integers and lists aren't, and even somewhat more exotic
types like files aren't. However, {\em all} Python types share a little
bit of common semantics that is best described by using the word
object.
Objects have individuality, and multiple names (in multiple scopes)
can be bound to the same object. This is known as aliasing in other
languages. This is usually not appreciated on a first glance at
Python, and can be safely ignored when dealing with immutable basic
types (numbers, strings, tuples). However, aliasing has an
(intended!) effect on the semantics of Python code involving mutable
objects such as lists, dictionaries, and most types representing
entities outside the program (files, windows, etc.). This is usually
used to the benefit of the program, since aliases behave like pointers
in some respects. For example, passing an object is cheap since only
a pointer is passed by the implementation; and if a function modifies
an object passed as an argument, the caller will see the change --- this
obviates the need for two different argument passing mechanisms as in
Pascal.
\section{Python scopes and name spaces}
Before introducing classes, I first have to tell you something about
Python's scope rules. Class definitions play some neat tricks with
name spaces, and you need to know how scopes and name spaces work to
fully understand what's going on. Incidentally, knowledge about this
subject is useful for any advanced Python programmer.
Let's begin with some definitions.
A {\em name space} is a mapping from names to objects. Most name
spaces are currently implemented as Python dictionaries, but that's
normally not noticeable in any way (except for performance), and it
may change in the future. Examples of name spaces are: the set of
built-in names (functions such as \verb\abs()\, and built-in exception
names); the global names in a module; and the local names in a
function invocation. In a sense the set of attributes of an object
also form a name space. The important things to know about name
spaces is that there is absolutely no relation between names in
different name spaces; for instance, two different modules may both
define a function ``maximize'' without confusion --- users of the
modules must prefix it with the module name.
By the way, I use the word {\em attribute} for any name following a
dot --- for example, in the expression \verb\z.real\, \verb\real\ is
an attribute of the object \verb\z\. Strictly speaking, references to
names in modules are attribute references: in the expression
\verb\modname.funcname\, \verb\modname\ is a module object and
\verb\funcname\ is an attribute of it. In this case there happens to
be a straightforward mapping between the module's attributes and the
global names defined in the module: they share the same name space!%
\footnote{
Except for one thing. Module objects have a secret read-only
attribute called {\tt __dict__} which returns the dictionary
used to implement the module's name space; the name
{\tt __dict__} is an attribute but not a global name.
Obviously, using this violates the abstraction of name space
implementation, and should be restricted to things like
post-mortem debuggers...
}
Attributes may be read-only or writable. In the latter case,
assignment to attributes is possible. Module attributes are writable:
you can write \verb\modname.the_answer = 42\. Writable attributes may
also be deleted with the del statement, e.g.
\verb\del modname.the_answer\.
Name spaces are created at different moments and have different
lifetimes. The name space containing the built-in names is created
when the Python interpreter starts up, and is never deleted. The
global name space for a module is created when the module definition
is read in; normally, module name spaces also last until the
interpreter quits. The statements executed by the top-level
invocation of the interpreter, either read from a script file or
interactively, are considered part of a module called \verb\__main__\,
so they have their own global name space. (The built-in names
actually also live in a module; this is called \verb\builtin\,
although it should really have been called \verb\__builtin__\.)
The local name space for a function is created when the function is
called, and deleted when the function returns or raises an exception
that is not handled within the function. (Actually, forgetting would
be a better way to describe what actually happens.) Of course,
recursive invocations each have their own local name space.
A {\em scope} is a textual region of a Python program where a name space
is directly accessible. ``Directly accessible'' here means that an
unqualified reference to a name attempts to find the name in the name
space.
Although scopes are determined statically, they are used dynamically.
At any time during execution, exactly three nested scopes are in use
(i.e., exactly three name spaces are directly accessible): the
innermost scope, which is searched first, contains the local names,
the middle scope, searched next, contains the current module's global
names, and the outermost scope (searched last) is the name space
containing built-in names.
Usually, the local scope references the local names of the (textually)
current function. Outside functions, the the local scope references
the same name space as the global scope: the module's name space.
Class definitions place yet another name space in the local scope.
It is important to realize that scopes are determined textually: the
global scope of a function defined in a module is that module's name
space, no matter from where or by what alias the function is called.
On the other hand, the actual search for names is done dynamically, at
run time --- however, the the language definition is evolving towards
static name resolution, at ``compile'' time, so don't rely on dynamic
name resolution! (In fact, local variables are already determined
statically.)
A special quirk of Python is that assignments always go into the
innermost scope. Assignments do not copy data --- they just
bind names to objects. The same is true for deletions: the statement
\verb\del x\ removes the binding of x from the name space referenced by the
local scope. In fact, all operations that introduce new names use the
local scope: in particular, import statements and function definitions
bind the module or function name in the local scope. (The
\verb\global\ statement can be used to indicate that particular
variables live in the global scope.)
\section{A first look at classes}
Classes introduce a little bit of new syntax, three new object types,
and some new semantics.
\subsection{Class definition syntax}
The simplest form of class definition looks like this:
\begin{verbatim}
class ClassName:
<statement-1>
.
.
.
<statement-N>
\end{verbatim}
Class definitions, like function definitions (\verb\def\ statements)
must be executed before they have any effect. (You could conceivably
place a class definition in a branch of an \verb\if\ statement, or
inside a function.)
In practice, the statements inside a class definition will usually be
function definitions, but other statements are allowed, and sometimes
useful --- we'll come back to this later. The function definitions
inside a class normally have a peculiar form of argument list,
dictated by the calling conventions for methods --- again, this is
explained later.
When a class definition is entered, a new name space is created, and
used as the local scope --- thus, all assignments to local variables
go into this new name space. In particular, function definitions bind
the name of the new function here.
When a class definition is left normally (via the end), a {\em class
object} is created. This is basically a wrapper around the contents
of the name space created by the class definition; we'll learn more
about class objects in the next section. The original local scope
(the one in effect just before the class definitions was entered) is
reinstated, and the class object is bound here to class name given in
the class definition header (ClassName in the example).
\subsection{Class objects}
Class objects support two kinds of operations: attribute references
and instantiation.
{\em Attribute references} use the standard syntax used for all
attribute references in Python: \verb\obj.name\. Valid attribute
names are all the names that were in the class's name space when the
class object was created. So, if the class definition looked like
this:
\begin{verbatim}
class MyClass:
i = 12345
def f(x):
return 'hello world'
\end{verbatim}
then \verb\MyClass.i\ and \verb\MyClass.f\ are valid attribute
references, returning an integer and a function object, respectively.
Class attributes can also be assigned to, so you can change the
value of \verb\MyClass.i\ by assignment.
Class {\em instantiation} uses function notation. Just pretend that
the class object is a parameterless function that returns a new
instance of the class. For example, (assuming the above class):
\begin{verbatim}
x = MyClass()
\end{verbatim}
creates a new {\em instance} of the class and assigns this object to
the local variable \verb\x\.
\subsection{Instance objects}
Now what can we do with instance objects? The only operations
understood by instance objects are attribute references. There are
two kinds of valid attribute names.
The first I'll call {\em data attributes}. These correspond to
``instance variables'' in Smalltalk, and to ``data members'' in C++.
Data attributes need not be declared; like local variables, they
spring into existence when they are first assigned to. For example,
if \verb\x\ in the instance of \verb\MyClass\ created above, the
following piece of code will print the value 16, without leaving a
trace:
\begin{verbatim}
x.counter = 1
while x.counter < 10:
x.counter = x.counter * 2
print x.counter
del x.counter
\end{verbatim}
The second kind of attribute references understood by instance objects
are {\em methods}. A method is a function that ``belongs to'' an
object. (In Python, the term method is not unique to class instances:
other object types can have methods as well, e.g., list objects have
methods called append, insert, remove, sort, and so on. However,
below, we'll use the term method exclusively to mean methods of class
instance objects, unless explicitly stated otherwise.)
Valid method names of an instance object depend on its class. By
definition, all attributes of a class that are (user-defined) function
objects define corresponding methods of its instances. So in our
example, \verb\x.f\ is a valid method reference, since
\verb\MyClass.f\ is a function, but \verb\x.i\ is not, since
\verb\MyClass.i\ is not. But \verb\x.f\ is not the
same thing as \verb\MyClass.f\ --- it is a {\em method object}, not a
function object.
\subsection{Method objects}
Usually, a method is called immediately, e.g.:
\begin{verbatim}
x.f()
\end{verbatim}
In our example, this will return the string \verb\'hello world'\.
However, it is not necessary to call a method right away: \verb\x.f\
is a method object, and can be stored away and called at a later
moment, for example:
\begin{verbatim}
xf = x.f
while 1:
print xf()
\end{verbatim}
will continue to print \verb\hello world\ until the end of time.
What exactly happens when a method is called? You may have noticed
that \verb\x.f()\ was called without an argument above, even though
the function definition for \verb\f\ specified an argument. What
happened to the argument? Surely Python raises an exception when a
function that requires an argument is called without any --- even if
the argument isn't actually used...
Actually, you may have guessed the answer: the special thing about
methods is that the object is passed as the first argument of the
function. In our example, the call \verb\x.f()\ is exactly equivalent
to \verb\MyClass.f(x)\. In general, calling a method with a list of
{\em n} arguments is equivalent to calling the corresponding function
with an argument list that is created by inserting the method's object
before the first argument.
If you still don't understand how methods work, a look at the
implementation can perhaps clarify matters. When an instance
attribute is referenced that isn't a data attribute, its class is
searched. If the name denotes a valid class attribute that is a
function object, a method object is created by packing (pointers to)
the instance object and the function object just found together in an
abstract object: this is the method object. When the method object is
called with an argument list, it is unpacked again, a new argument
list is constructed from the instance object and the original argument
list, and the function object is called with this new argument list.
\section{Random remarks}
[These should perhaps be placed more carefully...]
Data attributes override method attributes with the same name; to
avoid accidental name conflicts, which may cause hard-to-find bugs in
large programs, it is wise to use some kind of convention that
minimizes the chance of conflicts, e.g., capitalize method names,
prefix data attribute names with a small unique string (perhaps just
an undescore), or use verbs for methods and nouns for data attributes.
Data attributes may be referenced by methods as well as by ordinary
users (``clients'') of an object. In other words, classes are not
usable to implement pure abstract data types. In fact, nothing in
Python makes it possible to enforce data hiding --- it is all based
upon convention. (On the other hand, the Python implementation,
written in C, can completely hide implementation details and control
access to an object if necessary; this can be used by extensions to
Python written in C.)
Clients should use data attributes with care --- clients may mess up
invariants maintained by the methods by stamping on their data
attributes. Note that clients may add data attributes of their own to
an instance object without affecting the validity of the methods, as
long as name conflicts are avoided --- again, a naming convention can
save a lot of headaches here.
There is no shorthand for referencing data attributes (or other
methods!) from within methods. I find that this actually increases
the readability of methods: there is no chance of confusing local
variables and instance variables when glancing through a method.
Conventionally, the first argument of methods is often called
\verb\self\. This is nothing more than a convention: the name
\verb\self\ has absolutely no special meaning to Python. (Note,
however, that by not following the convention your code may be less
readable by other Python programmers, and it is also conceivable that
a {\em class browser} program be written which relies upon such a
convention.)
Any function object that is a class attribute defines a method for
instances of that class. It is not necessary that the function
definition is textually enclosed in the class definition: assigning a
function object to a local variable in the class is also ok. For
example:
\begin{verbatim}
# Function defined outside the class
def f1(self, x, y):
return min(x, x+y)
class C:
f = f1
def g(self):
return 'hello world'
h = g
\end{verbatim}
Now \verb\f\, \verb\g\ and \verb\h\ are all attributes of class
\verb\C\ that refer to function objects, and consequently they are all
methods of instances of \verb\C\ --- \verb\h\ being exactly equivalent
to \verb\g\. Note that this practice usually only serves to confuse
the reader of a program.
Methods may call other methods by using method attributes of the
\verb\self\ argument, e.g.:
\begin{verbatim}
class Bag:
def empty(self):
self.data = []
def add(self, x):
self.data.append(x)
def addtwice(self, x):
self.add(x) self.add(x)
\end{verbatim}
The instantiation operation (``calling'' a class object) creates an
empty object. Many classes like to create objects in a known initial
state. There is no special syntax to enforce this, but a convention
works almost as well: add a method named \verb\init\ to the class,
which initializes the instance (by assigning to some important data
attributes) and returns the instance itself. For example, class
\verb\Bag\ above could have the following method:
\begin{verbatim}
def init(self):
self.empty()
return self
\end{verbatim}
The client can then create and initialize an instance in one
statement, as follows:
\begin{verbatim}
x = Bag().init()
\end{verbatim}
Of course, the \verb\init\ method may have arguments for greater
flexibility.
Warning: a common mistake is to forget the \verb\return self\ at the
end of an init method!
Methods may reference global names in the same way as ordinary
functions. The global scope associated with a method is the module
containing the class definition. (The class itself is never used as a
global scope!) While one rarely encounters a good reason for using
global data in a method, there are many legitimate uses of the global
scope: for one thing, functions and modules imported into the global
scope can be used by methods, as well as functions and classes defined
in it. Usually, the class containing the method is itself defined in
this global scope, and in the next section we'll find some good
reasons why a method would want to reference its own class!
\section{Inheritance}
Of course, a language feature would not be worthy of the name ``class''
without supporting inheritance. The syntax for a derived class
definition looks as follows:
\begin{verbatim}
class DerivedClassName(BaseClassName):
<statement-1>
.
.
.
<statement-N>
\end{verbatim}
The name \verb\BaseClassName\ must be defined in a scope containing
the derived class definition. Instead of a base class name, an
expression is also allowed. This is useful when the base class is
defined in another module, e.g.,
\begin{verbatim}
class DerivedClassName(modname.BaseClassName):
\end{verbatim}
Execution of a derived class definition proceeds the same as for a
base class. When the class object is constructed, the base class is
remembered. This is used for resolving attribute references: if a
requested attribute is not found in the class, it is searched in the
base class. This rule is applied recursively if the base class itself
is derived from some other class.
There's nothing special about instantiation of derived classes:
\verb\DerivedClassName()\ creates a new instance of the class. Method
references are resolved as follows: the corresponding class attribute
is searched, descending down the chain of base classes if necessary,
and the method reference is valid if this yields a function object.
Derived classes may override methods of their base classes. Because
methods have no special privileges when calling other methods of the
same object, a method of a base class that calls another method
defined in the same base class, may in fact end up calling a method of
a derived class that overrides it. (For C++ programmers: all methods
in Python are ``virtual functions''.)
An overriding method in a derived class may in fact want to extend
rather than simply replace the base class method of the same name.
There is a simple way to call the base class method directly: just
call \verb\BaseClassName.methodname(self, arguments)\. This is
occasionally useful to clients as well. (Note that this only works if
the base class is defined or imported directly in the global scope.)
\subsection{Multiple inheritance}
Poython supports a limited form of multiple inheritance as well. A
class definition with multiple base classes looks as follows:
\begin{verbatim}
class DerivedClassName(Base1, Base2, Base3):
<statement-1>
.
.
.
<statement-N>
\end{verbatim}
The only rule necessary to explain the semantics is the resolution
rule used for class attribute references. This is depth-first,
left-to-right. Thus, if an attribute is not found in
\verb\DerivedClassName\, it is searched in \verb\Base1\, then
(recursively) in the base classes of \verb\Base1\, and only if it is
not found there, it is searched in \verb\Base2\, and so on.
(To some people breadth first --- searching \verb\Base2\ and
\verb\Base3\ before the base classes of \verb\Base1\ --- looks more
natural. However, this would require you to know whether a particular
attribute of \verb\Base1\ is actually defined in \verb\Base1\ or in
one of its base classes before you can figure out the consequences of
a name conflict with an attribute of \verb\Base2\. The depth-first
rule makes no differences between direct and inherited attributes of
\verb\Base1\.)
It is clear that indiscriminate use of multiple inheritance is a
maintenance nightmare, given the reliance in Python on conventions to
avoid accidental name conflicts. A well-known problem with multiple
inheritance is a class derived from two classes that happen to have a
common base class. While it is easy enough to figure out what happens
in this case (the instance will have a single copy of ``instance
variables'' or data attributes used by the common base class), it is
not clear that these semantics are in any way useful.
\section{Odds and ends}
Sometimes it is useful to have a data type similar to the Pascal
``record'' or C ``struct'', bundling together a couple of named data
items. An empty class definition will do nicely, e.g.:
\begin{verbatim}
class Employee:
pass
john = Employee() # Create an empty employee record
# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000
\end{verbatim}
A piece of Python code that expects a particular abstract data type
can often be passed a class that emulates the methods of that data
type instead. For instance, if you have a function that formats some
data from a file object, you can define a class with methods
\verb\read()\ and \verb\readline()\ that gets the data from a string
buffer instead, and pass it as an argument. (Unfortunately, this
technique has its limitations: a class can't define operations that
are accessed by special syntax such as sequence subscripting or
arithmetic operators, and assigning such a ``pseudo-file'' to
\verb\sys.stdin\ will not cause the interpreter to read further input
from it.)
Instance method objects have attributes, too: \verb\m.im_self\ is the
object of which the method is an instance, and \verb\m.im_func\ is the
function object corresponding to the method.
XXX Mention bw compat hacks.
\end{document}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment