JETZT ONLINE BESTELLEN
Add to Cart
Python in a Nutshell

Second Edition August 2006
ISBN 978-0-596-10046-9
734 Seiten
EUR32.00

Weitere Informationen zu diesem Buch

Inhaltsverzeichnis |


Inhaltsverzeichnis

	
Chapter 1: Introduction to Python
Inhaltsvorschau
Python is a general-purpose programming language. It has been around for quite a while: Guido van Rossum, Python's creator, started developing Python back in 1990. This stable and mature language is very high-level, dynamic, object-oriented, and cross-platform—all characteristics that are very attractive to developers. Python runs on all major hardware platforms and operating systems, so it doesn't constrain your platform choices.
Python offers high productivity for all phases of the software life cycle: analysis, design, prototyping, coding, testing, debugging, tuning, documentation, deployment, and, of course, maintenance. Python's popularity has seen steady, unflagging growth over the years. Today, familiarity with Python is an advantage for every programmer, as Python has infiltrated every niche and has useful roles to play as a part of any software solution.
Python provides a unique mix of elegance, simplicity, practicality, and power. You'll quickly become productive with Python, thanks to its consistency and regularity, its rich standard library, and the many third-party modules that are readily available for it. Python is easy to learn, so it is quite suitable if you are new to programming, yet at the same time, it is powerful enough for the most sophisticated expert.
The Python language, while not minimalist, is rather spare for good pragmatic reasons. Once a language offers one good way to express a design idea, adding other ways has only modest benefits, while the cost in terms of language complexity grows more than linearly with the number of features. A complicated language is harder to learn and master (and implement efficiently and without bugs) than a simpler one. Any complications and quirks in a language hamper productivity in software maintenance, particularly in large projects, where many developers cooperate and often maintain code originally written by others.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Python Language
Inhaltsvorschau
The Python language, while not minimalist, is rather spare for good pragmatic reasons. Once a language offers one good way to express a design idea, adding other ways has only modest benefits, while the cost in terms of language complexity grows more than linearly with the number of features. A complicated language is harder to learn and master (and implement efficiently and without bugs) than a simpler one. Any complications and quirks in a language hamper productivity in software maintenance, particularly in large projects, where many developers cooperate and often maintain code originally written by others.
Python is simple, but not simplistic. It adheres to the idea that if a language behaves a certain way in some contexts, it should ideally work similarly in all contexts. Python also follows the principle that a language should not have "convenient" shortcuts, special cases, ad hoc exceptions, overly subtle distinctions, or mysterious and tricky under-the-covers optimizations. A good language, like any other designed artifact, must balance such general principles with taste, common sense, and a high degree of practicality.
Python is a general-purpose programming language, so Python's traits are useful in just about any area of software development. There is no area where Python cannot be part of an optimal solution. "Part" is an important word here; while many developers find that Python fills all of their needs, Python does not have to stand alone. Python programs can easily cooperate with a variety of other software components, making it an ideal language for gluing together components written in other languages.
Python is a very-high-level language (VHLL). This means that Python uses a higher level of abstraction, conceptually farther from the underlying machine, than do classic compiled languages such as C, C++, and Fortran, which are traditionally called high-level languages. Python is also simpler, faster to process, and more regular than classic high-level languages. This affords high programmer productivity and makes Python an attractive development tool. Good compilers for classic compiled languages can often generate binary machine code that runs much faster than Python code. However, in most cases, the performance of Python-coded applications proves sufficient. When it doesn't, you can apply the optimization techniques covered in "Optimization" on page 474 to enhance your program's performance while keeping the benefits of high programming productivity.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Python Standard Library and Extension Modules
Inhaltsvorschau
There is more to Python programming than just the Python language: the standard Python library and other extension modules are almost as important for effective Python use as the language itself. The Python standard library supplies many well-designed, solid, 100 percent pure Python modules for convenient reuse. It includes modules for such tasks as representing data, string and text processing, interacting with the operating system and filesystem, and web programming. Because these modules are written in Python, they work on all platforms supported by Python.
Extension modules, from the standard library or from elsewhere, let Python code access functionality supplied by the underlying operating system or other software components such as graphical user interfaces (GUIs), databases, and networks. Extensions also afford maximal speed in computationally intensive tasks such as XML parsing and numeric array computations. Extension modules that are not coded in Python, however, do not necessarily enjoy the same cross-platform portability as pure Python code.
You can write special-purpose extension modules in lower-level languages to achieve maximum performance for small, computationally intensive parts that you originally prototyped in Python. You can also use tools such as SWIG to wrap existing C/C++ libraries into Python extension modules, as we'll see in "Extending Python Without Python's C API" on page 645. Finally, you can embed Python in applications coded in other languages, exposing existing application functionality to Python scripts via dedicated Python extension modules.
This book documents many modules, both from the standard library and from other sources, in areas such as client- and server-side network programming, GUIs, numerical array processing, databases, manipulation of text and binary files, and interaction with the operating system.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Python Implementations
Inhaltsvorschau
Python currently has three production-quality implementations, known as CPython, Jython, and IronPython, and several other experimental implementations, such as PyPy. This book primarily addresses CPython, the most widely used implementation, which I refer to as just Python for simplicity. However, the distinction between a language and its implementations is an important one.
Classic Python (a.k.a. CPython, often just called Python) is the fastest, most up-to-date, most solid and complete implementation of Python. Therefore, it can be considered the "reference implementation" of the language. CPython is a compiler, interpreter, and set of built-in and optional extension modules, all coded in standard C. CPython can be used on any platform where the C compiler complies with the ISO/IEC 9899:1990 standard (i.e., all modern, popular platforms). In Chapter 2, I'll explain how to download and install CPython. All of this book, except Chapter 26 and a few sections explicitly marked otherwise, applies to CPython, since CPython is the most widely used version of Python.
Jython is a Python implementation for any Java Virtual Machine (JVM) compliant with Java 1.2 or better. Such JVMs are available for all popular, modern platforms. With Jython, you can use all Java libraries and frameworks. For optimal use of Jython, you need some familiarity with fundamental Java classes. You do not have to code in Java, but documentation and examples for existing Java classes are couched in Java terms, so you need a nodding acquaintance with Java to read and understand them. You also need to use Java supporting tools for tasks such as manipulating .jar files and signing applets. This book deals with Python, not with Java. For Jython usage, you should complement this book with Jython Essentials, by Noel Rappin and Samuele Pedroni (O'Reilly), possibly
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Python Development and Versions
Inhaltsvorschau
Python is developed, maintained, and released by a team of core developers headed by Guido van Rossum, Python's inventor, architect, and Benevolent Dictator For Life (BDFL). This title means that Guido has the final say on what becomes part of the Python language and standard libraries. Python's intellectual property is vested in the Python Software Foundation (PSF), a nonprofit corporation devoted to promoting Python, with dozens of individual members (nominated for their contributions to Python, and including all of the Python core team) and corporate sponsors. Most PSF members have commit privileges to Python's SVN repositories (http://svn.python.org/projects/), and most Python SVN committers are members of the PSF.
Proposed changes to Python are detailed in public documents called Python Enhancement Proposals (PEPs), debated (and sometimes advisorily voted on) by Python developers and the wider Python community, and finally approved or rejected by Guido, who takes debates and votes into account but is not bound by them. Many hundreds of people actively contribute to Python development through PEPs, discussion, bug reports, and proposed patches to Python sources, libraries, and documentation.
The Python core team releases minor versions of Python (2.x, for growing values of x), currently at a pace of about once every year or two. Python 2.2 was released in December 2001, 2.3 in July 2003, and 2.4 in November 2004. Python 2.5 is scheduled to be released in the summer of 2006 (at the time of this writing, the first alpha release of 2.5 has just appeared). Each minor release adds features that make Python more powerful and simpler to use, but also takes care to maintain backward compatibility. One day there will be a Python 3.0 release, which will be allowed to break backward compatibility to some extent in order to remove some redundant "legacy" features and simplify the language even further. However, that release is still years in the future, and no specific schedules for it currently exist; the current state of Guido's ideas about Python 3.0 can be studied at
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Python Resources
Inhaltsvorschau
The richest of all Python resources is the Internet. The best starting point is Python's site, http://www.python.org, which is full of interesting links to explore. http://www.jython.org is a must if you have any interest in Jython. For IronPython, at the time of writing the most relevant site is http://workspaces.gotdotnet.com/ironpython, but the IronPython team's near-future plans include reviving the site http://ironpython.com; by the time you read this, http://ironpython.com should be back in its role as the primary IronPython web site.
Python, Jython, and IronPython come with good documentation. The manuals are available in many formats, suitable for viewing, searching, and printing. You can browse the manuals on the Web at http://www.python.org/doc/current/. You can find links to the various formats you can download at http://www.python.org/doc/current/download.html, and http://www.python.org/doc/ has links to a large variety of documents. For Jython, http://www.jython.org/docs/ has links to Jython-specific documents as well as general Python ones. The Python FAQ (Frequently Asked Questions) document is at http://www.python.org/doc/FAQ.html, and the Jython-specific FAQ document is at http://www.jython.org/cgi-bin/faqw.py?req=index.
Most Python documentation (including this book) assumes some software development knowledge. However, Python is quite suitable for first-time programmers, so there are exceptions to this rule. A few good introductory online texts for nonprogrammers are:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 2: Installation
Inhaltsvorschau
You can install Python, in classic (CPython), JVM (Jython), and .NET (IronPython) versions, on most platforms. With a suitable development system (C for CPython, Java for Jython, .NET for IronPython), you can install Python from its source code distribution. On popular platforms, you also have the alternative of installing from pre-built binary distributions. If your platform comes with a pre-installed version of Python, you may still want to install another richer or better updated one: if you do, I recommend you do not remove nor overwrite your platform's original version—rather, install the other version "side by side" with the first one. In this way, you can be sure you are not going to disturb any other software that is installed as part of your platform: such software might well rely on the exact Python version that came with the platform itself.
Installing CPython from a binary distribution is faster, saves you substantial work on some platforms, and is the only possibility if you have no suitable C compiler. Installing from sources gives you more control and flexibility and is the only possibility if you can't find a suitable pre-built binary distribution for your platform. Even if you install from binaries, I recommend you also download the source distribution because it includes examples and demos that may be missing from pre-built binary packages.
To install CPython from source code, you need a platform with an ISO-compliant C compiler and ancillary tools such as make. On Windows, the normal way to build Python is with Microsoft Visual Studio (version 7.1, a.k.a. VS2003, for Python 2.4 and 2.5).
To download Python source code, visit http://www.python.org and follow the link labeled Download. The latest version at the time of this writing is:
The .tgz file extension is equivalent to .tar.gz (i.e., a tar archive of files, compressed by the powerful and popular
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Installing Python from Source Code
Inhaltsvorschau
To install CPython from source code, you need a platform with an ISO-compliant C compiler and ancillary tools such as make. On Windows, the normal way to build Python is with Microsoft Visual Studio (version 7.1, a.k.a. VS2003, for Python 2.4 and 2.5).
To download Python source code, visit http://www.python.org and follow the link labeled Download. The latest version at the time of this writing is:
The .tgz file extension is equivalent to .tar.gz (i.e., a tar archive of files, compressed by the powerful and popular gzip compressor). You can also get a version with an extension of .tar.bz2 instead of .tgz, compressed with the even more powerful bzip2 compressor, if you're able to deal with Bzip-2 compression (most popular utilities can nowadays).
To download sources for Python 2.5, see http://www.python.org/download/releases/2.5/. At the same URL, you will also find Python 2.5 documentation and binary releases. At the time of this writing, the first alpha release of 2.5 had just appeared, but by the time you read this book the final release of 2.5 is likely to be available.
On Windows, installing Python from source code can be a chore unless you are already familiar with Microsoft Visual Studio and also used to working at the Windows command line (i.e., in the text-oriented windows known as MS-DOS Prompt or Command Prompt, depending on your version of Windows).
If the following instructions give you trouble, I suggest you skip ahead to "Installing Python from Binaries" on page 18. It may be a good idea to do an installation from binaries anyway, even if you also install from source code. This way, if you notice anything strange while using the version you installed from source code, you can double-check with the installation from binaries. If the strangeness goes away, it must be due to some quirk in your installation from source code, and then you know you must double-check the latter.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Installing Python from Binaries
Inhaltsvorschau
If your platform is popular and current, you may find pre-built and packaged binary versions of Python ready for installation. Binary packages are typically self-installing, either directly as executable programs, or via appropriate system tools, such as the RedHat Package Manager (RPM) on Linux and the Microsoft Installer (MSI) on Windows. Once you have downloaded a package, install it by running the program and interactively choosing installation parameters, such as the directory where Python is to be installed.
To download Python binaries, visit http://www.python.org and follow the link labeled Download. At the time of this writing, the binary installers directly available from the main Python site are a Windows Installer (MSI) package:
and a Mac OS X Disk Image (.dmg) package suitable for Mac OS X 10.3.9 and later on either a PowerPC or Intel processor ("Universal" format):
Many third parties supply free binary Python installers for other platforms. For Linux distributions, see http://rpmfind.net if your distribution is RPM-based (RedHat, Fedora, Mandriva, SUSE, etc.) or http://www.debian.org for Debian and Ubuntu. The site http://www.python.org/download/ provides links to binary distributions for OS/2, Amiga, RISC OS, QNX, VxWorks, IBM AS/400, Sony PlayStation 2, Sharp Zaurus, and Windows CE (also known as "Pocket PC"). Older Python versions, starting from 1.5.2, are also usable and functional, though not as powerful and polished as the current Python 2.4.3. The download page provides links to 1.5.2 and other installers for older or less popular platforms (MS-DOS, Windows 3.1, Psion, BeOS, etc.).
To get Python for Nokia Series 60 cellphones, see http://www.forum.nokia.com/python.
ActivePython (http://www.activestate.com/Products/ActivePython) is a binary package of Python 2.4, with several third-party extensions included, available for AIX, HP-UX, Linux (x86 processors only), Mac OS X, Solaris (SPARC, x64, and x86 processors), and Windows (all versions from Windows 95 to Windows XP and Windows Server 2003).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Installing Jython
Inhaltsvorschau
To install Jython, you need a Java Virtual Machine (JVM) that complies with Java 1.1 or higher. See http://www.jython.org/platform.html for advice on JVMs for your platform.
To download Jython, visit http://www.jython.org and follow the link labeled Download. The latest version, which at the time of this writing (supporting some Python 2.3 features, as well as all of Python 2.2) is:
In the following section, for clarity, I assume you have created a new directory named C:\Jy and downloaded jython-22.class there. Of course, you can choose to name and place the directory as it best suits you. On Unix-like platforms, in particular, the directory name will probably be something like ~/Jy.
The Jython installer .class file is a self-installing program. Open an MS-DOS Prompt window (or a shell prompt on a Unix-like platform), change directory to C:\Jy, and run your Java interpreter on the Jython installer. Make sure to include directory C:\Jy in the Java CLASSPATH. With most releases of Sun's Java Development Kit (JDK), for example, you can run:
C:\Jy>java -cp

. jython-22
This runs a GUI installer that lets you choose destination directory and options. If you want to avoid the GUI, you can use the -o switch on the command line. The switch lets you specify the installation directory and options on the command line. For example:
C:\Jy>java -cp .

jython-22 -o C:\Jython-2.2

demo lib source
installs Jython, with all optional components (demos, libraries, and source code), in directory C:\Jython-2.2. The Jython installation builds two small, useful command files. One, run as jython (named jython.bat on Windows), runs the interpreter. The other, run as jythonc, compiles Python source into JVM bytecode. You can add the Jython installation directory to your PATH or copy these command files into any directory on your
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Installing IronPython
Inhaltsvorschau
To install IronPython, you need to have a current Common Language Runtime (CLR) implementation installed on your machine. Both the latest version of Mono (see http://www.mono-project.com/Main_Page), and Microsoft .NET Framework 2.0, work fine with IronPython. To download IronPython, visit http://workspaces.gotdotnet.com/ironpython (or http://ironpython.com, which will eventually become IronPython's main site, but is still out of date at the time of this writing) and follow download instructions on that page. The latest version at the time of this writing is 1.0. The same site also provides up-to-date installation instructions. I cannot provide such instructions in this book because they are still in flux at the time of this writing.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 3: The Python Interpreter
Inhaltsvorschau
To develop software systems in Python, you write text files that contain Python source code and documentation. You can use any text editor, including those in Integrated Development Environments (IDEs). You then process the source files with the Python compiler and interpreter. You can do this directly, implicitly inside an IDE, or via another program that embeds Python. The Python interpreter also lets you execute Python code interactively, as do IDEs.
The Python interpreter program is run as python (it's named python.exe on Windows). python includes both the interpreter itself and the Python compiler, which is implicitly invoked, as needed, on imported modules. Depending on your system, the program may have to be in a directory listed in your PATH environment variable. Alternatively, as with any other program, you can give a complete pathname to it at a command (shell) prompt, or in the shell script (or .BAT file, shortcut target, etc.) that runs it. On Windows, you can also use Start → Programs → Python 2.4 → Python (command line).
Besides PATH, other environment variables affect the python program. Some environment variables have the same effects as options passed to python on the command line, as documented in the next section. A few environment variables provide settings not available via command-line options:
PYTHONHOME
The Python installation directory. A lib subdirectory, containing the standard Python library modules, should exist under this directory. On Unix-like systems, the standard library modules should be in subdirectory lib/python-2.3 for Python 2.3, lib/python-2.4 for Python 2.4, and so on.
PYTHONPATH
A list of directories separated by colons on Unix-like systems and by semicolons on Windows. Modules are imported from these directories. This list extends the initial value for Python's
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The python Program
Inhaltsvorschau
The Python interpreter program is run as python (it's named python.exe on Windows). python includes both the interpreter itself and the Python compiler, which is implicitly invoked, as needed, on imported modules. Depending on your system, the program may have to be in a directory listed in your PATH environment variable. Alternatively, as with any other program, you can give a complete pathname to it at a command (shell) prompt, or in the shell script (or .BAT file, shortcut target, etc.) that runs it. On Windows, you can also use Start → Programs → Python 2.4 → Python (command line).
Besides PATH, other environment variables affect the python program. Some environment variables have the same effects as options passed to python on the command line, as documented in the next section. A few environment variables provide settings not available via command-line options:
PYTHONHOME
The Python installation directory. A lib subdirectory, containing the standard Python library modules, should exist under this directory. On Unix-like systems, the standard library modules should be in subdirectory lib/python-2.3 for Python 2.3, lib/python-2.4 for Python 2.4, and so on.
PYTHONPATH
A list of directories separated by colons on Unix-like systems and by semicolons on Windows. Modules are imported from these directories. This list extends the initial value for Python's sys.path variable. Modules, importing, and the sys.path variable are covered in Chapter 7.
PYTHONSTARTUP
The name of a Python source file that is automatically executed each time an interactive interpreter session starts. No such file is run if this variable is not set or if it is set to the path of a file that is not found. The PYTHONSTARTUP file is not used when you run a Python script; it is used only when you start an interactive session.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Python Development Environments
Inhaltsvorschau
The Python interpreter's built-in interactive mode is the simplest development environment for Python. It is a bit primitive, but it is lightweight, has a small footprint, and starts fast. Together with an appropriate text editor (as discussed in "Free Text Editors with Python Support" on page 27), and line-editing and history facilities, the interactive interpreter (or, alternatively, IPython) offers a usable and popular development environment. However, there are a number of other development environments that you can also use.
Python's Integrated DeveLopment Environment (IDLE) comes with the standard Python distribution. IDLE is a cross-platform, 100 percent pure Python application based on Tkinter (see Chapter 17). IDLE offers a Python shell similar to interactive Python interpreter sessions but richer in functionality. It also includes a text editor optimized to edit Python source code, an integrated interactive debugger, and several specialized browsers/viewers.
IDLE is mature, stable, easy to use, and fairly rich in functionality. Promising new Python IDEs that share IDLE's free and cross-platform nature are emerging. Red Hat's Source Navigator (http://sources.redhat.com/sourcenav/) supports many languages. It runs on Linux, Solaris, HPUX, and Windows. Boa Constructor (http://boa-constructor.sf.net/) is Python-only and still beta-level, but well worth trying out. Boa Constructor includes a GUI builder for the wxWindows cross-platform GUI toolkit.
eric3 (http://www.die-offenbachs.de/detlev/eric3.html) is a full-featured IDE for Python and Ruby, based on the PyQt 3.1 cross-platform GUI toolkit.
The popular cross-platform, cross-language modular IDE Eclipse has plug-ins that support CPython and Jython; see http://pydev.sourceforge.net/ for more information.
Another new but very popular cross-platform Python editor and IDE is SPE, "Stani's Python Editor" (
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Running Python Programs
Inhaltsvorschau
Whatever tools you use to produce your Python application, you can see your application as a set of Python source files, which are normal text files. A script is a file that you can run directly. A module is a file that you can import (as covered in Chapter 7) to provide functionality to other files or to interactive sessions. A Python file can be both a module and a script, exposing functionality when imported, but is also suitable for being run directly. A useful and widespread convention is that Python files that are primarily intended to be imported as modules, when run directly, should execute some simple self-test operations, as covered in "Testing" on page 452.
The Python interpreter automatically compiles Python source files as needed. Python source files normally have extension .py. Python saves the compiled bytecode file for each module in the same directory as the module's source, with the same basename and extension .pyc (or .pyo if Python is run with option -O). Python does not save the compiled bytecode form of a script when you run the script directly; rather, Python recompiles the script each time you run it. Python saves bytecode files only for modules you import. It automatically rebuilds each module's bytecode file whenever necessary—for example, when you edit the module's source. Eventually, for deployment, you may package Python modules using tools covered in Chapter 27.
You can run Python code interactively with the Python interpreter or an IDE. Normally, however, you initiate execution by running a top-level script. To run a script, give its path as an argument to python, as covered earlier in "The python Program" on page 22. Depending on your operating system, you can invoke python directly from a shell script or in a command file. On Unix-like systems, you can make a Python script directly executable by setting the file's permission bits
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The jython Interpreter
Inhaltsvorschau
The jython interpreter built during installation (see "Installing Jython" on page 20) is run similarly to the python program:
[path]jython {options} [ -j

jar | -c command |

file | - ] {arguments}
-j jar tells jython that the main script to run is _ _run_ _.py in the .jar file. Options -i, -S, and -v are the same as for python. --help is like python's -h, and --version is like python's --V. Instead of environment variables, jython uses a text file named registry in the installation directory to record properties with structured names. Property python.path, for example, is the Jython equivalent of Python's environment variable PYTHONPATH. You can also set properties with jython command-line options in the form -D name = value.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The IronPython Interpreter
Inhaltsvorschau
IronPython may be run similarly to the python program:
[path]IronPythonConsole {options}

[-c command | file | - ]

{arguments}
Unfortunately, details are still in flux at the time of this writing, so I cannot provide them in this book. See http://ironpython.com for up-to-date information.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 4: The Python Language
Inhaltsvorschau
This chapter is a quick guide to the Python language. To learn Python from scratch, I suggest you start with Learning Python, by Mark Lutz and David Ascher (O'Reilly). If you already know other programming languages and just want to learn the specific differences of Python, this chapter is for you. However, I'm not trying to teach Python here, so we're going to cover a lot of ground at a pretty fast pace. I focus on teaching the rules, and only secondarily on pointing out best practices and recommended style; for a standard Python style guide, see http://python.org/doc/peps/pep-0008/.
The lexical structure of a programming language is the set of basic rules that govern how you write programs in that language. It is the lowest-level syntax of the language and specifies such things as what variable names look like and which characters denote comments. Each Python source file, like any other text file, is a sequence of characters. You can also usefully consider it as a sequence of lines, tokens, or statements. These different lexical views complement and reinforce each other. Python is very particular about program layout, especially with regard to lines and indentation, so you'll want to pay attention to this information if you are coming to Python from another language.
A Python program is composed of a sequence of logical lines, each made up of one or more physical lines. Each physical line may end with a comment. A hash sign (#) that is not inside a string literal begins a comment. All characters after the # and up to the physical line end are part of the comment, and the Python interpreter ignores them. A line containing only whitespace, possibly with a comment, is known as a blank line, and Python totally ignores it. In an interactive interpreter session, you must enter an empty physical line (without any whitespace or comment) to terminate a multiline statement.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Lexical Structure
Inhaltsvorschau
The lexical structure of a programming language is the set of basic rules that govern how you write programs in that language. It is the lowest-level syntax of the language and specifies such things as what variable names look like and which characters denote comments. Each Python source file, like any other text file, is a sequence of characters. You can also usefully consider it as a sequence of lines, tokens, or statements. These different lexical views complement and reinforce each other. Python is very particular about program layout, especially with regard to lines and indentation, so you'll want to pay attention to this information if you are coming to Python from another language.
A Python program is composed of a sequence of logical lines, each made up of one or more physical lines. Each physical line may end with a comment. A hash sign (#) that is not inside a string literal begins a comment. All characters after the # and up to the physical line end are part of the comment, and the Python interpreter ignores them. A line containing only whitespace, possibly with a comment, is known as a blank line, and Python totally ignores it. In an interactive interpreter session, you must enter an empty physical line (without any whitespace or comment) to terminate a multiline statement.
In Python, the end of a physical line marks the end of most statements. Unlike in other languages, you don't normally terminate Python statements with a delimiter, such as a semicolon (;). When a statement is too long to fit on a single physical line, you can join two adjacent physical lines into a logical line by ensuring that the first physical line has no comment and ends with a backslash (\). However, Python automatically joins adjacent physical lines into one logical line if an open parenthesis ((), bracket ([), or brace ({) has not yet been closed, and taking advantage of this mechanism, generally produces more readable code instead of explicitly inserting backslashes at physical line ends. Triple-quoted string literals can also span physical lines. Physical lines after the first one in a logical line are known as
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Data Types
Inhaltsvorschau
The operation of a Python program hinges on the data it handles. All data values in Python are objects, and each object, or value, has a type. An object's type determines which operations the object supports, or, in other words, which operations you can perform on the data value. The type also determines the object's attributes and items (if any) and whether the object can be altered. An object that can be altered is known as a mutable object, while one that cannot be altered is an immutable object. I cover object attributes and items in detail in "Object attributes and items" on page 46.
The built-in type( obj ) accepts any object as its argument and returns the type object that is the type of obj. Built-in function isinstance( obj , type ) returns True if object obj has type type (or any subclass thereof); otherwise, it returns False.
Python has built-in types for fundamental data types such as numbers, strings, tuples, lists, and dictionaries, as covered in the following sections. You can also create user-defined types, known as classes, as discussed in "Classes and Instances" on page 82.
The built-in number objects in Python support integers (plain and long), floating-point numbers, and complex numbers. In Python 2.4, the standard library also offers decimal floating-point numbers, covered in "The decimal Module" on page 372. All numbers in Python are immutable objects, meaning that when you perform any operation on a number object, you always produce a new number object. Operations on numbers, also known as arithmetic operations, are covered in "Numeric Operations" on page 52.
Note that numeric literals do not include a sign: a leading + or -, if present, is a separate operator, as discussed in "Arithmetic Operations" on page 52.

Section 4.2.1.1: Integer numbers

Integer literals can be decimal, octal, or hexadecimal. A decimal literal is represented by a sequence of digits in which the first digit is nonzero. To denote an octal literal, use
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Variables and Other References
Inhaltsvorschau
A Python program accesses data values through references. A reference is a name that refers to the location in memory of a value (object). References take the form of variables, attributes, and items. In Python, a variable or other reference has no intrinsic type. The object to which a reference is bound at a given time always has a type, but a given reference may be bound to objects of various types during the program's execution.
In Python there are no declarations. The existence of a variable begins with a statement that binds the variable, or, in other words, sets a name to hold a reference to some object. You can also unbind a variable, resetting the name so it no longer holds a reference. Assignment statements are the most common way to bind variables and other references. The del statement unbinds references.
Binding a reference that was already bound is also known as rebinding it. Whenever I mention binding in this book, I implicitly include rebinding except where I explicitly exclude it. Rebinding or unbinding a reference has no effect on the object to which the reference was bound, except that an object disappears when nothing refers to it. The automatic cleanup of objects bereft of references is known as garbage collection.
You can name a variable with any identifier except the 30 that are reserved as Python's keywords (see "Keywords" on page 35). A variable can be global or local. A global variable is an attribute of a module object (Chapter 7 covers modules). A local variable lives in a function's local namespace (see "Namespaces" on page 76).

Section 4.3.1.1: Object attributes and items

The main distinction between the attributes and items of an object is in the syntax you use to access them. An attribute of an object is denoted by a reference to the object, followed by a period (
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Expressions and Operators
Inhaltsvorschau
An expression is a phrase of code that Python evaluates to produce a value. The simplest expressions are literals and identifiers. You build other expressions by joining subexpressions with the operators and/or delimiters in Table 4-2. This table lists operators in decreasing order of precedence, higher precedence before lower. Operators listed together have the same precedence. The third column lists the associativity of the operator: L (left-to-right), R (right-to-left), or NA (nonassociative).
Table 4-2: Operator precedence in expressions
Operator
Description
Associativity
' expr ,...'
String conversion
NA
{ key : expr ,...}
Dictionary creation
NA
[ expr ,...]
List creation
NA
( expr ,...)
Tuple creation or just parentheses
NA
f ( expr ,...)
Function call
L
x [ index : index ]
Slicing
L
x [ index ]
Indexing
L
x.attr
Attribute reference
L
x ** y
Exponentiation (x to yth power)
R
~ x
Bitwise NOT
NA
+ x, - x
Unary plus and minus
NA
x * y, x / y, x // y, x % y
Multiplication, division, truncating division, remainder
L
x + y, x - y
Addition, subtraction
L
x << y, x >> y
Left-shift, right-shift
L
x & y
Bitwise AND
L
x ^ y
Bitwise XOR
L
x | y
Bitwise OR
L
x < y, x <= y, x > y, x >= y, x <> y, x != y, x == y
Comparisons (less than, less than or equal, greater than, greater than or equal, inequality, equality)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Numeric Operations
Inhaltsvorschau
Python supplies the usual numeric operations, as we've just seen in Table 4-2. Numbers are immutable objects: when you perform numeric operations on number objects, you always produce a new number object and never modify existing ones. You can access the parts of a complex object z as read-only attributes z.real and z.imag. Trying to rebind these attributes on a complex object raises an exception.
A number's optional + or - sign, and the + that joins a floating-point literal to an imaginary one to make a complex number, are not part of the literals' syntax. They are ordinary operators, subject to normal operator precedence rules (see Table 4-2). For example, -2**2 evaluates to -4: exponentiation has higher precedence than unary minus, so the whole expression parses as -(2**2), not as (-2)**2.
You can perform arithmetic operations and comparisons between any two numbers of Python built-in types. If the operands' types differ, coercion applies: Python converts the operand with the "smaller" type to the "larger" type. The types, in order from smallest to largest, are integers, long integers, floating-point numbers, and complex numbers.
You can request an explicit conversion by passing a noncomplex numeric argument to any of the built-in number types: int, long, float, and complex. int and long drop their argument's fractional part, if any (e.g., int(9.8) is 9). You can also call complex with two numeric arguments, giving real and imaginary parts. You cannot convert a complex to another numeric type in this way, because there is no single unambiguous way to convert a complex number into, e.g., a float.
Each built-in numeric type can also take a string argument with the syntax of an appropriate numeric literal, with small extensions: the argument string may have leading and/or trailing whitespace, may start with a sign, and, for complex numbers, may sum or subtract a real part and an imaginary one.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sequence Operations
Inhaltsvorschau
Python supports a variety of operations applicable to all sequences, including strings, lists, and tuples. Some sequence operations apply to all containers (including, for example, sets and dictionaries, which are not sequences), and some apply to all iterables (meaning "any object on which you can loop," as covered in "Iterables" on page 40; all containers, be they sequences or otherwise, are iterable, and so are many objects that are not containers, such as files, covered in "File Objects" on page 216, and generators, covered in "Generators" on page 78). In the following, I use the terms sequence, container, and iterable, quite precisely and specifically, to indicate exactly which operations apply to each category.
Sequences are containers with items that are accessible by indexing or slicing. The built-in len function takes any container as an argument and returns the number of items in the container. The built-in min and max functions take one argument, a nonempty iterable whose items are comparable, and return the smallest and largest items, respectively. You can also call min and max with multiple arguments, in which case they return the smallest and largest arguments, respectively. The built-in sum function takes one argument, an iterable whose items are numbers, and returns the sum of the numbers.

Section 4.6.1.1: Sequence conversions

There is no implicit conversion between different sequence types, except that plain strings are converted to Unicode strings if needed. (String conversion is covered in detail in "Unicode" on page 198.) You can call the built-ins tuple and list with a single argument (any iterable) to get a new instance of the type you're calling, with the same items (in the same order) as in the argument.

Section 4.6.1.2: Concatenation and repetition

You can concatenate sequences of the same type with the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Set Operations
Inhaltsvorschau
Python provides a variety of operations applicable to sets. Since sets are containers, the built-in len function can take a set as its single argument and return the number of items in the set object. A set is iterable, so you can pass it to any function or method that takes an iterable argument. In this case, the items of the set are iterated upon, in some arbitrary order. For example, for any set S, min( S ) returns the smallest item in S.
The k in S operator checks whether object k is one of the items of set S. It returns True if it is and False if it isn't. Similarly, k not in S is just like not ( k in S ).
Set objects provide several methods, as shown in Table 4-4. Nonmutating methods return a result without altering the object to which they apply and can also be called on instances of type frozenset, while mutating methods may alter the object to which they apply and can be called only on instances of type set. In Table 4-4, S and S1 indicate any set object, and x any hashable object.
Table 4-4: Set object methods
Method
Description
Nonmutating methods
S .copy( )
Returns a shallow copy of the set (a copy whose items are the same objects as S's, not copies thereof)
S .difference( S1 )
Returns the set of all items of S that aren't in S1
S .intersection( S1 )
Returns the set of all items of S that are also in S1
S .issubset( S1 )
Returns True if all items of S are also in S1; otherwise, returns False
S .issuperset( S1 )
Returns True if all items of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Dictionary Operations
Inhaltsvorschau
Python provides a variety of operations applicable to dictionaries. Since dictionaries are containers, the built-in len function can take a dictionary as its single argument and return the number of items (key/value pairs) in the dictionary object. A dictionary is iterable, so you can pass it to any function or method that takes an iterable argument. In this case, only the keys of the dictionary are iterated upon, in some arbitrary order. For example, for any dictionary D, min(D) returns the smallest key in D.
The k in D operator checks whether object k is one of the keys of the dictionary D. It returns True if it is and False if it isn't. k not in D is just like not ( k in D).
The value in a dictionary D that is currently associated with key k is denoted by an indexing: D [ k ]. Indexing with a key that is not present in the dictionary raises an exception. For example:
d = { 'x':42, 'y':3.14, 'z':7 }

d['x']                           # 42

d['z']                           # 7

d['a']                           # raises KeyError exception
Plain assignment to a dictionary indexed with a key that is not yet in the dictionary (e.g., D [ newkey ]= value) is a valid operation and adds the key and value as a new item in the dictionary. For instance:
d = { 'x':42, 'y':3.14}

d['a'] = 16                      # d is now {'x':42, 'y':3.14,'a':16}
The del statement, in the form del D [ k ], removes from the dictionary the item whose key is k. If k is not a key in dictionary D, del D [ k ] raises an exception.
Dictionary objects provide several methods, as shown in Table 4-5. Nonmutating methods return a result without altering the object to which they apply, while mutating methods may alter the object to which they apply. In Table 4-5, D and D1 indicate any dictionary object,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The print Statement
Inhaltsvorschau
A print statement is denoted by the keyword print followed by zero or more expressions separated by commas. print is a handy, simple way to output values in text form, mostly for debugging purposes. print outputs each expression x as a string that's just like the result of calling str( x ) (covered in str on page 157). print implicitly outputs a space between expressions, and implicitly outputs \n after the last expression, unless the last expression is followed by a trailing comma (,). Here are some examples of print statements:
letter = 'c'

print "give me a", letter, "..."           # prints: give me a c... answer = 42

print "the answer is:", answer             # prints: the answer is: 42
The destination of print's output is the file or file-like object that is the value of the stdout attribute of the sys module (covered in "The sys Module" on page 168). If you want to direct the output from a certain print statement to a specific file object f (which must be open for writing), you can use the special syntax:
print >>f, rest of print statement
(if f is None, the destination is sys.stdout, just as it would be without the >> f ). You can also use the write or writelines methods of file objects, as covered in "Attributes and Methods of File Objects" on page 218. However, print is very simple to use, and simplicity is important in the common case where all you need are the simple output strategies that print supplies—in particular, this is often the case for the kind of simple output statements you may temporarily add to a program for debugging purposes. "The print Statement" on page 256 has more advice and examples concerning the use of print.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Control Flow Statements
Inhaltsvorschau
A program's control flow is the order in which the program's code executes. The control flow of a Python program is regulated by conditional statements, loops, and function calls. (This section covers the if statement and for and while loops; functions are covered in "Functions" on page 70.) Raising and handling exceptions also affects control flow; exceptions are covered in Chapter 6.
Often, you need to execute some statements only if some condition holds, or choose statements to execute depending on several mutually exclusive conditions. The Python compound statement if, comprising if, elif, and else clauses, lets you conditionally execute blocks of statements. Here's the syntax for the if statement:
ifexpression:

    statement(s)

elif expression:

    statement(s)

elif expression:

    statement(s)

...

else:

    statement(s)
The elif and else clauses are optional. Note that, unlike some languages, Python does not have a switch statement. Use if, elif, and else for all conditional processing.
Here's a typical if statement with all three kinds of clauses:
if x < 0: print "x is negative"

elif x % 2: print "x is positive and odd"

else: print "x is even and non-negative"
When there are multiple statements in a clause (i.e., the clause controls a block of statements), the statements are placed on separate logical lines after the line containing the clause's keyword (known as the header line of the clause), indented rightward from the header line. The block terminates when the indentation returns to that of the clause header (or further left from there). When there is just a single simple statement, as here, it can follow the : on the same logical line as the header, but it can also be on a separate logical line, immediately after the header line and indented rightward from it. Most Python programmers prefer the separate-line style, with four-space indents for the guarded statements. Such a style is considered more general and more readable.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Functions
Inhaltsvorschau
Most statements in a typical Python program are grouped and organized into functions (code in a function body may be faster than at a module's top level, as covered in "Avoiding exec and from...import *" on page 486, so there are excellent practical reasons to put most of your code into functions). A function is a group of statements that execute upon request. Python provides many built-in functions and allows programmers to define their own functions. A request to execute a function is known as a function call. When you call a function, you can pass arguments that specify data upon which the function performs its computation. In Python, a function always returns a result value, either None or a value that represents the results of the computation. Functions defined within class statements are also known as methods. Issues specific to methods are covered in "Bound and Unbound Methods" on page 91; the general coverage of functions in this section, however, also applies to methods.
In Python, functions are objects (values) that are handled like other objects. Thus, you can pass a function as an argument in a call to another function. Similarly, a function can return another function as the result of a call. A function, just like any other object, can be bound to a variable, an item in a container, or an attribute of an object. Functions can also be keys into a dictionary. For example, if you need to quickly find a function's inverse given the function, you could define a dictionary whose keys and values are functions and then make the dictionary bidirectional. Here's a small example of this idea, using some functions from module math, covered in "The math and cmath Modules" on page 365:
inverse = {sin:asin, cos:acos, tan:atan, log:exp}

for f in inverse.keys( ): inverse[inverse[f]] = f
The fact that functions are ordinary objects in Python is often expressed by saying that functions are
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 5: Object-Oriented Python
Inhaltsvorschau
Python is an object-oriented programming language. Unlike some other object-oriented languages, Python doesn't force you to use the object-oriented paradigm exclusively. Python also supports procedural programming with modules and functions, so you can select the most suitable programming paradigm for each part of your program. Generally, the object-oriented paradigm is suitable when you want to group state (data) and behavior (code) together in handy packets of functionality. It's also useful when you want to use some of Python's object-oriented mechanisms covered in this chapter, such as inheritance or special methods. The procedural paradigm, based on modules and functions, may be simpler, and thus more suitable when you don't need any of the benefits of object-oriented programming. With Python, you can mix and match the two paradigms.
Python today is in transition between two slightly different object models. This chapter mainly describes the new-style, or new object model, which is simpler, more regular, more powerful, and the one I recommend you always use; whenever I speak of classes or instances, without explicitly specifying otherwise, I mean new-style classes or instances. However, for backward compatibility, the default object model in all Python 2.x versions, for every value of x, is the legacy object model, also known as the classic or old-style object model; the new-style object model will become the default (and the legacy one will disappear) in a few years, when Python 3.0 comes out. Therefore, in each section, after describing how the new-style object model works, this chapter covers the small differences between the new and legacy object models, and discusses how to use both object models with Python 2.x. Finally, the chapter covers special methods, in "Special Methods" on page 104, and then two advanced concepts known as decorators, in "Decorators" on page 115, and
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Classes and Instances
Inhaltsvorschau
If you're already familiar with object-oriented programming in other languages such as C++ or Java, then you probably have a good intuitive grasp of classes and instances: a class is a user-defined type, which you can instantiate to obtain instances, meaning objects of that type. Python supports these concepts through its class and instance objects.
A class is a Python object with several characteristics:
  • You can call a class object as if it were a function. The call returns another object, known as an instance of the class; the class is also known as the type of the instance.
  • A class has arbitrarily named attributes that you can bind and reference.
  • The values of class attributes can be descriptors (including functions), covered in "Descriptors" on page 85, or normal data objects.
  • Class attributes bound to functions are also known as methods of the class.
  • A method can have a special Python-defined name with two leading and two trailing underscores. Python implicitly invokes such special methods, if a class supplies them, when various kinds of operations take place on instances of that class.
  • A class can inherit from other classes, meaning it delegates to other class objects the lookup of attributes that are not found in the class itself.
An instance of a class is a Python object with arbitrarily named attributes that you can bind and reference. An instance object implicitly delegates to its class the lookup of attributes not found in the instance itself. The class, in turn, may delegate the lookup to the classes from which it inherits, if any.
In Python, classes are objects (values) and are handled like other objects. Thus, you can pass a class as an argument in a call to a function. Similarly, a function can return a class as the result of a call. A class, just like any other object, can be bound to a variable (local or global), an item in a container, or an attribute of an object. Classes can also be keys into a dictionary. The fact that classes are ordinary objects in Python is often expressed by saying that classes are
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Special Methods
Inhaltsvorschau
A class may define or inherit special methods (i.e., methods whose names begin and end with double underscores). Each special method relates to a specific operation. Python implicitly invokes a special method whenever you perform the related operation on an instance object. In most cases, the method's return value is the operation's result, and attempting an operation when its related method is not present raises an exception. Throughout this section, I will point out the cases in which these general rules do not apply. In the following, x is the instance of class C on which you perform the operation, and y is the other operand, if any. The formal argument self of each method also refers to instance object x. Whenever, in the following sections, I mention calls to x._ _name_ _ (...), keep in mind that, for new-style classes, the exact call happening is rather, pedantically speaking, x ._ _class_ _. _ _name_ _ ( x ,...).
Some special methods relate to general-purpose operations. A class that defines or inherits these methods allows its instances to control such operations. These operations can be divided into the following categories:
Initialization and finalization
A class can control its instances' initialization (a frequent need) via special methods _ _new_ _ (new-style classes only) and _ _init_ _, and/or their finalization (a rare need) via _ _del_ _.
Representation as string
A class can control how Python represents its instances as strings via special methods _ _repr_ _, _ _str_ _, and _ _unicode_ _.
Comparison, hashing, and use in a Boolean context
A class can control how its instances compare with other objects (methods _ _lt_ _, _ _le_ _, _ _gt_ _, _ _ge_ _, _ _eq_ _, _ _ne_ _, and _ _cmp_ _), how dictionaries use them as keys and sets as members (
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Decorators
Inhaltsvorschau
Due to the existence of descriptor types such as staticmethod and classmethod, covered in "Class-Level Methods" on page 99, which take as their argument a function object, Python somewhat frequently uses, within class bodies, idioms such as:
def f(cls,...):

 ...definition of f snipped...

f = classmethod(f)
Having the call to classmethod occur textually after the def statement may decrease code readability because, while reading f's definition, the reader of the code is not yet aware that f is destined to become a class method rather than an ordinary instance method. The code would be more readable if the mention of classmethod could be placed right before, rather than after, the def. Python 2.4 allows such placement, through the new syntax form known as decoration:
@classmethod def f(cls,...):

 ...definition of f snipped...
The @classmethod decoration must be immediately followed by a def statement and means that f =classmethod( f ) executes right after the def statement (for whatever name f the def defines). More generally, @ expression evaluates the expression (which must be a name, possibly qualified, or a call) and binds the result to an internal temporary name (say, _ _aux); any such decoration must be immediately followed by a def statement and means that f = _ _aux ( f ) executes right after the def statement (for whatever name f the def defines). The object bound to _ _aux is known as a decorator, and it's said to decorate function f.
Decoration affords a handy shorthand for some higher-order functions (and other callables that work similarly to higher-order functions). You may apply decoration to any def statement, not just to def statements occurring in class bodies. You may also code custom decorators, which are just higher-order functions, accepting a function object as an argument and returning a function object as the result. For example, here is a decorator that does not modify the function it decorates, but rather emits the function's docstring to standard output at function-definition time:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Metaclasses
Inhaltsvorschau
Any object, even a class object, has a type. In Python, types and classes are also first-class objects. The type of a class object is also known as the class's metaclass. An object's behavior is mostly determined by the type of the object. This also holds for classes: a class's behavior is mostly determined by the class's metaclass. Metaclasses are an advanced subject, and you may want to skip the rest of this section on first reading. However, fully grasping metaclasses can help you obtain a deeper understanding of Python, and occasionally it can be useful to define your own custom metaclasses.
The distinction between legacy and new-style classes relies on the fact that each class's behavior is determined by its metaclass. In other words, the reason legacy classes behave differently from new-style classes is that legacy and new-style classes are objects of different types (metaclasses):
class Classic: pass class Newstyle(object): pass print type(Classic)                  # prints: <type 'class'>

print type(Newstyle)                 # prints: <type 'type'>
The type of Classic is object types.ClassType from standard module types, while the type of Newstyle is built-in object type. type is also the metaclass of all Python built-in types, including itself (i.e., print type(type) also prints <type 'type'>).
To execute a class statement, Python first collects the base classes into a tuple t (an empty one if there are no base classes) and executes the class body in a temporary dictionary d. Then, Python determines the metaclass M to use for the new class object C that the class statement is creating.
When '_ _metaclass_ _' is a key in d, M is d ['_ _metaclass_ _']. Thus, you can explicitly control class C's metaclass by binding the attribute _ _metaclass_ _ in C's class body. Otherwise, when t is nonempty (i.e., when
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 6: Exceptions
Inhaltsvorschau
Python uses exceptions to communicate errors and anomalies. An exception is an object that indicates an error or anomalous condition. When Python detects an error, it raises an exception—that is, Python signals the occurrence of an anomalous condition by passing an exception object to the exception-propagation mechanism. Your code can explicitly raise an exception by executing a raise statement.
Handling an exception means receiving the exception object from the propagation mechanism and performing whatever actions are needed to deal with the anomalous situation. If a program does not handle an exception, the program terminates with an error traceback message. However, a program can handle exceptions and keep running despite errors or other abnormal conditions.
Python also uses exceptions to indicate some special situations that are not errors, and are not even abnormal. For example, as covered in "Iterators" on page 65, an iterator's next method raises the exception StopIteration when the iterator has no more items. This is not an error, and it is not even an anomaly since most iterators run out of items eventually. The optimal strategies for checking and handling errors and other special situations in Python are therefore different from what might be best in other languages, and I cover such considerations in "Error-Checking Strategies" on page 134. This chapter also covers the logging module of the Python standard library, in "Logging Errors" on page 136, and the assert Python statement, in "The assert Statement" on page 138.
The try statement provides Python's exception-handling mechanism. It is a compound statement that can take one of two different forms:
  • A try clause followed by one or more except clauses (and optionally an else clause)
  • A try clause followed by exactly one finally clause
In Python 2.5, a try statement can have
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The try Statement
Inhaltsvorschau
The try statement provides Python's exception-handling mechanism. It is a compound statement that can take one of two different forms:
  • A try clause followed by one or more except clauses (and optionally an else clause)
  • A try clause followed by exactly one finally clause
In Python 2.5, a try statement can have except clauses (and optionally an else clause) followed by a finally clause; however, in all previous versions of Python, the two forms cannot be merged, so I present them separately in the following. See "The try/except/finally statement" on page 124 for this small 2.5 enhancement to try statement syntax.
Here's the syntax for the try/except form of the try statement:
try:statement(s)

except [expression [, target]]:

    statement(s)

[else:

    statement(s)]
This form of the try statement has one or more except clauses, as well as an optional else clause.
The body of each except clause is known as an exception handler. The code executes if the expression in the except clause matches an exception object propagating from the try clause. expression is a class or tuple of classes, and matches any instance of one of those classes or any of their subclasses. The optional target is an identifier that names a variable that Python binds to the exception object just before the exception handler executes. A handler can also obtain the current exception object by calling the exc_info function of module sys (covered in exc_info on page 168).
Here is an example of the try/except form of the try statement:
try: 1/0

except ZeroDivisionError: print "caught divide-by-0 attempt"
If a try statement has several except clauses, the exception-propagation mechanism tests the except clauses in order; the first except clause whose expression matches the exception object is used as the handler. Thus, you must always list handlers for specific cases before you list handlers for more general cases. If you list a general case first, the more specific
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Exception Propagation
Inhaltsvorschau
When an exception is raised, the exception-propagation mechanism takes control. The normal control flow of the program stops, and Python looks for a suitable exception handler. Python's try statement establishes exception handlers via its except clauses. The handlers deal with exceptions raised in the body of the try clause, as well as exceptions propagating from any of the functions called by that code, directly or indirectly. If an exception is raised within a try clause that has an applicable except handler, the try clause terminates and the handler executes. When the handler finishes, execution continues with the statement after the try statement.
If the statement raising the exception is not within a try clause that has an applicable handler, the function containing the statement terminates, and the exception propagates "upward" along the stack of function calls to the statement that called the function. If the call to the terminated function is within a try clause that has an applicable handler, that try clause terminates, and the handler executes. Otherwise, the function containing the call terminates, and the propagation process repeats, unwinding the stack of function calls until an applicable handler is found.
If Python cannot find any applicable handler, by default the program prints an error message to the standard error stream (the file sys.stderr). The error message includes a traceback that gives details about functions terminated during propagation. You can change Python's default error-reporting behavior by setting sys.excepthook (covered in excepthook on page 168). After error reporting, Python goes back to the interactive session, if any, or terminates if no interactive session is active. When the exception class is SystemExit, termination is silent, and ends the interactive session, if any.
Here are some functions that you can use to see exception propagation at work:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The raise Statement
Inhaltsvorschau
You can use the raise statement to raise an exception explicitly. raise is a simple statement with the following syntax:
raise [expression1[, expression2]]
Only an exception handler (or a function that a handler calls, directly or indirectly) can use raise without any expressions. A plain raise statement re-raises the same exception object that the handler received. The handler terminates, and the exception propagation mechanism keeps searching for other applicable handlers. Using raise without expressions is useful when a handler discovers that it is unable to handle an exception it receives, or can handle the exception only partially, so the exception should keep propagating to allow handlers up the call stack to perform handling and clean-up.
When only expression1 is present, it can be a legacy-style instance object or class object (Python 2.5 changes this rule in order to allow new-style classes, as long as they inherit from the new built-in new-style class BaseException and instances of BaseException). In this case, if expression1 is an instance object, Python raises that instance. When expression1 is a class object, raise instantiates the class without arguments and raises the resulting instance. When both expressions are present, expression1 must be a legacy-style class object. raise instantiates the class, with expression2 as the argument (or multiple arguments if expression2 is a tuple), and raises the resulting instance. Note that the raise statement is the only construct remaining in Python 2.3 and 2.4, where only legacy classes and instances, not new-style ones, are allowed. In Python 2.5, on the other hand, built-in exception classes are all new-style, although, for backward compatibility, legacy classes will also still be allowed in a raise statement in all Python 2.x versions, and removed only in Python 3.0.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Exception Objects
Inhaltsvorschau
Exceptions are instances of subclasses of the built-in, legacy-style Exception class. An instance of any subclass of Exception has an attribute args, the tuple of arguments used to create the instance. args holds error-specific information, usable for diagnostic or recovery purposes. In Python 2.5, class Exception is new-style, since it inherits from the new built-in new-style class BaseException; this, in turn, makes all built-in exception classes new-style in Python 2.5. For detailed explanations of how the standard exception hierarchy changes in 2.5, and how things will move further in 3.0, see http://python.org/doc/peps/pep-0352/.
All exceptions that Python itself raises are instances of subclasses of Exception. The inheritance structure of exception classes is important, as it determines which except clauses handle which exceptions. In Python 2.5, however, classes KeyboardInterrupt and SystemExit inherit directly from the new class BaseException and are not subclasses of Exception: the new arrangement makes it more likely that a general handler clause coded as except Exception: does what's intended, since you rarely want to catch KeyboardInterrupt and SystemExit (exception handlers are covered in "try/except" on page 122).
In Python 2.3 and 2.4, the SystemExit, StopIteration, and Warning classes inherit directly from Exception. Instances of SystemExit are normally raised by the exit function in module sys (covered in exit on page 169). StopIteration is used in the iteration protocol, covered in "Iterators" on page 65. Warning is covered in "The warnings Module" on page 471. Other standard exceptions derive from StandardError, a direct subclass of Exception. In Python 2.5, class GeneratorExit, covered in "Generator enhancements" on page 126, also inherits directly from Exception, while the rest of the standard exception hierarchy is like the one in 2.3 and 2.4, covered in the following.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Custom Exception Classes
Inhaltsvorschau
You can subclass any of the standard exception classes in order to define your own exception class. Often, such a subclass adds nothing more than a docstring:
class InvalidAttribute(AttributeError):

    "Used to indicate attributes that could never be valid"
As covered in "The pass Statement" on page 69, you don't need a pass statement to make up the body of this class; the docstring (which you should always write) is quite sufficient to keep Python happy. Best style for such "empty" classes, just like for "empty" functions, is to have a docstring and no pass.
Given the semantics of try/except, raising a custom exception class such as InvalidAttribute is almost the same as raising its standard exception superclass, AttributeError. Any except clause that can handle AttributeError can handle InvalidAttribute just as well. In addition, client code that knows specifically about your InvalidAttribute custom exception class can handle it specifically, without having to handle all other cases of AttributeError if it is not prepared for those. For example:
class SomeFunkyClass(object):

    "much hypothetical functionality snipped"

    def _ _getattr_ _(self, name):

        "this _ _getattr_ _ only clarifies the kind of attribute error"

        if name.startswith('_'):

            raise InvalidAttribute, "Unknown private attribute "+name

        else:

            raise AttributeError, "Unknown attribute "+name
Now client code can be more selective in its handlers. For example:
s = SomeFunkyClass( )

try:

    value = getattr(s, thename)

except InvalidAttribute, err:

    warnings.warn(str(err))

    value = None

# other cases of AttributeError just propagate, as they're unexpected
It's an excellent idea to define, and raise, custom exception classes in your modules rather than plain standard exceptions: by using custom exceptions, you make it easier for all callers of your module's code to handle exceptions that come from your module separately from others.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Error-Checking Strategies
Inhaltsvorschau
Most programming languages that support exceptions are geared to raise exceptions only in rare cases. Python's emphasis is different. In Python, exceptions are considered appropriate whenever they make a program simpler and more robust, even if that means that exceptions are raised rather frequently.
A common idiom in other languages, sometimes known as "look before you leap" (LBYL), is to check in advance, before attempting an operation, for all circumstances that might make the operation invalid. This approach is not ideal for several reasons:
  • The checks may diminish the readability and clarity of the common, mainstream cases where everything is okay.
  • The work needed for checking may duplicate a substantial part of the work done in the operation itself.
  • The programmer might easily err by omitting some needed check.
  • The situation might change between the moment the checks are performed and the moment the operation is attempted.
The preferred idiom in Python is generally to attempt the operation in a try clause and handle the exceptions that may result in except clauses. This idiom is known as "it's easier to ask forgiveness than permission" (EAFP), a motto widely credited to Admiral Grace Murray Hopper, co-inventor of COBOL, and that shares none of the defects of LBYL. Here is a function written using the LBYL idiom:
def safe_divide_1(x, y):

    if y==0:

        print "Divide-by-0 attempt detected"

        return None

    else:

        return x/y
With LBYL, the checks come first, and the mainstream case is somewhat hidden at the end of the function. Here is the equivalent function written using the EAFP idiom:
def safe_divide_2(x, y):

    try:

        return x/y

    except ZeroDivisionError:

        print "Divide-by-0 attempt detected"

        return None
With EAFP, the mainstream case is up front in a try clause, and the anomalies are handled in an except clause.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 7: Modules
Inhaltsvorschau
A typical Python program is made up of several source files. Each source file corresponds to a module, grouping program code and data for reuse. Modules are normally independent of each other so that other programs can reuse the specific modules they need. A module explicitly establishes dependencies upon another module by using import or from statements. In some other programming languages, global variables can provide a hidden conduit for coupling between modules. In Python, however, global variables are not global to all modules, but rather attributes of a single module object. Thus, Python modules communicate in explicit and maintainable ways.
Python also supports extensions, which are components written in other languages, such as C, C++, Java, or C#, for use with Python. Extensions are seen as modules by the Python code that uses them (known as client code). From the client code viewpoint, it does not matter whether a module is 100 percent pure Python or an extension. You can always start by coding a module in Python. Later, should you need better performance, you can recode some modules in lower-level languages without changing the client code that uses the modules. Chapters 25 and 26 discuss writing extensions in C and Java.
This chapter discusses module creation and loading. It also covers grouping modules into packages, which are modules that contain other modules, in a hierarchical, tree-like structure. Finally, the chapter discusses using Python's distribution utilities (distutils) to prepare packages for distribution and to install distributed packages.
A module is a Python object with arbitrarily named attributes that you can bind and reference. The Python code for a module named aname normally resides in a file named aname.py, as covered in "Module Loading" on page 144.
In Python, modules are objects (values) and are handled like other objects. Thus, you can pass a module as an argument in a call to a function. Similarly, a function can return a module as the result of a call. A module, just like any other object, can be bound to a variable, an item in a container, or an attribute of an object. For example, the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Module Objects
Inhaltsvorschau
A module is a Python object with arbitrarily named attributes that you can bind and reference. The Python code for a module named aname normally resides in a file named aname.py, as covered in "Module Loading" on page 144.
In Python, modules are objects (values) and are handled like other objects. Thus, you can pass a module as an argument in a call to a function. Similarly, a function can return a module as the result of a call. A module, just like any other object, can be bound to a variable, an item in a container, or an attribute of an object. For example, the sys.modules dictionary, covered in "Module Loading" on page 144, holds module objects as its values. The fact that modules are ordinary objects in Python is often expressed by saying that modules are first-class objects.
You can use any Python source file as a module by executing an import statement in some other Python source file. import has the following syntax:
importmodname [as varname][,...]
The import keyword is followed by one or more module specifiers, separated by commas. In the simplest, most common case, a module specifier is just modname, an identifier—a variable that Python binds to the module object when the import statement finishes. In this case, Python looks for the module of the same name to satisfy the import request. For example:
import MyModule
looks for the module named MyModule and binds the variable named MyModule in the current scope to the module object. modname can also be a sequence of identifiers separated by dots (.) to name a module in a package, as covered in "Packages" on page 149.
When as varname is part of a module specifier, Python looks for a module named modname but then binds the module object as variable varname. For example:
import MyModule as Alias
looks for the module named MyModule and binds the module object to variable
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Module Loading
Inhaltsvorschau
Module-loading operations rely on attributes of the built-in sys module (covered in "The sys Module" on page 168). The module-loading process described in this section is carried out by built-in function _ _import_ _. Your code can call _ _import_ _ directly, with the module name string as an argument. _ _import_ _ returns the module object or raises ImportError if the import fails.
To import a module named M, _ _import_ _ first checks dictionary sys.modules, using string M as the key. When key M is in the dictionary, _ _import_ _ returns the corresponding value as the requested module object. Otherwise, _ _import_ _ binds sys.modules[ M ] to a new empty module object with a _ _name_ _ of M, then looks for the right way to initialize (load) the module, as covered in "Searching the Filesystem for a Module" on page 144.
Thanks to this mechanism, the relatively slow loading operation takes place only the first time a module is imported in a given run of the program. When a module is imported again, the module is not reloaded, since _ _import_ _ rapidly finds and returns the module's entry in sys.modules. Thus, all imports of a given module after the first one are very fast: they're just dictionary lookups. (To force a reload, see "The reload Function" on page 146.)
When a module is loaded, _ _import_ _ first checks whether the module is built-in. Built-in modules are listed in tuple sys.builtin_module_names, but rebinding that tuple does not affect module loading. When Python loads a built-in module, as when it loads any other extension, Python calls the module's initialization function. The search for built-in modules also looks for modules in platform-specific locations, such as resource forks and frameworks on the Mac, and the Registry in Windows.
If module M is not built-in, _ _import_ _ looks for
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Packages
Inhaltsvorschau
A package is a module that contains other modules. Some or all of the modules in a package may be subpackages, resulting in a hierarchical tree-like structure. A package named P resides in a subdirectory, also called P, of some directory in sys.path. Packages can also live in ZIP files; in the following, I explain the case in which the package lives on the filesystem, since the case in which a package lives in a ZIP file is strictly analogous, relying on the hierarchical filesystem-like structure inside the ZIP file.
The module-body of P is in the file P/_ _init_ _.py. You must have a file named P/_ _init_ _.py, even if it's empty (representing an empty module-body) in order to indicate to Python that directory P is indeed a package. The module-body of a package is loaded when you first import the package (or any of the package's modules) and behaves in all respects like any other Python module. The other .py files in directory P are the modules of package P. Subdirectories of P containing _ _init_ _.py files are subpackages of P. (In Python 2.5, all subdirectories of P are subpackages of P, whether such subdirectories contain _ _init_ _.py files or not.) Nesting can continue to any depth.
You can import a module named M in package P as P.M. More dots let you navigate a hierarchical package structure. (A package's module-body is always loaded before any module in the package is loaded.) If you use the syntax import P.M, variable P is bound to the module object of package P, and attribute M of object P is bound to module P.M. If you use the syntax import P.M as V, variable V is bound directly to module P.M.
Using from P import M to import a specific module M from package P is a perfectly acceptable practice: the from statement is specifically okay in this case.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Distribution Utilities (distutils)
Inhaltsvorschau
Python modules, extensions, and applications can be packaged and distributed in several forms:
Compressed archive files
Generally .zip for Windows and .tar.gz (a.k.a. .tgz) for Unix-based systems, but both forms are portable
Self-unpacking or self-installing executables
Normally .exe for Windows
Self-contained, ready-to-run executables that require no installation
For example, .exe for Windows, ZIP archives with a short script prefix on Unix, .app for the Mac, and so on
Platform-specific installers
For example, .msi on Windows, .rpm and .srpm on most Linux distributions, and .deb on Debian GNU/Linux and Ubuntu
Python Eggs
A very popular third-party extension, covered in "Python Eggs" on page 151
When you distribute a package as a self-installing executable or platform-specific installer, a user can then install the package simply by running the installer. How to run such an installer program depends on the platform, but it no longer matters which language the program was written in. How to build self-contained, ready-to run executables for various platforms is covered in Chapter 27.
When you distribute a package as an archive file or as an executable that unpacks but does not install itself, it does matter that the package was coded in Python. In this case, the user must first unpack the archive file into some appropriate directory, say C:\Temp\MyPack on a Windows machine or ~/MyPack on a Unix-like machine. Among the extracted files there should be a script, conventionally named setup.py, which uses the Python facility known as the distribution utilities (standard library package distutils). The distributed package is then almost as easy to install as a self-installing executable would be. The user opens a command-prompt window and changes to the directory into which the archive is unpacked. Then the user runs, for example:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 8: Core Built-ins
Inhaltsvorschau
The term built-in has more than one meaning in Python. In most contexts, a built-in means an object directly accessible to Python code without an import statement. "Python built-ins" on page 141 shows the mechanism that Python uses to allow this direct access. Built-in types in Python include numbers, sequences, dictionaries, sets, functions (covered in Chapter 4), classes (covered in "Python Classes" on page 82), standard exception classes (covered in "Exception Objects" on page 129), and modules (covered in "Module Objects" on page 139). The built-in file object is covered in "File Objects" on page 216, and "Internal Types" on page 331 covers some built-in types intrinsic to Python's internal operation. This chapter provides additional coverage of the core built-in types (in "Built-in Types" on page 154) and covers built-in functions available in module _ _builtin_ _ (in "Built-in Functions" on page 158).
As I mentioned in "Python built-ins" on page 141, some modules are known as "built-in" because they are an integral part of the Python standard library (even though it takes an import statement to access them), as distinguished from separate, optional add-on modules, also called Python extensions. This chapter documents some core built-in modules, essentially those that offer functionality that, in some other languages, is built into the languages themselves: namely, modules sys in "The sys Module" on page 168, copy in "The copy Module" on page 172, collections in "The collections Module" on page 173, functional (2.5 only) in "The functional Module" on page 175, bisect in "The bisect Module" on page 176, heapq in "The heapq Module" on page 177, UserDict in "The UserDict Module" on page 178, optparse in "The optparse Module" on page 179, and itertools in "The itertools Module" on page 183. Chapter 9 covers some string-related core built-in modules (
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Built-in Types
Inhaltsvorschau
This section documents Python's core built-in types, such as int, float, dict, and many others. More details about many of these types, and about operations on their instances, are found throughout Chapter 4. In the rest of this section, by "number" I mean, specifically, "noncomplex number."
basestring
basestring
Description
Noninstantiable (abstract) common basetype of types str and unicode. Used mostly to ascertain whether some object x is a string (either plain or Unicode) by testing isinstance( x , basestring).
bool
bool(x)
Description
Returns False if argument x evaluates as false; returns True if argument x evaluates as true. (See also "Boolean Values" on page 45.) bool is a subclass of int, and built-in names False and True refer to the only two instances of type bool. These instances are also integer numbers, equal to 0 and 1, respectively, but str(True) is 'True', and str(False) is 'False'.
buffer
buffer(obj,offset=0,size=-1)
Description
Returns a read-only buffer object that refers to a compact slice of obj's data, starting at the given offset and with the given size (all the way to the end of obj's data, if size <0, or if obj's data is too short to provide size bytes after the offset). obj must be of a type that supports the buffer call interface, such as a string or array.
classmethod
classmethod(function)
Description
Returns a class method object. In practice, you call this built-in type only within a class body, and, most often, in Python 2.4, you use it as a decorator. See "Class methods" on page 99.
complex
complex(real,imag=0)
Description
Converts any number, or a suitable string, to a complex number. imag may be present only when real is a number and is the imaginary part of the resulting complex number. See also "Complex numbers" on page 40.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Built-in Functions
Inhaltsvorschau
This section documents the Python functions available in module _ _builtin_ _ in alphabetical order. Note that the names of these built-ins are not reserved words. Thus, your program can bind for its own purposes, in local or global scope, an identifier that has the same name as a built-in function. Names bound in local or global scope have priority over names bound in built-in scope, so local and global names hide built-in ones. You can also rebind names in built-in scope, as covered in "Python built-ins" on page 141. Be very careful, however, to avoid accidentally hiding built-ins that your code might need. It's tempting to use, for your own variables, natural names such as file, input, list, filter, but don't do it: these are all names of built-in Python types or functions, and, unless you get into the habit of never shadowing such built-in names with your own, you'll end up with some mysterious bug in your code sooner or later due to such hiding.
Like most built-in functions and types, the functions documented in this section cannot normally be called with named arguments, only with positional ones; in the following, I specifically mention any case in which this limitation does not hold.
_ _import_ _
_ _import_ _(module_name[,globals[,locals[,fromlist]]])
Description
Loads the module named by string module_name and returns the resulting module object. globals, which defaults to the result of globals( ), and locals, which defaults to the result of locals( ) (both covered in this section), are dictionaries that _ _import_ _ treats as read-only and uses only to get context for package-relative imports, covered in "Packages" on page 149. fromlist defaults to an empty list, but can be a list of strings that name the module attributes to be imported in a from statement. See "Module Loading" on page 144 for more details on module loading.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The sys Module
Inhaltsvorschau
The attributes of the sys module are bound to data and functions that provide information on the state of the Python interpreter or affect the interpreter directly. This section documents the most frequently used attributes of sys, in alphabetical order.
argv
The list of command-line arguments passed to the main script. argv[0] is the name or full path of the main script, or '-c' if the -c option was used. See "The optparse Module" on page 179 for a good way to use sys.argv.
Description
displayhook
displayhook(value)
Description
In interactive sessions, the Python interpreter calls displayhook, passing it the result of each expression statement entered. The default displayhook does nothing if value is None; otherwise, it preserves and displays value:
ifvalue is not None:

    _ _builtin_ _._ = value

    print repr(value)
You can rebind sys.displayhook in order to change interactive behavior. The original value is available as sys._ _displayhook_ _.
excepthook
excepthook(type,value,traceback)
Description
When an exception is not caught by any handler, and thus propagates all the way up the call stack, Python calls excepthook, passing it the exception class, exception object, and traceback object, as covered in "Exception Propagation" on page 126. The default excepthook displays the error and traceback. You can rebind sys.excepthook to change what is displayed for uncaught exceptions (just before Python returns to the interactive loop or terminates). The original value is available as sys._ _excepthook_ _.
exc_info
exc_info( )
Description
If the current thread is handling an exception, exc_info returns a tuple whose three items are the class, object, and traceback for the exception. If the current thread is not handling any exception, exc_info returns (None,None,None). A traceback object indirectly holds references to all variables of all functions that propagated the exception. Thus, if you hold a reference to the traceback object (for example, indirectly, by binding a variable to the whole tuple that
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The copy Module
Inhaltsvorschau
As discussed in "Assignment Statements" on page 47, assignment in Python does not copy the righthand side object being assigned. Rather, assignment adds a reference to the righthand side object. When you want a copy of object x, you can ask x for a copy of itself, or you can ask x's type to make a new instance copied from x. If x is a list, list( x ) returns a copy of x, as does x [:]. If x is a dictionary, dict( x ) and x .copy( ) return a copy of x. If x is a set, set( x ) and x .copy( ) return a copy of x. In each of these cases, my strong personal preference is for the uniform and readable idiom of calling the type, but there is no consensus on this style issue in the community of Python experts.
The copy module supplies a copy function to create and returns a copy of many types of objects. Normal copies, such as list( x ) for a list x and copy.copy( x ), are also known as shallow copies: when x has references to other objects (as items or attributes), a normal (shallow) copy of x has distinct references to the same objects. Sometimes, however, you need a deep copy, where referenced objects are copied recursively; fortunately, this requirement is rare, because a deep copy can take a lot of memory and time. Module copy supplies a deepcopy( x ) function to create and return a deep copy.
copy
copy(x)
Description
Creates and returns a shallow copy of x, for x of many types (copies of several types, such as modules, classes, files, frames, and other internal types, are, however, not supported). If x is immutable, copy.copy( x ) may return x itself as an optimization. A class can customize the way copy.copy copies its instances by having a special method _ _copy_ _(self) that returns a new object, a copy of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The collections Module
Inhaltsvorschau
The collections module (introduced in Python 2.4) is intended to eventually supply several interesting types that are collections (i.e., containers).
In Python 2.4, the collections module supplies only one type, deque, whose instances are "double-ended queues" (i.e., sequence-like containers suitable for additions and removals at both ends). Call deque with a single argument, any iterable, to obtain a new deque instance whose items are those of the iterable in the same order, or call deque without arguments to obtain a new empty deque instance. A deque instance d is a mutable sequence and thus can be indexed and iterated on (however, d cannot be sliced, only indexed one item at a time, whether for access, rebinding, or deletion). A deque instance d supplies the following methods.
append
d.append(item)
Description
Appends item at the right (end) of d.
appendleft
d.appendleft(item)
Description
Appends item at the left (start) of d.
clear
d.clear( )
Description
Removes all items from d.
extend
d.extend(iterable)
Description
Appends all items of iterable at the right (end) of d.
extendleft
d.extendleft(item)
Description
Appends all items of iterable at the left (start) of d.
pop
d.pop( )
Description
Removes and returns the last (rightmost) item from d. If d is empty, raises IndexError.
popleft
d.popleft( )
Description
Removes and returns the first (leftmost) item from d. If d is empty, raises IndexError.
rotate
d.rotate(n=1)
Description
Rotates d n steps to the right (if n <0, rotates left).
In Python 2.5, the collections module also supplies type defaultdict. defaultdict subclasses
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The functional Module
Inhaltsvorschau
The functional module (introduced in Python 2.5) is intended to eventually supply several interesting functions and types to support functional programming in Python. At the time of writing, there is debate about renaming the module to functools, or perhaps introducing a separate functools module, to hold higher-order functions not strictly connected to the idioms of functional programming (in particular, built-in decorators). However, again at the time of writing, only one function has been definitively accepted for inclusion in the functional module.
partial
partial(func,*a,**k)
Description
func is any callable. partial returns another callable p that is just like func, but with some positional and/or named parameters already bound to the values given in a and k. In other words, p is a partial application of func, often also known (with debatable correctness, but colorfully) as a currying of func to the given arguments (named in honor of mathematician Haskell Curry). For example, say that we have a list of numbers L and want to clip the negative ones to 0. In Python 2.5, one way to do it is:
L = map(functional.partial(max, 0), L)
as an alternative to the lambda-using snippet:
L = map(lambda x: max(0, x), L)
and the list-comprehension:
L = [max(0, x) for x in L]
functional.partial really comes into its own in situations that require callbacks, such as event-driven programming for GUIs (covered in Chapter 17) and networking applications (covered in "Event-Driven Socket Programs" on page 533).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The bisect Module
Inhaltsvorschau
The bisect module uses a bisection algorithm to keep a list in sorted order as items are inserted. bisect's operation is faster than calling a list's sort method after each insertion. This section documents the main functions supplied by bisect.
bisect
bisect(seq,item,lo=0,hi=None)
Description
Returns the index i into seq where item should be inserted to keep seq sorted. In other words, i is such that each item in seq [: i ] is less than or equal to item, and each item in seq [ i :] is greater than item. seq must be a sorted sequence. For any sorted sequence seq, seq [bisect( seq , y )-1]== y is equivalent to y in seq, but is faster if len( seq ) is large. You may pass optional arguments lo and hi to operate on the slice seq [ lo : hi ].
insort
insort(seq,item,lo=0,hi=None)
Description
Like seq .insert(bisect( seq , item ), item ). In other words, seq must be a sorted mutable sequence (usually a sorted list), and insort modifies seq by inserting item at the right spot so that seq remains sorted. You may pass optional arguments lo and hi to operate on the slice seq [ lo : hi ].
Module bisect also supplies functions bisect_left, bisect_right, insort_left, and insort_right for explicit control of search and insertion strategies into sequences that contain duplicates. bisect is a synonym for bisect_right, and insort is a synonym for insort_right.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The heapq Module
Inhaltsvorschau
The heapq module uses heap algorithms to keep a list in "nearly sorted" order as items are inserted and extracted. heapq's operation is faster than either calling a list's sort method after each insertion or using bisect, and, for many purposes, (such as implementing "priority queues") the nearly sorted order supported by heapq may be just as useful as a fully sorted order. Module heapq supplies the following functions.
heapify
heapify(alist)
Description
Permutes alist as needed to make it satisfy the heap condition: for any i >=0, alist [ i ]<= alist [2* i +1] and alist [ i ]<= alist [2* i +2](if all the indices in question are <len( alist )). Note that if a list satisfies the heap condition, the list's first item is the smallest (or equal-smallest) one. A sorted list satisfies the heap condition, but there are many other permutations of a list that satisfy the heap condition without requiring the list to be fully sorted. heapify runs in O(len( alist )) time.
heappop
heappop(alist)
Description
Removes and returns the smallest (first) item of alist, a list that satisfies the heap condition, and permutes some of the remaining items of alist to ensure the heap condition is still satisfied after the removal. heappop runs in O(log(len( alist ))) time.
heappush
heappush(alist,item)
Description
Inserts the item in alist, a list that satisfies the heap condition, and permutes some items of alist to ensure the heap condition is still satisfied after the insertion. heappush runs in O(log(len( alist ))) time.
heapreplace
heapreplace(alist,item)
Description
Logically equivalent to heappop followed by heappush, similar to:
def heapreplace(alist, item):

    try: return heappop(alist)

    finally: heappush(alist, item)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The UserDict Module
Inhaltsvorschau
The main content of the UserDict module of Python's standard library (analogous to those of now semi-obsolete modules UserList and UserString) used to be classes that emulated the behavior of standard built-in container types. Their usefulness used to be that, in the legacy object model, you could not inherit from built-in types, and what we now call the legacy object model used to be the only object model available in Python. Nowadays, you can simply subclass built-in types such as list and dict, so there isn't much point in emulating the built-in types by Python-coded classes. However, the UserDict module does contain one class that is still extremely useful.
Implementing the full interface defined as a "mapping" (i.e., the interface of a dictionary) takes a lot of programming because dictionaries have so many useful and convenient methods. The UserDict module supplies one class, DictMixin, that makes it easy for you to code classes that offer the complete mapping interface while coding a minimal number of methods: your class just needs to inherit (possibly multiply inherit) from UserDict.DictMixin. At a minimum, your class must define methods _ _getitem_ _, keys, and copy; if instances of your class are meant to be mutable, then your class must also define methods _ _setitem_ _ and _ _delitem_ _.
Optionally, for efficiency, you may also choose to define methods _ _contains_ _, _ _iter_ _, and/or iteritems; if you don't define such methods, your DictMixin superclass defines them on your behalf, but the versions you get by inheritance from DictMixin are probably substantially less efficient than ones you could define yourself on the basis of your knowledge of your class's specific concrete structure (the many other methods of the mapping interface, on the other hand, tend to get reasonably efficient implementations even just by inheritance from DictMixin
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The optparse Module
Inhaltsvorschau
The optparse module offers rich, powerful ways to parse the command-line options (a.k.a. flags) that the user passed upon starting your programs (by using syntax elements such as -x or --foo=bar on the command line, after your program name and before other program arguments). Instantiate (without arguments) the OptionParser class supplied by the module, populate the instance so it knows about your program's options (by calls to its add_option method), and finally call the instance's parse_args method to handle your program's command line, dealing with each option appropriately and returning the collection of option values and a list of nonoption arguments.
optparse supports many advanced ways to customize and fine-tune your program's option-parsing behavior. In most cases, you can accept optparse's reasonable defaults and use it effectively with just the two methods of class OptionParser that I cover here (omitting many advanced options of these methods, which you do not normally need). For all of the powerful and complex details, consult Python's online docs.
An OptionParser instance p supplies the following methods.
add_option
p.add_option(opt_str,*opt_strs,**kw)
Description
The positional arguments specify the option strings, i.e., the strings that the user passes as part of your program's command line to set this option. Each argument can be a short-form option string (a string starting with a single dash [hyphen] followed by a single letter or digit) or a long-form option string (a string starting with two dashes followed by an identifier that may also contain dashes). You normally pass exactly one short-form and one long-form option string, but it's also okay to pass multiple "synonyms," or just short forms, or just long forms.
The named (optional) arguments (kw) are where the action is...literally, too, because the most important argument is the named one,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The itertools Module
Inhaltsvorschau
The iterools module offers many powerful, high-performance building blocks to build or manipulate iterator objects. Manipulating iterators is often better than manipulating lists thanks to iterators' intrinsic "lazy evaluation" approach: items of an iterator are produced one at a time, as needed, while all the items of a list (or other sequence) must exist in memory at the same time (this "lazy" approach even makes it feasible to build and manipulate unbounded iterators, while all lists must always have finite numbers of items).
This section documents the most frequently used attributes of module itertools; each of them is an iterator type, which you can call to get an instance of the type in question.
chain
Builds and returns an iterator whose items are all those from the first iterable passed, followed by all those from the second iterable passed, and so on until the end of the last iterable passed, just like the generator expression:
Description
(item for iterable in

iterables for item in

iterable)
count
Builds and returns an unbounded iterator whose items are consecutive integers starting from firstval, just like the generator:
Description
def count(firstval=0):

    while True:

        yield firstval

        firstval += 1
cycle
Builds and returns an unbounded iterator whose items are the items of iterable, endlessly repeating the items from the beginning each time the cycle reaches the end, just like the generator:
Description
def cycle(iterable):

    buffer = []

    for item in iterable:

        yield item

        buffer.append(item)

    while True:

        for item in buffer: yield item
ifilter
Builds and returns an iterator whose items are those items of iterable for which func is true, just like the generator expression:
Description
(item for item in

iterable if func(item))
func can be any callable object that accepts a single argument, or None. When func is None, ifilter tests for true items, just like the generator expression:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 9: Strings and Regular Expressions
Inhaltsvorschau
Python supports plain and Unicode strings extensively, with statements, operators, built-in functions, methods, and dedicated modules. This chapter covers the methods of string objects, in "Methods of String Objects" on page 186; string formatting, in "String Formatting" on page 193; modules string (in "The string Module" on page 191), pprint (in "The pprint Module" on page 197), and repr (in "The repr Module" on page 198); and issues related to Unicode strings, in "Unicode" on page 198.
Regular expressions let you specify pattern strings and allow searches and substitutions. Regular expressions are not easy to master, but they can be a powerful tool for processing text. Python offers rich regular expression functionality through the built-in re module, documented in "Regular Expressions and the re Module" on page 201.
Plain and Unicode strings are immutable sequences, as covered in "Strings" on page 55. All immutable-sequence operations (repetition, concatenation, indexing, slicing) apply to strings. A string object s also supplies several nonmutating methods, as documented in this section. Unless otherwise noted, each method returns a plain string when s is a plain string, or a Unicode string when s is a Unicode string. Terms such as "letters," "whitespace," and so on, refer to the corresponding attributes of the string module, covered in "The string Module" on page 191. See also "Locale Sensitivity" on page 192.
capitalize
s.capitalize( )
Description
Returns a copy of s where the first character, if a letter, is uppercase, and all other letters, if any, are lowercase.
center
s.center(n,fillchar=' ')
Description
Returns a string of length max(len( s ), n ), with a copy of s in the central part, surrounded by equal numbers of copies of character fillchar on both sides (e.g.,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Methods of String Objects
Inhaltsvorschau
Plain and Unicode strings are immutable sequences, as covered in "Strings" on page 55. All immutable-sequence operations (repetition, concatenation, indexing, slicing) apply to strings. A string object s also supplies several nonmutating methods, as documented in this section. Unless otherwise noted, each method returns a plain string when s is a plain string, or a Unicode string when s is a Unicode string. Terms such as "letters," "whitespace," and so on, refer to the corresponding attributes of the string module, covered in "The string Module" on page 191. See also "Locale Sensitivity" on page 192.
capitalize
s.capitalize( )
Description
Returns a copy of s where the first character, if a letter, is uppercase, and all other letters, if any, are lowercase.
center
s.center(n,fillchar=' ')
Description
Returns a string of length max(len( s ), n ), with a copy of s in the central part, surrounded by equal numbers of copies of character fillchar on both sides (e.g., 'ciao'.center(2) is 'ciao' and 'x'.center(4,'_') is '_x_ _').
count
s.count(sub,start=0,end=sys.maxint)
Description
Returns the number of nonoverlapping occurrences of substring sub in s [ start : end ].
decode
s.decode(codec=None,errors='strict')
Description
Returns a (typically Unicode) string obtained from s with the given coded and error handling. See "Unicode" on page 198 for more details.
encode
s.encode(codec=None,errors='strict')
Description
Returns a plain string obtained from s with the given codec and error handling. See "Unicode" on page 198 for more details.
endswith
s.endswith(suffix,start=0,end=sys.maxint)
Description
Returns True when s [ start : end
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The string Module
Inhaltsvorschau
The string module supplies functions that duplicate each method of string objects, as covered in "Methods of String Objects" on page 186. Each function takes the (plain or Unicode) string object as its first argument. Module string also supplies several useful plain-string attributes:
ascii_letters
The string ascii_lowercase+ascii_uppercase
ascii_lowercase
The string 'abcdefghijklmnopqrstuvwxyz'
ascii_uppercase
The string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
digits
The string '0123456789'
hexdigits
The string '0123456789abcdefABCDEF'
letters
The string lowercase+uppercase
lowercase
A string containing all characters that are deemed lowercase letters: at least 'abcdefghijklmnopqrstuvwxyz', but more letters (e.g., accented ones) may be present, depending on the active locale
octdigits
The string '01234567'
punctuation
The string '!"#$%&\'( )*+,-./:;<=>?@[\\]^_'{|}~' (i.e., all ASCII characters that are deemed punctuation characters in the 'C' locale; does not depend on which locale is active)
printable
The string of those characters that are deemed printable (i.e., digits, letters, punctuation, and whitespace)
uppercase
A string containing all characters that are deemed uppercase letters: at least 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', but more letters (e.g., accented ones) may be present, depending on the active locale
whitespace
A string containing all characters that are deemed whitespace: at least space, tab, linefeed, and carriage return, but more characters (e.g., certain control characters) may be present, depending on the active locale
You should not rebind these attributes, since other parts of the Python library may rely on them and the effects of rebinding them are undefined.
Module string also supplies class Template, covered in "Template Strings" on page 196.
The locale module is covered in "The locale Module" on page 269. Locale setting affects some attributes of module
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
String Formatting
Inhaltsvorschau
In Python, a string-formatting expression has the syntax:
format % values
where format is a plain or Unicode string containing format specifiers and values is any single object or a collection of objects in a tuple or dictionary. Python's string-formatting operator has roughly the same set of features as the C language's printf and operates in a similar way. Each format specifier is a substring of format that starts with a percent sign (%) and ends with one of the conversion characters shown in Table 9-1.
Table 9-1: String-formatting conversion characters
Character
Output format
Notes
d, i
Signed decimal integer
Value must be number.
u
Unsigned decimal integer
Value must be number.
o
Unsigned octal integer
Value must be number.
x
Unsigned hexadecimal integer (lowercase letters)
Value must be number.
X
Unsigned hexadecimal integer (uppercase letters)
Value must be number.
e
Floating-point value in exponential form (lowercase e for exponent)
Value must be number.
E
Floating-point value in exponential form (uppercase E for exponent)
Value must be number.
f, F
Floating-point value in decimal form
Value must be number.
g, G
Like e or E when exp is >=4 or < precision; otherwise, like f or F
exp is the exponent of the number being converted.
c
Single character
Value can be integer or single-character string.
r
String
Converts any value with repr.
s
String
Converts any value with str.
%
Literal % character
Consumes no value.
Between the % and the conversion character, you can specify a number of optional modifiers, as we'll discuss shortly.
The result of a formatting expression is a string that is a copy of format where each format specifier is replaced by the corresponding item of values
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The pprint Module
Inhaltsvorschau
The pprint module pretty-prints complicated data structures, with formatting that may be more readable than that supplied by built-in function repr (covered in repr on page 166). To fine-tune the formatting, you can instantiate the PrettyPrinter class supplied by module pprint and apply detailed control, helped by auxiliary functions also supplied by module pprint. Most of the time, however, one of the two main functions exposed by module pprint suffices.
pformat
pformat(obj)
Description
Returns a string representing the pretty-printing of obj.
pprint
pprint(obj,stream=sys.stdout)
Description
Outputs the pretty-printing of obj to file object stream, with a terminating newline.
The following statements are the same:
print pprint.pformat(x)

pprint.pprint(x)
Either of these constructs will be roughly the same as print x in many cases, such as when the string representation of x fits within one line. However, with something like x =range(30), print x displays x in two lines, breaking at an arbitrary point, while using module pprint displays x over 30 lines, one line per item. You can use module pprint when you prefer the module's specific display effects to the ones of normal string representation.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The repr Module
Inhaltsvorschau
The repr module supplies an alternative to the built-in function repr (covered in repr on page 198), with limits on length for the representation string. To fine-tune the length limits, you can instantiate or subclass the Repr class supplied by module repr and apply detailed control. Most of the time, however, the main function exposed by module repr suffices.
repr
repr(obj)
Description
Returns a string representing obj, with sensible limits on length.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Unicode
Inhaltsvorschau
Plain strings are converted into Unicode strings either explicitly, with the unicode built-in, or implicitly, when you pass a plain string to a function that expects Unicode. In either case, the conversion is done by an auxiliary object known as a codec (for coder-decoder). A codec can also convert Unicode strings to plain strings, either explicitly, with the encode method of Unicode strings, or implicitly.
To identify a codec, pass the codec name to unicode or encode. When you pass no codec name, and for implicit conversion, Python uses a default encoding, normally 'ascii'. You can change the default encoding in the startup phase of a Python program, as covered in "The site and sitecustomize Modules" on page 338; see also setdefaultencoding on page 170. However, such a change is not a good idea for most "serious" Python code: it might too easily interfere with code in the standard Python libraries or third-party modules, written to expect the normal 'ascii'.
Every conversion has a parameter errors, a string specifying how conversion errors are to be handled. The default is 'strict', meaning any error raises an exception. When errors is 'replace', the conversion replaces each character that causes an error with '?' in a plain-string result and with u'\ufffd' in a Unicode result. When errors is 'ignore', the conversion silently skips characters that cause errors. When errors is 'xmlcharrefreplace', the conversion replaces each character that causes an error with the XML character reference representation of that character in the result. You may also code your own function to implement a conversion-error-handling strategy and register it under an appropriate name by calling codecs.register_error.
The mapping of codec names to codec objects is handled by the codecs module. This module also lets you develop your own codec objects and register them so that they can be looked up by name, just like built-in codecs. Module
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Regular Expressions and the re Module
Inhaltsvorschau
A regular expression (RE) is a string that represents a pattern. With RE functionality, you can check any string with the pattern and see if any part of the string matches the pattern.
The re module supplies Python's RE functionality. The compile function builds a RE object from a pattern string and optional flags. The methods of a RE object look for matches of the RE in a string or perform substitutions. Module re also exposes functions equivalent to a RE's methods, but with the RE's pattern string as the first argument.
REs can be difficult to master, and this book does not purport to teach them; I cover only the ways in which you can use REs in Python. For general coverage of REs, I recommend the book Mastering Regular Expressions, by Jeffrey Friedl (O'Reilly). Friedl's book offers thorough coverage of REs at both tutorial and advanced levels. Many tutorials and references on REs can also be found online.
The pattern string representing a regular expression follows a specific syntax:
  • Alphabetic and numeric characters stand for themselves. A RE whose pattern is a string of letters and digits matches the same string.
  • Many alphanumeric characters acquire special meaning in a pattern when they are preceded by a backslash (\).
  • Punctuation works the other way around: self-matching when escaped, special meaning when unescaped.
  • The backslash character is matched by a repeated backslash (i.e., the pattern \\).
Since RE patterns often contain backslashes, you often specify them using raw-string syntax (covered in "Strings" on page 40). Pattern elements (e.g., r'\t', equivalent to the non-raw string literal '\\t') do match the corresponding special characters (e.g., the tab character '\t'). Therefore, you can use raw-string syntax even when you do need a literal match for some such special character.
Table 9-2 lists the special elements in RE pattern syntax. The exact meanings of some pattern elements change when you use optional flags, together with the pattern string, to build the RE object. The optional flags are covered in "Optional Flags" on page 205.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 10: File and Text Operations
Inhaltsvorschau
This chapter covers most of the issues related to dealing with files and the filesystem in Python. A file is a stream of bytes that a program can read and/or write; a filesystem is a hierarchical repository of files on a computer system.
Because files are such a crucial concept in programming, even though this chapter is the largest one in the book, several other chapters also contain material that is relevant when you're handling specific kinds of files. In particular, Chapter 11 deals with many kinds of files related to persistence and database functionality (marshal files in "The marshal Module" on page 278, pickle files in "The pickle and cPickle Modules" on page 279, shelve files in "The shelve Module" on page 284, DBM and DBM-like files in "DBM Modules" on page 285, Berkeley database files in "Berkeley DB Interfacing" on page 288), Chapter 23 deals with files in HTML format, and Chapter 24 deals with files in XML format.
First of all, this chapter, in "File Objects" on page 216, discusses the most typical way in which Python programs read and write data, which is via built-in file objects. Immediately after that, the chapter covers the polymorphic concept of file-like objects (objects that are not files but behave to some extent like files) in "File-Like Objects and Polymorphism" on page 222.
The chapter next covers modules that deal with temporary files and file-like objects (tempfile in "The tempfile Module" on page 223, StringIO and cStringIO in "The StringIO and cStringIO Modules" on page 229).
Next comes the coverage of modules that help you access the contents of text and binary files (fileinput in "The fileinput Module" on page 224, linecache in "The linecache Module" on page 226, struct in "The struct Module" on page 227) and support compressed files and other data archives (gzip in "The gzip Module" on page 230,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Other Chapters That Also Deal with Files
Inhaltsvorschau
Because files are such a crucial concept in programming, even though this chapter is the largest one in the book, several other chapters also contain material that is relevant when you're handling specific kinds of files. In particular, Chapter 11 deals with many kinds of files related to persistence and database functionality (marshal files in "The marshal Module" on page 278, pickle files in "The pickle and cPickle Modules" on page 279, shelve files in "The shelve Module" on page 284, DBM and DBM-like files in "DBM Modules" on page 285, Berkeley database files in "Berkeley DB Interfacing" on page 288), Chapter 23 deals with files in HTML format, and Chapter 24 deals with files in XML format.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Organization of This Chapter
Inhaltsvorschau
First of all, this chapter, in "File Objects" on page 216, discusses the most typical way in which Python programs read and write data, which is via built-in file objects. Immediately after that, the chapter covers the polymorphic concept of file-like objects (objects that are not files but behave to some extent like files) in "File-Like Objects and Polymorphism" on page 222.
The chapter next covers modules that deal with temporary files and file-like objects (tempfile in "The tempfile Module" on page 223, StringIO and cStringIO in "The StringIO and cStringIO Modules" on page 229).
Next comes the coverage of modules that help you access the contents of text and binary files (fileinput in "The fileinput Module" on page 224, linecache in "The linecache Module" on page 226, struct in "The struct Module" on page 227) and support compressed files and other data archives (gzip in "The gzip Module" on page 230, bz2 in "The bz2 Module" on page 232, tarfile in "The tarfile Module" on page 233, zipfile in "The zipfile Module" on page 235, zlib in "The zlib Module" on page 239).
In Python, the os module supplies many of the functions that operate on the filesystem, so this chapter continues by introducing the os module in "The os Module" on page 240. The chapter then covers, in "Filesystem Operations" on page 241, operations on the filesystem (comparing, copying, and deleting directories and files, working with file paths, and accessing low-level file descriptors) offered by os (in "File and Directory Functions of the os Module" on page 242), os.path (in "The os.path Module" on page 246), and other modules (dircache in "listdir", stat in "The stat Module" on page 249, filecmp in "The filecmp Module" on page 250, and shutil in "The shutil Module" on page 252).
Many modern programs rely on a graphical user interface (GUI) (as covered in Chapter 17), but text-based, nongraphical user interfaces are still useful, since they're simple, fast to program, and lightweight. This chapter concludes with material about text input and output in Python in "Text Input and Output" on page 256, richer text I/O in "Richer-Text I/O" on page 258, interactive command-line sessions in "Interactive Command Sessions" on page 265, and, finally, information about presenting text that is understandable to different users, no matter where they are or what language they speak, in "Internationalization" on page 269. This subject is generally known as
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
File Objects
Inhaltsvorschau
As mentioned in "Organization of This Chapter" on page 215, file is a built-in type in Python and the single most common way for your Python programs to read or write data. With a file object, you can read and/or write data to a file as seen by the underlying operating system. Python reacts to any I/O error related to a file object by raising an instance of built-in exception class IOError. Errors that cause this exception include open failing to open or create a file, calls to a method on a file object to which that method doesn't apply (e.g., calling write on a read-only file object, or calling seek on a nonseekable file), and I/O errors diagnosed by a file object's methods. This section covers file objects, as well as the important issue of making temporary files.
To create a Python file object, call the built-in open with the following syntax:
open(filename, mode='r', bufsize=-1)
open opens the file named by plain string filename, which denotes any path to a file. open returns a Python file object f, which is an instance of the built-in type file. Currently, calling file directly is like calling open, but you should call open, which may become a factory function in some future release of Python. If you explicitly pass a mode string, open can also create filename if the file does not already exist (depending on the value of mode, as we'll discuss in a moment). In other words, despite its name, open is not just for opening existing files: it can also create new ones.

Section 10.3.1.1: File mode

mode is a string that indicates how the file is to be opened (or created). mode can be:
'r'
The file must already exist, and it is opened in read-only mode.
'w'
The file is opened in write-only mode. The file is truncated and overwritten if it already exists, or created if it does not exist.
'a'
The file is opened in write-only mode. The file is kept intact if it already exists, and the data you write is appended to what's already in the file. The file is created if it does not exist. Calling
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Auxiliary Modules for File I/O
Inhaltsvorschau
File objects supply all the minimal indispensable functionality needed for file I/O. Some auxiliary Python library modules, however, offer convenient supplementary functionality, making I/O even easier and handier in several important cases.
The fileinput module lets you loop over all the lines in a list of text files. Performance is good, comparable to the performance of direct iteration on each file, since fileinput uses buffering to minimize I/O. You can therefore use module fileinput for line-oriented file input whenever you find the module's rich functionality convenient, with no worries about performance. The input function is the key function of module fileinput, and the module also provides a FileInput class whose methods support the same functionality as the module's functions.
close
close( )
Description
Closes the whole sequence so that iteration stops and no file remains open.
FileInput
class FileInput(files=None, inplace=False, backup='', bufsize=0)
Description
Creates and returns an instance f of class FileInput. Arguments are the same as for fileinput.input, and methods of f have the same names, arguments, and semantics as functions of module fileinput. f also supplies a method readline, which reads and returns the next line. You can use class FileInput explicitly when you want to nest or mix loops that read lines from more than one sequence of files.
filelineno
filelineno( )
Description
Returns the number of lines read so far from the file now being read. For example, returns 1 if the first line has just been read from the current file.
filename
filename( )
Description
Returns the name of the file being read, or None if no line has been read yet.
input
input(files=None, inplace=False, backup='', bufsize=0)
Description
Returns the sequence of lines in the files, suitable for use in a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The StringIO and cStringIO Modules
Inhaltsvorschau
You can implement file-like objects by writing Python classes that supply the methods you need. If all you want is for data to reside in memory, rather than on a file as seen by the operating system, use modules StringIO or cStringIO. The two modules are almost identical: each supplies a factory that is callable to create in-memory file-like objects. The difference between them is that objects created by module StringIO are instances of class StringIO.StringIO. You may inherit from this class to create your own customized file-like objects, overriding the methods that you need to specialize, and you can perform both input and output on objects of this class. Objects created by module cStringIO, on the other hand, are instances of either of two special-purpose types (one just for input, the other just for output), not of a class. Performance is better when you can use cStringIO, but inheritance is not supported, and neither is doing both input and output on the same object. Furthermore, cStringIO does not support Unicode.
Each module supplies a factory function StringIO that returns a file-like object fl.
StringIO
StringIO([s])
Description
Creates and returns an in-memory file-like object fl, with all the methods and attributes of a built-in file object. The data contents of fl are initialized to be a copy of argument s, which must be a plain string for the StringIO factory function in cStringIO, though it can be a plain or Unicode string for the function in StringIO. When s is present, cStringIO.StringIO produces an object suitable for reading from; when s is not present, cStringIO.StringIO produces an object suitable for writing to.
Besides all methods and attributes of built-in file objects (as covered in "Attributes and Methods of File Objects" on page 218), fl supplies one supplementary method,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Compressed Files
Inhaltsvorschau
Storage space and transmission bandwidth are increasingly cheap and abundant, but in many cases you can save such resources, at the expense of some computational effort, by using compression. Computational power grows cheaper and more abundant even faster than other resources, such as bandwidth, so compression's popularity keeps growing. Python makes it easy for your programs to support compression, since the Python standard library contains several modules dedicated to compression.
Since Python offers so many ways to deal with compression, some guidance may be helpful. Files containing data compressed with the zlib module are not automatically interchangeable with other programs, except for those files built with the zipfile module, which respects the standard format of ZIP file archives. You can write custom programs, with any language able to use InfoZip's free zlib compression library, to read files produced by Python programs using the zlib module. However, if you do need to interchange compressed data with programs coded in other languages, but have a choice of compression methods, I suggest you use modules bzip2 (best), gzip, or zipfile instead. Module zlib, however, may be useful when you want to compress some parts of datafiles that are in some proprietary format of your own and need not be interchanged with any other program except those that make up your own application.
The gzip module lets you read and write files compatible with those handled by the powerful GNU compression programs gzip and gunzip. The GNU programs support many compression formats, but module gzip supports only the highly effective native gzip format, normally denoted by appending the extension .gz to a filename. Module gzip supplies the GzipFile class and an open factory function.
GzipFile
class GzipFile(filename=None, mode=None, compresslevel=9,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The os Module
Inhaltsvorschau
The os module is an umbrella module that presents a reasonably uniform cross-platform view of the different capabilities of various operating systems. The module provides ways to create and handle files and directories, and to create, manage, and destroy processes. This section covers the filesystem-related capabilities of the os module; "Running Other Programs with the os Module" on page 354 covers the process-related capabilities.
The os module supplies a name attribute, which is a string that identifies the kind of platform on which Python is being run. Common values for name are 'posix' (all kinds of Unix-like platforms, including Mac OS X), 'nt' (all kinds of 32-bit Windows platforms), 'mac' (old Mac systems), and 'java' (Jython). You can often exploit unique capabilities of a platform, at least in part, through functions supplied by os. However, this book deals with cross-platform programming, not with platform-specific functionality, so I do not cover parts of os that exist only on one kind of platform, nor do I cover platform-specific modules. All functionality covered in this book is available at least on both 'posix' and 'nt' platforms. However, I do cover any differences among the ways in which a given functionality is provided on different platforms.
When a request to the operating system fails, os raises an exception, which is an instance of OSError. os also exposes built-in exception class OSError with the name os.error. Instances of OSError expose three useful attributes:
errno
The numeric error code of the operating system error
strerror
A string that summarily describes the error
filename
The name of the file on which the operation failed (for file-related functions only)
os functions can also raise other standard exceptions, typically TypeError or ValueError, when the cause of the error is that you have called them with invalid argument types or values so that the underlying operating system functionality has not even been attempted.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Filesystem Operations
Inhaltsvorschau
Using the os module, you can manipulate the filesystem in a variety of ways: creating, copying, and deleting files and directories, comparing files, and examining filesystem information about files and directories. This section documents the attributes and methods of the os module that you use for these purposes, and covers some related modules that operate on the filesystem.
A file or directory is identified by a string, known as its path, whose syntax depends on the platform. On both Unix-like and Windows platforms, Python accepts Unix syntax for paths, with a slash (/) as the directory separator. On non-Unix-like platforms, Python also accepts platform-specific path syntax. On Windows, in particular, you may use a backslash (\) as the separator. However, you then need to double-up each backslash as \\ in string literals, or use raw-string syntax as covered in "Literals" on page 37; you also needlessly lose portability. Unix path syntax is handier, and usable everywhere, so I strongly recommend that you always use it. In the rest of this chapter, for brevity, I assume Unix path syntax in both explanations and examples.
Module os supplies attributes that provide details about path strings on the current platform. You should typically use the higher-level path manipulation operations covered in "The os.path Module" on page 246 rather than lower-level string operations based on these attributes. However, the attributes may be useful at times.
curdir
The string that denotes the current directory ('.' on Unix and Windows)
defpath
The default search path for programs, used if the environment lacks a PATH environment variable
linesep
The string that terminates text lines ('\n' on Unix; '\r\n' on Windows)
extsep
The string that separates the extension part of a file's name from the rest of the name ('.'
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Text Input and Output
Inhaltsvorschau
Python presents non-GUI text input and output channels to your programs as file objects, so you can use the methods of file objects (covered in "Attributes and Methods of File Objects" on page 218) to manipulate these channels.
The sys module (covered in "The sys Module" on page 168) has attributes stdout and stderr, which are writeable file objects. Unless you are using shell redirection or pipes, these streams connect to the terminal running your script. Nowadays, actual terminals are rare: a so-called "terminal" is generally a screen window that supports text I/O (e.g., a Command Prompt console on Windows or an xterm window on Unix).
The distinction between sys.stdout and sys.stderr is a matter of convention. sys.stdout, known as your script's standard output, is where your program emits results. sys.stderr, known as your script's standard error, is where error messages go. Separating results from error messages helps you use shell redirection effectively. Python respects this convention, using sys.stderr for errors and warnings.
Programs that output results to standard output often need to write to sys.stdout. Python's print statement (covered in "The print Statement" on page 61) can be a convenient alternative to sys.stdout.write.
print works well for the kind of informal output used during development to help you debug your code. For production output, you often need more control of formatting than print affords. You may need to control spacing, field widths, number of decimals for floating-point values, and so on. In this case, prepare the output as a string with the string-formatting operator % (covered in "String Formatting" on page 193), then output the string, normally with the write method of the appropriate file object. (You can also pass formatted strings to print, but
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Richer-Text I/O
Inhaltsvorschau
The tools we have covered so far support the minimal subset of text I/O functionality that all platforms supply. Most platforms also offer richer-text I/O capabilities, such as responding to single keypresses (not just entire lines of text) and showing text in any spot on the terminal (not just sequentially).
Python extensions and core Python modules let you access platform-specific functionality. Unfortunately, various platforms expose this functionality in different ways. To develop cross-platform Python programs with rich-text I/O functionality, you may need to wrap different modules uniformly, importing platform-specific modules conditionally (usually with the try/except idiom covered in try/except on page 122).
The readline module wraps the GNU Readline Library. GNU Readline lets the user edit text lines during interactive input, and recall previous lines for editing and reentry. Readline is installed on many Unix-like platforms, and it's available at http://cnswww.cns.cwru.edu/~chet/readline/rltop.html. A Windows port (http://starship.python.net/crew/kernr/) is available, but is not widely deployed. Chris Gonnerman's module, Alternative Readline for Windows, implements a subset of Python's standard readline module (using a small dedicated .pyd file instead of Readline) and is found at http://newcenturycomputers.net/projects/readline.html. One way to use Readline on Windows is to install Gary Bishop's version of readline (http://sourceforge.net/projects/uncpythontools); this version does, however, require two other Python add-ons (ctypes and PyWin32), and so is not quite trivial to install.
When readline is available, Python uses it for all line-oriented input, such as raw_input. The interactive Python interpreter always tries to load readline to enable line editing and recall for interactive sessions. Some readline
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Interactive Command Sessions
Inhaltsvorschau
The cmd module offers a simple way to handle interactive sessions of commands. Each command is a line of text. The first word of each command is a verb defining the requested action. The rest of the line is passed as an argument to the method that implements the verb's action.
Module cmd supplies class Cmd to use as a base class, and you define your own subclass of cmd.Cmd. Your subclass supplies methods with names starting with do_ and help_, and may optionally override some of Cmd's methods. When the user enters a command line such as verb and the rest, as long as your subclass defines a method named do_ verb, Cmd.onecmd calls:
self.do_verb('and the rest')
Similarly, as long as your subclass defines a method named help_ verb, Cmd.do_help calls the method when the command line starts with 'help verb ' or '? verb '. Cmd, by default, shows suitable error messages if the user tries to use, or asks for help about, a verb for which the subclass does not define the needed method.
Your subclass of cmd.Cmd, if it defines its own _ _init_ _ special method, must call the base class's _ _init_ _, whose signature is as follows.
_ _init_ _
Cmd._ _init_ _(self, completekey='Tab', stdin=sys.stdin, stdout=sys.stdout)
Description
Initializes instance self with specified or default values for completekey (name of the key to use for command completion with the readline module; pass None to disable command completion), stdin (file object to get input from), and stdout (file object to emit output to).
If your subclass does not define _ _init_ _, then it inherits the one from base class cmd.Cmd. In this case, to instantiate your subclass, call it, with optional parameters completekey, stdin, and stdout, as documented in the previous paragraph.
An instance
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Internationalization
Inhaltsvorschau
Most programs present some information to users as text. Such text should be understandable and acceptable to the user. For example, in some countries and cultures, the date "March 7" can be concisely expressed as "3/7." Elsewhere, "3/7" indicates "July 3," and the string that means "March 7" is "7/3." In Python, such cultural conventions are handled with the help of standard module locale.
Similarly, a greeting can be expressed in one natural language by the string "Benvenuti," while in another language the string to use is "Welcome." In Python, such translations are handled with the help of standard module gettext.
Both kinds of issues are commonly called internationalization (often abbreviated i18n, as there are 18 letters between i and n in the full spelling). This is a misnomer, as the same issues also apply to users of different languages or cultures within a single nation.
Python's support for cultural conventions imitates that of C, though it is slightly simplified. In this architecture, a program operates in an environment of cultural conventions known as a locale. The locale setting permeates the program and is typically set at program startup. The locale is not thread-specific, and module locale is not thread-safe. In a multithreaded program, set the program's locale before starting secondary threads.
If a program does not call locale.setlocale, the program operates in a neutral locale known as the C locale. The C locale is named from this architecture's origins in the C language and is similar, but not identical, to the U.S. English locale. Alternatively, a program can find out and accept the user's default locale. In this case, module locale interacts with the operating system (via the environment or in other system-dependent ways) to establish the user's preferred locale. Finally, a program can set a specific locale, presumably determining which locale to set on the basis of user interaction or via persistent configuration settings such as a program initialization file.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 11: Persistence and Databases
Inhaltsvorschau
Python supports a variety of ways of making data persistent. One such way, known as serialization, involves viewing the data as a collection of Python objects. These objects can be saved, or serialized, to a byte stream, and later loaded and recreated, or deserialized, back from the byte stream. Object persistence is layered on top of serialization and adds such features as object naming. This chapter covers the built-in Python modules that support serialization and object persistence.
Another way to make data persistent is to store it in a database. One simple type of database is just a file format that uses keyed access to enable selective reading and updating of relevant parts of the data. This chapter covers Python standard library modules that support several variations of this file format, known as DBM.
A relational database management system (RDBMS), such as MySQL or Oracle, provides a more powerful approach to storing, searching, and retrieving persistent data. Relational databases rely on dialects of Structured Query Language (SQL) to create and alter a database's schema, insert and update data in the database, and query the database according to search criteria. (This chapter does not provide reference material on SQL. For that purpose, I recommend SQL in a Nutshell, by Kevin Kline [O'Reilly].) Unfortunately, despite the existence of SQL standards, no two RDBMSes implement exactly the same SQL dialect.
The Python standard library does not come with an RDBMS interface. However, many free third-party modules let your Python programs access a specific RDBMS. Such modules mostly follow the Python Database API 2.0 standard, also known as the DBAPI. This chapter covers the DBAPI standard and mentions some of the third-party modules that implement it.
Python supplies a number of modules dealing with I/O operations that serialize (save) entire Python objects to various kinds of byte streams and deserialize (load and recreate) Python objects back from such streams. Serialization is also known as
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Serialization
Inhaltsvorschau
Python supplies a number of modules dealing with I/O operations that serialize (save) entire Python objects to various kinds of byte streams and deserialize (load and recreate) Python objects back from such streams. Serialization is also known as marshaling.
The marshal module supports the specific serialization tasks needed to save and reload compiled Python files (.pyc and .pyo). marshal handles only fundamental built-in data types: None, numbers (int, long, float, complex), strings (plain and Unicode), code objects, and built-in containers (tuple, list, dict) whose items are instances of elementary types. marshal does not handle sets nor user-defined types and classes. marshal is faster than other serialization modules, and is the one such module that supports code objects. Module marshal supplies the following functions.

dump, dumps
dump(value,fileobj)
Description
dumps(value)
dumps returns a string representing object value. dump writes the same string to file object fileobj, which must be opened for writing in binary mode. dump( v , f ) is just like f .write(dumps( v )). fileobj cannot be any file-like object: it must be specifically an instance of type file.
load, loads
load(fileobj)
Description
loads(str)
loads creates and returns the object v previously dumped to string str so that, for any object v of a supported type, v ==loads(dumps( v )). If str is longer than dumps( v ), loads ignores the extra bytes. load reads the right number of bytes from file object fileobj, which must be opened for reading in binary mode, and creates and returns the object v represented by those bytes.

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
DBM Modules
Inhaltsvorschau
A DBM-like file is a file that contains pairs of strings ( key , data ), with support for fetching or storing the data given a key, known as keyed access. DBM-like files were developed on early Unix systems, with functionality roughly equivalent to that of access methods popular on mainframe and minicomputers of the time, such as ISAM, the Indexed-Sequential Access Method. Today, many libraries, available for many platforms, let programs written in many different languages create, update, and read DBM-like files.
Keyed access, while not as powerful as the data access functionality of relational databases, may often suffice for a program's needs. If DBM-like files are sufficient, you may end up with a program that is smaller and faster than one using an RDBMS.
The classic dbm library, whose first version introduced DBM-like files many years ago, has limited functionality but tends to be available on many Unix platforms. The GNU version, gdbm, is richer and very widespread. The BSD version, dbhash, offers superior functionality. Python supplies modules that interface with each of these libraries if the relevant underlying library is installed on your system. Python also offers a minimal DBM module, dumbdbm (usable anywhere, as it does not rely on other installed libraries), and generic DBM modules, which are able to automatically identify, select, and wrap the appropriate DBM library to deal with an existing or new DBM file. Depending on your platform, your Python distribution, and what dbm-like libraries you have installed on your computer, the default Python build may install some subset of these modules. In general, as a minimum, you can rely on having module dbm on all Unix-like platforms, module dbhash on Windows, and dumbdbm on any platform.
The anydbm module is a generic interface to any other DBM module. anydbm supplies a single factory function.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Berkeley DB Interfacing
Inhaltsvorschau
Python comes with the bsddb package, which wraps the Berkeley Database (also known as BSD DB) library if that library is installed on your system and your Python installation is built to support it. With the BSD DB library, you can create hash, binary-tree, or record-based files that generally behave like persistent dictionaries. On Windows, Python includes a port of the BSD DB library, thus ensuring that module bsddb is always usable. To download BSD DB sources, binaries for other platforms, and detailed documentation on BSD DB itself, see http://www.sleepycat.com.
Module bsddb itself provides a simplified, backward-compatible interface to a subset of BSD DB's functionality, as covered by the Python online documentation at http://www.python.org/doc/2.4/lib/module-bsddb.html. However, the standard Python library also comes with many modules in package bsddb, starting with bsddb.db. This set of modules closely mimics BSD DB's current rich, complex functionality and interfaces, and is documented at http://pybsddb.sourceforge.net/bsddb3.html. At this URL, you'll see the package documented under the slightly different name bsddb3, which is the name of a package you can separately download and install even on very old versions of Python. However, to use the version of this package that comes as part of the Python standard library, what you need to import are modules named bsddb.db and the like, not bsddb3.db and the like. Apart from this naming detail, the Sourceforge documentation fully applies to the modules in package bsddb in the Python standard library (db, dbshelve, dbtables, dbutil, dbobj, dbrecio).
Entire books can be (and have been) written about the full interface to BSD DB and its functionality, so I do not cover this rich, complete, and complex interface in this book. (If you need to exploit BSD DB's complete functionality, I suggest, in addition to studying the URLs mentioned above, the book
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Python Database API (DBAPI) 2.0
Inhaltsvorschau
As I mentioned earlier, the Python standard library does not come with an RDBMS interface, but there are many free third-party modules that let your Python programs access specific databases. Such modules mostly follow the Python Database API 2.0 standard, also known as the DBAPI. A new version of the DBAPI (possibly to be known as 3.0) is likely to appear in the future, but currently there are no firm plans or schedules for one. Programs written against DBAPI 2.0 should work with minimal or no changes with any future DBAPI 3.0, although 3.0, if and when it comes, will no doubt offer further enhancements that future programs will be able to take advantage of.
If your Python program runs only on Windows, you might prefer to access databases by using Microsoft's ADO package through COM. For more information on using Python on Windows, see Python Programming on Win32, by Mark Hammond and Andy Robinson (O'Reilly). Since ADO and COM are platform-specific, and this book focuses on cross-platform use of Python, I do not cover ADO and COM further in this book. However, at http://adodbapi.sourceforge.net/ you will find a useful Python extension that lets you access ADO indirectly through DBAPI.
After importing a DBAPI-compliant module, call the module's connect function with suitable parameters. connect returns an instance of Connection, which represents a connection to the database. The instance supplies commit and rollback methods to deal with transactions, a close method to call as soon as you're done with the database, and a cursor method to return an instance of Cursor. The cursor supplies the methods and attributes used for database operations. A DBAPI-compliant module also supplies exception classes, descriptive attributes, factory functions, and type-description attributes.
A DBAPI-compliant module supplies exception classes Warning, Error, and several subclasses of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 12: Time Operations
Inhaltsvorschau
A Python program can handle time in several ways. Time intervals are floating-point numbers in units of seconds (a fraction of a second is the fractional part of the interval). Particular instants in time are expressed in seconds since a reference instant, known as the epoch. (Midnight, UTC, of January 1, 1970, is a popular epoch used on both Unix and Windows platforms.) Time instants often also need to be expressed as a mixture of units of measurement (e.g., years, months, days, hours, minutes, and seconds), particularly for I/O purposes. I/O, of course, also requires the ability to format times and dates into human-readable strings, and parse them back from string formats.
This chapter covers the time module, which supplies Python's core time-handling functionality. The time module is somewhat dependent on the underlying system's C library. The chapter also presents the datetime, sched, and calendar modules from the standard Python library, the third-party modules dateutil and pytz, and some essentials of the popular extension mx.DateTime. mx.DateTime has been around for many years, with behavior across platforms more uniform than time's, which helps account for its popularity, particularly for date-time representation in database interfaces.
The underlying C library determines the range of dates that the time module can handle. On Unix systems, years 1970 and 2038 are typical cut-off points, a limitation that datetime and mx.DateTime let you avoid. Time instants are normally specified in UTC (Coordinated Universal Time, once known as GMT, or Greenwich Mean Time). Module time also supports local time zones and daylight saving time (DST), but only to the extent that support is supplied by the underlying C system library.
As an alternative to seconds since the epoch, a time instant can be represented by a tuple of nine integers, called a time-tuple. (Time-tuples are covered in Table 12-1.) All items are integers: time-tuples don't keep track of fractions of a second. A time-tuple is an instance of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The time Module
Inhaltsvorschau
The underlying C library determines the range of dates that the time module can handle. On Unix systems, years 1970 and 2038 are typical cut-off points, a limitation that datetime and mx.DateTime let you avoid. Time instants are normally specified in UTC (Coordinated Universal Time, once known as GMT, or Greenwich Mean Time). Module time also supports local time zones and daylight saving time (DST), but only to the extent that support is supplied by the underlying C system library.
As an alternative to seconds since the epoch, a time instant can be represented by a tuple of nine integers, called a time-tuple. (Time-tuples are covered in Table 12-1.) All items are integers: time-tuples don't keep track of fractions of a second. A time-tuple is an instance of struct_time. You can use it as a tuple, and access the items as read-only attributes x .tm_year, x .tm_mon, and so on, with the attribute names listed in Table 12-1. Wherever a function requires a time-tuple argument, you can pass an instance of struct_time or any other sequence whose items are nine integers in the right ranges.
Table 12-1: Tuple form of time representation
Item
Meaning
Field name
Range
Notes
0
Year
tm_year
1970-2038
Wider on some platforms.
1
Month
tm_mon
1–12
1 is January; 12 is December.
2
Day
tm_mday
1–31
3
Hour
tm_hour
0–23
0 is midnight; 12 is noon.
4
Minute
tm_min
0–59
5
Second
tm_sec
0–61
60 and 61 for leap seconds.
6
Weekday
tm_wday
0–6
0 is Monday; 6 is Sunday.
7
Year day
tm_yday
1–366
Day number within the year.
8
DST flag
tm_isdst
-1 to 1
-1 means library determines DST.
To translate a time instant from a "seconds since the epoch" floating-point value into a time-tuple, pass the floating-point value to a function (e.g., localtime) that returns a time-tuple with all nine items valid. When you convert in the other direction,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The datetime Module
Inhaltsvorschau
datetime provides classes representing date and time objects, which can be either aware of time zones or naïve (the default). Class tzinfo is abstract: module datetime supplies no implementation (for all the gory details, see http://docs.python.org/lib/datetime-tzinfo.html). (See module pytz, in "The pytz Module" on page 313, for a good, simple implementation of tzinfo, which lets you easily create aware objects.) All the types in module datetime have immutable instances, so the instances' attributes are all read-only and the instances can be keys in a dict or items in a set.
Class date instances represent a date (no time of day in particular within that date), are always naïve, and assume the Gregorian calendar was always in effect. date instances have three read-only integer attributes: year, month, and day.
date
date(year,month,day)
Description
Class date also supplies some class methods usable as alternative constructors.
fromordinal
date.fromordinal(ordinal)
Description
Returns a date object corresponding to the proleptic Gregorian ordinal ordinal, where a value of 1 corresponds to the first day of the year 1.
fromtimestamp
date.fromtimestamp(timestamp)
Description
Returns a date object corresponding to the instant timestamp expressed in seconds since the epoch.
today
date.today( )
Description
Returns a date object representing today's date.
Instances of class date support some arithmetic: the difference between date instances is a timedelta instance, and you can add or subtract a date instance and a timedelta instance to construct another date instance. You can also compare any two instances of class date (the one that's later in time is considered greater).
An instance d of class date also supplies the following methods.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The pytz Module
Inhaltsvorschau
The third-party pytz module offers the simplest way to create tzinfo instances, which are used to make time zone-aware instances of classes time and datetime. (pytz is based on the Olson library of time zones, documented at http://www.twinsun.com/tz/tz-link.htm. pytz is available at http://pytz.sourceforge.net/.) The best general way to program around the traps and pitfalls of time zones is to always use the UTC time zone internally, converting to other time zones only for display purposes.
pytz supplies common_timezones, a list of over 400 strings that name the most common time zones you might want to use (mostly of the form continent / city, with some alternatives such as 'UTC' and 'US/Pacific'), and all_timezones, a list of over 500 strings that also supply some synonyms for the time zones. For example, to specify the time zone of Lisbon, Portugal, by Olson library standards, you would normally use the string 'Europe/Lisbon', and that is what you find in common_timezones; however, you may also use 'Portugal', which you find only in all_timezones. pytz also supplies utc and UTC, two names for the same object, a tzinfo instance representing Coordinated Universal Time (UTC).
pytz also supplies two functions.
country_timezones
country_timezones(code)
Description
Returns a list of time zone names corresponding to the country whose two-letter ISO code is code. For example, pytz.country_timezones('US') returns a list of 22 strings, from 'America/New_York' to 'Pacific/Honolulu'.
timezone
timezone(name)
Description
Returns an instance of tzinfo corresponding to the time zone named name.
For example, to display the Honolulu equivalent of midnight, December 31, 2005, in New York:
dt = datetime.datetime(2005,12,31,tzinfo=pytz.timezone('America/New_York'))

print dt.astimezone(pytz.timezone('Pacific/Honolulu'))

# emits2005-12-30 19:00:00-10:00
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The dateutil Module
Inhaltsvorschau
The third-party package dateutil (http://labix.org/python-dateutil) offers modules to manipulate dates in various ways, including time deltas, recurrence, time zones, and fuzzy parsing. (See the module's web site for complete documentation of its rich functionality.) dateutil's main modules are the following.
easter
easter.easter(year)
Description
Returns the datetime.date object for Easter of the given year. For example:
from dateutil import easter print easter.easter(2006)
emits 2006-04-16.
parser
parser.parse(s)
Description
Returns the datetime.datetime object denoted by string s, with very permissive (a.k.a. "fuzzy") parsing rules. For example:
from dateutil import parser print parser.parse("Saturday, January 28, 2006, at 11:15pm")
emits 2006-01-28 23:15:00.
relativedelta
relativedelta.relativedelta(...)
Description
You can call relativedelta with two instances of datetime.datetime: the resulting relativedelta instance captures the relative difference between the two arguments. Alternatively, you can call relativedelta with a keyword argument representing absolute information (year, month, day, hour, minute, second, microsecond); relative information (years, months, weeks, days, hours, minutes, seconds, microseconds), which may have positive or negative values; or the special keyword weekday, which can be a number from 0 (Monday) to 6 (Sunday), or one of the module attributes MO, TU,..., SU, which can also be called with a numeric argument n to specify the nth weekday. In any case, the resulting relativedelta instance captures the information in the call. For example, after:
from dateutil import relativedelta r = relativedelta.relativedelta(weekday=relativedelta.MO(1))
r means "next Monday." You can add a relativedelta instance to a datetime.datetime instance to get a new
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The sched Module
Inhaltsvorschau
The sched module implements an event-scheduler, letting you easily deal, along a single thread of execution, with events that may be scheduled in either a "real" or a "simulated" time-scale. sched supplies a scheduler class.
scheduler
class scheduler(timefunc,delayfunc)
Description
An instance s of scheduler holds two functions to use for all time-related operations. timefunc is callable without arguments to get the current time instant (in any unit of measure); for example, you can pass time.time. delayfunc is callable with one argument (a time duration, in the same units as timefunc) to delay the current thread for that time; for example, you can pass time.sleep. scheduler calls delayfunc (0) after each event to give other threads a chance; this is compatible with time.sleep. By taking functions as arguments, scheduler lets you use whatever "simulated time" or "pseudotime" fits your application's needs.
A scheduler instance s supplies the following methods.
cancel
s.cancel(event_token)
Description
Removes an event from s's queue. event_token must be the result of a previous call to s .enter or s .enterabs, and the event must not yet have happened; otherwise, cancel raises RuntimeError.
empty
s.empty( )
Description
Returns True if s's queue is empty; otherwise, False.
enterabs
s.enterabs(when,priority,func,args)
Description
Schedules a future event (a callback to func (* args )) at time when. when is in the units used by the time functions of s. If several events are scheduled for the same time, s executes them in increasing order of priority. enterabs returns an event token t, which you may pass to s .cancel to cancel this event.
enter
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The calendar Module
Inhaltsvorschau
The calendar module supplies calendar-related functions, including functions to print a text calendar for a given month or year. By default, calendar takes Monday as the first day of the week and Sunday as the last one. To change this, call calendar.setfirstweekday. calendar handles years in module time's range, typically 1970 to 2038. Module calendar supplies the following functions.
calendar
calendar(year,w=2,l=1,c=6)
Description
Returns a multiline string with a calendar for year year formatted into three columns separated by c spaces. w is the width in characters of each date; each line has length 21* w +18+2* c. l is the number of lines for each week.
firstweekday
firstweekday( )
Description
Returns the current setting for the weekday that starts each week. By default, when calendar is first imported, this is 0, meaning Monday.
isleap
isleap(year)
Description
Returns True if year is a leap year; otherwise, False.
leapdays
leapdays(y1,y2)
Description
Returns the total number of leap days in the years within range( y1 , y2 ).
month
month(year,month,w=2,l=1)
Description
Returns a multiline string with a calendar for month month of year year, one line per week plus two header lines. w is the width in characters of each date; each line has length 7* w +6. l is the number of lines for each week.
monthcalendar
monthcalendar(year,month)
Description
Returns a list of lists of ints. Each sublist denotes a week. Days outside month month of year year are set to 0; days within the month are set to their day-of-month, 1 and up.
monthrange
monthrange(year,month)
Description
Returns two integers. The first one is the code of the weekday for the first day of the month month in year year; the second one is the number of days in the month. Weekday codes are
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The mx.DateTime Module
Inhaltsvorschau
DateTime is one of the modules in the mx package made available by eGenix GmbH. mx is mostly open source, and, at the time of this writing, mx.DateTime has liberal license conditions similar to those of Python itself. mx.DateTime's popularity stems from its functional richness and cross-platform portability. I present only an essential subset of mx.DateTime's rich functionality here; the module comes with detailed documentation about its advanced time- and date-handling features.
Package DateTime supplies several date and time types whose instances are immutable (thus, suitable as dictionary keys). Type DateTime represents a time instant and includes an absolute date, the number of days since an epoch of January 1, year 1 CE, according to the Gregorian calendar (0001-01-01 is day 1), and an absolute time, a floating-point number of seconds since midnight. Type DateTimeDelta represents an interval of time, which is a floating-point number of seconds. Class RelativeDateTime lets you specify dates in relative terms, such as "next Monday" or "first day of next month." DateTime and DateTimeDelta are covered in detail (respectively in "The DateTime Type" on page 319 and "The DateTimeDelta Type" on page 324), but RelativeDateTime is not.
Date and time types supply customized string conversion, invoked via the built-in str or automatically during implicit conversion (e.g., in a print statement). The resulting strings are in standard ISO8601 formats, such as:
YYYY-MM-DD HH:MM:SS.ss
For finer-grained control of string formatting, use method strftime. Function DateTimeFrom constructs DateTime instances from strings. Various modules of package mx.DateTime also supply other formatting and parsing functions.
Module DateTime supplies factory functions to build instances of type DateTime, which in turn supply methods, attributes, and arithmetic operators.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 13: Controlling Execution
Inhaltsvorschau
Python documents (in other words, directly exposes and supports) many of the internal mechanisms it uses. This support may help you understand Python at an advanced level, and lets you hook your own code into such documented Python mechanisms and control them to some extent. For example, "Python built-ins" on page 141 covers the way Python arranges for built-ins to be made implicitly visible. This chapter covers other advanced Python control techniques, while Chapter 18 covers techniques that apply specifically to testing, debugging, and profiling.
Python's exec statement can execute code that you read, generate, or otherwise obtain during a program's run. exec dynamically executes a statement or a suite of statements. exec is a simple keyword statement with the following syntax:
execcode[ in globals

[,locals]]
code can be a string, an open file-like object, or a code object. globals and locals are dictionaries (in Python 2.4, locals can be any mapping, but globals must be specifically a dict; in Python 2.5, either or both can be any mapping). If both are present, they are the global and local namespaces in which code executes. If only globals is present, exec uses globals in the role of both namespaces. If neither globals nor locals is present, code executes in the current scope. Running exec in the current scope is a bad idea, since it can bind, rebind, or unbind any name. To keep things under control, use exec only with specific, explicit dictionaries.
Use exec only when it's really indispensable. Most often, it's best to avoid exec and choose more specific, well-controlled mechanisms instead: exec pries loose your control on your code's namespace, damages your program's performance, and exposes you to numerous, hard-to-find bugs.
For example, a frequently asked question about Python is "How do I set a variable whose name I just read or built?" Strictly speaking,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Dynamic Execution and the exec Statement
Inhaltsvorschau
Python's exec statement can execute code that you read, generate, or otherwise obtain during a program's run. exec dynamically executes a statement or a suite of statements. exec is a simple keyword statement with the following syntax:
execcode[ in globals

[,locals]]
code can be a string, an open file-like object, or a code object. globals and locals are dictionaries (in Python 2.4, locals can be any mapping, but globals must be specifically a dict; in Python 2.5, either or both can be any mapping). If both are present, they are the global and local namespaces in which code executes. If only globals is present, exec uses globals in the role of both namespaces. If neither globals nor locals is present, code executes in the current scope. Running exec in the current scope is a bad idea, since it can bind, rebind, or unbind any name. To keep things under control, use exec only with specific, explicit dictionaries.
Use exec only when it's really indispensable. Most often, it's best to avoid exec and choose more specific, well-controlled mechanisms instead: exec pries loose your control on your code's namespace, damages your program's performance, and exposes you to numerous, hard-to-find bugs.
For example, a frequently asked question about Python is "How do I set a variable whose name I just read or built?" Strictly speaking, exec lets you do this. For example, if the name of the variable you want to set is in varname, you might use:
execvarname+'=23'
Don't do this. An exec statement like this in current scope makes you lose control of your namespace, leading to bugs that are extremely hard to track, and more generally making your program unfathomably difficult to understand. An improvement is to keep the "variables" that you need to set not as variables, but as entries in a dictionary, say
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Internal Types
Inhaltsvorschau
Some of the internal Python objects that I mention in this section are hard to use. Using such objects correctly and to the best effect requires some study of Python's own C (or Java, or C#) sources. Such black magic is rarely needed, except to build general-purpose development frameworks and similar wizardly tasks. Once you do understand things in depth, Python empowers you to exert control if and when you need to. Since Python exposes many kinds of internal objects to your Python code, you can exert that control by coding in Python, even when a nodding acquaintance with C (or Java, or C#) is needed to read Python's sources in order to understand what is going on.
The built-in type named type acts as a factory object, returning objects that are types. Type objects don't have to support any special operations except equality comparison and representation as strings. However, most type objects are callable and return new instances of the type when called. In particular, built-in types such as int, float, list, str, tuple, set, and dict all work this way. The attributes of the types module are the built-in types, each with one or more names. For example, types.DictType and types.DictionaryType both refer to type({}), also known as dict. Besides being callable to generate instances, many type objects are useful because you can subclass them, as covered in "Classes and Instances" on page 82.
As well as by using built-in function compile, you can get a code object via the func_code attribute of a function or method object. (For the attributes of code objects, see "Compile and Code Objects" on page 329.) Code objects are not callable, but you can rebind the func_code attribute of a function object with the correct number of parameters in order to wrap a code object into callable form. For example:
def g(x): print 'g', x code_object = g.func_code def f(x): pass f.func_code = code_object f(23)     # emits
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Garbage Collection
Inhaltsvorschau
Python's garbage collection normally proceeds transparently and automatically, but you can choose to exert some direct control. The general principle is that Python collects each object x at some time after x becomes unreachable—that is, when no chain of references can reach x by starting from a local variable of a function instance that is executing, nor from a global variable of a loaded module.
Normally, an object x becomes unreachable when there are no references at all to x. In addition, a group of objects can be unreachable when they reference each other but no global nor local variables reference any of them, even indirectly (such a situation is known as a mutual reference loop).
Classic Python keeps with each object x a count, known as a reference count, of how many references to x are outstanding. When x's reference count drops to 0, CPython immediately collects x. Function getrefcount of module sys accepts any object and returns its reference count (at least 1, since getrefcount itself has a reference to the object it's examining). Other versions of Python, such as Jython or IronPython, rely on other garbage-collection mechanisms supplied by the platform they run on (e.g., the JVM or the MSCLR). Modules gc and weakref therefore apply only to CPython.
When Python garbage-collects x and there are no references at all to x, Python then finalizes x (i.e., calls x ._ _del_ _( )) and makes the memory that x occupied available for other uses. If x held any references to other objects, Python removes the references, which in turn may make other objects collectable by leaving them unreachable.
The gc module exposes the functionality of Python's garbage collector. gc deals only with objects that are unreachable in a subtle way, being part of mutual reference loops. In such a loop, each object in the loop refers to others, keeping the reference counts of all objects positive. However, no outside references to any one of the set of mutually referencing objects exist any longer. Therefore, the whole group, also known as cyclic garbage, is unreachable, and therefore garbage-collectable. Looking for such cyclic garbage loops takes some time, which is why module
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Termination Functions
Inhaltsvorschau
The atexit module lets you register termination functions (i.e., functions to be called at program termination: "last in, first out"). Termination functions are similar to clean-up handlers established by try/finally. However, termination functions are globally registered and get called at the end of the whole program, while clean-up handlers are established lexically and get called at the end of a specific try clause. Both termination functions and clean-up handlers are called whether the program terminates normally or abnormally, but not when the program ends specifically by calling os._exit. Module atexit supplies a single function called register.
register
register(func,*args,**kwds)
Description
Ensures that func(*args,**kwds) is called at program termination time.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Site and User Customization
Inhaltsvorschau
Python provides a specific "hook" to let each site customize some aspects of Python's behavior at the start of each run. Customization by each single user is not enabled by default, but Python specifies how programs that want to run user-provided code at startup can explicitly request such customization.
Python loads standard module site just before the main script. If Python is run with option -S, Python does not load site. -S allows faster startup but saddles the main script with initialization chores. site's tasks are:
  • Putting sys.path in standard form (absolute paths, no duplicates).
  • Interpreting each .pth file found in the Python home directory, adding entries to sys.path, and/or importing modules, as each .pth file indicates.
  • Adding built-ins used to display information in interactive sessions (quit, exit, copyright, credits, and license).
  • Setting the default Unicode encoding to 'ascii'. site's source code includes two blocks, each guarded by if 0:, one to set the default encoding to be locale-dependent, and the other to completely disable any default encoding between Unicode and plain strings. You may optionally edit site.py to select either block.
  • Trying to import sitecustomize (should import sitecustomize raise an ImportError exception, site catches and ignores it). sitecustomize is the module that each site's installation can optionally use for further site-specific customization beyond site's tasks. It is generally best not to edit site.py, since any Python upgrade or reinstallation might overwrite such customizations. sitecustomize's main task can be to set the correct default encoding for the site. A Western European site, for example, may choose to call from sitecustomize sys.setdefaultencoding('iso-8859-1').
  • After sitecustomize
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 14: Threads and Processes
Inhaltsvorschau
A thread is a flow of control that shares global state with other threads; all threads appear to execute simultaneously. Threads are not easy to master, but once you do master them, they may let you tackle some problems with a simpler architecture, and sometimes better performance, in comparison to traditional "single-thread" programming. This chapter covers the facilities that Python provides for dealing with threads, including the thread, threading, and Queue modules.
A process is an instance of a running program. Sometimes, particularly on Unix-like operating systems or on multiprocessor computers, you can get better results with multiple processes than with threads. The operating system protects processes from one another. Processes that want to communicate must explicitly arrange to do so via local inter-process communication (IPC). Processes may communicate via files (covered in Chapter 10) or via databases (covered in Chapter 11). In both cases, the general way in which processes communicate using such data storage mechanisms is that one process can write data, and another process can later read that data back. This chapter covers Python standard library module process; the process-related parts of module os, including simple IPC by means of pipes; and a cross-platform IPC mechanism known as memory-mapped files, which is supplied to Python programs by module mmap.
Network mechanisms are well suited for IPC, as they work between processes that run on different nodes of a network as well as those that run on the same node. (Chapter 20 covers low-level network mechanisms that provide a flexible basis for IPC.) Other, higher-level mechanisms, known as distributed computing, such as CORBA, DCOM/COM+, EJB, SOAP, XML-RPC, and .NET, make IPC easier, whether locally or remotely. However, distributed computing is not covered in this book.
Python offers multithreading on platforms that support threads, such as Win32, Linux, and most other variants of Unix. The Classic Python interpreter does not freely switch threads. In the current implementation, Python uses a global interpreter lock (GIL) to ensure that switching between threads happens only between bytecode instructions or when C code deliberately releases the GIL (Python's own C code releases the GIL around blocking I/O and sleep operations). An action is said to be
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Threads in Python
Inhaltsvorschau
Python offers multithreading on platforms that support threads, such as Win32, Linux, and most other variants of Unix. The Classic Python interpreter does not freely switch threads. In the current implementation, Python uses a global interpreter lock (GIL) to ensure that switching between threads happens only between bytecode instructions or when C code deliberately releases the GIL (Python's own C code releases the GIL around blocking I/O and sleep operations). An action is said to be atomic if it's guaranteed that no thread switching within Python's process occurs between the start and the end of the action. In practice, in the current implementation, operations that look atomic (such as simple assignments and accesses) actually are atomic when executed on objects of built-in types (augmented and multiple assignments, however, are not atomic). In general, though, it is not a good idea to rely on atomicity. For example, you never know when you might be dealing with a derived class rather than an object of a built-in type, meaning there might be callbacks to Python code, and any assumptions of atomicity would then prove unwarranted. Moreover, relying on implementation-depended atomicity would lock your code into a specific implementation and might preclude future upgrades. A much better strategy is to use the synchronization facilities covered in the rest of this chapter.
Python offers multithreading in two different flavors. An older and lower-level module, thread, offers a bare minimum of functionality and is not recommended for direct use by your code. The higher-level module threading, built on top of thread, was loosely inspired by Java's threads and is the recommended tool. The key design issue in multithreading systems is most often how best to coordinate multiple threads. threading therefore supplies several synchronization objects. Module
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The thread Module
Inhaltsvorschau
The only part of the thread module that your code should use directly is the lock objects that module thread supplies. Locks are simple thread-synchronization primitives. Technically, thread's locks are non-reentrant and unowned: they do not keep track of which thread last locked them, so there is no specific owner thread for a lock. A lock is in one of two states, locked or unlocked.
To get a new lock object (in the unlocked state), call the factory function named allocate_lock without arguments. This function is supplied by both modules thread and threading. A lock object L supplies three methods.
acquire
L.acquire(wait=True)
Description
When wait is True, acquire locks L. If L is already locked, the calling thread suspends and waits until L is unlocked, then locks L. Even if the calling thread was the one that last locked L, it still suspends and waits until another thread releases L. When wait is False and L is unlocked, acquire locks L and returns True. When wait is False and L is locked, acquire does not affect L and returns False.
locked
L.locked( )
Description
Returns True if L is locked; otherwise, False.
release
L.release( )
Description
Unlocks L, which must be locked. When L is locked, any thread may call L .release, not just the thread that last locked L. When more than one thread is waiting on L (i.e., has called L .acquire, finding L locked, and is now waiting for L to be unlocked), release wakes up an arbitrary waiting thread. The thread that calls release is not suspended: it remains ready and continues to execute.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Queue Module
Inhaltsvorschau
The Queue module supplies first-in, first-out (FIFO) queues that support multithread access, with one main class and two exception classes.
Queue
class Queue(maxsize=0)
Description
Queue is the main class for module Queue and is covered in "Methods of Queue Instances" on page 343. When maxsize is greater than 0, the new Queue instance q is deemed full when q has maxsize items. A thread inserting an item with the block option, when q is full, suspends until another thread extracts an item. When maxsize is less than or equal to 0, q is never considered full and is limited in size only by available memory, like normal Python containers.
Empty
Empty is the class of the exception that q.get(False) raises when q is empty.
Description
Full
Full is the class of the exception that q.put(x,False) raises when q is full.
Description
An instance q of class Queue supplies the following methods.
empty
q.empty( )
Description
Returns True if q is empty; otherwise, False.
full
q.full( )
Description
Returns True if q is full; otherwise, False.
get, get_nowait
q.get(block=True,timeout=None)
Description
When block is False, get removes and returns an item from q if one is available; otherwise, get raises Empty. When block is True and timeout is None, get removes and returns an item from q, suspending the calling thread, if need be, until an item is available. When block is True and timeout is not None, timeout must be a number >=0 (which may include a fractional part to specify a fraction of a second), and get waits for no longer than timeout seconds (if no item is yet available by then,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The threading Module
Inhaltsvorschau
The threading module is built on top of module thread and supplies multithreading functionality in a more usable, higher-level form. The general approach of threading is similar to that of Java, but locks and conditions are modeled as separate objects (in Java, such functionality is part of every object), and threads cannot be directly controlled from the outside (which means there are no priorities, groups, destruction, or stopping). All methods of objects supplied by threading are atomic.
threading provides numerous classes for dealing with threads, including Thread, Condition, Event, RLock, and Semaphore. Besides factory functions for the classes detailed in the following sections, threading supplies the currentThread factory function.
currentThread
currentThread( )
Description
Returns a Thread object for the calling thread. If the calling thread was not created by module threading, currentThread creates and returns a semi-dummy Thread object with limited functionality.
A Thread object t models a thread. You can pass t's main function as an argument when you create t, or you can subclass Thread and override the run method (you may also override _ _init_ _ but should not override other methods). t is not ready to run when you create it; to make t ready (active), call t .start( ). Once t is active, it terminates when its main function ends, either normally or by propagating an exception. A Thread t can be a daemon, meaning that Python can terminate even if t is still active, while a normal (nondaemon) thread keeps Python alive until the thread terminates. Class Thread exposes the following constructor and methods.
Thread
class Thread(name=None,target=None,args=( ),kwargs={ })
Description
Always call Thread with named arguments. The number and order of formal arguments may change in the future, but the names of existing arguments are guaranteed to stay. When you instantiate class
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Threaded Program Architecture
Inhaltsvorschau
A threaded program should always arrange for a single thread to deal with any given object or subsystem that is external to the program (such as a file, a database, a GUI, or a network connection). Having multiple threads that deal with the same external object can often cause unpredictable problems.
Whenever your threaded program must deal with some external object, devote a thread to such dealings using a Queue object from which the external-interfacing thread gets work requests that other threads post. The external-interfacing thread can return results by putting them on one or more other Queue objects. The following example shows how to package this architecture into a general, reusable class, assuming that each unit of work on the external subsystem can be represented by a callable object:
import threading, Queue class ExternalInterfacing(Threading.Thread):

    def _ _init_ _(self, externalCallable, **kwds):

        Threading.Thread._ _init_ _(self, **kwds)

        self.setDaemon(1)

        self.externalCallable = externalCallable

        self.workRequestQueue = Queue.Queue( )

        self.resultQueue = Queue.Queue( )

        self.start( )

    def request(self, *args, **kwds):

        "called by other threads as externalCallable would be"

        self.workRequestQueue.put((args,kwds))

        return self.resultQueue.get( )

    def run(self):

        while 1:

            args, kwds = self.workRequestQueue.get( )

            self.resultQueue.put(self.externalCallable(*args, **kwds))
Once some ExternalInterfacing object ei is instantiated, all other threads may call ei .request just as they would call someExternalCallable without such a mechanism (with or without arguments as appropriate). The advantage of the ExternalInterfacing mechanism is that all calls upon someExternalCallable are now serialized. This means they are performed by just one thread (the thread object bound to ei) in some defined sequential order, without overlap, race conditions (hard-to-debug errors that depend on which thread happens to get there first), or other anomalies that might otherwise result.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Process Environment
Inhaltsvorschau
The operating system supplies each process P with an environment, which is a set of environment variables whose names are identifiers (most often, by convention, uppercase identifiers) and whose contents are strings. In "Environment Variables" on page 22, I covered environment variables that affect Python's operations. Operating system shells offer ways to examine and modify the environment via shell commands and other means mentioned in "Environment Variables" on page 22.
The environment of any process P is determined when P starts. After startup, only P itself can change P's environment. Nothing that P does affects the environment of P's parent process (the process that started P), nor of those of child processes previously started from P and now running, nor of processes unrelated to P. Changes to P's environment affect only P itself: the environment is not a means of IPC. Child processes of P normally get a copy of P's environment as their starting environment. In this sense, changes to P's environment do affect child processes that P starts after such changes.
Module os supplies attribute environ, which is a mapping that represents the current process's environment. os.environ is initialized from the process environment when Python starts. Changes to os.environ update the current process's environment if the platform supports such updates. Keys and values in os.environ must be strings. On Windows, but not on Unix-like platforms, keys into os.environ are implicitly uppercased. For example, here's how to try to determine which shell or command processor you're running under:
import os shell = os.environ.get('COMSPEC')

if shell is None: shell = os.environ.get('SHELL')

if shell is None: shell = 'an unknown command processor'

print 'Running under', shell
When a Python program changes its environment (e.g., via
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Running Other Programs
Inhaltsvorschau
You can run other programs via functions in the os module or, in Python 2.4, by using the new subprocess module.
In Python 2.4, the best way for your program to run other processes is with the new subprocess module, covered in "The Subprocess Module" on page 358. However, the os module also offers several ways to do this, which in some cases may be simpler or allow your code to remain backward-compatible to older versions of Python.
The simplest way to run another program is through function os.system, although this offers no way to control the external program. The os module also provides a number of functions whose names start with exec. These functions offer fine-grained control. A program run by one of the exec functions replaces the current program (i.e., the Python interpreter) in the same process. In practice, therefore, you use the exec functions mostly on platforms that let a process duplicate itself by fork (i.e., Unix-like platforms). os functions whose names start with spawn and popen offer intermediate simplicity and power: they are cross-platform and not quite as simple as system, but simple and usable enough for most purposes.
The exec and spawn functions run a specified executable file, given the executable file's path, arguments to pass to it, and optionally an environment mapping. The system and popen functions execute a command, which is a string passed to a new instance of the platform's default shell (typically /bin/sh on Unix; command.com or cmd.exe on Windows). A command is a more general concept than an executable file, as it can include shell functionality (pipes, redirection, built-in shell commands) using the normal shell syntax specific to the current platform.
execl, execle, execlp, execv, execve, execvp, execvpe
execl(path,*args) execle(path,*args) execlp(path,*args) execv(path,args
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The mmap Module
Inhaltsvorschau
The mmap module supplies memory-mapped file objects. An mmap object behaves similarly to a plain (not Unicode) string, so you can often pass an mmap object where a plain string is expected. However, there are differences:
  • An mmap object does not supply the methods of a string object.
  • An mmap object is mutable, while string objects are immutable.
  • An mmap object also corresponds to an open file and behaves polymorphically to a Python file object (as covered in "File-Like Objects and Polymorphism" on page 222).
An mmap object m can be indexed or sliced, yielding plain strings. Since m is mutable, you can also assign to an indexing or slicing of m. However, when you assign to a slice of m, the righthand side of the assignment statement must be a string of exactly the same length as the slice you're assigning to. Therefore, many of the useful tricks available with list slice assignment (covered in "Modifying a list" on page 56) do not apply to mmap slice assignment.
Module mmap supplies a factory function that is slightly different on Unix-like systems and Windows.
mmap
mmap(filedesc,length,tagname='') # Windows mmap(filedesc,length,flags=MAP_SHARED, prot=PROT_READ|PROT_WRITE) # Unix
Description
Creates and returns an mmap object m that maps into memory the first length bytes of the file indicated by file descriptor filedesc. filedesc must normally be a file descriptor opened for both reading and writing (except, on Unix-like platforms, when argument prot requests only reading or only writing). (File descriptors are covered in "File Descriptor Operations" on page 253.) To get an mmap object m that refers to a Python file object f, use m =mmap.mmap( f .fileno( ), length ).
On Windows, all memory mappings are readable and writable and shared between processes so that all processes with a memory mapping on a file can see changes made by each such process. On Windows only, you can pass a string
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 15: Numeric Processing
Inhaltsvorschau
You can perform some numeric computations with operators (covered in "Numeric Operations" on page 52) and built-in functions (covered in "Built-in Functions" on page 158). Python also provides modules that support additional numeric computation functionality, as documented in this chapter: math, cmath in "The math and cmath Modules" on page 365, operator in "The operator Module" on page 368, random in "The random module" on page 370, and decimal in "The decimal Module" on page 372. "The gmpy Module" on page 373 also covers third-party module gmpy, which further extends Python's numeric computation abilities. Numeric processing often requires, more specifically, the processing of arrays of numbers, covered in Chapter 16.
The math module supplies mathematical functions on floating-point numbers, while the cmath module supplies equivalent functions on complex numbers. For example, math.sqrt(-1) raises an exception, but cmath.sqrt(-1) returns 1j.
Each module exposes two attributes of type float bound to the values of fundamental mathematical constants, pi and e, and the following functions.
acos
acos(x)
Returns the arccosine of x in radians.
math and cmath
acosh
acosh(x)
Returns the arc hyperbolic cosine of x in radians.
cmath only
asin
asin(x)
Returns the arcsine of x in radians.
math and cmath
asinh
asinh(x)
Returns the arc hyperbolic sine of x in radians.
cmath only
atan
atan(x)
Returns the arctangent of x in radians.
math and cmath
atanh
atanh(x)
Returns the arc hyperbolic tangent of x in radians.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The math and cmath Modules
Inhaltsvorschau
The math module supplies mathematical functions on floating-point numbers, while the cmath module supplies equivalent functions on complex numbers. For example, math.sqrt(-1) raises an exception, but cmath.sqrt(-1) returns 1j.
Each module exposes two attributes of type float bound to the values of fundamental mathematical constants, pi and e, and the following functions.
acos
acos(x)
Returns the arccosine of x in radians.
math and cmath
acosh
acosh(x)
Returns the arc hyperbolic cosine of x in radians.
cmath only
asin
asin(x)
Returns the arcsine of x in radians.
math and cmath
asinh
asinh(x)
Returns the arc hyperbolic sine of x in radians.
cmath only
atan
atan(x)
Returns the arctangent of x in radians.
math and cmath
atanh
atanh(x)
Returns the arc hyperbolic tangent of x in radians.
cmath only
atan2
atan2(y,x)
Like atan( y / x ), except that atan2 properly takes into account the signs of both arguments. For example:
>>> import math

>>> math.atan(-1./-1.)

0.78539816339744828

>>> math.atan2(-1., -1.)

-2.3561944901923448
Also, when x equals 0, atan2 returns pi/2, while dividing by x would raise ZeroDivisionError.
math only
ceil
ceil(x)
Returns float( i ), where i is the lowest integer such that i >= x.
math only
cos
cos(x)
Returns the cosine of x in radians.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The operator Module
Inhaltsvorschau
The operator module supplies functions that are equivalent to Python's operators. These functions are handy in cases where callables must be stored, passed as arguments, or returned as function results. The functions in operator have the same names as the corresponding special methods (covered in "Special Methods" on page 104). Each function is available with two names, with and without leading and trailing double underscores (e.g., both operator.add( a , b ) and operator._ _add_ _( a , b ) return a + b). Table 15-1 lists the functions supplied by the operator module.
Table 15-1: Functions supplied by the operator module
Method
Signature
Behaves like
abs
abs(a)
abs( a )
add
add(a,b)
a + b
and_
and_(a,b)
a & b
concat
concat(a,b)
a + b
contains
contains(a,b)
b in a
countOf
countOf(a,b)
a .count( b )
delitem
delitem(a,b)
del a [ b ]
delslice
delslice(a,b,c)
del a [ b : c ]
div
div(a,b)
a / b
eq
eq(a,b)
a == b
floordiv
floordiv(a,b)
a // b
ge
ge(a,b)
a >= b
getitem
getitem(a,b)
a [ b ]
getslice
getslice(a,b,c)
a [ b : c ]
gt
gt(a,b)
a > b
indexOf
indexOf(a,b)
a .index( b )
invert, inv
invert( a ), inv ( a )
~ a
is
is( a , b )
a is b
is_not
is_not( a , b )
a is not b
le
le( a,b )
a <= b
lshift
lshift(
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Random and Pseudorandom Numbers
Inhaltsvorschau
The random module of the standard Python library generates pseudorandom numbers with various distributions. The underlying uniform pseudorandom generator uses the Mersenne Twister algorithm, with a period of length 2**19937-1. (Older versions of Python used the Whichmann-Hill algorithm, with a period of length 6,953,607,871,644.)

Section 15.3.1.1: Physically random and cryptographically strong random numbers

Pseudorandom numbers provided by module random, while very good, are not of cryptographic quality. If you want higher-quality random numbers (ideally, physically generated random numbers rather than algorithmically generated pseudorandom numbers), in Python 2.4, you can call os.urandom (from module os, not random).
urandom
urandom(n)
Description
Returns n random bytes, read from physical sources of random bits such as /dev/urandom on recent Linux releases or from cryptographical-strength sources such as the CryptGenRandom API on Windows. If no suitable source exists on the current system, urandom raises NotImplementedError.
For an alternative source of physically random numbers, see http://www.fourmilab.ch/hotbits.

Section 15.3.1.2: The random module

All functions of module random are methods of one hidden global instance of class random.Random. You can instantiate Random explicitly to get multiple generators that do not share state. Explicit instantiation is advisable if you require random numbers in multiple threads (threads are covered in Chapter 14). This section documents the most frequently used functions exposed by module random.
choice
choice(seq)
Description
Returns a random item from nonempty sequence seq.
getrandbits
getrandbits(k)
Description
Returns a nonnegative long integer with k random bits, like choice(xrange(2** k )) (but much faster and with no problems for large
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The decimal Module
Inhaltsvorschau
A Python float is a binary floating-point number, normally in accordance with the standard known as IEEE 754 and implemented in hardware in modern computers. A concise, practical introduction to floating-point arithmetic and its issues can be found in David Goldberg's essay "What Every Computer Scientist Should Know about Floating-Point Arithmetic," at http://docs.sun.com/source/806-3568/ncg_goldberg.html. Often, particularly for money-related computations, you may prefer to use decimal floating-point numbers; Python 2.4 supplies an implementation of the standard known as IEEE 854, for base 10, in standard library module decimal. At http://docs.python.org/lib/module-decimal.html, you can find complete reference documentation, pointers to the applicable standards, a tutorial, and an advocacy for decimal. Here, I cover only a small subset of decimal's functionality that corresponds to the most frequently used parts of the module.
Module decimal supplies a class Decimal whose immutable instances are decimal numbers, exception classes, and classes and functions to deal with the arithmetic context, which specifies such things as precision, rounding, and which computational anomalies (such as division by zero, overflow, underflow, and so on) will raise exceptions if they occur. In the default context, precision is 28 decimal digits, rounding is "half-even" (round results to the closest representable decimal number: when a result is exactly halfway between two such numbers, round to the one whose last digit is even), and anomalies that raise exceptions are invalid operation, division by zero, and overflow.
To build a decimal number, call decimal.Decimal with one argument: an integer or a string. If you start with a float, you must pass to Decimal a string form of that float to control all the digits involved. For example, decimal.Decimal(0.1) is an error; use
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The gmpy Module
Inhaltsvorschau
The gmpy module (http://gmpy.sourceforge.net) wraps the GMP library (http://www.swox.com/gmp/) to extend and accelerate Python's abilities for multiple-precision arithmetic, or arithmetic in which the precision of the numbers involved is bounded only by the amount of memory available. Python "out of the box" supplies multiple-precision arithmetic for integers through the built-in type long, covered in Chapter 4; gmpy supplies another integer-number type, named mpz, which affords even faster operations than Python's built-in long, and other functions and methods for a vast variety of fast number-theoretical computations (Fibonacci numbers, factorials, binomial coefficients, probabilistic determination of primality, etc.) and bit-string operations. gmpy also supplies a rational-number type (named mpq), a floating-point-number type with arbitrary precision (named mpf), and fast random-number generators.
gmpy's reference documentation is part of the gmpy-sources tarball, which you can download from http://sourceforge.net/projects/gmpy/; on the same page, you will find precompiled, ready-to-install downloads for Windows and Mac OS X 10.4 versions of Python (2.3 and 2.4 at the time of this writing). You need to download and unpack the sources package anyway, even if you're installing a precompiled version, because the sources package is the only one that includes the documentation (which you'll find in subdirectory doc once you have unpacked the tarball). Further, in subdirectory test there are over 1,000 unit tests to verify the complete correctness of your installation (run python test/gmpy_test.py, at a command prompt in the directory into which you've unpacked the tarball, to run all tests; it will take only a few seconds) and a few timing examples to show you the performance, on your specific machine, of gmpy types compared to Python's built-in ones. For example, on my laptop, by running
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 16: Array Processing
Inhaltsvorschau
You can represent arrays with lists (covered in "Lists" on page 43), as well as with the array standard library module (covered in "The array Module" on page 375). You can manipulate arrays with loops; list comprehensions; iterators; generators; genexps (all covered in Chapter 4); built-ins such as map, reduce, and filter (all covered in "Built-in Functions" on page 158); and standard library modules such as itertools (covered in "The itertools Module" on page 183). However, to process large arrays of numbers, such functions may be slower and less convenient than extensions such as Numeric, numarray, and numpy (covered in "Extensions for Numeric Array Computation" on page 377).
The array module supplies a type, also called array, whose instances are mutable sequences, like lists. An array a is a one-dimensional sequence whose items can be only characters, or only numbers of one specific numeric type, fixed when you create a.
array.array's main advantage is that, compared to a list, it can save memory to hold objects all of the same (numeric or character) type. An array object a has a one-character, read-only attribute a .typecode, which is set on creation and gives the type of a's items. Table 16-1 shows the possible typecodes for array.
Table 16-1: Typecodes for the array module
Typecode
C type
Python type
Minimum size
'c'
char
str (length 1)
1 byte
'b'
char
int
1 byte
'B'
unsigned char
int
1 byte
'U'
unicode char
unicode (lenth 1)
2 bytes
'h'
short
int
2 bytes
'H'
unsigned short
int
2 bytes
'i'
int
int
2 bytes
'I'
unsigned
long
2 bytes
'l'
long
int
4 bytes
'L'
unsigned long
long
4 bytes
'f'
float
float
4 bytes
'd'
double
float
8 bytes
The size in bytes of each item may be larger than the minimum, depending on the machine's architecture, and is available as the read-only attribute a .itemsize. Module array supplies just the type object called array.
array
array(typecode,init='')
Description
Creates and returns an
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The array Module
Inhaltsvorschau
The array module supplies a type, also called array, whose instances are mutable sequences, like lists. An array a is a one-dimensional sequence whose items can be only characters, or only numbers of one specific numeric type, fixed when you create a.
array.array's main advantage is that, compared to a list, it can save memory to hold objects all of the same (numeric or character) type. An array object a has a one-character, read-only attribute a .typecode, which is set on creation and gives the type of a's items. Table 16-1 shows the possible typecodes for array.
Table 16-1: Typecodes for the array module
Typecode
C type
Python type
Minimum size
'c'
char
str (length 1)
1 byte
'b'
char
int
1 byte
'B'
unsigned char
int
1 byte
'U'
unicode char
unicode (lenth 1)
2 bytes
'h'
short
int
2 bytes
'H'
unsigned short
int
2 bytes
'i'
int
int
2 bytes
'I'
unsigned
long
2 bytes
'l'
long
int
4 bytes
'L'
unsigned long
long
4 bytes
'f'
float
float
4 bytes
'd'
double
float
8 bytes
The size in bytes of each item may be larger than the minimum, depending on the machine's architecture, and is available as the read-only attribute a .itemsize. Module array supplies just the type object called array.
array
array(typecode,init='')
Description
Creates and returns an array object a with the given typecode. init can be a plain string whose length is a multiple of itemsize; the string's bytes, interpreted as machine values, directly initialize a's items. Alternatively, init can be any iterable (of characters when typecode is 'c', otherwise of numbers): each item of the iterable initializes one item of a.
Array objects expose all the methods and operations of mutable sequences (as covered in "Sequence Operations" on page 53), except method sort. Concatenation with + or +=, and assignment to slices, require both operands to be arrays with the same typecode; in contrast, the argument to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extensions for Numeric Array Computation
Inhaltsvorschau
From http://sourceforge.net/project/showfiles.php?group_id=1369, you can freely download any of three extension packages that are compatible with each other: Numeric, Numarray, and NumPy. Each is available either as source code (easy to build and install on many platforms) or as a pre-built self-installing .exe file for Windows; some are also available in other pre-built forms, such as .rpm files for Linux or .dmg files for Apple Mac OS X. From the same URL, you can also download an extensive tutorial on Numeric and find links to other resources, such as bug trackers, mailing lists, and the Python Scientific Computing home page (http://numeric.scipy.org/).
Each of these extensions focuses on processing large arrays of numbers, which are often multidimensional (such as matrices). High-performance support for advanced computations such as linear algebra, Fast Fourier Transforms, and image processing, is supplied by many auxiliary modules, some of which come with the extension itself, while others can be downloaded separately from other sites. Each of the extensions is a large, rich package. For a fuller understanding, study the tutorial, work through the examples, and experiment interactively. This chapter presents a reference to an essential subset of Numeric on the assumption that you already have some grasp of array manipulation and numeric computing issues. If you are unfamiliar with this subject, the Numeric tutorial can help.
Numeric is not under active development anymore; it is widely considered "stable" by its users and "old" by its detractors. numarray is newer and richer, still under active development, and well documented and supported at its home site, http://www.stsci.edu/resources/software_hardware/numarray, where you will also find pointers to abundant, excellent documentation. NumPy is newest, richest, and under very active development (not quite up to a stable 1.0 release at the time of this writing); in the future, as it matures, you can confidently expect that
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Numeric Package
Inhaltsvorschau
The main module in the Numeric package is the Numeric module, which supplies the array type, functions that act on array instances, and so-called "universal functions" that operate on arrays and other sequences. Numeric is one of the few Python packages that is often used with the idiom from Numeric import *, even though that idiom does give occasional problems even in this case. A popular alternative, probably the best compromise between conciseness and clarity, is to import Numeric with a short name (e.g., import Numeric as N) and qualify each name by preceding it with N.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Array Objects
Inhaltsvorschau
Numeric supplies a type array that represents a grid of items. An array object a has a given number of dimensions, known as its rank, up to some arbitrarily high limit (normally 30, when Numeric is built with default options). A scalar (i.e., a single number) has rank 0, a vector has rank 1, a matrix has rank 2, and so forth.
The values in the grid cells of an array object, known as the elements of the array, are homogeneous, meaning they are all of the same type, and all element values are stored within one memory area. This contrasts with a list, where items may be of different types, each stored as a separate Python object. This means a Numeric array occupies far less memory than a Python list with the same number of items. The type of a's elements is encoded as a's typecode, a one-character string, as shown in Table 16-2. Factory functions that build array instances (covered in "Factory Functions" on page 384) take a typecode argument that is one of the values in Table 16-2.
Table 16-2: Typecodes for Numeric arrays
Typecode
C type
Python type
Synonym
'c'
char
str (length 1)
Character
'b'
unsigned char
int
UnsignedInt8
'1'
signed char
int
Int8
's'
short
int
Int16
'w'
unsigned short
int
UnsignedInt16
'i'
int
int
Int32
'u'
unsigned
int
UnsignedInt32
'l'
long
int
Int
'f'
float
float
Float32
'F'
Two floats
complex
Complex32
'd'
double
float
Float
'D'
Two doubles
complex
Complex
'O'
PyObject*
any
PyObject
Numeric supplies readable attribute names for each typecode, as shown in the last column of Table 16-2. Numeric also supplies, on all platforms, the names Int0, Float0, Float8, Float16, Float64, Complex0, Complex8, Complex16, and Complex64. In each case, the name refers to the smallest type of the requested kind with at least that many bits. For example, Float8 is the smallest floating-point type of at least 8 bits (generally the same as
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Universal Functions (ufuncs)
Inhaltsvorschau
Numeric supplies named functions with the same semantics as Python's arithmetic, comparison, and bitwise operators, and mathematical functions like those supplied by built-in modules math and cmath (covered in "The math and cmath Modules" on page 365), such as sin, cos, log, and exp.
These functions are objects of type ufunc (which stands for "universal function") and share several traits in addition to those they have in common with array operators (element-wise operation, broadcasting, coercion). Every ufunc instance u is callable, is applicable to sequences as well as to arrays, and accepts an optional output argument. If u is binary (i.e., if u accepts two operand arguments), u also has four callable attributes, named u .accumulate, u .outer, u .reduce, and u .reduceat. The ufunc objects supplied by Numeric apply only to arrays with numeric typecodes (i.e., not to arrays with typecode 'O' or 'c') and Python sequences of numbers.
When you start with a list L, it's faster to call u directly on L rather than to convert L to an array. u's return value is an array a; you can perform further computation, if any, on a; if you need a list result, convert the resulting array to a list at the end by calling method tolist. For example, say you must compute the logarithm of each item of a list and return another list. On my laptop, with N set to 2222, a list comprehension such as:
def logsupto(N):

    return [math.log(x) for x in range(2,N)]
takes about 5.2 milliseconds. Using Python's built-in map:
def logsupto(N):

    return map(math.log, range(2,N))
is faster, about 3.7 milliseconds. Using Numeric's ufunc named log:
def logsupto(N):

    return Numeric.log(Numeric.arange(2,N)).tolist( )
reduces the time to about 2.1 milliseconds. Taking some care to exploit the output argument to the log ufunc:
def logsupto(N):

    temp = Numeric.arange(2, N, typecode=Numeric.Float)

    Numeric.log(temp, output=temp)

    return temp.tolist( )
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Auxiliary Numeric Modules
Inhaltsvorschau
Many other modules are built on top of Numeric or cooperate with it. Some of these extra modules are included in the Numeric package, and their documentation is also part of Numeric's documentation. A rich collection of scientific and engineering computing tools that work with Numeric is available at http://www.scipy.org; have a look at it if you are using Python for any kind of scientific or engineering computing.
Here are the key auxiliary modules that come with Numeric:
MLab
MLab supplies many Python functions, written on top of Numeric, but is similar in name and operation to functions supplied by the product Matlab.
FFT
FFT supplies Python-callable Fast Fourier Transforms (FFTs) of data held in Numeric arrays. FFT can wrap either the well-known FFTPACK Fortran-coded library or the compatible C-coded fftpack library, which comes with FFT.
LinearAlgebra
LinearAlgebra supplies Python-callable functions, operating on data held in Numeric arrays, wrapping either the LAPACK Fortran-coded library or the compatible C-coded lapack_lite library. LinearAlgebra lets you invert matrices, solve linear systems, compute eigenvalues and eigenvectors, perform singular value decomposition, and least-squares-solve overdetermined linear systems.
RandomArray
RandomArray supplies fast, high-quality pseudorandom number generators to build Numeric arrays with various random distributions.
MA
MA supports masked arrays (i.e., arrays that can have missing or invalid values). MA supplies a large subset of Numeric's functionality, albeit sometimes at reduced speed. MA also lets you associate to each array an optional mask, which is an auxiliary array of Booleans, where True indicates array elements that are missing, unknown, or invalid. Computations propagate masks; you can turn masked arrays into plain
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 17: Tkinter GUIs
Inhaltsvorschau
Most professional client-side applications interact with the user through a graphical user interface (GUI). A GUI is programmed through a toolkit, which is a library that supplies controls (also known as widgets), visible objects such as buttons, labels, text entry fields, and menus. A GUI toolkit lets you compose controls into a coherent whole, display them on-screen, and interact with the user, receiving input via keyboard and mouse.
Python gives you a choice among many GUI toolkits. Some are platform-specific, but most are cross-platform, supporting at least Windows and Unix-like platforms and often the Mac as well. http://wiki.python.org/moin/GuiProgramming lists dozens of GUI toolkits for Python. The most popular Python GUI toolkit today is probably wxPython (http://www.wxpython.org/), but the one distributed with Python itself is Tkinter.
Tkinter is an object-oriented Python wrapper around the cross-platform toolkit Tk, which is also used with other scripting languages such as Tcl (for which it was originally developed), Ruby, and Perl. Tkinter, like the underlying Tcl/Tk, runs on Windows, Macintosh, and Unix-like platforms. On Windows, the standard Python distribution also includes, as well as Tkinter itself, the Tcl/Tk library needed to run Tkinter. On other platforms, you may have to obtain and install Tcl/Tk separately (you may also have to install or reinstall Python after Tcl/Tk, depending on the Python distribution's details).
This chapter covers an essential subset of Tkinter that is sufficient to build simple graphical frontends for Python applications. (More complete documentation is available at http://docs.python.org/lib/tkinter.html.) All the scripts in this chapter are meant to be run standalone (i.e., from a command line, or in a platform-dependent way, such as by double-clicking on a script's icon). Running a GUI script from inside another program that has its own GUI, such as a Python integrated development environment (e.g., IDLE or PythonWin), can often cause anomalies. This can be a particular problem when the GUI script attempts to terminate (and thus close down the GUI), since the script's GUI and the development environment's GUI may interfere with each other.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Tkinter Fundamentals
Inhaltsvorschau
Tkinter makes it easy to build simple GUI applications. You import Tkinter, create, configure, and position the widgets you want, then enter the Tkinter main loop. Your application becomes event-driven: the user interacts with the widgets, causing events, and your application responds via the functions you installed as handlers for these events.
The following example shows a simple application that exhibits this general structure:
import sys, Tkinter Tkinter.Label(text="Welcome!").pack( )

Tkinter.Button(text="Exit", command=sys.exit).pack( )

Tkinter.mainloop( )
The calls to Label and Button create the respective widgets and return them as results. Since we specify no parent windows, Tkinter puts the widgets directly in the application's main window. The named arguments specify each widget's configuration. In this simple case, we don't need to bind variables to the widgets. We just call the pack method on each widget, handing control of the widget's layout to a layout manager object known as the packer. A layout manager is an invisible component whose job is to position widgets within other widgets (known as container or parent widgets), handling geometrical layout issues. The previous example passes no arguments to control the packer's operation, which lets the packer operate in a default way.
When the user clicks on the button, the command callable of the Button widget executes without arguments. The example passes function sys.exit as the argument named command when it creates the Button. Therefore, when the user clicks on the button, sys.exit( ) executes and terminates the application (as covered in exit on page 169).
After creating and packing the widgets, the example calls Tkinter's mainloop function, and thus enters the Tkinter main loop and becomes event-driven. Since the only event for which the example installs a handler is a click on the button, nothing happens from the application's viewpoint until the user clicks the button. Meanwhile, however, the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Widget Fundamentals
Inhaltsvorschau
Tkinter supplies many kinds of widgets, and most of them have several things in common. All widgets are instances of classes that inherit from class Widget. Class Widget itself is abstract; that is, you never instantiate Widget itself. You only instantiate concrete subclasses corresponding to specific kinds of widgets. Class Widget's functionality is common to all the widgets you instantiate.
To instantiate any kind of widget, call the widget's class. The first argument is the parent window of the widget, also known as the widget's master. If you omit this positional argument, the widget's master is the application's main window. All other arguments are in named form: option = value. To set or change options on an existing widget w, call w .config( option = value ). To get an option of w call w .cget(' option '), which returns the option's value. Each widget w is a mapping, so you can also get an option as w [' option '], and set or change it with w [' option ']= value.
Many widgets accept some common options. Some options affect a widget's colors, while others affect lengths (normally in pixels), and there are various other kinds. This section details the most commonly used options.

Section 17.2.1.1: Color options

Tkinter represents colors with strings. The string can be a color name, such as 'red' or 'orange', or it may be of the form '# RRGGBB ', where each R, G, and B are hexadecimal digits that represent a color by the values of red, green, and blue components on a scale of 0 to 255. Don't worry if your screen can't display millions of different colors, as implied by this scheme: Tkinter maps any requested color to the closest color that your screen can display. The common color options are:
activebackground
Background color for the widget when the widget is
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Commonly Used Simple Widgets
Inhaltsvorschau
The Tkinter module provides a number of simple widgets that cover most of the needs of simple GUI applications. This section documents the Button, Checkbutton, Entry, Label, Listbox, Radiobutton, Scale, and Scrollbar widgets.
Class Button implements a pushbutton, which the user clicks to execute an action. Instantiate Button with option text= somestring to let the button show text, or image= imageobject to let the button show an image. You normally also use command= callable to have callable execute without arguments when the user clicks the button. callable can be a function, a bound method of an object, an instance of a class with a _ _call_ _ method, or a lambda.
Besides methods common to all widgets, an instance b of class Button supplies two button-specific methods.
flash
b.flash( )
Description
Draws the user's attention to button b by redrawing b a few times, alternatively in normal and active states.
invoke
b.invoke( )
Description
Calls without arguments the callable object that is b's command option, just like b .cget('command')( ). This can be handy when, within some other action, you want the program to act just as if the button had been clicked.
Class Checkbutton implements a checkbox, which is a little box, optionally displaying a checkmark, that the user clicks to toggle on or off. Instantiate Checkbutton with exactly one of the two options text= somestring to label the box with text, or image= imageobject to label the box with an image. Optionally, use command= callable to have callable execute without arguments when the user clicks the box. callable can be a function, a bound method of an object, an instance of a class with a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Container Widgets
Inhaltsvorschau
The Tkinter module supplies widgets whose purpose is to contain other widgets. A Frame instance does nothing else but act as a container. A Toplevel instance (including Tkinter's root window, also known as the application's main window) is a top-level window, so your window manager interacts with it (typically by supplying suitable decoration and handling requests). To ensure that a widget parent, which must be a Frame or Toplevel instance, is the parent (a.k.a. the master) of another widget child, pass parent as the first parameter when you instantiate child.
Class Frame is a rectangular area of the screen contained in other frames or top-level windows. Frame's only purpose is to contain other widgets. Option borderwidth defaults to 0, so an instance of Frame normally displays no border. You can configure the option with borderwidth=1 if you want the frame border's outline to be visible.
Class Toplevel represents a rectangular area of the screen that is a top-level window and therefore receives decoration from whichever window manager handles your screen. Each instance of Toplevel interacts with the window manager and contains other widgets. Every Tkinter program has at least one top-level window, known as the root window. Instantiate Tkinter's root window with root =Tkinter.Tk( ); otherwise, Tkinter instantiates its root window implicitly as and when it is first needed. If you want more than one top-level window, instantiate the main one with root =Tkinter.Tk( ). Later, instantiate other top-level windows as needed, with calls such as another_toplevel =Tkinter.Toplevel( ).
An instance T of class Toplevel supplies many methods that enable interaction with the window manager. Many are platform-specific, relevant only with some window managers for the X Window System (used mostly on Unix-like systems). The cross-platform methods used most often are as follows.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Menus
Inhaltsvorschau
Class Menu implements all kinds of menus: menu bars of top-level windows, submenus, and pop-up menus. To use a Menu instance m as the menu bar for a top-level window w, set w's configuration option menu= m. To use m as a submenu of a Menu instance x, call x .add_cascade with menu= m. To use m as a pop-up menu, call m .post.
Besides configuration options covered in "Common Widget Options" on page 409, a Menu instance m supports option postcommand= callable. Tkinter calls callable without arguments each time it is about to display m (because of a call to m .post or because of user actions). Use this option to update a dynamic menu just in time when necessary.
By default, a Tkinter menu shows a tear-off entry (a dashed line before other entries), which lets the user get a copy of the menu in a separate Toplevel window. Since such tear-offs are not part of user interface standards on popular platforms, you may want to disable tear-off functionality by using configuration option tearoff=0 for the menu.
Besides methods common to all widgets, an instance m of class Menu supplies several menu-specific methods.
add, add_cascade, add_checkbutton, add_command, add_radiobutton, add_separator
m.add(entry_kind, **entry_options)
Description
Adds after m's existing entries a new entry whose kind is the string entry_kind, one of 'cascade', 'checkbutton', 'command', 'radiobutton', or 'separator'. "Menu Entries" on page 425 covers entry kinds and options.
Methods whose names start with add_ work like method add, but accept no positional argument; the kind of entry each method adds is implied by the method's name.
delete
m.delete(i[,j])
Description
m .delete( i ) removes m's i entry. m .delete(
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Text Widget
Inhaltsvorschau
Class Text implements a powerful multiline text editor, which can display images and embedded widgets as well as text in one or more fonts and colors. An instance t of Text supports many ways to refer to specific points in t's contents. t supplies methods and configuration options, which allows fine-grained control of operations, content, and rendering. This section covers a large, frequently used subset of this vast functionality. In some very simple cases, you can get by with just three Text-specific idioms:
t.delete('1.0', END)             # clear the widget's contents

t.insert(END, astring)

# append astring to the widget's contents

somestring = t.get('1.0', END)

# get the widget's contents as a string
END is an index on any Text instance t, indicating the end of t's text. '1.0' is also an index, indicating the start of t's text (first line, first column). For more about indices, see "Indices" on page 432.
The ScrolledText module of Python's standard library supplies a class named ScrolledText. To construct a ScrolledText instance, call ScrolledText.ScrolledText in exactly the same way you would call Tkinter.Text. A ScrolledText instance s is exactly the same as a Text instance, except that s automatically provides a scrollbar for the Text instance it wraps.
An instance t of Text supplies many methods. (Methods dealing with marks and tags are covered in "Marks" on page 428 and "Tags" on page 429.) Many methods accept one or two indices into t's contents. The most frequently used methods are the following.
delete
t.delete(i[,j])
Description
t .delete( i ) removes t's character at index i. t .delete( i , j ) removes all characters from index i to index j, included.
get
t.get(i[,j])
Description
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Canvas Widget
Inhaltsvorschau
Class Canvas is a powerful, flexible widget used for many purposes, including plotting and, in particular, building custom widgets. Building custom widgets is an advanced topic, and I do not cover it further in this book. This section covers only a subset of Canvas functionality used for the simplest kind of plotting.
Coordinates within a Canvas instance c are in pixels, with origin at the upper-left corner of c, and positive coordinates growing rightward and down. Some advanced methods let you change c's coordinate system, but I do not cover them in this book.
What you draw on a Canvas instance c are canvas items: lines, polygons, Tkinter images, arcs, ovals, texts, and others. Each item has an item handle by which you can refer to the item. You can also assign symbolic names called tags to sets of canvas items (the sets of items with different tags can overlap). ALL is a predefined tag that applies to all items; CURRENT is a predefined tag that applies to the item under the mouse pointer.
Tags on a Canvas are different from tags on a Text. Canvas tags are nothing more than sets of items, with no independent existence. When you perform any operation with a Canvas tag as the item identifier, the operation occurs on the items that are in the tag's current set. It makes no difference if items are later removed from or added to that tag.
To create a canvas item, call on c a method with a name of the form create_ kindofitem, which returns the new item's handle. Methods itemcget and itemconfig of c let you get and change items' options.
A Canvas instance c supplies methods that you can call on items. The item argument can be an item's handle—as returned, for example, by c .create_line—or a tag, meaning all items in that tag's set (no items at all if the tag's set is currently empty), unless otherwise indicated in the method's description.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Layout Management
Inhaltsvorschau
In all examples so far, we have made widgets visible via method pack. This is typical of real-life Tkinter usage. However, two other layout managers are sometimes useful. This section covers all three layout managers provided by Tkinter: the packer, gridder, and placer. Never mix layout managers for the same container widget: all children of a given container widget must be handled by the same layout manager, or very strange effects (including Tkinter going into infinite loops) may result.
Calling method pack on a widget delegates widget layout management to a simple, flexible layout manager known as the Packer. The Packer sizes and positions widgets within a container (parent) widget according to each widget's space needs (including padx and pady). Each widget w supplies the following Packer-related methods.
pack
w.pack(**pack_options)
Description
Delegates layout management to the packer. pack_options may include:
expand
When true, w expands to fill any space not otherwise used in w's parent.
fill
Determines whether w fills any extra space allocated to it by the packer or keeps its own minimal dimensions: NONE (default), X (fill only horizontally), Y (fill only vertically), or BOTH (fill both horizontally and vertically).
side
Determines which side of the parent w packs against: TOP (default), BOTTOM, LEFT, or RIGHT. To avoid confusion, don't mix different values for option side= in widgets that are children of the same container. When more than one child requests the same side (for example, TOP), the rule is first come, first served: the first child packs at the top, the second child packs second from the top, and so on.
pack_forget
w
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Tkinter Events
Inhaltsvorschau
So far, we've seen only one kind of event handling: callbacks performed on callables set with the command= option of buttons and menu entries. Tkinter also lets you set callables to handle a variety of events. Tkinter does not let you create custom events: you are limited to working with events predefined by Tkinter itself.
General event callbacks must accept one argument event that is a Tkinter event object. Such an event object has several attributes that describe the event:
char
A single-character string that is the key's code (only for keyboard events)
keysym
A string that is the key's symbolic name (only for keyboard events)
num
Button number (only for mouse-button events); 1 and up
x, y
Mouse position, in pixels, relative to the upper-left corner of the widget
x_root y_root
Mouse position, in pixels, relative to the upper-left corner of the screen
widget
The widget in which the event has occurred
To bind a callback to an event in a widget w, call w .bind and describe the event with a string, usually enclosed in angle brackets ('<...>'). The following example prints 'Hello World' each time the user presses the Enter key:
from Tkinter import *



root = Tk( )

def greet(*ignore): print 'Hello World'

root.bind('<Return>', greet)

root.mainloop( )
Method tag_bind of classes Canvas and Text (covered in tag_bind on page 430 and tag_bind on page 437) binds event callbacks to sets of items of a Canvas instance, or ranges within a Text instance.
Common event names, almost all of which are enclosed in angle brackets, fall into a few categories.

Section 17.9.3.1: Keyboard events

Key
The user clicked any key. The event object's attribute char tells you which key, for normal keys only. Attribute keysym is equal to char for letters and digits, the character's name for punctuation, and the key's name for special keys.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 18: Testing, Debugging, and Optimizing
Inhaltsvorschau
You're not finished with a programming task when you're done writing the code; you're finished when the code runs correctly and with acceptable performance. Testing (covered in "Testing" on page 452) means verifying that code runs correctly by exercising the code under known conditions and checking that results are as expected. Debugging (covered in "Debugging" on page 461) means discovering causes of incorrect behavior and repairing them (repair is often easy, once you figure out the causes).
Optimizing (covered in "Optimization" on page 474) is often used as an umbrella term for activities meant to ensure acceptable performance. Optimizing breaks down into benchmarking (measuring performance for given tasks to check that it's within acceptable bounds), profiling (instrumenting the program to identify performance bottlenecks), and optimizing proper (removing bottlenecks to make overall program performance acceptable). Clearly, you can't remove performance bottlenecks until you've found out where they are (using profiling), which in turn requires knowing that there are performance problems (using benchmarking).
This chapter covers the three subjects in the natural order in which they occur in development: testing first and foremost, debugging next, and optimizing last. However, most programmers' enthusiasm focuses on optimization: testing and debugging are often (wrongly, in my opinion) perceived as being chores, while optimization is perceived as being fun. As a consequence, if you were to read only one section of the chapter, I would suggest that section be "Developing a Fast-Enough Python Application" on page 474, which summarizes the Pythonic approach to optimization—close to Jackson's classic "Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet."
All of these tasks are large and important, and each could fill at least a book by itself. This chapter does not even come close to exploring every related technique and implication; it focuses on Python-specific techniques, approaches, and tools.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Testing
Inhaltsvorschau
In this chapter, I distinguish between two rather different kinds of testing: unit testing and system testing. Testing is a rich and important field, and many more distinctions could be drawn, but I focus on the issues of most immediate importance to software developers. Many developers are reluctant to spend time on testing, seeing it as time subtracted from "real" development, but this attitude is short-sighted: defects are easier to fix the earlier you find out about them, so each hour spent developing tests can amply pay back for itself by finding defects ASAP, thus saving many hours of debugging that would otherwise have been needed in later phases of the software development cycle.
Unit testing means writing and running tests to exercise a single module or an even smaller unit, such as a class or function. System testing (also known as functional or integration testing) involves running an entire program with known inputs. Some classic books on testing also draw the distinction between white-box testing, done with knowledge of a program's internals, and black-box testing, done without such knowledge. This classic viewpoint parallels, but does not exactly duplicate, the modern one of unit versus system testing.
Unit and system testing serve different goals. Unit testing proceeds apace with development; you can and should test each unit as you're developing it. One modern approach is known as test-driven development (TDD): for each feature that your program must have, you first write unit tests and only then do you proceed to write code that implements the feature. TDD may seem upside-down, but it has several advantages. For example, it ensures that you won't omit unit tests for some feature. Further, developing test-first is helpful because it urges you to focus first on the tasks a certain function, class, or method should accomplish, and to deal only afterward with how to implement that function, class, or method. A recent important innovation along the lines of TDD is
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Debugging
Inhaltsvorschau
Since Python's development cycle is so fast, the most effective way to debug is often to edit your code so that it outputs relevant information at key points. Python has many ways to let your code explore its own state in order to extract information that may be relevant for debugging. The inspect and traceback modules specifically support such exploration, which is also known as reflection or introspection.
Once you have obtained debugging-relevant information, the print statement is often the simplest way to display it. You can also log debugging information to files. Logging is particularly useful for programs that run unattended for a long time, such as server programs. Displaying debugging information is just like displaying other kinds of information, as covered in Chapters 10 and 17. Logging such information is mostly like writing to files (as covered in Chapter 10) or otherwise persisting information, as covered in Chapter 11; however, to help with the specific task of logging, Python's standard library also supplies a logging module, covered in "The logging module" on page 136. As covered in excepthook on page 168, rebinding attribute excepthook of module sys lets your program log detailed error information just before your program is terminated by a propagating exception.
Python also offers hooks that enable interactive debugging. Module pdb supplies a simple text-mode interactive debugger. Other interactive debuggers for Python are part of integrated development environments (IDEs), such as IDLE and various commercial offerings. However, I do not cover IDEs in this book.
Before you embark on possibly lengthy debugging explorations, make sure you have thoroughly checked your Python sources with the tools mentioned in Chapter 3. Such tools can catch only a subset of the bugs in your code, but they're much faster than interactive debugging, and so their use amply repays itself.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The warnings Module
Inhaltsvorschau
Warnings are messages about errors or anomalies that may not be serious enough to be worth disrupting the program's control flow (as would happen by raising a normal exception). The warnings module affords fine-grained control over which warnings are output and what happens to them. You can conditionally output a warning by calling function warn in module warnings. Other functions in the module let you control how warnings are formatted, set their destinations, and conditionally suppress some warnings (or transform some warnings into exceptions).
Module warnings supplies several exception classes that represent warnings. Class Warning subclasses Exception and is the base class for all warnings. You may define your own warning classes; they must subclass Warning, either directly or via one of its other existing subclasses, which are:
DeprecationWarning
Uses deprecated features supplied only for backward compatibility
RuntimeWarning
Uses features whose semantics are error-prone
SyntaxWarning
Uses features whose syntax is error-prone
UserWarning
Other user-defined warnings that don't fit any of the above cases
Python supplies no concrete warning objects. A warning is composed of a message (a text string), a category (a subclass of Warning), and two pieces of information that identify where the warning was raised from: module (name of the module that raised the warning) and lineno (line number of the source code line that raised the warning). Conceptually, you may think of these as attributes of a warning object w, and I use attribute notation later for clarity, but no specific warning object w actually exists.
At any time, module warnings keeps a list of active filters for warnings. When you import warnings for the first time in a run, the module examines sys.warnoptions to determine the initial set of filters. You can run Python with option
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Optimization
Inhaltsvorschau
"First make it work. Then make it right. Then make it fast." This quotation, often with slight variations, is widely known as "the golden rule of programming." As far as I've been able to ascertain, the quotation is by Kent Beck, who credits his father with it. Being widely known makes the principle no less important, particularly because it's more honored in the breach than in the observance. A negative form, slightly exaggerated for emphasis, is in a quotation by Don Knuth (who credits Hoare with it): "Premature optimization is the root of all evil in programming."
Optimization is premature if your code is not working yet, or if you're not sure about what, exactly, your code should be doing (since then you cannot be sure if it's working). First make it work. Optimization is also premature if your code is working but you are not satisfied with the overall architecture and design. Remedy structural flaws before worrying about optimization: first make it work, then make it right. These first two steps are not optional; working, well-architected code is always a must.
In contrast, you don't always need to make it fast. Benchmarks may show that your code's performance is already acceptable after the first two steps. When performance is not acceptable, profiling often shows that all performance issues are in a small part of the code, perhaps 10 to 20 percent of the code where your program spends 80 or 90 percent of the time. Such performance-crucial regions of your code are known as its bottlenecks, or hot spots. It's a waste of effort to optimize large portions of code that account for, say, 10 percent of your program's running time. Even if you made that part run 10 times as fast (a rare feat), your program's overall runtime would only decrease by 9 percent, a speedup no user would even notice. If optimization is needed, focus your efforts where they'll matter—on bottlenecks. You can optimize bottlenecks while keeping your code 100 percent pure Python, thus not preventing future porting to other Python implementations. In some cases, you can resort to recoding some computational bottlenecks as Python extensions (as covered in Chapter 25), potentially gaining even better performance (possibly at the expense of some potential future portability).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 19: Client-Side Network Protocol Modules
Inhaltsvorschau
A program can work on the Internet as a client (a program that accesses resources) or as a server (a program that makes services available). Both kinds of programs deal with protocol issues, such as how to access and communicate data, and with data-formatting issues. For order and clarity, the Python library deals with these issues in several different modules. This book covers these topics in several chapters. This chapter deals with the modules in the Python library that support protocol issues of client programs. Chapter 20 deals with lower-level modules such as socket, used in both client and server programs, and modules that support protocol issues in server programs. Data-format issues are covered in Chapters 22, 23, and 24. Chapter 21 deals specifically with server-side programs that produce web pages, either standalone or in cooperation with existing web servers such as Apache or IIS.
Data access can often be achieved most simply through Uniform Resource Locators (URLs). Python supports URLs with modules urlparse, urllib, and urllib2. For rarer cases, such as when you need fine-grained control of data access protocols normally accessed via URLs, Python supplies modules httplib and ftplib. Protocols for which URLs are often insufficient include mail (modules poplib and smtplib), Network News (module nntplib), and Telnet (module telnetlib). Python also supports the XML-RPC protocol for distributed computing with module xmlrpclib.
A URL identifies a resource on the Internet. A URL is a string composed of several optional parts, called components, known as scheme, location, path, query, and fragment. A URL with all its parts looks something like:
scheme://lo.ca.ti.on/pa/th?query#fragment
For example, in http://www.python.org:80/faq.cgi?src=fie, the scheme is http, the location is www.python.org:80, the path is /faq.cgi, the query is src=fie, and there is no fragment. Some of the punctuation characters form a part of one of the components they separate, while others are just separators and are part of no component. Omitting punctuation implies missing components. For example, in
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
URL Access
Inhaltsvorschau
A URL identifies a resource on the Internet. A URL is a string composed of several optional parts, called components, known as scheme, location, path, query, and fragment. A URL with all its parts looks something like:
scheme://lo.ca.ti.on/pa/th?query#fragment
For example, in http://www.python.org:80/faq.cgi?src=fie, the scheme is http, the location is www.python.org:80, the path is /faq.cgi, the query is src=fie, and there is no fragment. Some of the punctuation characters form a part of one of the components they separate, while others are just separators and are part of no component. Omitting punctuation implies missing components. For example, in mailto:me@you.com, the scheme is mailto, the path is me@you.com, and there is no location, query, or fragment. The missing // means the URL has no location part, the missing ? means it has no query part, and the missing # means it has no fragment part.
The urlparse module supplies functions to analyze and synthesize URL strings. The most frequently used functions of module urlparse are urljoin, urlsplit, and urlunsplit.
urljoin
urljoin(base_url_string,relative_url_string)
Description
Returns a URL string u, obtained by joining relative_url_string, which may be relative, with base_url_string. The joining procedure that urljoin performs to obtain its result u may be summarized as follows:
  • When either of the argument strings is empty, u is the other argument.
  • When relative_url_string explicitly specifies a scheme that is different from that of base_url_string, u is relative_url_string. Otherwise, u's scheme is that of base_url_string.
  • When the scheme does not allow relative URLs (e.g., mailto), or relative_url_string explicitly specifies a location (even when it is the same as the location of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Email Protocols
Inhaltsvorschau
Most email today is sent via servers that implement the Simple Mail Transport Protocol (SMTP) and received via servers that implement the Post Office Protocol version 3 (POP3). These protocols are supported by the Python standard library modules smtplib and poplib. Some servers, instead of or in addition to POP3, implement the richer and more advanced Internet Message Access Protocol version 4 (IMAP4), supported by the Python standard library module imaplib, which I do not cover in this book.
The poplib module supplies a class POP3 to access a POP mailbox. The specifications of the POP protocol are at http://www.ietf.org/rfc/rfc1939.txt.
POP3
class POP3(host,port=110)
Description
Returns an instance p of class POP3 connected to the given host and port.
Instance p supplies many methods, of which the most frequently used are the following.
dele
p.dele(msgnum)
Description
Marks message msgnum for deletion. The server will perform deletions when this connection terminates by calling p .quit. dele returns the server response string.
list
p.list(msgnum=None)
Description
Returns a pair ( response , messages ), where response is the server response string and messages is a list of strings, each of two words ' msgnum bytes ', giving the message number and the length in bytes of each message in the mailbox. When msgnum is not None, messages has only one item: a ' msgnum bytes ' for the given msgnum.
pass_
p.pass_(password)
Description
Sends the password. Must be called after p .user. The trailing underscore in the name is needed because pass is a Python keyword. Returns the server response string.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The HTTP and FTP Protocols
Inhaltsvorschau
Modules urllib and urllib2 are often the handiest ways to access servers for http, https, and ftp protocols. The Python standard library also supplies specific modules for these protocols. The protocols' specifications are at http://www.ietf.org/rfc/rfc2616.txt, http://www.ietf.org/rfc/rfc2818.txt, and http://www.ietf.org/rfc/rfc959.txt.
Module httplib supplies a class HTTPConnection to connect to an HTTP server.
HTTPConnection
class HTTPConnection(host,port=80)
Description
Returns an instance h of class HTTPConnection, ready for connection (but not yet connected) to the given host and port.
Instance h supplies several methods, of which the most frequently used are the following.
close
h.close( )
Description
Closes the connection to the HTTP server.
getresponse
h.getresponse( )
Description
Returns an instance r of class HTTPResponse, which represents the response received from the HTTP server. Call after method request has returned. Instance r supplies the following attributes and methods:
r .getheadeeypr( name , default =None)
Returns the contents of header name, or default if no such header exists.
r .msg
An instance of class Message of module mimetools, covered in "The Message Classes of the rfc822 and mimetools Modules" on page 573. You can use r .msg to access the response's headers and body.
r .read( )
Returns a string that is the body of the server's response.
r .reason
The string that the server gave as the reason for errors or anomalies, if any. If the request was successful, r .reason is normally the string 'OK'.
r .status
An int, which is the status code that the server returned. If the request was successful,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Network News
Inhaltsvorschau
Network News, also known as Usenet, is mostly transmitted with the Network News Transport Protocol (NNTP). The specifications of the NNTP protocol are at http://www.ietf.org/rfc/rfc977.txt and http://www.ietf.org/rfc/rfc2980.txt. The Python standard library supports this protocol in module nntplib. The nntplib module supplies a class NNTP to connect to an NNTP server.
NNTP
class NNTP(host, port=119, user=None, password=None, readermode=False, usenetrc=True)
Description
Returns an instance n of class NNTP connected to the host and port, and optionally authenticated with the given user and password if user is not None. When readermode is True, also sends a 'mode reader' command; you may need this, depending on the NNTP server and on the NNTP commands you send to that server. When usenetrc is True, tries getting user and password for authentication from a file named .netrc in the current user's home directory, if not explicitly specified.
An instance n of NNTP supplies many methods. Each of n's methods returns a tuple whose first item is a string (known as response in the following), which is the response from the NNTP server to the NNTP command that corresponds to the method (method post just returns the response string, not a tuple). Each method returns the response string just as the NNTP server supplies it. The string starts with an integer in decimal form (the integer is known as the return code), followed by a space, followed by explanatory text.
For some commands, the extra text after the return code is just a comment or explanation supplied by the NNTP server. For other commands, the NNTP standard specifies the format of the text that follows the return code on the response line. In those cases, the relevant method also parses the text in question, yielding other items in the method's resulting tuple, so your code need not perform such parsing itself; rather, you just access further items in the method's result tuple, as specified in the following sections.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Telnet
Inhaltsvorschau
Telnet is an old protocol, specified by RFC 854 (see http://www.ietf.org/rfc/rfc854.txt), and is normally used for interactive user sessions. The Python standard library supports this protocol in its module telnetlib. Module telnetlib supplies a class Telnet to connect to a Telnet server.
Telnet
class Telnet(host=None,port=23)
Description
Returns an instance t of class Telnet. When host (and optionally port) is given, implicitly calls t .open( host,port ).
Instance t supplies many methods, of which the most frequently used are as follows.
close
t.close( )
Description
Closes the connection.
expect
t.expect(res,timeout=None)
Description
Reads data from the connection until it matches any of the regular expressions that are the items of list res, or until timeout seconds elapse when timeout is not None. (Regular expressions and match objects are covered in "Regular Expressions and the re Module" on page 201.) Returns a tuple of three items ( i , mo , txt ), where i is the index in res of the regular expression that matched, mo is the match object, and txt is all the text read until the match, included. Raises EOFError when the connection is closed and no data is available; otherwise, when it gets no match, returns (-1,None, txt ), where txt is all the text read, or possibly '' if nothing was read before a timeout. Results are nondeterministic if more than one item in res can match, or if any of the items in res include greedy parts (such as '.*').
interact
t.interact( )
Description
Enters interactive mode, connecting standard input and output to the two channels of the connection, like a dumb Telnet client.
open
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Distributed Computing
Inhaltsvorschau
There are many standards for distributed computing, from simple Remote Procedure Call (RPC) ones to rich object-oriented ones such as CORBA. You can find many third-party Python modules that support these standards on the Internet.
The Python standard library supports both server and client use of a simple yet powerful standard known as XML-RPC. For in-depth coverage of XML-RPC, I recommend the book Programming Web Services with XML-RPC, by Simon St.Laurent and Joe Johnson (O'Reilly). XML-RPC uses HTTP or HTTPS as the underlying transport and encodes requests and replies in XML. For server-side support, see "The Message Classes of the rfc822 and mimetools Modules" on page 573. Client-side support is supplied by module xmlrpclib.
The xmlrcplib module supplies a class ServerProxy, which you instantiate to connect to an XML-RPC server. An instance s of ServerProxy is a proxy for the server it connects to: you call arbitrary methods on s, and s packages the method name and argument values as an XML-RPC request, sends the request to the XML-RPC server, receives the server's response, and unpacks the response as the method's result. The arguments to such method calls can be of any type supported by XML-RPC:
Boolean
The built-in bool constants True and False
Integers, floating-point numbers, strings, arrays
Passed and returned as Python int, float, Unicode, and list values
Structures
Passed and returned as Python dict values whose keys must be strings
Dates
Passed as instances of class xmlrpclib.DateTime; value is represented in seconds since the epoch, as in module time (see Chapter 12).
Binary data
Passed as instances of class xmlrpclib.Binary; value is an arbitrary byte string
Module xmlrpclib supplies several classes.
Binary
class Binary(x)
Description
x is a Python string of arbitrary bytes. b wraps the bytes as an XML-RPC binary object.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Other Protocols
Inhaltsvorschau
While the standard Python library is quite rich, the set of protocols used on the Net is even richer. You can find support for these protocols in many third-party extensions. For the RSS protocol (described at http://blogs.law.harvard.edu/tech/rss), for example, you can check http://wiki.python.org/moin/RssLibraries, where you will find a fair summary of many available modules. For SSH (see http://www.snailbook.com/protocols.html), a very secure protocol that does not require third-party involvement of a Certification Authority, your best choice is probably paramiko, found at http://www.lag.net/paramiko/. SSH is often the most secure, handiest alternative to Telnet, FTP, and similar old protocols, and paramiko is an excellent implementation of SSH for Python.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 20: Sockets and Server-Side Network Protocol Modules
Inhaltsvorschau
To communicate with the Internet, programs use objects known as sockets. The Python library supports sockets through module socket, as well as wrapping them into higher-level client-side modules, as covered in Chapter 19. To help you write server programs, the Python library also supplies higher-level modules to use as frameworks for socket servers. Standard and third-party Python modules and extensions also support asynchronous socket operations. This chapter covers socket, in "The socket Module" on page 521; server-side framework modules, in "The SocketServer Module" on page 528; asynchronous operation with standard Python library modules, in "Event-Driven Socket Programs" on page 533; and the bare essentials of the rich and powerful Twisted third-party package, in "The Twisted Framework" on page 539.
The modules covered in this chapter offer many conveniences compared to C-level socket programming. However, in the end, the modules rely on native socket functionality supplied by the underlying operating system. While it is often possible to write effective network clients by using just the modules covered in Chapter 19 without really needing to understand sockets, writing effective network servers most often does require some understanding of sockets. Thus, the lower-level module socket is covered in this chapter and not in Chapter 19, even though both clients and servers use sockets.
However, I cover only the ways in which module socket lets your program access sockets; I do not try to impart a detailed understanding of sockets, TCP/IP, and other aspects of network behavior independent of Python that you may need to make use of socket's functionality. To understand socket behavior in detail on any kind of platform, I recommend W. Richard Stevens's Unix Network Programming, Volume 1 (Prentice Hall). Higher-level modules are simpler and more powerful, but a detailed understanding of the underlying technology is always useful, and sometimes it can prove indispensable.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The socket Module
Inhaltsvorschau
The socket module supplies a factory function, also named socket, that you call to generate a socket object s. To perform network operations, call methods on s. In a client program, connect to a server by calling s .connect. In a server program, wait for clients to connect by calling s .bind and s .listen. When a client requests a connection, accept the request by calling s .accept, which returns another socket object s1 connected to the client. Once you have a connected socket object, transmit data by calling its method send and receive data by calling its method recv.
Python supports both current Internet Protocol (IP) standards. IPv4 is more widespread; IPv6 is newer. In IPv4, a network address is a pair ( host , port ). host is a Domain Name System (DNS) hostname such as 'www.python.org' or a dotted-quad IP address such as '194.109.137.226'. port is an integer that indicates a socket's port number. In IPv6, a network address is a tuple ( host , port , flowinfo , scopeid ). IPv6 infrastructure is not yet widely deployed; I do not cover IPv6 further in this book. When host is a DNS hostname, Python looks up the name on your platform's DNS infrastructure, using the IP address that corresponds to the name.
Module socket supplies an exception class error. Functions and methods of socket raise error to diagnose socket-specific errors. Module socket also supplies many functions. Many of these functions translate data, such as integers, between your host's native format and network standard format. The higher-level protocol that your program and its counterpart are using on a socket determines what conversions you must perform.
The most frequently used functions of module socket are as follows.
getdefault-timeout
getdefaulttimeout( )
Description
Returns a float that is the timeout (in seconds, possibly with a fractional part) currently set by default on newly created socket objects, or
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The SocketServer Module
Inhaltsvorschau
The Python library supplies a framework module, SocketServer, to help you implement simple Internet servers. SocketServer supplies server classes TCPServer, for connection-oriented servers using TCP, and UDPServer, for datagram-oriented servers using UDP, with the same interface.
An instance s of either TCPServer or UDPServer supplies many attributes and methods, and you can subclass either class and override some methods to architect your own specialized server framework. However, I do not cover such advanced and rarely used possibilities in this book.
Classes TCPServer and UDPServer implement synchronous servers that can serve one request at a time. Classes ThreadingTCPServer and ThreadingUDPServer implement threaded servers, spawning a new thread per request. You are responsible for synchronizing the resulting threads as needed. Threading is covered in "Threads in Python" on page 341.
For normal use of SocketServer, subclass the BaseRequestHandler class provided by SocketServer and override the handle method. Then instantiate a server class, passing the address pair on which to serve and your subclass of BaseRequestHandler. Finally, call serve_forever on the server instance.
An instance h of BaseRequestHandler supplies the following methods and attributes.
client_address
The h.client_address attribute is the pair (host,port) of the client, set by the base class at connection.
Description
handle
h.handle( )
Description
Your subclass overrides this method, and the server calls the method on a new instance of your subclass for each incoming request. For a TCP server, your implementation of handle conducts a conversation with the client on socket h .request to service the request. For a UDP server, your implementation of handle examines the datagram in h .request[0] and sends a reply string with
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Event-Driven Socket Programs
Inhaltsvorschau
Socket programs, particularly servers, must often perform many tasks at once. Example 20-1 accepts a connection request, then serves a single client until that client has finished—other requests must wait. This is not acceptable for servers in production use. Clients cannot wait too long: the server must be able to service multiple clients at once.
One way to let your program perform several tasks at once is threading, covered in "Threads in Python" on page 341. Module SocketServer optionally supports threading, as covered in "The SocketServer Module" on page 528. An alternative to threading that can offer better performance and scalability is event-driven (also known as asynchronous) programming.
An event-driven program sits in an event loop and waits for events. In networking, typical events are "a client requests connection," "data arrived on a socket," and "a socket is available for writing." The program responds to each event by executing a small slice of work to service that event, then goes back to the event loop to wait for the next event. The Python library provides minimal support for event-driven network programming with the low-level select module and the higher-level asyncore and asynchat modules. Much richer support for event-driven programming is in the Twisted package (available at http://www.twistedmatrix.com), particularly in subpackage twisted.internet.
The select module exposes a cross-platform, low-level function to implement asynchronous network servers and clients. Module select has additional functionality on Unix-like platforms, but I cover only cross-platform functionality in this book.
select
select(inputs,outputs,excepts,timeout=None)
Description
inputs, outputs, and excepts are lists of socket objects that wait for input events, output events, and exceptional conditions, respectively.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 21: CGI Scripting and Alternatives
Inhaltsvorschau
When a web browser (or any other web client) requests a page from a web server, the server may return either static or dynamic content. Serving dynamic content involves server-side web programs to generate and deliver content on the fly, often based on information stored in a database. The long-standing web-wide standard for server-side programming is known as CGI, which stands for Common Gateway Interface:
  1. A web client (typically a browser) sends a structured request to a web server.
  2. The server executes another program, passing the content of the request.
  3. The server captures the standard output of the other program.
  4. The server sends that output to the client as the response to the original request.
In other words, the server's role is that of a gateway between the client and the other program. The other program is called a CGI program, or CGI script.
CGI enjoys the typical advantages of standards. When you program to the CGI standard, your program can be deployed on all web servers, and work despite the differences. This chapter focuses on CGI scripting in Python. It also mentions the downsides of CGI (basically, issues of scalability under high load) and, in "Other Server-Side Approaches" on page 557, some of the many alternative, nonstandard server-side architectures that you can use instead of CGI. Nowadays, the nonstandard alternatives are often superior because they do not constrain your deployment much, and they can support higher-level abstractions to enhance productivity. (For example, use cookies implicitly and transparently to provide a "session" abstraction (while in CGI you must deal with cookies directly if you want any session continuity) with the Cookie module covered in "The Cookie Module" on page 553). However, at least for low-level CGI alternatives (such as FastCGI, mentioned in "FastCGI" on page 557), most of what you learn about CGI programming still comes in handy.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
CGI in Python
Inhaltsvorschau
The CGI standard lets you use any language to code CGI scripts. Python is a very high-level, high-productivity language, and thus quite suitable for CGI coding. The Python standard library supplies modules to handle typical CGI-related tasks.
CGI scripts often handle submitted HTML forms. In this case, the action attribute of the form tag specifies the URL for a CGI script to handle the form, and the method attribute is GET or POST, indicating how the form data is sent to the script. According to the CGI standard, the GET method should be used only for forms without side effects, such as asking the server to query a database and display results, while the POST method is meant for forms with side effects, such as asking the server to update a database. In practice, however, GET is also often used to create side effects. The distinction between GET and POST in practical use is that GET encodes the form's contents as a query string joined to the action URL to form a longer URL, while POST transmits the form's contents as an encoded stream of data, which a CGI script sees as standard input.
GET is slightly faster. You can use a fixed GET-form URL wherever you can use a hyperlink. However, GET cannot send large amounts of data to the server, since many clients and servers limit URL lengths (you're safe up to about 200 bytes). The POST method has no size limits. You must use POST when the form contains input tags with type=file—the form tag must then have enctype=multipart/form-data.
The CGI standard does not specify whether a single script can access both the query string (used for GET) and the script's standard input (used for POST). Many clients and servers let you get away with it, but relying on this nonstandard practice may negate the portability advantages that you would otherwise get from the fact that CGI is a standard. Python's standard module
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Cookies
Inhaltsvorschau
HTTP, per se, is a stateless protocol, meaning that it retains no session state between transactions. Cookies, as specified by the HTTP 1.1 standard, let web clients and servers cooperate to build a stateful session from a sequence of HTTP transactions.
Each time a server sends a response to a client's request, the server may initiate or continue a session by sending one or more Set-Cookie headers, whose contents are small data items called cookies. When a client sends another request to the server, the client may continue a session by sending Cookie headers with cookies previously received from that server or other servers in the same domain. Each cookie is a pair of strings, the name and value of the cookie, plus optional attributes. Attribute max-age is the maximum number of seconds the cookie should be kept. The client should discard saved cookies after their maximum age. If max-age is missing, then the client should discard the cookie when the user's interactive session ends.
Cookies provide no intrinsic privacy or authentication. Cookies travel in the clear on the Internet and are vulnerable to sniffing. A malicious client might return cookies different from cookies previously received. To use cookies for authentication or identification, or to hold sensitive information, the server must encrypt and encode cookies sent to clients, and decode, decrypt, and verify cookies received back from clients.
Encryption, encoding, decoding, decryption, and verification may all be slow when applied to large amounts of data. Decryption and verification require the server to keep some amount of server-side state. Sending substantial amounts of data back and forth on the network is also slow. The server should therefore persist most state data locally in files or databases. In most cases, a server should use cookies only as small, encrypted, verifiable keys that confirm the identity of a user or session, using DBM files or a relational database (both covered in Chapter 11) for session state. HTTP sets a limit of 2 KB on cookie size, but I suggest you normally use even smaller cookies.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Other Server-Side Approaches
Inhaltsvorschau
A CGI script runs as a new process each time a client requests it. Process startup time, interpreter initialization, connection to databases, and script initialization add up to measurable overhead. On fast, modern server platforms, the overhead is bearable for light to moderate loads. On a busy server, CGI may not scale up well. Web servers support many server-specific ways to reduce overhead, running scripts in processes that can serve for several hits rather than starting up a new CGI process per hit.
Microsoft's ASP (Active Server Pages) is a server extension that leverages a lower-level library, ISAPI, and Microsoft's COM technology. Most ASP pages are coded in the VBScript language, but ASP is language-independent. As the reptilian connection suggests, Python and ASP go very well together, as long as Python is installed with the platform-specific win32all extensions, specifically ActiveScripting. Many other server extensions are cross-platform, not tied to specific operating systems.
The popular application server Zope (http://www.zope.org) is a Python application. If you need advanced management features, Zope (and the higher-level content-management system Plone, http://plone.org/, built on top of Zope) should be among the solutions you consider. Zope and Plone are large, powerful systems and need full books of their own to do them justice. I do not cover Zope and Plone further in this book.
FastCGI lets you write scripts similar to CGI scripts, in a variety of languages, using each process to handle multiple hits, either sequentially or simultaneously in separate threads. See http://www.fastcgi.com for FastCGI overviews and details, including pointers about FastCGI support on all kinds of servers as well as Python support for FastCGI. A streamlined variant of FastCGI is SCGI (http://www.mems-exchange.org/software/scgi/).
The Python Web Server Gateway Interface (WSGI) is the emerging standard "middleware" approach that interfaces higher-level Python web development frameworks to underlying web servers, and is documented at
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 22: MIME and Network Encodings
Inhaltsvorschau
What travels on a network are streams of bytes or text. However, what you want to send over the network often has more structure. The Multipurpose Internet Mail Extensions (MIME) and other encoding standards bridge the gap by specifying how to represent structured data as bytes or text. Python supports such encodings through many library modules, such as base64, quopri, and uu (covered in "Encoding Binary Data as Text" on page 561), and the modules of the email package (covered in "MIME and Email Format Handling" on page 564).
Several kinds of media (e.g., email messages) contain only text. When you want to transmit arbitrary binary data via such media, you need to encode the data as text strings. The Python standard library supplies modules that support the standard encodings known as Base 64, Quoted Printable, and UU.
The base64 module supports the encoding specified in RFC 1521 as Base 64. The Base 64 encoding is a compact way to represent arbitrary binary data as text, without any attempt to produce human-readable results. Module base64 supplies four functions.
decode
decode(infile,outfile)
Description
Reads text-file-like object infile by calling infile .readline until end of file (i.e., until a call to infile .readline returns an empty string), decodes the Base 64-encoded text thus read, and writes the decoded data to binary-file-like object outfile.
decodestring
decodestring(s)
Description
Decodes text string s, which contains one or more complete lines of Base 64-encoded text, and returns the byte string with the corresponding decoded data.
encode
encode(infile,outfile)
Description
Reads binary-file-like object infile by calling infile .read (for 57 bytes at a time, which is the amount of data that Base 64 encodes into 76 characters in each output line) until end of file (i.e., until a call to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Encoding Binary Data as Text
Inhaltsvorschau
Several kinds of media (e.g., email messages) contain only text. When you want to transmit arbitrary binary data via such media, you need to encode the data as text strings. The Python standard library supplies modules that support the standard encodings known as Base 64, Quoted Printable, and UU.
The base64 module supports the encoding specified in RFC 1521 as Base 64. The Base 64 encoding is a compact way to represent arbitrary binary data as text, without any attempt to produce human-readable results. Module base64 supplies four functions.
decode
decode(infile,outfile)
Description
Reads text-file-like object infile by calling infile .readline until end of file (i.e., until a call to infile .readline returns an empty string), decodes the Base 64-encoded text thus read, and writes the decoded data to binary-file-like object outfile.
decodestring
decodestring(s)
Description
Decodes text string s, which contains one or more complete lines of Base 64-encoded text, and returns the byte string with the corresponding decoded data.
encode
encode(infile,outfile)
Description
Reads binary-file-like object infile by calling infile .read (for 57 bytes at a time, which is the amount of data that Base 64 encodes into 76 characters in each output line) until end of file (i.e., until a call to infile .read returns an empty string). It encodes the data thus read in Base 64, and writes the encoded text, one line at a time, to text-file-like object outfile, appending \n to each line of text it emits, including the last one.
encodestring
encodestring(s)
Description
Encodes binary string s, which contains arbitrary bytes, and returns a text string with one or more complete lines of Base 64-encoded data joined by newline characters (
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
MIME and Email Format Handling
Inhaltsvorschau
Python supplies the email package to handle parsing, generation, and manipulation of MIME files such as email messages, network news posts, and so on. The Python standard library also contains other modules that handle some parts of these jobs. However, the email package offers a complete and systematic approach to these important tasks. I suggest you use package email, not the older modules that partially overlap with parts of email's functionality. Package email has nothing to do with receiving or sending email; for such tasks, see modules poplib and smtplib, covered in "Email Protocols" on page 503. email deals with handling messages after you receive them or before you send them.
Package email supplies two factory functions that return an instance m of class email.Message.Message. These functions rely on class email.Parser.Parser, but the factory functions are handier and simpler. Therefore, I do not cover module Parser further in this book.
message_from_string
message_from_string(s)
Description
Builds m by parsing string s.
message_from_file
message_from_file(f)
Description
Builds m by parsing the contents of file-like object f, which must be open for reading.
The email.Message module supplies class Message. All parts of package email make, modify, or use instances of class Message. An instance m of Message models a MIME message, including headers and a payload (data content). To create an initially empty m, call class Message with no arguments. More often, you create m by parsing via functions message_from_string and message_from_file of module email, or by other indirect means such as the classes covered in "Creating Messages" on page 568. m's payload can be a string, a single other instance of Message, or a list of other Message instances for a multipart message.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 23: Structured Text: HTML
Inhaltsvorschau
Most documents on the Web use HTML, the HyperText Markup Language. Markup is the insertion of special tokens, known as tags, in a text document to give structure to the text. HTML is, in theory, an application of the large, general standard known as SGML, the Standard General Markup Language. In practice, many of the Web's documents use HTML in sloppy or incorrect ways. Browsers have evolved many practical heuristics over the years to try and compensate for this, but even so, it still sometimes happens that a browser displays an incorrect web page in some weird way (don't blame the browser: 9 times out of 10, all the blame and then some is deserved by the web page's author!).
HTML is not suitable for much more than presenting documents on a screen. Complete and precise extraction of the information in the document, working backward from the document's presentation, is often unfeasible. To tighten things up, HTML has evolved into a more rigorous standard called XHTML. XHTML is very similar to traditional HTML, but it is defined in terms of XML and more precisely than HTML. You can handle XHTML with the tools covered in Chapter 24.
Despite the difficulties, it's often possible to extract at least some useful information from HTML documents (a task often known as screen-scraping, or just scraping). Python supplies the sgmllib, htmllib, and HTMLParser modules for the task of parsing HTML documents, whether this parsing is for the purpose of presenting the documents, or, more typically, as part of an attempt to extract (scrape) information. When you're dealing with broken web pages, third-party module BeautifulSoup offers your best, last hope. Generating HTML and embedding Python in HTML are also frequent tasks. No standard Python library module supports HTML generation or embedding directly, but you can use normal Python string manipulation, and third-party modules can also help.
The name of the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The sgmllib Module
Inhaltsvorschau
The name of the sgmllib module is misleading: sgmllib parses only a tiny subset of SGML, but it is still a good way to get information from HTML files. sgmllib supplies one class, SGMLParser, which you subclass, overriding methods. The most frequently used methods of an instance s of your subclass X of SGMLParser are as follows.
close
s.close( )
Description
Tells the parser that there is no more input data. When X overrides close, s .close must call SGMLParser.close to ensure that buffered data is processed.
do_tag
s.do_tag(attributes)
Description
X supplies a method with such a name for each tag, with no corresponding end tag, that X wants to process. tag must be lowercase in the method name, but can be in any case in the parsed text (the SGML standard, like HTML, is case-insensitive, in contrast to XML and XHTML, which are case-sensitive). SGMLParser's handle_tag method calls do_ tag when appropriate. attributes is a list of pairs ( name , value ), where name is an attribute's name, lowercased, and value is the value, processed to resolve entity and character references and remove surrounding quotes.
end_tag
s.end_tag( )
Description
X supplies a method with such a name for each tag whose end tag X wants to process. tag must be lowercase in the method name, but can be in any case in the parsed text. X must also supply a method named start_ tag; otherwise, end_ tag is ignored. SGMLParser's handle_endtag method calls end_ tag when appropriate.
feed
s.feed(data)
Description
Passes to the parser some of the text being parsed. The parser may process some prefix of the text, holding the rest in a buffer until the next call to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The htmllib Module
Inhaltsvorschau
The htmllib module supplies a class named HTMLParser that subclasses SGMLParser and defines start_ tag, do_ tag, and end_ tag methods for HTML 2.0 tags. HTMLParser implements and overrides methods to perform calls to methods of a formatter object, covered in "The formatter Module" on page 581. You can subclass HTMLParser and override methods. In addition to start_ tag, do_ tag, and end_ tag methods, an instance h of HTMLParser supplies the following attributes and methods.
anchor_bgn
h.anchor_bgn(href,name,type)
Description
Called for each <a> tag. href, name, and type are the string values of the tag's attributes with the same names. HTMLParser's implementation of anchor_bgn maintains a list of outgoing hyperlink targets (i.e., href arguments of method s .anchor_bgn) in an instance attribute named s .anchorlist.
anchor_end
h.anchor_end( )
Description
Called for each </a> end tag. HTMLParser's implementation of anchor_end emits to the formatter a footnote reference that is an index within s .anchorlist. In other words, by default, HTMLParser asks the formatter to format an <a>/</a> tag pair as the text inside the tag, followed by a footnote reference number that points to the URL in the <a> tag. Of course, it's up to the formatter to deal with this formatting request.
anchorlist
The h.anchor_list attribute contains the list of outgoing hyperlink target URLs, as built by method h.anchor_bgn.
Description
formatter
The h.formatter attribute is the formatter object f associated with h, which you pass as the only argument when you instantiate HTMLParser(f).
Description
handle_image
h.handle_image(source,alt,ismap='',align='',width='',height
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The HTMLParser Module
Inhaltsvorschau
Module HTMLParser supplies one class, HTMLParser, that you subclass to override methods. HTMLParser.HTMLParser is similar to sgmllib.SGMLParser, but is simpler and able to parse XHTML as well. The main differences between HTMLParser and SGMLParser are the following:
  • HMTLParser does not call methods named do_ tag, start_ tag, and end_ tag. To process tags and end tags, your subclass X of HTMLParser must override methods handle_starttag and/or handle_endtag and check explicitly for the tags it wants to process.
  • HMTLParser does not keep track of, nor check, tag nesting in any way.
  • HMTLParser does nothing, by default, to resolve character and entity references. Your subclass X of HTMLParser must override methods handle_charref and/or handle_entityref if it needs to perform processing of such references.
Commonly used methods of an instance h of subclass X of HTMLParser are as follows.
close
h.close( )
Description
Tells the parser that there is no more input data. When X overrides close, h .close must also call HTMLParser.close to ensure that all buffered data is processed.
feed
h.feed(data)
Description
Passes to the parser a part of the text being parsed. The parser processes some prefix of the text and holds the rest in a buffer until the next call to h .feed or h .close.
handle_charref
h.handle_charref(ref)
Description
Called to process a character reference '&# ref ;'. HTMLParser's implementation of handle_charref does nothing.
handle_comment
h.handle_comment(comment)
Description
Called to handle comments. comment is the string within '<!--...-->', without the delimiters. HTMLParser's implementation of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The BeautifulSoup Extension
Inhaltsvorschau
BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/) lets you parse HTML that may be badly formed and uses simple heuristics to compensate for likely HTML brokenness (it succeeds in this difficult task with surprisingly good frequency). Module BeautifulSoup supplies a class, also named BeautifulSoup, which you instantiate with either a file-like object (which is read to give the HTML text to parse) or a string (which is the text to parse). The module also supplies other classes (BeautifulStoneSoup and ICantBelieveItsBeautifulSoup) that are quite similar, but suitable for slightly different XML parsing tasks. An instance b of class BeautifulSoup supplies many attributes and methods to ease the task of searching for information in the parsed HTML input, returning instances of classes Tag and NavigableText, which in turn let you keep navigating or dig for more information.
The following example uses BeautifulSoup to perform the same task as previous examples: fetch a page from the Web with urllib, parse it, and output the hyperlinks:
import urllib, urlparse, BeautifulSoup



f = urllib.urlopen('http://www.python.org/index.html')

b = BeautifulSoup.BeautifulSoup(f)



seen = set( )

for anchor in b.fetch('a'):

    url = anchor.get('href')

    if url is None or url in seen: continue

    seen.add(url)

    pieces = urlparse.urlparse(url)

    if pieces[0]=='http':

        print urlparse.urlunparse(pieces)
The example calls the fetch method of class BeautifulSoup.BeautifulSoup to obtain all instances of a certain tag (here, tag '<a>'), then the get method of instances of class Tag to obtain the value of an attribute (here, 'href'), or None when that attribute is missing. The logic to analyze and emit the target URLs of outgoing hyperlinks is just the same as in previous examples.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Generating HTML
Inhaltsvorschau
Python does not come with tools to generate HTML. If you want an advanced framework for structured HTML generation, I recommend Robin Friedrich's HTMLGen 2.2 (available at http://starship.python.net/crew/friedrich/HTMLgen/html/main.html), but I do not cover the package in this book. To generate XHTML, you can use the approaches covered in Chapter 24.
If your favorite approach is to embed Python code within HTML in the manner made popular by JSP, ASP, and PHP, one possibility is to use the Python Server Pages (PSP) supplied by Webware (mentioned in "Webware" on page 559). Another package, focused particularly on the embedding approach, is Spyce (available at http://spyce.sf.net/). For all but the simplest problems, however, development and maintenance are eased by separating logic and presentation issues through templating, covered in the next section. Both Webware and Spyce optionally support templating in lieu of embedding.
To generate HTML, the best approach is often templating. With templating, you start with a template, which is a text string (often read from a file, database, etc.) that is valid HTML, but includes markers, also known as placeholders, where dynamically generated text must be inserted. Your program generates the needed text and substitutes it into the template. In the simplest case, you can use markers of the form '%( name )s'. Set the dynamically generated text as the value for key ' name ' in some dictionary d. The Python string formatting operator % (covered in "String Formatting" on page 193) now does all you need: if t is your template, t%d is a copy of the template with all values properly substituted.
For advanced templating tasks, I recommend Cheetah (available at http://www.cheetahtemplate.org). Cheetah interoperates particularly well with Webware and other Python server-side web frameworks, as mentioned in "Webware" on page 559. When you have Webware installed, Cheetah's template objects are Webware servlets, so you can immediately deploy them under Webware. You can also use Cheetah in many other contexts: for example, Spyce and
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 24: Structured Text: XML
Inhaltsvorschau
XML, the eXtensible Markup Language, has become very widespread over the last few years. Like SGML (mentioned in "The sgmllib Module" on page 576), XML is a metalanguage, a language to describe markup languages. On top of XML 1.0, the XML community (mostly within the World Wide Web Consortium [W3C]) has standardized many other technologies, such as schema languages, Namespaces, XPath, XLink, XPointer, and XSLT.
Industry consortia in many fields have defined industry-specific markup languages on top of XML to facilitate data exchange among applications in those fields. Such industry standards let applications exchange data even when the applications are coded in different languages and deployed on different platforms by different firms. XML, related technologies, and XML-based markup languages are the basis for inter-application, cross-language, cross-platform data interchange in modern applications.
Python has excellent support for XML. The standard Python library supplies the xml package, which lets you use fundamental XML technology quite simply. The third-party package PyXML (http://pyxml.sf.net) extends the standard library's xml with validating parsers, richer DOM implementations, and advanced technologies such as XPath and XSLT. Downloading and installing PyXML upgrades Python's own xml packages, so it can be a good idea to do so even if you don't use PyXML-specific features.
On top of PyXML, you can choose to install yet another freely available third-party package, 4Suite (available at http://4suite.org). 4Suite provides even more XML parsers for special niches, advanced technologies such as XLink and XPointer, and code supporting standards built on top of XML, such as the Resource Description Framework (RDF).
A highly Pythonic alternative for XML processing is ElementTree (http://effbot.org/zone/element-index.htm), most of whose functionality is also slated for release in Python 2.5 as standard library module
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
An Overview of XML Parsing
Inhaltsvorschau
When your application must parse XML documents, your first, fundamental choice is what kind of parsing to use. You can use event-driven parsing, in which the parser reads the document sequentially and calls back to your application each time it parses a significant aspect of the document (such as an element), or you can use object-based parsing, in which the parser reads the whole document and builds in-memory data structures, representing the document, that you can then navigate. SAX is the main way to perform event-driven parsing, and DOM is the main way to perform object-based parsing. In each case, there are alternatives, such as direct use of expat for event-driven parsing, or ElementTree for object-based parsing, but I do not cover these alternatives in this book. Another interesting possibility is pull-based parsing, supported by pulldom, covered later in this chapter (and also, to some extent, by ElementTree, via the iterparse function of C-coded module cElementTree).
Event-driven parsing requires fewer resources, which makes it particularly suitable to parse very large documents. However, event-driven parsing requires you to structure your application accordingly, performing your processing (and typically building auxiliary data structures) in your methods called by the parser. Object-based parsing gives you more flexibility to structure your application, which may make it more suitable when you need to perform very complicated processing, as long as you can afford the extra resources needed for object-based parsing (typically, this means that you are not dealing with very large documents). Object-based approaches also support programs that need to modify or create XML documents, as covered in "Changing and Generating XML" on page 606.
As a general guideline, when you are still undecided after studying the various trade-offs, I suggest you try event-driven parsing first, whenever you can see a reasonably direct way to perform your program's tasks through this approach. Event-driven parsing is more scalable: if your program can perform its task via event-driven parsing, it will be more applicable to larger documents than it would be otherwise. If event-driven parsing is just too confining, then try pull-based parsing instead, via
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing XML with SAX
Inhaltsvorschau
In most cases, the best way to extract information from an XML document is to parse the document with an event-driven parser compliant with SAX, the Simple API for XML. SAX defines a standard API that can be implemented on top of many different underlying parsers. The SAX approach to parsing has similarities to most of the HTML parsers covered in Chapter 23. As the parser encounters XML elements, text contents, and other significant events in the input stream, the parser calls back to methods of your classes. Such event-driven parsing, based on callbacks to your methods as relevant events occur, also has similarities to the event-driven approach that is almost universal in GUIs and in some of the best, most scalable networking frameworks, such as Twisted, mentioned in Chapter 19. Event-driven approaches in various programming fields may not appear natural to beginners, but enable high performance and particularly high scalability, making them very suitable for high-workload cases.
To use SAX, you define a content handler class, subclassing a library class and overriding some methods. Then you build a parser object p, install an instance of your class as p's handler, and feed p the input stream to parse. p calls methods on your handler to reflect the document's structure and contents. Your handler's methods perform application-specific processing. The xml.sax package supplies a factory function to build p, and convenience functions for simpler operation in typical cases. xml.sax also supplies exception classes, raised in cases of invalid input and other errors.
Optionally, you can also register with parser p other kinds of handlers besides the content handler. You can supply a custom error handler to use an error diagnosis strategy different from normal exception raising, for example in order to diagnose several errors during a parse. You can supply a custom DTD handler to receive information about notation and unparsed entities from the XML document's Document Type Definition (DTD). You can supply a custom entity resolver to handle external entity references in advanced, customized ways. These advanced possibilities are rarely used, and I do not cover them further in this book.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing XML with DOM
Inhaltsvorschau
SAX parsing does not build any structure in memory to represent the XML document. This makes SAX fast and highly scalable, as your application builds exactly as little or as much in-memory structure as needed for its specific tasks. However, for particularly complicated processing tasks involving reasonably small XML documents, you may prefer to let the library build in-memory structures that represent the whole XML document, and then traverse those structures. The XML standards describe the DOM (Document Object Model) for XML. A DOM object represents an XML document as a tree whose root is the document object, while other nodes correspond to elements, text contents, element attributes, and so on. The ElementTree module mentioned in the introduction of this chapter provides a different, more Pythonic (and faster) approach to build an in-memory representation of an XML document, while DOM mimics existing W3C standards (mostly developed with other languages, such as Java, in mind).
The Python standard library supplies a minimal implementation of the XML DOM standard: xml.dom.minidom. minidom builds everything up in memory, with the typical pros and cons of the DOM approach to parsing. The Python standard library also supplies a different DOM-like approach in module xml.dom.pulldom. pulldom occupies an interesting middle ground between SAX and DOM, presenting the stream of parsing events as a Python iterator object so that you do not code callbacks, but rather loop over the events and examine each event to see if it's of interest. When you do find an event of interest to your application, you ask pulldom to build the DOM subtree rooted in that event's node by calling method expandNode, and then work with that subtree as you would in minidom. Paul Prescod, pulldom's author and XML and Python expert, describes the net result as "80 percent of the performance of SAX, 80 percent of the convenience of DOM." Other DOM parsers are part of the PyXML and 4Suite extension packages, mentioned at the start of this chapter.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Changing and Generating XML
Inhaltsvorschau
Just like for HTML and other kinds of structured text, the simplest way to output an XML document is often to prepare and write it using Python's normal string and file operations, covered in Chapter 9 and "File Objects" on page 216. Templating (covered in "Templating" on page 586) is also often the best approach. Subclassing class XMLGenerator (covered in "XMLGenerator" on page 597) is a good way to generate an XML document that is like an input XML document except for a few changes.
The xml.dom.minidom module offers yet another possibility because its classes support methods to generate, insert, remove, and alter nodes in a DOM tree that represents the document. You can create a DOM tree by parsing and then alter it, or you can create an empty DOM tree and populate it from scratch. You can output the resulting XML document with methods toxml, toprettyxml, or writexml of the Document instance. You can also output a subtree by calling these methods on the Node that is the subtree's root. The ElementTree module, mentioned in this chapter's introduction, also offers similar functionality (but with a more Pythonic API and much better performance).
The Document class supplies factory methods to create instances of Node subclasses. The most frequently used factory methods of a Document instance d are as follows.
createComment
d.createComment(data)
Description
Builds and returns an instance c of class Comment for a comment with text data.
createElement
d.createElement(tagname)
Description
Builds and returns an instance e of class Element for an element with the given tag.
createTextNode
d.createTextNode(data)
Description
Builds and returns an instance t of class TextNode for a text node with text data.
An instance e of class Element supplies methods to remove and add attributes.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 25: Extending and Embedding Classic Python
Inhaltsvorschau
Classic Python runs on a portable, C-coded virtual machine. Python's built-in objects, such as numbers, sequences, dictionaries, sets, and files, are coded in C, as are several modules in Python's standard library. Modern platforms support dynamic-load libraries, with file extensions such as .dll on Windows and .so on Linux and Mac, and building Python produces such binary files. You can code your own extension modules for Python in C, using the Python C API covered in this chapter, to produce and deploy dynamic libraries that Python scripts and interactive sessions can later use with the import statement, covered in "The import Statement" on page 140.
Extending Python means building modules that Python code can import to access the features the modules supply. Embedding Python means executing Python code from your application. For such execution to be useful, Python code must in turn be able to access some of your application's functionality. In practice, therefore, embedding implies some extending, as well as a few embedding-specific operations. The three main reasons for wishing to extend Python can be summarized as follows:
  • Reimplementing some functionality (originally coded in Python) in a lower-level language, hoping to get better performance
  • Letting Python code access some existing functionality supplied by libraries coded in (or, at any rate, callable from) lower-level languages
  • Letting Python code access some existing functionality of an application that is in the process of embedding Python as the application's scripting language
Embedding and extending are covered extensively in Python's online documentation; you can find an in-depth tutorial at http://www.python.org/doc/ext/ext.html and a reference manual at http://www.python.org/doc/api/api.html. Many details are best studied in Python's extensively documented sources. Download Python's source distribution and study the sources of Python's core, C-coded extension modules and the example extensions supplied for study purposes.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extending Python with Python's C API
Inhaltsvorschau
A Python extension module named x resides in a dynamic library with the same filename (x.pyd on Windows; x.so on most Unix-like platforms) in an appropriate directory (often the site-packages subdirectory of the Python library directory). You generally build the x extension module from a C source file x.c whose the overall structure is:
#include <Python.h>



/* omitted: the body of thex module */



void initx(void)

{

    /* omitted: the code that initializes the module named x */

}
When you have built and installed the extension module, a Python statement import x loads the dynamic library, then locates and calls the function named init x, which must do all that is needed to initialize the module object named x.
To build and install a C-coded Python extension module, it's simplest and most productive to use the distribution utilities, distutils, covered in "The Distribution Utilities (distutils)" on page 150. In the same directory as x.c, place a file named setup.py that contains the following statements:
from distutils.core import setup, Extension setup(name='x', ext_modules=[ Extension('x',

sources=['x.c']) ])
From a shell prompt in this directory, you can now run:
C:\>python setup.py install
to build the module and install it so that it becomes usable in your Python installation. distutils performs all needed compilation and linking steps, with the right compiler and linker commands and flags, and copies the resulting dynamic library into an appropriate directory, dependent on your Python installation (depending on that installation's details, you may need to have administrator or super-user privileges for the installation; for example, on a Mac or Linux, you may need to run sudo python setup.py install). Your Python code can then access the resulting module with the statement import x.

Section 25.1.1.1: The C compiler you need

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extending Python Without Python's C API
Inhaltsvorschau
You can code Python extensions in other classic compiled languages besides C. For Fortran, the choice is between Paul Dubois's Pyfort (available at http://pyfortran.sf.net) and Pearu Peterson's F2PY (available at http://cens.ioc.ee/projects/f2py2e/). Both packages support and require the Numeric package covered in "The Numeric Package" on page 378, since numeric processing is Fortran's typical application area.
For C++, you have many choices. SCXX (available at http://davidf.sjsoft.com/mirrors/mcmillan-inc/scxx.html) is a simple, lightweight package that uses no templates and is thus suitable for older C++ compilers. PyCXX (available at http://cxx.sf.net) uses a modest amount of templates, essentially ones from the C++ standard library. SIP (available at http://www.riverbankcomputing.co.uk/sip/index.php) also supports the C++ extensions needed to use the powerful Qt cross-platform libraries, but, while it fully supports Qt, it does not require it. The Boost Python Library (available at http://www.boost.org/libs/python/doc) is part of Boost, a vast treasury of powerful, template-rich C++ libraries, of uniformly high quality, that need and support modern C++ compilers that support templates very well.
Of course, you may also choose to use Python's C API from your C++ code, using C++ in this respect as if it were C, and foregoing the extra convenience that C++ affords. However, if you're already using C++ rather than C anyway, then using SCXX, PyCXX, SIP, or Boost can substantially improve your programming productivity when compared to using Python's underlying C API.
If your Python extension is basically a wrapper over an existing C or C++ library (as many are), consider SWIG, the Simplified Wrapper and Interface Generator (available at http://www.swig.org). SWIG generates the C source code for your extension based on the library's header files, generally with some help in terms of further annotations in an interface description file. If you specifically need to wrap an existing dynamic library (a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Embedding Python
Inhaltsvorschau
If you have an application already written in C or C++ (or another classic compiled language), you may want to embed Python as your application's scripting language. To embed Python in languages other than C, the other language must be able to call C functions. In the following, I cover only the C view of things, since other languages vary widely regarding what you have to do in order to call C functions from them.
In order for Python scripts to communicate with your application, your application must supply extension modules with Python-accessible functions and classes that expose your application's functionality. If these modules are linked with your application, rather than residing in dynamic libraries that Python can load when necessary, register your modules with Python as additional built-in modules by calling the PyImport_AppendInittab C API function.
PyImport_AppendInittab
int PyImport_AppendInittab(char* name,void (*initfunc)(void))
Description
name is the module name, which Python scripts use in import statements to access the module. initfunc is the module initialization function, taking no argument and returning no result, as covered in "Module Initialization" on page 617 (i.e., initfunc is the module's function that would be named init name for a normal extension module in a dynamic library). PyImport_AppendInittab must be called before Py_Initialize.
You may want to set the program name and arguments, which Python scripts can access as sys.argv, by calling either or both of the following C API functions.
Py_SetPro-gramName
void Py_SetProgramName(char* name)
Description
Sets the program name, which Python scripts can access as sys.argv[0]. Must be called before Py_Initialize.
PySys_SetArgv
void PySys_SetArgv(int argc,char** argv)
Description
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Pyrex
Inhaltsvorschau
The Pyrex language (http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/) is often the most convenient way to write extensions for Python. Pyrex is a large subset of Python, with the addition of optional C-like types for variables: you can automatically compile Pyrex programs (source files with extension .pyx) into machine code (via an intermediate stage of generated C code), producing Python-importable extensions. See the above URL for all the details of Pyrex programming; in this section, I cover only a few essentials to let you get started with Pyrex.
The limitations of the Pyrex language, compared with Python, are the following:
  • No nesting of def and class statements in other statements (except that one level of def within one level of class is okay, and indeed is the proper and normal way to define a class's methods).
  • No import *, generators, list comprehensions, decorators, or augmented assignment.
  • No globals and locals built-ins.
  • To give a class a staticmethod or classmethod, you must first def the function outside the class statement (in Python, it's normally defed within the class).
As you can see, while not quite as rich as Python proper, Pyrex is a vast subset indeed. More importantly, Pyrex adds to Python a few statements that allow C-like declarations, enabling easy generation of machine code (via an intermediate C-code generation step). Here is a simple example; code it in source file hello.pyx in a new empty directory:
def hello(char *name):

    return "Hello, " + name + "!"
This is almost exactly like Python—except that parameter name is preceded by a char*, declaring that its type must always be a C 0-terminated string (but, as you see from the body, in Pyrex, you can use its value as you would a normal Python string).
When you install Pyrex (by the usual python setup.py install route), you also gain a way to build Pyrex source files into Python dynamic-library extensions through the usual
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 26: Extending and Embedding Jython
Inhaltsvorschau
Jython implements the Python language on a Java Virtual Machine (JVM). Jython's built-in objects, such as numbers, sequences, dictionaries, and files, are coded in Java. To extend Classic Python with C, you code C modules using the Python C API (as covered in "Extending Python with Python's C API" on page 614). To extend Jython with Java, you do not have to code Java modules in special ways: every Java package on the Java CLASSPATH (or on Jython's sys.path) is automatically available to your Jython scripts and Jython interactive sessions for use with the import statement covered in "The import Statement" on page 140. This automatic availability applies to Java's standard libraries, third-party Java libraries you have installed, and Java classes you have coded yourself. You can extend Java with C using the Java Native Interface (JNI), and such extensions will be available to Jython code, just as if they were coded in pure Java rather than in JNI-compliant C.
For details on interoperation between Java and Jython, I recommend Jython Essentials, by Samuele Pedroni and Noel Rappin (O'Reilly). In this chapter, I offer a brief overview of the simplest interoperation scenarios, just enough for a large number of practical needs. In most cases, importing, using, extending, and implementing Java classes and interfaces in Jython just works. In some cases, however, you need to be aware of issues related to accessibility, type conversions, and overloading, as covered in this chapter. Embedding the Jython interpreter in Java-coded applications is similar to embedding the Python interpreter in C-coded applications (as covered in "Embedding Python" on page 647), but the Jython task is easier. Jython offers yet another possibility for interoperation with Java, using the jythonc compiler to turn your Python sources into classic, static JVM bytecode .class and .jar files. You can then use these bytecode files in Java applications and frameworks, exactly as if their source code had been in Java rather than in Python.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Importing Java Packages in Jython
Inhaltsvorschau
Unlike Java, Jython does not implicitly and automatically import java.lang. Your Jython code can explicitly import java.lang, or even just import java, and then use classes such as java.lang.System and java.lang.String as if they were Python classes. Specifically, your Jython code can use imported Java classes as if they were Python classes with a _ _slots_ _ class attribute (i.e., you cannot create arbitrary new instance attributes). You can subclass a Java class with your own Python class, and instances of your class do let you create new attributes just by binding them, as usual.
You may choose to import a top-level Java package (such as java) rather than specific subpackages (such as java.lang). Your Python code acquires the ability to access all subpackages when you import the top-level package. For example, after import java, your code can use classes java.lang.String, java.util.Vector, and so on.
The Jython runtime wraps every Java class you import in a transparent proxy, which manages communication between Python and Java code behind the scenes. This gives an extra reason to avoid the dubious idiom from somewhere import *, in addition to the reasons mentioned in "The from...import * statement" on page 143. When you perform such a bulk import, the Jython runtime must build proxy wrappers for all the Java classes in package somewhere, spending substantial amounts of memory and time wrapping many classes your code will probably not use. Avoid from...import *, except for occasional convenience in interactive exploratory sessions, and stick with the import statement. Alternatively, it's okay to use specific, explicit from statements for classes you know your Python code specifically wants to use (e.g., from java.lang import System).
Jython relies on a registry of Java properties as a cross-platform equivalent of the kind of settings that would normally use the Windows Registry, or environment variables on Unix-like systems. Jython's registry file is a standard Java properties file named
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Embedding Jython in Java
Inhaltsvorschau
Your Java-coded application can embed the Jython interpreter in order to use Jython for scripting. jython.jar must be in your Java CLASSPATH. Your Java code must import org.python.core.* and org.python.util.* in order to access Jython's classes. To initialize Jython's state and instantiate an interpreter, use the Java statements:
PySystemState.initialize( );

PythonInterpreter interp = new PythonInterpreter( );
Jython also supplies several advanced overloads of this method and constructor in order to let you determine in detail how PySystemState is set up, and to control the system state and global scope for each interpreter instance. However, in typical, simple cases, the previous Java code is all your application needs.
Once you have an instance interp of class PythonInterpreter, you can call method interp .eval to have the interpreter evaluate a Python expression held in a Java string. You can also call any of several overloads of interp .exec and interp .execfile to have the interpreter execute Python statements held in a Java string, a precompiled Jython code object, a file, or a Java InputStream.
The Python code you execute can import your Java classes in order to access your application's functionality. Your Java code can set attributes in the interpreter namespace by calling overloads of interp .set, and get attributes from the interpreter namespace by calling overloads of interp .get. The methods' overloads give you a choice. You can work with native Java data and let Jython perform type conversions, or you can work directly with PyObject, the base class of all Python objects, covered in "The PyObject Class" on page 661. The most frequently used methods and overloads of a PythonInterpreter instance interp are the following.

eval
PyObject interp.eval(String s)
Description
Evaluates, in interp's namespace, the Python expression held in Java string

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Compiling Python into Java
Inhaltsvorschau
Jython comes with the jythonc compiler. You can feed jythonc your .py source files, and jythonc compiles them into normal JVM bytecode and packages them into .class and .jar files. Since jythonc generates traditional static bytecode, it cannot quite cope with the whole range of dynamic possibilities that Python allows. For example, jythonc cannot successfully compile Python classes that determine their base classes dynamically at runtime, as the normal Python interpreters allow. However, except for such extreme examples of dynamically changeable class structures, jythonc does support compilation of essentially the whole Python language into Java bytecode.
jythonc resides in the Tools/jythonc directory of your Jython installation. You invoke it from a shell (console) command line with the syntax:
jythoncoptions modules
options are zero or more option flags starting with --. modules are zero or more names of Python source files to compile, either as Python-style names of modules residing on Python's sys.path, or as relative or absolute paths to Python source files. Include the .py extension in each path to a source file, but not in a module name.
More often than not, you will specify the jythonc option --jar jarfile to build a .jar file of compiled bytecode rather than separate .class files. Most other options deal with what to put in the .jar file. You can choose to make the file self-sufficient (for browsers and other Java runtime environments that do not support the use of multiple .jar files) at the expense of making the file larger. Option --all ensures all Jython core classes are copied into the .jar file, while --core tries to be more conservative, copying as few core classes as feasible. Option --addpackages packages lets you list (in
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 27: Distributing Extensions and Programs
Inhaltsvorschau
Python's distutils allows you to package Python programs and extensions in several ways, and to install programs and extensions to work with your Python installation. As I mentioned in "Building and Installing C-Coded Python Extensions" on page 614, distutils also affords the simplest and most effective way to build C-coded extensions you write yourself, even when you are not interested in distributing such extensions to anybody else. This chapter covers distutils, as well as third-party tools that complement distutils and let you package Python programs for distribution as standalone applications, installable on machines with specific hardware and operating systems without a separate installation of Python. A simpler and more powerful way to package Python programs and extensions is offered by the freely downloadable third-party framework covered in "Python Eggs" on page 151.
distutils is a rich and flexible set of tools to package Python programs and extensions for distribution to third parties. I cover typical, simple uses of distutils for the most common packaging needs. For an in-depth, highly detailed discussion of distutils, I recommend two manuals that are part of Python's online documentation: Distributing Python Modules (available at http://www.python.org/doc/current/dist/) and Installing Python Modules (available at http://www.python.org/doc/current/inst/), both by Greg Ward, the principal author of distutils.
A distribution is the set of files to package into a single file for distribution purposes. A distribution may include zero, one, or more Python packages and other Python modules (as covered in Chapter 7), as well as, optionally, Python scripts, C-coded (and other) extensions, supporting datafiles, and auxiliary files containing metadata about the distribution itself. A distribution is said to be pure if all code it includes is Python, and
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Python's distutils
Inhaltsvorschau
distutils is a rich and flexible set of tools to package Python programs and extensions for distribution to third parties. I cover typical, simple uses of distutils for the most common packaging needs. For an in-depth, highly detailed discussion of distutils, I recommend two manuals that are part of Python's online documentation: Distributing Python Modules (available at http://www.python.org/doc/current/dist/) and Installing Python Modules (available at http://www.python.org/doc/current/inst/), both by Greg Ward, the principal author of distutils.
A distribution is the set of files to package into a single file for distribution purposes. A distribution may include zero, one, or more Python packages and other Python modules (as covered in Chapter 7), as well as, optionally, Python scripts, C-coded (and other) extensions, supporting datafiles, and auxiliary files containing metadata about the distribution itself. A distribution is said to be pure if all code it includes is Python, and nonpure if it includes non-Python code (most often, C-coded or Pyrex extensions).
You should normally place all the files of a distribution in a directory, known as the distribution root directory, and in subdirectories of the distribution root. Mostly, you can arrange the subtree of files and directories rooted at the distribution root to suit your own organizational needs. However, as covered in "Packages" on page 149, a Python package must reside in its own directory, and a package's directory must contain a file named _ _init_ _.py (and subdirectories with _ _init_ _.py files, for the package's subpackages, if any) as well as other modules that belong to that package.
The distribution root directory must contain a Python script that by convention is named setup.py. The setup.py script can, in theory, contain arbitrary Python code. However, in practice,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
py2exe
Inhaltsvorschau
distutils helps you package your Python extensions and applications. However, an end user can install the resulting packaged form only after installing Python. This is particularly a problem on Windows, where end users want to run a single installer to get an application working on their machine. Installing Python first and then running your application's installer may prove too much of a hassle for such end users.
Thomas Heller has developed a simple solution, a distutils add-on named py2exe, freely available for download from http://starship.python.net/crew/theller/py2exe/. This URL also contains detailed documentation of py2exe, and I recommend you study this documentation if you intend to use py2exe in advanced ways. However, the simplest uses, which I cover in the rest of this section, cover most practical needs.
After downloading and installing py2exe (on a Windows machine where Microsoft VS 2003 is also installed), you just need to add the line:
import py2exe
at the start of your otherwise normal distutils script setup.py. Now, in addition to other distutils commands, you have one more option. Running:
python setup.py py2exe
builds and collects in a subdirectory of your distribution root directory an .exe file and one or more .dll files. If your distribution's name metadata is, for example, myapp, then the directory into which the .exe and .dll files are collected is named dist\myapp\. Any files specified by option data_files in your setup.py script are placed in subdirectories of dist\myapp\. The .exe file corresponds to your application's first or only entry in the scripts keyword argument value, and contains the bytecode-compiled form of all Python modules and packages that your setup.py specifies or implies. Among the .dll files is, at minimum, the Python dynamic load library—for example, python24.dll if you use Python 2.4—plus any other
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
py2app
Inhaltsvorschau
py2app (http://undefined.org/python/py2app.html) is a distutils extension that builds standalone Python applications for the Mac. py2app is distributed with PyObjC (http://pyobjc.sourceforge.net/), the Python/Objective C bridge that offers an excellent way to create Mac applications with Cocoa interfaces in Python; however, py2app is also fully compatible with all major cross-platform GUI toolkits for Python, including Tkinter, wxPython, pygame, and PyQt. Moreover, py2app lets you build installer packages (.mpkg files) directly. Refer to the URL for all practical usage details.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
cx_Freeze
Inhaltsvorschau
cx_Freeze (http://starship.python.net/crew/atuining/cx_Freeze/) is a standalone utility (not a distutils extension) that builds standalone Python applications for Windows and Linux. Refer to the URL for all practical usage details.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
PyInstaller
Inhaltsvorschau
PyInstaller (http://pyinstaller.hpcf.upr.edu/cgi-bin/trac.cgi) is another standalone utility that builds standalone Python applications for Windows, Linux, and Irix. Refer to the URL for all practical usage details.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
	

Zurück zu Python in a Nutshell


Themen

Buchreihen

Special Interest

International Sites

O'Reilly China O'Reilly USA O'Reilly Japan O'Reilly Taiwan