JETZT ONLINE BESTELLEN
Third Edition Dezember 2002
ISBN 978-0-596-00203-9
588 Seiten
EUR37.00
Weitere Informationen zu diesem Buch
Inhaltsverzeichnis | Kolophon |
Inhaltsverzeichnis
- Chapter 1: Getting Started
- InhaltsvorschauApache is the dominant web server on the Internet today, filling a key place in the infrastructure of the Internet. This chapter will explore what web servers do and why you might choose the Apache web server, examine how your web server fits into the rest of your network infrastructure, and conclude by showing you how to install Apache on a variety of different systems.The whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming.When you fire up your browser and connect to the URL of someone's home page — say the notional http://www.butterthlies.com/ we shall meet later on — you send a message across the Internet to the machine at that address. That machine, you hope, is up and running; its Internet connection is working; and it is ready to receive and act on your message.URL stands for Uniform Resource Locator. A URL such as http://www.butterthlies.com/ comes in three parts:
<scheme>://<host>/<path>
So, in our example, < scheme> ishttp, meaning that the browser should use HTTP (Hypertext Transfer Protocol); <host> iswww.butterthlies.com; and <path> is/, traditionally meaning the top page of the host. The <host> may contain either an IP address or a name, which the browser will then convert to an IP address. Using HTTP 1.1, your browser might send the following request to the computer at that IP address:GET / HTTP/1.1 Host: www.butterthlies.com
The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in four parts: a method (an HTTP method, not a URL method), that in this case isGET, but could equally bePUT,POST,DELETE, orCONNECT; the Uniform Resource Identifier (URI)/; the version of the protocol we are using; and a series of headers that modify the request (in this case, aHostheader, which is used for name-based virtual hosting: see Chapter 4). It is then up to the web server running on that host to make something of this message.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - What Does a Web Server Do?
- InhaltsvorschauThe whole business of a web server is to translate a URL either into a filename, and then send that file back over the Internet, or into a program name, and then run that program and send its output back. That is the meat of what it does: all the rest is trimming.When you fire up your browser and connect to the URL of someone's home page — say the notional http://www.butterthlies.com/ we shall meet later on — you send a message across the Internet to the machine at that address. That machine, you hope, is up and running; its Internet connection is working; and it is ready to receive and act on your message.URL stands for Uniform Resource Locator. A URL such as http://www.butterthlies.com/ comes in three parts:
<scheme>://<host>/<path>
So, in our example, < scheme> ishttp, meaning that the browser should use HTTP (Hypertext Transfer Protocol); <host> iswww.butterthlies.com; and <path> is/, traditionally meaning the top page of the host. The <host> may contain either an IP address or a name, which the browser will then convert to an IP address. Using HTTP 1.1, your browser might send the following request to the computer at that IP address:GET / HTTP/1.1 Host: www.butterthlies.com
The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again in four parts: a method (an HTTP method, not a URL method), that in this case isGET, but could equally bePUT,POST,DELETE, orCONNECT; the Uniform Resource Identifier (URI)/; the version of the protocol we are using; and a series of headers that modify the request (in this case, aHostheader, which is used for name-based virtual hosting: see Chapter 4). It is then up to the web server running on that host to make something of this message.The host machine may be a whole cluster of hypercomputers costing an oil sheik's ransom or just a humble PC. In either case, it had better be running a web server, a program that listens to the network and accepts and acts on this sort of message.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - How Apache Works
- InhaltsvorschauApache is a program that runs under a suitable multitasking operating system. In the examples in this book, the operating systems are Unix and Windows 95/98/2000/Me/NT/..., which we call Win32. There are many others: flavors of Unix, IBM's OS/2, and Novell Netware. Mac OS X has a FreeBSD foundation and ships with Apache.The Apache binary is called httpd under Unix and apache.exe under Win32 and normally runs in the background. Each copy of httpd/apache that is started has its attention directed at a web site , which is, for our purposes, a directory. Regardless of operating system, a site directory typically contains four subdirectories:
- conf
- Contains the configuration file(s), of which httpd.conf is the most important. It is referred to throughout this book as the Config file. It specifies the URLs that will be served.
- htdocs
- Contains the HTML files to be served up to the site's clients. This directory and those below it, the web space, are accessible to anyone on the Web and therefore pose a severe security risk if used for anything other than public data.
- logs
- Contains the log data, both of accesses and errors.
- cgi-bin
- Contains the CGI scripts. These are programs or shell scripts written by or for the webmaster that can be executed by Apache on behalf of its clients. It is most important, for security reasons, that this directory not be in the web space — that is, in .../htdocs or below.
In its idling state, Apache does nothing but listen to the IP addresses specified in its Config file. When a request appears, Apache receives it and analyzes the headers. It then applies the rules it finds in the Config file and takes the appropriate action.The webmaster's main control over Apache is through the Config file. The webmaster has some 200 directives at her disposal, and most of this book is an account of what these directives do and how to use them to reasonable advantage. The webmaster also has a dozen flags she can use when Apache starts up.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Apache and Networking
- InhaltsvorschauAt its core, Apache is about communication over networks. Apache uses the TCP/IP protocol as its foundation, providing an implementation of HTTP. Developers who want to use Apache should have at least a foundation understanding of TCP/IP and may need more advanced skills if they need to integrate Apache servers with other network infrastructure like firewalls and proxy servers.To understand the substance of this book, you need a modest knowledge of what TCP/IP is and what it does. You'll find more than enough information in Craig Hunt and Robert Bruce Thompson's books on TCP/IP, but what follows is, we think, what is necessary to know for our book's purposes.TCP/IP (Transmission Control Protocol/Internet Protocol) is a set of protocols enabling computers to talk to each other over networks. The two protocols that give the suite its name are among the most important, but there are many others, and we shall meet some of them later. These protocols are embodied in programs on your computer written by someone or other; it doesn't much matter who. TCP/IP seems unusual among computer standards in that the programs that implement it actually work, and their authors have not tried too much to improve on the original conceptions.TCP/IP is generally only used where there is a network. Each computer on a network that wants to use TCP/IP has an IP address , for example, 192.168.123.1.There are four parts in the address, separated by periods. Each part corresponds to a byte, so the whole address is four bytes long. You will, in consequence, seldom see any of the parts outside the range 0 -255.Although not required by the protocol, by convention there is a dividing line somewhere inside this number: to the left is the network number and to the right, the host number. Two machines on the same physical network — usually a local area network (LAN) — normally have the same network number and communicate directly using TCP/IP.How do we know where the dividing line is between network number and host number? The default dividing line used to be determined by the first of the four numbers, but a shortage of addresses required a change to the use ofEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- How HTTP Clients Work
- InhaltsvorschauOnce the server is set up, we can get down to business. The client has the easy end: it wants web action on a particular site, and it sends a request with a URL that begins with http to indicate what service it wants (other common services are ftp for File Transfer Protocolor https for HTTP with Secure Sockets Layer — SSL) and continues with these possible parts:
//<user>:<password>@<host>:<port>/<url-path>
RFC 1738 says:Some or all of the parts "<user>:<password>@", ":<password>",":<port>", and "/<url-path>" may be omitted. The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax.In real life, URLs look more like: http://www.apache.org/ — that is, there is no user and password pair, and there is no port. What happens?The browser observes that the URL starts with http: and deduces that it should be using the HTTP protocol. The client then contacts a name server, which uses DNS to resolve www.apache.org to an IP address. At the time of writing, this was 63.251.56.142. One way to check the validity of a hostname is to go to the operating-system prompt and type:ping www.apache.orgIf that host is connected to the Internet, a response is returned:Pinging www.apache.org [63.251.56.142] with 32 bytes of data: Reply from 63.251.56.142: bytes=32 time=278ms TTL=49 Reply from 63.251.56.142: bytes=32 time=620ms TTL=49 Reply from 63.251.56.142: bytes=32 time=285ms TTL=49 Reply from 63.251.56.142: bytes=32 time=290ms TTL=49 Ping statistics for 63.251.56.142:
A URL can be given more precision by attaching a port number: the web address http://www.apache.org doesn't include a port because it is port 80, the default, and the browser takes it for granted. If some other port is wanted, it is included in the URL after a colon — for example, http://www.apache.org:8000/. We will have more to do with ports later.The URL always includes a path, even if is only /. If the path is left out by the careless user, most browsers put it back in. If the path wereEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - What Happens at the Server End?
- InhaltsvorschauWe assume that the server is well set up and running Apache. What does Apache do? In the simplest terms, it gets a URL from the Internet, turns it into a filename, and sends the file (or its output if it is a program) back down the Internet. That's all it does, and that's all this book is about!Two main cases arise:
The Unix server has a standalone Apache that listens to one or more ports (port 80 by default) on one or more IP addresses mapped onto the interfaces of its machine. In this mode (known as standalone mode ), Apache actually runs several copies of itself to handle multiple connections simultaneously.
On Windows, there is a single process with multiple threads. Each thread services a single connection. This currently limits Apache 1.3 to 64 simultaneous connections, because there's a system limit of 64 objects for which you can wait at once. This is something of a disadvantage because a busy site can have several hundred simultaneous connections. It has been improved in Apache 2.0. The default maximim is now 1920 — but even that can be extended at compile time.
Both cases boil down to an Apache server with an incoming connection. Remember our first statement in this section, namely, that the object of the whole exercise is to resolve the incoming request either into a filename or the name of a script, which generates data internally on the fly. Apache thus first determines which IP address and port number were used by asking the operating system to where the connection is connecting. Apache then uses the IP address, port number — and theHostheader in HTTP 1.1 — to decide which virtual host is the target of this request. The virtual host then looks at the path, which was handed to it in the request, and reads that against its configuration to decide on the appropriate response, which it then returns.Most of this book is about the possible appropriate responses and how Apache decides which one to use.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Planning the Apache Installation
- InhaltsvorschauUnless you're using a prepackaged installation, you'll want to do some planning before setting up the software. You'll need to consider network integration, operating system choices, Apache version choices, and the many modules available for Apache. Even if you're just using Apache at an ISP, you may want to know which choices the ISP made in its installation.Apache installations come in many flavors. If an installation is intended only for local use on a developer's machine, it probably needs much less integration with network systems than an installation meant as public host supporting thousands of simultaneous hits. Apache itself provides network and security functionality, but you'll need to set up supporting services separately, like the DNS that identifies your server to the network or the routing that connects it to the rest of the network. Some servers operate behind firewalls, and firewall configuration may also be an issue. If these are concerns for you, involve your network administrator early in the process.Many webmasters have no choice of operating system — they have to use what's in the box on their desks — but if they have a choice, the first decision to make is between Unix and Windows. As the reader who persists with us will discover, much of the Apache Group and your authors prefer Unix. It is, itself, essentially open source. Over the last 30 years it has been the subject of intense scrutiny and improvement by many thousands of people. On the other hand, Windows is widely available, and Apache support for Windows has improved substantially in Apache 2.0.The choice is commonly between some sort of Linux and FreeBSD. Both are technically acceptable. If you already know someone who has one of these OSs and is willing to help you get used to yours, then it would make sense to follow them. If you are an Apple user, OS X has a Unix core and includes Apache.Failing that, the difference between the two paths is mainly a legal one, turning on their different interperations of open source licensing.Linux lives at
http://www.linux.org, and there are more than 160 different distributions from which Linux can be obtained free or in prepackaged pay-for formats. It is rather ominously described as a "Unix-type" operating system, which sometimes means that long-established Unix standards have been "improved", not always in an upwards direction.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Windows?
- InhaltsvorschauThe main problem with the Win32 version of Apache lies in its security, which must depend, in turn, on the security of the underlying operating system. Unfortunately, Windows 95, Windows 98, and their successors have no effective security worth mentioning. Windows NT and Windows 2000 have a large number of security features, but they are poorly documented, hard to understand, and have not been subjected to the decades of public inspection, discussion, testing, and hacking that have forged Unix security into a fortress that can pretty well be relied upon.It is a grave drawback to Windows that the source code is kept hidden in Microsoft's hands so that it does not benefit from the scrutiny of the computing community. It is precisely because the source code of free software is exposed to millions of critical eyes that it works as well as it does.In the view of the Apache development group, the Win32 version is useful for easy testing of a proposed web site. But if money is involved, you would be wise to transfer the site to Unix before exposure to the public and the Bad Guys.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Which Apache?
- InhaltsvorschauAt the time this edition was prepared, Apache 1.3.26 was the stable release. It has an improved build system (see the section that follows). Both the Unix and Windows versions were thought to be in good shape. Apache 2.0 had made it through beta test into full release. We suggest that if you are working under Unix and you don't need Apache 2.0's improved features (which are multitudinous but not fundamental for the ordinary webmaster), you go for Version 1.3.26 or later.Apache 2.0 is a major new version. The main new features are multithreading (on platforms that support it), layered I/O (also known as filters), and a rationalized API. The ordinary user will see very little difference, but the programmer writing new modules (see the section that follows) will find a substantial change, which is reflected in our rewritten Chapter 20 and Chapter 21. However, the improvements in Apache v2.0 look to the future rather than trying to improve the present. The authors are not planning to transfer their own web sites to v2.0 any time soon and do not expect many other sites to do so either. In fact, many sites are still happily running Apache v1.2, which was nominally superseded several years ago. There are good security reasons for them to upgrade to v1.3.Apache 2.0 is designed to run on Windows NT and 2000. The binary installer will only work with x86 processors. In all cases, TCP/IP networking must be installed. If you are using NT 4.0, install Service Pack 3 or 6, since Pack 4 had TCP/IP problems. It is not recommended that Windows 95 or 98 ever be used for production servers and, when we went to press, Apache 2.0 would not run under either at all. See
http://httpd.apache.org/docs/windows.html.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Installing Apache
- InhaltsvorschauThere are two ways of getting Apache running on your machine: by downloading an appropriate executable or by getting the source code and compiling it. Which is better depends on your operating system.The fairly painless business of compiling Apache, which is described later, can now be circumvented by downloading a precompiled binary for the Unix of your choice. When we went to press, the following operating systems (mostly versions of Unix) were suported, but check before you decide. (See
http://httpd.apache.org/dist/httpd/binaries.)aixauxbeosbs2000-osdbsdidarwindguxdigitalunixfreebsdhpuxirixlinuxmacosxmacosxservernetbsdnetwareopenbsdos2os390osf1qnxreliantunixrhapsodysinixsolarissunosEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Building Apache 1.3.X Under Unix
- InhaltsvorschauThere are two methods for building Apache: the "Semimanual Method" and "Out of the Box". They each involve the user in about the same amount of keyboard work: if you are happy with the defaults, you need do very little; if you want to do a custom build, you have to do more typing to specify what you want.Both methods rely on a shell script that, when run, creates a Makefile. When you run make, this, in turn, builds the Apache executable with the side orders you asked for. Then you copy the executable to its home (Semimanual Method) or run make install (Out of the Box) and the various necessary files are moved to the appropriate places around the machine.Between the two methods, there is not a tremendous amount to choose. We prefer the Semimanual Method because it is older and more reliable. It is also nearer to the reality of what is happening and generates its own record of what you did last time so you can do it again without having to perform feats of memory. Out of the Box is easier if you want a default build. If you want a custom build and you want to be able to repeat it later, you would do the build from a script that can get quite large. On the other hand, you can create several different scripts to trigger different builds if you need to.Until Apache 1.3, there was no real out-of-the-box batch-capable build and installation procedure for the complete Apache package. This method is provided by a top-level configure script and a corresponding top-level Makefile.tmpl file. The goal is to provide a GNU Autoconf-style frontend that is capable of driving the old src/Configure stuff in batch.Once you have extracted the sources (see earlier), the build process can be done in a minimum of three command lines — which is how most Unix software is built nowadays. Change yourself to root before you run ./configure; otherwise, if you use the default build configuration (which we suggest you do not), the server will be looking at port 8080 and will, confusingly, refuse requests to the default port, 80.The result is, as you will be told during the process, probably not what you really want:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- New Features in Apache v2
- InhaltsvorschauThe procedure for configuring and compiling Apache has changed, as we will see later.High-level decisions about the way Apache works internally can now be made at compile time by including one of a series of Multi Processing Modules (MPMs). This is done by attaching a flag to configure:
./configure <other flags> --with_mpm=<name of MPM>
Although MPMs are rather like ordinary modules, only one can be used at a time. Some of them are designed to adapt Apache to different operating systems; others offer a range of different optimizations for Unix.It will be shown, along with the other compiled-in modules, by executing httpd -l. When we went to press, these were the possible MPMs under Unix:- prefork
- Default. Most closely imitates behavior of v1.3. Currently the default for Unix and sites that require stability, though we hope that threading will become the default later on.
- threaded
- Suitable for sites that require the benefits brought by threading, particularly reduced memory footprint and improved interthread communications. But see "prefork" earlier in this list.
- perchild
- Allows different hosts to have different user IDs.
- mpmt_pthread
- Similar to prefork, but each child process has a specified number of threads. It is possible to specify a minimum and maximum number of idle threads.
- Dexter
- Multiprocess, multithreaded MPM that allows you to specify a static number of processes.
- Perchild
- Similar to Dexter, but you can define a seperate user and group for each child process to increase server security.
Other operating systems have their own MPMs:- spmt_os2
- For OS2.
- beos
- For the Be OS.
- WinNT
- Win32-specific version, taking advantage of completion ports and native function calls to give better network performance.
To begin with, accept the default MPM. More advanced users should refer tohttp://httpd.apache.org/docs-2.0/mpm.htmlandhttp://httpd.apache.org/docs-2.0/misc/perf-tuning.html.See the entry for the AcceptMutex directive in Chapter 3.Version 2.0 makes the following changes to the Config file:- CacheNegotiatedDocs now takes the argument on/off. Existing instances of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making and Installing Apache v2 Under Unix
- InhaltsvorschauDisregard all the previous instructions for Apache compilation. There is no longer a .../src directory. Even the name of the Unix source file has changed. We downloaded httpd-2_0_40.tar.gz and unpacked it in /usr/src/apache as usual. You should read the file INSTALL. The scheme for building Apache v2 is now much more in line with that for most other downloaded packages and utilities.Set up the configuration file with this:
./configure --prefix=/usr/local
or wherever it is you want to keep the Apache bits — which will appear in various subdirectories. The executable, for instance, will be in .../sbin. If you are compiling under FreeBSD, as we were, --with-mpm=prefork is automatically used internally, since threads do not currently work well under this operating system. To see all the configuration possibilities:./configure --help | more
If you want to preserve your Apache 1.3.X executable, you might rename it to httpd.13, wherever it is, and then:make
which takes a surprising amount of time to run. Then:make installThe result is a nice new httpd in /usr/local/sbin.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Apache Under Windows
- InhaltsvorschauApache 1.3 will work under Windows NT 4.0 and 2000. Its performance under Windows 95 and 98 is not guaranteed. If running on Windows 95, the "Winsock2" upgrade must be installed before Apache will run. "Winsock2" for Windows 95 is available at
http://www.microsoft.com/windows95/downloads/contents/WUAdminTools/S_WUNetworkingTools/W95Sockets2. Be warned that the Dialup Networking 1.2 (MS DUN) updates include a Winsock2 that is entirely insufficient, and the Winsock2 update must be reinstalled after installing Windows 95 dialup networking. Windows 98, NT (Service Pack 3 or later), and 2000 users need to take no special action; those versions provide Winsock2 as distributed.Apache v2 will run under Windows 2000 and NT, but, when we went to press, they did not work under Win 95, 98, or Me. These different versions are the same as far as Apache is concerned, except that under NT, Apache can also be run as a service. From Apache v1.3.14, emulators are available to provide NT services under the other Windows platforms. Performance under Win32 may not be as good as under Unix, but this will probably improve over coming months.Since Win32 is considerably more consistent than the sprawling family of Unices, and since it loads extra modules as DLLs at runtime rather than compiling them at make time, it is practical for the Apache Group to offer a precompiled binary executable as the standard distribution. Go tohttp://www.apache.org/dist, and click on the version you want, which will be in the form of a self-installing .exe file (the .exe extension is how you tell which one is the Win32 Apache). Download it into, say, c:\temp, and then run it from the Win32 Start menu's Run option.The executable will create an Apache directory, C:\Program Files\Apache, by default. Everything to do with Win32 Apache happens in an MS-DOS window, so get into a window and type:> cd c:\<apache directory> > dir
and you should see something like this:Volume in drive C has no label Volume Serial Number is 294C-14EE Directory of C:\apache . <DIR> 21/05/98 7:27 . .. <DIR> 21/05/98 7:27 .. DEISL1 ISU 12,818 29/07/98 15:12 DeIsL1.isu HTDOCS <DIR> 29/07/98 15:12 htdocs MODULES <DIR> 29/07/98 15:12 modules ICONS <DIR> 29/07/98 15:12 icons LOGS <DIR> 29/07/98 15:12 logs CONF <DIR> 29/07/98 15:12 conf CGI-BIN <DIR> 29/07/98 15:12 cgi-bin ABOUT_~1 12,921 15/07/98 13:31 ABOUT_APACHE ANNOUN~1 3,090 18/07/98 23:50 Announcement KEYS 22,763 15/07/98 13:31 KEYS LICENSE 2,907 31/03/98 13:52 LICENSE APACHE EXE 3,072 19/07/98 11:47 Apache.exe APACHE~1 DLL 247,808 19/07/98 12:11 ApacheCore.dll MAKEFI~1 TMP 21,025 15/07/98 18:03 Makefile.tmpl README 2,109 01/04/98 13:59 README README~1 TXT 2,985 30/05/98 13:57 README-NT.TXT INSTALL DLL 54,784 19/07/98 11:44 install.dll _DEISREG ISR 147 29/07/98 15:12 _DEISREG.ISR _ISREG32 DLL 40,960 23/04/97 1:16 _ISREG32.DLL 13 file(s) 427,389 bytes 8 dir(s) 520,835,072 bytes freeEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 2: Configuring Apache: The First Steps
- InhaltsvorschauAfter the installation described in Chapter 1, you now have a shiny bright apache/httpd, and you're ready for anything. For our next step, we will be creating a number of demonstration web sites.It might be a good idea to get a firm idea of what, in the Apache business, a web site is: it is a directory somewhere on the server, say, /usr/www/APACHE3/site.for_instance. It usually contains at least four subdirectories. The first three are essential:
- conf
- Contains the Config file, usually httpd.conf, which tells Apache how to respond to different kinds of requests.
- htdocs
- Contains the documents, images, data, and so forth that you want to serve up to your clients.
- logs
- Contains the log files that record what happened. You should consult .../logs/error_log whenever anything fails to work as expected.
- cgi-bin
- Contains any CGI scripts that are needed. If you don't use scripts, you don't need the directory.
In our standard installation, there will also be a file go in the site directory, which contains a script for starting Apache.Nothing happens until you start Apache. In this example, you do it from the command line. If your computer experience so far has been entirely with Windows or other Graphical User Interfaces (GUIs), you may find the command line rather stark and intimidating to begin with. However, it offers a great deal of flexibility and something which is often impossible through a GUI: the ability to write scripts (Unix) or batch files (Win32) to automate the executables you want to run and the inputs they need, as we shall see later.If the conf subdirectory is not in the default location (and it usually isn't), you need a flag that tells Apache where it is.
httpd -d /usr/www/APACHE3/site.for_instance -f...
apache -d c:/usr/www/APACHE3/site.for_instanceNotice that the executable names are different under Win32 and Unix. The Apache Group decided to make this change, despite the difficulties it causes for documentation, because "httpd" is not a particularly sensible name for a specific web server and, indeed, is used by other web servers. However, it was felt that the name change would cause too many backward-compatibility issues on Unix, and so the new name is implemented only on Win32.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - What's Behind an Apache Web Site?
- InhaltsvorschauIt might be a good idea to get a firm idea of what, in the Apache business, a web site is: it is a directory somewhere on the server, say, /usr/www/APACHE3/site.for_instance. It usually contains at least four subdirectories. The first three are essential:
- conf
- Contains the Config file, usually httpd.conf, which tells Apache how to respond to different kinds of requests.
- htdocs
- Contains the documents, images, data, and so forth that you want to serve up to your clients.
- logs
- Contains the log files that record what happened. You should consult .../logs/error_log whenever anything fails to work as expected.
- cgi-bin
- Contains any CGI scripts that are needed. If you don't use scripts, you don't need the directory.
In our standard installation, there will also be a file go in the site directory, which contains a script for starting Apache.Nothing happens until you start Apache. In this example, you do it from the command line. If your computer experience so far has been entirely with Windows or other Graphical User Interfaces (GUIs), you may find the command line rather stark and intimidating to begin with. However, it offers a great deal of flexibility and something which is often impossible through a GUI: the ability to write scripts (Unix) or batch files (Win32) to automate the executables you want to run and the inputs they need, as we shall see later.If the conf subdirectory is not in the default location (and it usually isn't), you need a flag that tells Apache where it is.
httpd -d /usr/www/APACHE3/site.for_instance -f...
apache -d c:/usr/www/APACHE3/site.for_instanceNotice that the executable names are different under Win32 and Unix. The Apache Group decided to make this change, despite the difficulties it causes for documentation, because "httpd" is not a particularly sensible name for a specific web server and, indeed, is used by other web servers. However, it was felt that the name change would cause too many backward-compatibility issues on Unix, and so the new name is implemented only on Win32.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - site.toddle
- InhaltsvorschauYou can't do much with Apache without a web site to play with. To embody our first shaky steps, we created site.toddle as a subdirectory, /usr/www/APACHE3/site.toddle, which you will find on the code download. Since you may want to keep your demonstration sites somewhere else, we normally refer to this path as ... /. So we will talk about ... /site.toddle. (Windows users, please read this as ...\site.toddle).In ... /site.toddle, we created the three subdirectories that Apache expects: conf, logs, and htdocs. The README file in Apache's root directory states:The next step is to edit the configuration files for the server. In the subdirectory called conf you should find distribution versions of the three configuration files: srm.conf-dist, access.conf-dist, and httpd.conf-dist.As a legacy from the NCSA server, Apache will accept these three Config files. But we strongly advise you to put everything you need in httpd.conf and to delete the other two. It is much easier to manage the Config file if there is only one of them. From Apache v1.3.4-dev on, this has become Group doctrine. In earlier versions of Apache, it was necessary to disable these files explicitly once they were deleted, but in v1.3 it is enough that they do not exist.The README file continues with advice about editing these files, which we will disregard. In fact, we don't have to set about this job yet; we will learn more later. A simple expedient for now is to run Apache with no configuration and to let it prompt us for what it needs.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Setting Up a Unix Server
- InhaltsvorschauWe can point httpd at our site with the
-dflag (notice the full pathname to the site.toddle directory, which will probably be different on your machine):% httpd -d /usr/www/APACHE3/site.toddleSince you will be typing this a lot, it's sensible to copy it into a script called go . This can go in /usr/local/bin or in each local site. We have done the latter since it is convenient to change it slightly from time to time. Create it by typing:% cat > /usr/local/bin/go test -d logs || mkdir logs httpd -f 'pwd'/conf/httpd$1.conf -d 'pwd' ^d
^dis shorthand for Ctrl-D, which ends the input and gets your prompt back. This go will work on every site. It creates a logs directory if one does not exist, and it explicitly specifies paths for the ServerRoot directory (-d) and the Config file (-f). The command 'pwd' finds the current directory with the Unix commandpwd. The back-ticks are essential: they substitutepwd's value into the script — in other words, we will run Apache with whatever configuration is in our current directory. To accomodate sites where we have more than one Config file, we have used...httpd$1...where you might expect to see...httpd...The symbol$1copies the first argument (if any) given to the commandgo. Thus./go 2will run the Config file called httpd2.conf, and./goby itself will run httpd.conf.Remember that you have to be in the site directory. If you try to run this script from somewhere else,pwd's return will be nonsense, and Apache will complain that it'could not open document config file ...'.Make go runnable, and run it by typing the following (note that you have to be in the directory .../site.toddle when you run go):% chmod +x go % go
If you get the error message:go: command not found
you need to type:% ./goThis launches Apache in the background. Check that it's running by typing something like this (arguments topsvary from Unix to Unix):% ps -auxEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Setting Up a Win32 Server
- InhaltsvorschauThere is no point trying to run Apache unless TCP/IP is set up and running on your machine. A quick test is to ping some IP — and if you can't think of a real one, ping yourself:
>ping 127.0.0.1If TCP/IP is working, you should see some confirming message, like this:Pinging 127.0.0.1 with 32 bytes of data: Reply from 127.0.0.1: bytes=32 time<10ms TTL=32 ....
If you don't see something along these lines, defer further operations until TCP/IP is working.It is important to remember that internally, Windows Apache is essentially the same as the Unix version and that it uses Unix-style forward slashes (/) rather than MS-DOS- and Windows-style backslashes (\) in its file and directory names, as specified in various files.There are two ways of running Apache under Win32. In addition to the command-line approach, you can run Apache as a "service" (available on Windows NT/2000, or a pseudoservice on Windows 95, 98, or Me). This is the best option if you want Apache to start automatically when your machine boots and to keep Apache running when you log off.To run Apache from a console window, select the Apache server option from the Start menu.Alternatively — and under Win95/98, this is all you can do — click on the MS-DOS prompt to get a DOS session window. Go to the /Program Files/Apache directory with this:>cd "\Program Files\apache"The Apache executable, apache.exe,is sitting here. We can start it running, to see what happens, with this:>apache -sYou might want to automate your Apache startup by putting the necessary line into a file called go.bat. You then only need to type:go[RETURN]Since this is the same as for the Unix version, we will simply say "typego" throughout the book when Apache is to be started, and thus save lengthy explanations.When we ran Apache, we received the following lines:Apache/<version number> Syntax error on line 44 of /apache/conf/httpd.conf ServerRoot must be a valid directory
To deal with the first complaint, we looked at the fileEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Directives
- InhaltsvorschauHere we go over the directives again, giving formal definitions for reference.
ServerNamegives the hostname of the server to use when creating redirection URLs, that is, if you use a <Location>directive or access a directory without a trailing /.ServerName hostname Server config, virtual hostIt will also be useful when we consider Virtual Hosting (see Chapter 4).This directive sets the directory from which Apache will serve files.DocumentRoot directory Default: /usr/local/apache/htdocs Server config, virtual hostUnless matched by a directive likeAlias, the server appends the path from the requested URL to the document root to make the path to the document. For example:DocumentRoot /usr/web
An access to http://www.www.my.host.com/index.html now refers to /usr/web/index.html.There appears to be a bug in the relevant Module, mod_dir, that causes problems when the directory specified inDocumentRoothas a trailing slash (e.g.,DocumentRoot/usr/web/), so please avoid that. It is worth bearing in mind that the deeperDocumentRootgoes, the longer it takes Apache to check out the directories. For the sake of performance, adopt the British Army's universal motto: KISS (Keep It Simple, Stupid)!ServerRootspecifies where the subdirectories conf and logs can be found.ServerRoot directory Default directory: /usr/local/etc/httpd Server config
If you start Apache with the-f(file) option, you need to include theServerRootdirective. On the other hand, if you use the-d(directory) option, as we do, this directive is not needed.TheErrorLogdirective sets the name of the file to which the server will log any errors it encounters.ErrorLog filename|syslog[:facility] Default: ErrorLog logs/error_log Server config, virtual host
If the filename does not begin with a slash (/), it is assumed to be relative to the server root.If the filename begins with a pipe (|), it is assumed to be a command to spawn a file to handle the error log.Apache 1.3 and above: usingEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Shared Objects
- InhaltsvorschauIf you are using the DSO mechanism, you need quite a lot of stuff in your Config file.In Apache v1.3 the order of these directives is important, so it is probably easiest to generate the list by doing an "out of the box" build using the flag
--enable-shared=max.You will find /usr/etc/httpd / httpd.conf.default: copy the list from it into your own Config file, and edit it as you need.LoadModule env_module libexec/mod_env.so LoadModule config_log_module libexec/mod_log_config.so LoadModule mime_module libexec/mod_mime.so LoadModule negotiation_module libexec/mod_negotiation.so LoadModule status_module libexec/mod_status.so LoadModule includes_module libexec/mod_include.so LoadModule autoindex_module libexec/mod_autoindex.so LoadModule dir_module libexec/mod_dir.so LoadModule cgi_module libexec/mod_cgi.so LoadModule asis_module libexec/mod_asis.so LoadModule imap_module libexec/mod_imap.so LoadModule action_module libexec/mod_actions.so LoadModule userdir_module libexec/mod_userdir.so LoadModule alias_module libexec/mod_alias.so LoadModule access_module libexec/mod_access.so LoadModule auth_module libexec/mod_auth.so LoadModule setenvif_module libexec/mod_setenvif.so # Reconstruction of the complete module list from all available modules # (static and shared ones) to achieve correct module execution order. # [WHENEVER YOU CHANGE THE LOADMODULE SECTION ABOVE UPDATE THIS, TOO] ClearModuleList AddModule mod_env.c AddModule mod_log_config.c AddModule mod_mime.c AddModule mod_negotiation.c AddModule mod_status.c AddModule mod_include.c AddModule mod_autoindex.c AddModule mod_dir.c AddModule mod_cgi.c AddModule mod_asis.c AddModule mod_imap.c AddModule mod_actions.c AddModule mod_userdir.c AddModule mod_alias.c AddModule mod_access.c AddModule mod_auth.c AddModule mod_so.c AddModule mod_setenvif.c
Notice that the list comes in three parts:LoadModules, thenClearModuleList, followed byAddModulesto activate the ones you want. As we said earlier, it is all rather cumbersome and easy to get wrong. You might want put the list in a separate file and thenEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 3: Toward a Real Web Site
- InhaltsvorschauNow that we have the server running with a basic configuration, we can start to explore more sophisticated possibilities in greater detail. Fortunately, the differences between the Windows and Unix versions of Apache fade as we get past the initial setup and configuration, so it's easier to focus on the details of making a web site work.We are now in a position to start creating real(ish) web sites, which can be found in the sample code at the web site for the book,
http://oreilly.com/catalog/apache3/. For the sake of a little extra realism, we will base the site loosely round a simple web business, Butterthlies, Inc., that creates and sells picture postcards. We need to give it some web addresses, but since we don't yet want to venture into the outside world, they should be variants on your own network ID. This way, all the machines in the network realize that they don't have to go out on the Web to make contact. For instance, we edited the \windows\hosts file on the Windows 95 machine running the browser and the /etc/hosts file on the Unix machine running the server to read as follows:127.0.0.1 localhost 192.168.123.2 www.butterthlies.com 192.168.123.2 sales.butterthlies.com 192.168.123.3 sales-IP.butterthlies.com 192.168.124.1 www.faraway.com
localhost is obligatory, so we left it in, but you should not make any server requests to it since the results are likely to be confusing.You probably need to consult your network manager to make similar arrangements.site.simple is site.toddle with a few small changes. The script go will work anywhere. To get started, do the following, depending on your operating environment:
test -d logs || mkdir logs httpd -d 'pwd' -f 'pwd'/conf/httpd.conf
Open an MS-DOS window and from the command line, type:c>cd \program files\apache group\apache c>apache -k start c>Apache/1.3.26 (Win32) running ...
To stop Apache, open a second MS-DOS window:c>apache -k stop c>cd logs c>edit error.log
This will be true of each site in the demonstration setup, so we will not mention it again.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - More and Better Web Sites: site.simple
- InhaltsvorschauWe are now in a position to start creating real(ish) web sites, which can be found in the sample code at the web site for the book,
http://oreilly.com/catalog/apache3/. For the sake of a little extra realism, we will base the site loosely round a simple web business, Butterthlies, Inc., that creates and sells picture postcards. We need to give it some web addresses, but since we don't yet want to venture into the outside world, they should be variants on your own network ID. This way, all the machines in the network realize that they don't have to go out on the Web to make contact. For instance, we edited the \windows\hosts file on the Windows 95 machine running the browser and the /etc/hosts file on the Unix machine running the server to read as follows:127.0.0.1 localhost 192.168.123.2 www.butterthlies.com 192.168.123.2 sales.butterthlies.com 192.168.123.3 sales-IP.butterthlies.com 192.168.124.1 www.faraway.com
localhost is obligatory, so we left it in, but you should not make any server requests to it since the results are likely to be confusing.You probably need to consult your network manager to make similar arrangements.site.simple is site.toddle with a few small changes. The script go will work anywhere. To get started, do the following, depending on your operating environment:
test -d logs || mkdir logs httpd -d 'pwd' -f 'pwd'/conf/httpd.conf
Open an MS-DOS window and from the command line, type:c>cd \program files\apache group\apache c>apache -k start c>Apache/1.3.26 (Win32) running ...
To stop Apache, open a second MS-DOS window:c>apache -k stop c>cd logs c>edit error.log
This will be true of each site in the demonstration setup, so we will not mention it again.From here on, there will be minimal differences between the server setups necessary for Win32 and those for Unix. Unless one or the other is specifically mentioned, you should assume that the text refers to both.It would be nice to have a log of what goes on. In the first edition of this book, we found that a fileEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Butterthlies, Inc., Gets Going
- InhaltsvorschauThe httpd.conf file (to be found in ... /site.first) contains the following:
User webuser Group webgroup ServerName my586 DocumentRoot /usr/www/APACHE3/APACHE3/site.first/htdocs TransferLog logs/access_log #Listen is needed for Apache2 Listen 80
In the first edition of this book, we mentioned the directivesAccessConfigandResourceConfighere. If set with /dev/null (NUL under Win32), they disable the srm.conf and access.conf files, and they were formerly required if those files were absent. However, new versions of Apache ignore these files if they are not present, so the directives are no longer required. However, if they are present, the files mentioned will be included in the Config file. In Apache Version 1.3.14 and later, they can be given a directory rather than a filename, and all files in that directory and its subdirectories will be parsed as configuration files.In Apache v2 the directivesAccessConfigandResourceConfigare abolished and will cause an error. However, you can write: Include conf/srm.conf Include conf/access.conf in that order, and at the end of the Config file.Apache v2 also, rather oddly, insists on a Listen directive. If you don't include it in your Config file, you will get the error message:...no listening sockets available, shutting down.
If you are using Win32, note that theUserandGroupdirectives are not supported, so these can be removed.Apache's role in life is delivering documents, and so far we have not done much of that. We therefore begin in a modest way with a little HTML document that lists our cards, gives their prices, and tells interested parties how to get them.We can look at the Netscape Help item "Creating Net Sites" and download "A Beginners Guide to HTML" as well as the next web person can, then rough out a little brochure in no time flat:<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"> <html> <head> <title> Butterthlies Catalog</title> </head> <body> <h1> Welcome to Butterthlies Inc</h1> <h2>Summer Catalog</h2> <p> All our cards are available in packs of 20 at $2 a pack. There is a 10% discount if you order more than 100. </p> <hr> <p> Style 2315 <p align=center> <img src="bench.jpg" alt="Picture of a bench"> <p align=center> Be BOLD on the bench <hr> <p> Style 2316 <p align=center> <img src="hen.jpg" ALT="Picture of a hencoop like a pagoda"> <p align=center> Get SCRAMBLED in the henhouse <HR> <p> Style 2317 <p align=center> <img src="tree.jpg" alt="Very nice picture of tree"> <p align=center> Get HIGH in the treehouse <hr> <p> Style 2318 <p align=center> <img src="bath.jpg" alt="Rather puzzling picture of a bathtub"> <p align=center> Get DIRTY in the bath <hr> <p align=right> Postcards designed by Harriet@alart.demon.co.uk <hr> <br> Butterthlies Inc, Hopeful City, Nevada 99999 </body> </HTML>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Block Directives
- InhaltsvorschauApache has a number of block directives that limit the application of other directives within them to operations on particular virtual hosts, directories, or files. These are extremely important to the operation of a real web site because within these blocks — particularly
<VirtualHost>— the webmaster can, in effect, set up a large number of individual servers run by a single invocation of Apache. This will make more sense when you get to the Section 4.1.The syntax of the block directives is detailed next.<VirtualHost><VirtualHost host[:port]> ... </VirtualHost> Server configThe<VirtualHost>directive within a Config file acts like a tag in HTML: it introduces a block of text containing directives referring to one host; when we're finished with it, we stop with</VirtualHost>. For example:.... <VirtualHost www.butterthlies.com> ServerAdmin sales@butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/customers ServerName www.butterthlies.com ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/name-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/name-based/logs/access_log </VirtualHost> ...
<VirtualHost>also specifies which IP address we're hosting and, optionally, the port. If port is not specified, the default port is used, which is either the standard HTTP port, 80, or the port specified in aPortdirective (not in Apache v2). host can also be_default_, in which case it matches anything no other <VirtualHost>section matches.In a real system, this address would be the hostname of our server. There are three more similar directives that also limit the application of other directives:<Directory><Files><Location>
This list shows the analogues in ascending order of authority, so that<Directory>is overruled by<Files>, and<Files>by<Location>. Files can be nested within<Directory>blocks. Execution proceeds in groups, in the following order:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Other Directives
- InhaltsvorschauOther housekeeping directives are listed here.ServerNameServerName fully-qualified-domain-name Server config, virtual hostThe
ServerNamedirective sets the hostname of the server; this is used when creating redirection URLs. If it is not specified, then the server attempts to deduce it from its own IP address; however, this may not work reliably or may not return the preferred hostname. For example:ServerName www.example.com
could be used if the canonical (main) name of the actual machine were simple.example.com, but you would like visitors to seewww.example.com.UseCanonicalNameUseCanonicalName on|off Default: on Server config, virtual host, directory, .htaccessThis directive controls how Apache forms URLs that refer to itself, for example, when redirecting a request for http://www.domain.com/some/directory to the correct http://www.domain.com/some/directory/ (note the trailing / ). IfUseCanonical-Nameison(the default), then the hostname and port used in the redirect will be those set byServerNameandPort(not Apache v2). If it isoff, then the name and port used will be the ones in the original request.One instance where this directive may be useful is when users are in the same domain as the web server (for example, on an intranet). In this case, they may use the "short" name for the server (www, for example), instead of the fully qualified domain name (www.domain.com, say). If a user types a URL such as http://www/APACHE3/somedir (without the trailing slash), then, withUseCanonicalNameswitchedon, the user will be directed to http://www.domain.com/somedir/. WithUseCanonicalNameswitchedoff, she will be redirected to http://www/APACHE3/somedir/. An obvious case in which this is useful is when user authentication is switched on: reusing the server name that the user typed means she won't be asked to reauthenticate when the server name appears to the browser to have changed. More obscure cases relate to name/address translation caused by some firewalling techniques.ServerAdminServerAdmin email_address Server config, virtual hostEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - HTTP Response Headers
- InhaltsvorschauThe webmaster can set and remove HTTP response headers for special purposes, such as setting metainformation for an indexer or PICS labels. Note that Apache doesn't check whether what you are doing is at all sensible, so make sure you know what you are up to, or very strange things may happen.HeaderNameHeaderName filename Server config, virtual host, directory, .htaccessThe
HeaderNamedirective sets the name of the file that will be inserted at the top of the index listing. filename is the name of the file to include.Apache 1.3.6 and EarlierThe module first attempts to include filename.htmlas an HTML document; otherwise, it will try to include filename as plain text. filename is treated as a filesystem path relative to the directory being indexed. In no case is SSI (server-side includes — see Chapter 14) processing done. For example:HeaderName HEADER
When indexing the directory /web, the server will first look for the HTML file /web/HEADER.html and include it if found; otherwise, it will include the plain text file /web/HEADER, if it exists.Apache Versions After 1.3.6filename is treated as a URI path relative to the one used to access the directory being indexed, and it must resolve to a document with a major content type of "text" (e.g., text/html, text/plain, etc.). This means that filename may refer to a CGI script if the script's actual file type (as opposed to its output) is marked as text/html, such as with a directive like:AddType text/html .cgi
Content negotiation will be performed if theMultiViewsoption is enabled. If filename resolves to a static text/html document (not a CGI script) and theIncludesoption is enabled, the file will be processed for server-side includes (see the mod_include documentation). This directive needs mod_autoindex.HeaderHeaderName [set|add|unset|append] HTTP-header "value"HeaderName remove HTTP-header AnywhereTheHeaderNamedirective takes two or three arguments: the first may beset,addEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Restarts
- InhaltsvorschauA webmaster will sometimes want to kill Apache and restart it with a new Config file, often to add or remove a virtual host as people's web sites come and go. This can be done the brutal way, by running
ps-auxto get Apache's PID, doingkill<PID>to stop httpd and restarting it. This method causes any transactions in progress to fail in an annoying and disconcerting way for logged-on clients. A recent innovation in Apache allowed restarts of the main server without suddenly chopping off any child processes that were running.
There are three ways to restart Apache under Unix (see Chapter 2):- Kill and reload Apache, which then rereads all its Config files and restarts:
% kill PID % httpd [ flags ]
- The same effect is achieved with less typing by using the flag
-HUPto kill Apache:% kill -HUP PID
- A graceful restart is achieved with the flag
-USR1. This rereads the Config files but lets the child processes run to completion, finishing any client transactions in progress, before they are replaced with updated children. In most cases, this is the best way to proceed, because it won't interrupt people who are browsing at the time (unless you messed up the Config files):% kill -USR1 PID
A script to do the job automatically (assuming you are in the server root directory when you run it) is as follows:#!/bin/sh kill -USR1 `cat logs/httpd.pid`
Under Win32 it is enough to open a second MS-DOS window and type:apache -k shutdown|restartSee Chapter 2.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - .htaccess
- InhaltsvorschauAn alternative to restarting to change Config files is to use the .htaccess mechanism, which is explained in Chapter 5. In effect, the changeable parts of the Config file are stored in a secondary file kept in .../htdocs. Unlike the Config file, which is read by Apache at startup, this file is read at each access. The advantage is flexibility, because the webmaster can edit it whenever he likes without interrupting the server. The disadvantage is a fairly serious degradation in performance, because the file has to be laboriously parsed to serve each request. The webmaster can limit what people do in their .htaccess files with the
AllowOverridedirective.He may also want to prevent clients seeing the .htaccess files themselves. This can be achieved by including these lines in the Config file:<Files .htaccess> order allow,deny deny from all </Files>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - CERN Metafiles
- InhaltsvorschauA metafile is a file with extra header data to go with the file served — for example, you could add a
Refreshheader. There seems no obvious place for this material, so we will put it here, with apologies to those readers who find it rather odd.MetaFilesMetaFiles [on|off] Default: off DirectoryTurns metafile processing on or off on a directory basis.MetaDirMetaDir directory_name Default directory_name: .web DirectoryNames the directory in which Apache is to look for metafiles. This is usually a "hidden" subdirectory of the directory where the file is held. Set to the value.to look in the same directory.MetaSuffixMetaSuffix file_suffix Default file_suffix: .meta DirectoryNames the suffix of the file containing metainformation.The default values for these directives will cause a request for DOCUMENT_ROOT/mydir/fred.html to look for metainformation (supplementing the MIME header) in DOCUMENT_ROOT/mydir/fred.html.meta.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Expirations
- InhaltsvorschauApache Version 1.2 brought the
expiresmodule, mod_expires, into the main distribution. The point of this module is to allow the webmaster to set the returned headers to pass information to clients' browsers about documents that will need to be reloaded because they are apt to change or, alternatively, that are not going to change for a long time and can therefore be cached. There are three directives:ExpiresActiveExpiresActive [on|off] Anywhere, .htaccess when AllowOverride IndexesExpiresActivesimply switches the expiration mechanism on and off.ExpiresByTypeExpiresByType mime-type time Anywhere, .htaccess when AllowOverride IndexesExpiresByTypetakes two arguments. mime-type specifies a MIME type of file; time specifies how long these files are to remain active. There are two versions of the syntax. The first is this:code secondsThere is no space between code and seconds. code is one of the following:-
A - Access time (or now, in other words)
-
M - Last modification time of the file
seconds is simply a number. For example:A565656
specifies 565,656 seconds after the access time.The more readable second format is:base [plus] number type [number type ...]where base is one of the following:-
access - Access time
-
now - Synonym for
access -
modification - Last modification time of the file
Thepluskeyword is optional, and type is one of the following:years
months
weeks
days
hoursEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Chapter 4: Virtual Hosts
- InhaltsvorschauOur business has now expanded, and we have a team of salespeople. They need their own web site — with different prices, gossip about competitors, conspiracies, plots, plans, and so on — that is separate from the customers' web site we have been talking about. There are essentially two ways of doing this:
- Run a single copy of Apache that maintains two or more web sites as virtual sites. This is the most common method.
- Run two (or more) copies of Apache, each maintaining a single site. You may want to do this to optimize two versions of Apache in different ways — for instance, one serving images and the other running scripts.
On site.twocopy (see Section 4.3, later in this chapter) we run two different versions of Apache, each serving a different host. As we have said, you might want to do this to optimize the two versions in different ways. However, it is more common to run a number of virtual Apache servers that steer incoming requests on different URLs (usually with the same IP address) to different sets of documents. These might well be home pages for members of your organization or your clients.In the first edition of this book, we showed how to do this for Apache 1.2 and HTTP 1.0. The result was rather clumsy, with a main host and a virtual host, but it coped with HTTP 1.0 clients. However, the setup can now be done much more neatly with theNameVirtualHostdirective. The possible combinations of IP-based and name-based hosts can become quite complex. A full explanation with examples and the underlying theology can be found athttp://www.apache.org/docs/vhosts, but several of the possible permutations are unlikely to be very useful in practice.This is by far the preferred method of managing virtual hosts, taking advantage of the ability of HTTP 1.1-compliant browsers (or at least browsers that support theHostheader . . . pretty much all of them at this point) to send the name of the site they want to access. At .../site.virtual/Name-based we have www.butterthlies.com andEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Two Sites and Apache
- InhaltsvorschauOur business has now expanded, and we have a team of salespeople. They need their own web site — with different prices, gossip about competitors, conspiracies, plots, plans, and so on — that is separate from the customers' web site we have been talking about. There are essentially two ways of doing this:
- Run a single copy of Apache that maintains two or more web sites as virtual sites. This is the most common method.
- Run two (or more) copies of Apache, each maintaining a single site. You may want to do this to optimize two versions of Apache in different ways — for instance, one serving images and the other running scripts.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Virtual Hosts
- InhaltsvorschauOn site.twocopy (see Section 4.3, later in this chapter) we run two different versions of Apache, each serving a different host. As we have said, you might want to do this to optimize the two versions in different ways. However, it is more common to run a number of virtual Apache servers that steer incoming requests on different URLs (usually with the same IP address) to different sets of documents. These might well be home pages for members of your organization or your clients.In the first edition of this book, we showed how to do this for Apache 1.2 and HTTP 1.0. The result was rather clumsy, with a main host and a virtual host, but it coped with HTTP 1.0 clients. However, the setup can now be done much more neatly with the
NameVirtualHostdirective. The possible combinations of IP-based and name-based hosts can become quite complex. A full explanation with examples and the underlying theology can be found athttp://www.apache.org/docs/vhosts, but several of the possible permutations are unlikely to be very useful in practice.This is by far the preferred method of managing virtual hosts, taking advantage of the ability of HTTP 1.1-compliant browsers (or at least browsers that support theHostheader . . . pretty much all of them at this point) to send the name of the site they want to access. At .../site.virtual/Name-based we have www.butterthlies.com and sales. butterthlies.com on 192.168.123.2. Of course, these sites must have their names registered in DNS (or, if you are dummying the setup as we did, included in /etc/hosts). The Config file is as follows:User webuser Group webgroup NameVirtualHost 192.168.123.2 <VirtualHost www.butterthlies.com> ServerName www.butterthlies.com ServerAdmin sales@butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/customers ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/Name-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/Name-based/logs/access_log </VirtualHost> <VirtualHost sales.butterthlies.com> ServerName sales.butterthlies.com ServerAdmin sales@butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.virtual/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/APACHE3/site.virtual/Name-based/logs/error_log TransferLog /usr/www/APACHE3/APACHE3/site.virtual/Name-based/logs/access_log </VirtualHost>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Two Copies of Apache
- InhaltsvorschauTo illustrate the possibilities, we will run two copies of Apache with different IP addresses on different consoles, as if they were on two completely separate machines. This is not something you want to do often, but on a heavily loaded site it may be useful to run two Apaches optimized in different ways. The different virtual hosts probably need very different configurations, such as different values for
ServerType,User,TypesConfig, orServerRoot(none of these directives can apply to a virtual host, since they are global to all servers, which is why you have to run two copies to get the desired effect). If you are expecting a lot of hits, you should avoid running more than one copy, as doing so will generally load the machine more.You can find the necessary machinery in ... /site.twocopy. There are two subdirectories: customers and sales.The Config file in ... /customers contains the following:User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.twocopy/customers/htdocs BindAddress www.butterthlies.com TransferLog logs/access_log
In .../sales the Config file is as follows:User webuser Group webgroup ServerName sales.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.twocopy/sales/htdocs Listen sales-not-vh.butterthlies.com:80 TransferLog logs/access_log
On this occasion, we will exercise the sales-not-vh.butterthlies.com URL. For the first time, we have more than one copy of Apache running, and we have to associate requests on specific URLs with different copies of the server. There are three more directives to for making these associations:BindAddressBindAddress addr Default addr: any Server configThis directive forces Apache to bind to a particular IP address, rather than listening to all IP addresses on the machine. It has been abolished in Apache v2: useListeninstead.PortPort port Default port: 80 Server configWhen used in the main server configuration (i.e., outside any<VirtualHost>sections) and in the absence of aEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Dynamically Configured Virtual Hosting
- InhaltsvorschauAn even neater method of managing Virtual Hosting is provided by
mod_vhost_alias, which lets you define a single boilerplate configuration and then fills in the details at service time from the IP address and or the Host header in the HTTP request.All the directives in this module interpolate a string into a pathname. The interpolated string (called the "name") may be either the server name (see the UseCanonicalName directive for details on how this is determined) or the IP address of the virtual host on the server in dotted-quad format (xxx.xxx.xxx.xxx).The interpolation is controlled by a mantra,%<code-letter>, which is replaced by some value you supply in the Config file. It's not unlike the controls for logging — see Chapter 10.These are the possible formats:-
%% - Insert a literal %.
-
%p - Insert the port number of the virtual host.
-
%N.M - Insert (part of ) the name.
NandMare numbers, used to specify substrings of the name.Nselects from the dot-separated components of the name, andMselects characters within whateverNhas selected.Mis optional and defaults to zero if it isn't present. The dot must be present if and only ifMis present. If we are trying to parse sales.butterthlies.com, the interpretation of N is as follows:-
0 - The whole name: sales.butterthlies.com
-
1 - The first part: sales
-
2 - The second part: butterthlies
-
-1 - The last part: com
-
-2 - The penultimate part: butterthlies
-
2+ - The second and all subsequent parts: butterthlies.com
-
-2+
-
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Chapter 5: Authentication
- InhaltsvorschauThe volume of business Butterthlies, Inc. is doing is stupendous, and naturally our competitors are anxious to look at sensitive information such as the discounts we give our salespeople. We have to seal our site off from their vulgar gaze by authenticating those who log on to it.Authentication is simple in principle. The client sends his name and password to Apache. Apache looks up its file of names and encrypted passwords to see whether the client is entitled to access. The webmaster can store a number of clients in a list — either as a simple text file or as a database — and thereby control access person by person.It is also possible to group a number of people into named groups and to give or deny access to these groups as a whole. So, throughout this chapter, bill and ben are in the group directors, and daphne and sonia are in the group cleaners. The webmaster can
requireuser so and so orrequiregroup such and such, or even simplyrequirethat visitors be registered users. If you have to deal with large numbers of people, it is obviously easier to group them in this way. To make the demonstration simpler, the password is always theft. Naturally, you would not use so short and obvious a password in real life, or one so open to a dictionary attack.Each username/password pair is valid for a particular realm, which is named when the passwords are created. The browser asks for a URL; the server sends back "Authentication Required" (code 401) and the realm. If the browser already has a username/password for that realm, it sends the request again with the username/password. If not, it prompts the user, usually including the realm's name in the prompt, and sends that.Of course, all this is worryingly insecure since the password is sent unencrypted over the Web (base64 encoding is easily reversed), and any malign observer simply has to watch the traffic to get the password — which is as good in his hands as in the legitimate client's. Digest authentication improves on this by using a challenge/handshake protocol to avoid revealing the actual password. In the two earlier editions of this book, we had to report that no browsers actually supported this technique; now things are a bit better. Using SSL (see Chapter 11) also improves this.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Authentication Protocol
- InhaltsvorschauAuthentication is simple in principle. The client sends his name and password to Apache. Apache looks up its file of names and encrypted passwords to see whether the client is entitled to access. The webmaster can store a number of clients in a list — either as a simple text file or as a database — and thereby control access person by person.It is also possible to group a number of people into named groups and to give or deny access to these groups as a whole. So, throughout this chapter, bill and ben are in the group directors, and daphne and sonia are in the group cleaners. The webmaster can
requireuser so and so orrequiregroup such and such, or even simplyrequirethat visitors be registered users. If you have to deal with large numbers of people, it is obviously easier to group them in this way. To make the demonstration simpler, the password is always theft. Naturally, you would not use so short and obvious a password in real life, or one so open to a dictionary attack.Each username/password pair is valid for a particular realm, which is named when the passwords are created. The browser asks for a URL; the server sends back "Authentication Required" (code 401) and the realm. If the browser already has a username/password for that realm, it sends the request again with the username/password. If not, it prompts the user, usually including the realm's name in the prompt, and sends that.Of course, all this is worryingly insecure since the password is sent unencrypted over the Web (base64 encoding is easily reversed), and any malign observer simply has to watch the traffic to get the password — which is as good in his hands as in the legitimate client's. Digest authentication improves on this by using a challenge/handshake protocol to avoid revealing the actual password. In the two earlier editions of this book, we had to report that no browsers actually supported this technique; now things are a bit better. Using SSL (see Chapter 11) also improves this.Examples are found in site.authent. The first Config file, .../conf/httpd1.confEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Authentication Directives
- InhaltsvorschauFrom Apache v1.3 on, filenames are relative to theserver rootunless they are absolute. A filename is taken as absolute if it starts with / or, on Win32, if it starts with drive :/. It seems sensible for us to write them in absolute form to prevent misunderstandings. The directives are as follows:AuthTypeAuthType type directory, .htaccess
AuthTypespecifies the type of authorization control.Basicwas originally the only possible type, but Apache 1.1 introducedDigest, which uses an MD5 digest and a shared secret.If the directiveAuthTypeis used, we must also useAuthName,AuthGroupFile, andAuthUserFile.AuthNameAuthName auth-realm directory, .htaccessAuthNamegives the name of the realm in which the users' names and passwords are valid. If the name of the realm includes spaces, you will need to surround it with quotation marks:AuthName "sales people"
AuthGroupFileAuthGroupFile filename directory, .htaccessAuthGroupFilehas nothing to do with theGroupwebgroupdirective at the top of the Config file. It gives the name of another file that contains group names and their members:cleaners: daphne sonia directors: bill ben
We put this into ... /ok_users/groups and setAuthGroupFileto match. TheAuthGroupFiledirective has no effect unless therequiredirective is suitably set.AuthUserFileAuthUserFile filenameAuthUserFileis a file of usernames and their encrypted passwords. There is quite a lot to this; see the section Section 5.3, Section 5.4, and Section 5.5 later in this chapter.AuthAuthoritativeAuthAuthoritative on|off Default: AuthAuthoritative on directory, .htaccessSetting theAuthAuthoritativedirective explicitly tooffallows for both authentication and authorization to be passed on to lower-level modules (as defined in the Config and modules.c files) if there is no user ID or rule matching the supplied user ID. If there is a user ID and/or rule specified, the usual password and access checks will be applied, and a failure will give an Authorization Required reply.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Passwords Under Unix
- InhaltsvorschauAuthentication of salespeople is managed by the password file sales, stored in /usr/www/APACHE3/ok_users. This is safely above the document root, so that the Bad Guys cannot get at it to mess with it. The file sales is maintained using the Apache utility htpasswd . The source code for this utility is to be found in ... /apache_1.3.1/src/support/htpasswd.c, and we have to compile it with this:
% make htpasswdhtpasswd now links, and we can set it to work. Since we don't know how it functions, the obvious thing is to prod it with this:% htpasswd -?It responds that the correct usage is as follows:Usage: htpasswd [-cmdps] passwordfile username htpasswd -b[cmdps] passwordfile username password -c Create a new file. -m Force MD5 encryption of the password. -d Force CRYPT encryption of the password (default). -p Do not encrypt the password (plaintext). -s Force SHA encryption of the password. -b Use the password from the command line rather than prompting for it. On Windows and TPF systems the '-m' flag is used by default. On all other systems, the '-p' flag will probably not work.
This seems perfectly reasonable behavior, so let's create a user bill with the password "theft" (in real life, you would never use so obvious a password for a character such as Bill of the notorious Butterthlies sales team, because it would be subject to a dictionary attack, but this is not real life):% htpasswd -m -c ... /ok_users/sales billWe are asked to type his password twice, and the job is done. If we look in the password file, there is something like the following:bill:$1$Pd$E5BY74CgGStbs.L/fsoEU0
Add subsequent users (the-cflag creates a new file, so we shouldn't use it after the first one):% htpasswd ... /ok_users/sales benEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Passwords Under Win32
- InhaltsvorschauSince Win32 lacks an encryption function, passwords are stored in plain text. This is not very secure, but one hopes it will change for the better. The passwords would be stored in the file named by the
AuthUserFiledirective, and Bill's entry would be:bill:theft
except that in real life you would use a better password.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Passwords over the Web
- InhaltsvorschauThe security of these passwords on your machine becomes somewhat irrelevant when we realize that they are transmitted unencrypted over the Web. The Base64 encoding used for Basic password transmission keeps passwords from being readable at a glance, but it is very easily decoded. Authentication, as described here, should only be used for the most trivial security tasks. If a compromised password could cause any serious trouble, then it is essential to encrypt it using SSL — see Chapter 11.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- From the Client's Point of View
- InhaltsvorschauIf you run Apache using httpd1.conf, you will find you can access www.butterthlies.comas before. But if you go to sales.butterthlies.com,you will have to give a username and password.The file is httpd2.conf. These are the relevant bits:
... AuthType Digest AuthName darkness AuthDigestDomain http://sales.butterthlies.com AuthDigestFile /usr/www/APACHE3/ok_digest/digest_users
Run it with./go2. At the client end, Microsoft Internet Explorer (MSIE) v5 displayed a password screen decorated with a key and worked as you would expect; Netscape v4.05 asked for a username and password in the usual way and returned error 401 "Authorization required."Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - CGI Scripts
- InhaltsvorschauAuthentication (both Basic and Digest) can also protect CGI scripts. Simply provide a suitable
<Directory .../cgi-bin>block.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Variations on a Theme
- InhaltsvorschauYou may find that logging in again is a bit more elaborate than you would think. We found that both MSIE and Netscape were annoyingly helpful in remembering the password used for the last login and using it again. To make sure you are really exercising the security features, you have to exit your browser completely each time and reload it to get a fresh crack.You might like to try the effect of inserting these lines in either of the previous Config files:
.... #require valid-user #require user daphne bill #require group cleaners #require group directors ...
and uncommenting them one line at a time (remember to kill and restart Apache each time).Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Order, Allow, and Deny
- InhaltsvorschauSo far we have dealt with potential users on an individual basis. We can also allow access from or deny access to specific IP addresses, hostnames, or groups of addresses and hostnames. The commands are
allowfromanddenyfrom.The order in which theallowanddenycommands are applied is not set by the order in which they appear in your file. The default order isdenythenallow: if a client is excluded bydeny, it is excluded unless it matchesallow. If neither is matched, the client is granted access.The order in which these commands is applied can be set by theorderdirective.allow fromallow from host host ... directory, .htaccessTheallowdirective controls access to a directory. The argument host can be one of the following:-
all - All hosts are allowed access.
- A (partial) domain name
- All hosts whose names match or end in this string are allowed access.
- A full IP address
- The first one to three bytes of an IP address are allowed access, for subnet restriction.
- A network/netmask pair
- Network a.b.c.d and netmask w.x.y.z are allowed access, to give finer-grained subnet control. For instance, 10.1.0.0/255.255.0.0.
- A network CIDR specification
- The netmask consists of nnn high-order 1-bits. For instance, 10.1.0.0/16 is the same as 10.1.0.0/255.255.0.0.
allow from envallow from env=variablename ... directory, .htaccessTheallowfromenvdirective controls access by the existence of a named environment variable. For instance:BrowserMatch ^KnockKnock/2.0 let_me_in <Directory /docroot> order deny,allow deny from all allow from env=let_me_in </Directory>
Access by a browser called KnockKnock v2.0 sets an environment variablelet_me_in,which in turn triggersallowfrom.deny fromdeny from host host ... directory, .htaccessThedenyfromdirective controls access by host. The argumentEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- DBM Files on Unix
- InhaltsvorschauAlthough searching a file of usernames and passwords works perfectly well, it is apt to be rather slow once the list gets up to a couple hundred entries. To deal with this, Apache provides a better way of handling large lists by turning them into a database. You need one (not both!) of the modules that appear in the Config file as follows:
#Module db_auth_module mod_auth_db.o Module dbm_auth_module mod_auth_dbm.o
Bear in mind that they correspond to different directives:AuthDBMUserFileorAuthDBUserFile. A Perl script to manage both types of database, dbmmanage , is supplied with Apache in .../src/support. To decide which type to use, you need to discover the capabilities of your Unix. Explore these by going to the command prompt and typing first:% man dband then:% man dbmWhichever method produces a manpage is the one you should use. You can also use a SQL database, employing MySQLor a third-party package to manage it.Once you have decided which method to use, edit the Config file to include the appropriate module, and then type:% ./Configureand:% makeWe now have to create a database of our users: bill, ben, sonia, and daphne. Go to ... /apache/src/support, find the utility dbmmanage, and copy it into /usr/local/bin or something similar to put it on your path. This utility may be distributed without execute permission set, so, before attempting to run it, we may need to change the permissions:% chmod +x dbmmanageYou may find, when you first try to run dbmmanage, that it complains rather puzzlingly that some unnamed file can't be found. Since dbmmanage is a Perl script, this is probably Perl, a text-handling language, and if you have not installed it, you should. It may also be necessary to change the first line of dbmmanage:#!/usr/bin/perl5
to the correct path for Perl, if it is installed somewhere else.If you provoke it withdbmmanage -?, you get:Usage: dbmmanage [enc] dbname command [username [pw [group[,group] [comment]]]] where enc is -d for crypt encryption (default except on Win32, Netware) -m for MD5 encryption (default on Win32, Netware) -s for SHA1 encryption -p for plaintext command is one of: add|adduser|check|delete|import|update|view pw of . for update command retains the old password pw of--(or blank) for update command prompts for the password groups or comment of . (or blank) for update command retains old values groups or comment of--for update command clears the existing value groups or comment of--for add and adduser commands is the empty value takes the following arguments: dbmmanage [enc] dbname command [username [pw [group[,group] [comment]]]] 'enc' sets the encryption method: -d for crypt (default except Win32, Netware) -m for MD5 (default on Win32, Netware) -s for SHA1 -p for plaintextEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Digest Authentication
- InhaltsvorschauA halfway house between complete encryption and none at all is digest authentication. The idea is that a one-way hash, or digest, is calculated from a password and various other bits of information. Rather than sending the lightly encoded password, as is done in basic authentication, the digest is sent. At the other end, the same function is calculated: if the numbers are not identical, something is wrong — and in this case, since all other factors should be the same, the "something" must be the password.Digest authentication is applied in Apache to improve the security of passwords. MD5 is a cryptographic hash function written by Ronald Rivest and distributed free by RSA Data Security; with its help, the client and server use the hash of the password and other stuff. The point of this is that although many passwords lead to the same hash value, there is a very small chance that a wrong password will give the right hash value, if the hash function is intelligently chosen; it is also very difficult to construct a password leading to the same hash value (which is why these are sometimes referred to as one-way hashes ). The advantage of using the hash value is that the password itself is not sent to the server, so it isn't visible to the Bad Guys. Just to make things more tiresome for them, MD5 adds a few other things into the mix: the URI, the method, and a nonce. A nonce is simply a number chosen by the server and told to the client, usually different each time. It ensures that the digest is different each time and protects against replay attacks. The digest function looks like this:
MD5(MD5(<password>)+":"+<nonce>+":"+MD5(<method>+":"+<uri>))
MD5 digest authentication can be invoked with the following line:AuthType Digest
This plugs a nasty hole in the Internet's security. As we saw earlier — and almost unbelievably — the authentication procedures discussed up to now send the user's password in barely encoded text across the Web. A Bad Guy who intercepts the Internet traffic then knows the user's password. This is a Bad Thing.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Anonymous Access
- InhaltsvorschauIt sometimes happens that even though you have passwords controlling the access to certain things on your site, you also want to allow guests to come and sample the site's joys — probably a reduced set of joys, mediated by the username passed on by the client's browser. The Apache module mod_auth_anon.c allows you to do this.We have to say that the whole enterprise seems rather silly. If you want security at all on any part of your site, you need to use SSL. If you then want to make some of the material accessible to everyone, you can give them a different URL or a link from a reception page. However, it seems that some people want to do this to capture visitors' email addresses (using a long-standing convention for anonymous access), and if that is what you want, and if your users' browsers are configured to provide that information, then here's how.The module should be compiled in automatically — check by looking at Configuration or by running
httpd -l. If it wasn't compiled in, you will probably get this unnerving error message:Invalid command Anonymous
when you try to exercise theAnonymousdirective. The Config file in ... /site.anon/conf/httpd.conf is as follows:User webuser Group webgroup ServerName www.butterthlies.com IdentityCheck on NameVirtualHost 192.168.123.2 <VirtualHost www.butterthlies.com> ServerAdmin sales@butterthlies.com DocumentRoot /usr/www/APACHE3/site.anon/htdocs/customers ServerName www.butterthlies.com ErrorLog /usr/www/APACHE3/site.anon/logs/customers/error_log TransferLog /usr/www/APACHE3/site.anon/logs/access_log ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin </VirtualHost> <VirtualHost sales.butterthlies.com> ServerAdmin sales_mgr@butterthlies.com DocumentRoot /usr/www/APACHE3/site.anon/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/site.anon/logs/error_log TransferLog /usr/www/APACHE3/site.anon/logs/salesmen/access_log ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin <Directory /usr/www/APACHE3/site.anon/htdocs/salesmen> AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups require valid-user Anonymous guest anonymous air-head Anonymous_NoUserID on </Directory> </VirtualHost>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Experiments
- InhaltsvorschauRun
./go. Exit from your browser on the client machine, and reload it to make sure it does password checking properly (you will probably need to do this every time you make a change throughout this exercise). If you access the salespeople's site again with the user ID guest, anonymous, or air-head and any password you like (fffor23orrubbish), you will get access. It seems rather silly, but you must give a password of some sort.Set:Anonymous_NoUserID on
This time you can leave both the ID and password fields empty. If you enter a valid username (bill, ben, sonia, or gloria), you must follow through with a valid password.Set:Anonymous_NoUserID off Anonymous_VerifyEmail on Anonymous_LogEmail on
The effect here is that the user ID has to look something like an email address, with (according to the documentation) at least one "@" and one ".". However, we found that one "." orone "@" would do. Email is logged in the error log, not the access log as you might expect.Set:Anonymous_VerifyEmail off Anonymous_LogEmail off Anonymous_Authoritative on
The effect here is that if an access attempt fails, it is not now passed on to the other methods. Up to now we have always been able to enter as bill, passwordtheft, but no more. Change theAnonymoussection to look like this:Anonymous_Authoritative off Anonymous_MustGiveEmail on
Finally:Anonymous guest anonymous air-head Anonymous_NoUserID off Anonymous_VerifyEmail off Anonymous_Authoritative off Anonymous_LogEmail on Anonymous_MustGiveEmail on
The documentation says thatAnonymous_MustGiveEmailforces the user to give some sort of password. In fact, it seems to have the same effect asVerifyEmail:. A "." or "@" will do.In the first edition of this book we said that if you wrote your httpd.conf file as shown earlier, but also created .../conf/access.conf containing directives as innocuous as:<Directory /usr/www/APACHE3/site.anon/htdocs/salesmen> </Directory>
security in the salespeople's site would disappear. This bug seems to have been fixed in Apache v1.3.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Automatic User Information
- InhaltsvorschauThis is all great fun, but we are trying to run a business here. Our salespeople are logging in because they want to place orders, and we ought to be able to detect who they are so we can send the goods to them automatically. This can be done by looking at the environment variable REMOTE_USER, which will be set to the current username. Just for the sake of completeness, we should note another directive here.The
IdentityCheckdirective causes the server to attempt to identify the client's user by querying the identd daemon of the client host. (See RFC 1413 for details, but the short explanation is that identd will, when given a socket number, reveal which user created that socket — that is, the username of the client on his home machine.)IdentityCheck [on|off]
If successful, the user ID is logged in the access log. However, as the Apache manual austerely remarks, you should "not trust this information in any way except for rudimentary usage tracking." Furthermore (or perhaps, furtherless), this extra logging slows Apache down, and many machines do not run an identd daemon, or if they do, they prevent external access to it. Even if the client's machine is running identd, the information it provides is entirely under the control of the remote machine. Many providers find that it is not worth the trouble to useIdentityCheck.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using .htaccess Files
- InhaltsvorschauWe experimented with putting configuration directives in a file called ... /htdocs/.htaccess rather than in httpd.conf. It worked, but how do you decide whether to do things this way rather than the other?The point of the .htaccess mechanism is that you can change configuration directives without having to restart the server. This is especially valuable on a site where a lot of people maintain their own home pages but are not authorized to bring the server down or, indeed, to modify its Config files. The drawback to the .htaccess method is that the files are parsed for each access to the server, rather than just once at startup, so there is a substantial performance penalty.The httpd1.conf (from ... /site.htaccess) file contains the following:
User webuser Group webgroup ServerName www.butterthlies.com AccessFileName .myaccess ServerAdmin sales@butterthlies.com DocumentRoot /usr/www/APACHE3/site.htaccess/htdocs/salesmen ErrorLog /usr/www/APACHE3/site.htaccess/logs/error_log TransferLog /usr/www/APACHE3/site.htaccess/logs/access_log ServerName sales.butterthlies.com
Access control, as specified byAccessFileName, is now in ... /htdocs/salesmen/.myaccess:AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups require group cleaners
If you run the site with./go 1and access http://sales.butterthlies.com /, you are asked for an ID and a password in the usual way. You had better be daphne or sonia if you want to get in, because only members of the group cleaners are allowed.You can then edit ... /htdocs/salesmen/.myaccess torequire group directorsinstead. Without reloading Apache, you now have to be bill or ben.AccessFileNamegives authority to the files specified. If a directory is given, authority is given to all files in it and its subdirectories.AccessFileName filename, filename|direcory and subdirectories ... Server config, virtual hostInclude the following line in httpd.conf:AccessFileName .myaccess1, myaccess2 ...
Restart Apache (since theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Overrides
- InhaltsvorschauWe can do more with overrides than speed up Apache. This mechanism allows the webmaster to exert finer control over what is done in .htaccess files. The key directive is
AllowOverride.This directive tells Apache which directives in an .htaccess file can override earlier directives.AllowOverride override1 override2 ... DirectoryThe list ofAllowOverrideoverrides is as follows:-
AuthConfig - Allows individual settings of
AuthDBMGroupFile,AuthDBMUserFile,AuthGroupFile,AuthName,AuthType,AuthUserFile, andrequire -
FileInfo - Allows
AddType,AddEncoding,AddLanguage,AddCharset,AddHandler,RemoveHandler,LanguagePriority,ErrorDocument,DefaultType,Action, Redirect,RedirectMatch,RedirectTemp,RedirectPermanent,PassEnv,SetEnv,UnsetEnv,Header,RewriteEnging,RewriteOptions,RewriteBase,RewriteCond,RewriteRule,CookieTracking, andCookiename -
Indexes - Allows
FancyIndexing,AddIcon,AddDescription(see Chapter 7) -
Limit - Can limit access based on hostname or IP number
-
Options - Allows the use of the
Optionsdirective (see Chapter 13) -
All - All of the previous
-
None - None of the previous
You might ask: ifnoneswitches multiple searches off, which of these options switches it on? The answer is any of them, or the complete absence ofAllowOverride. In other words, it is on by default.To illustrate how this works, look at .../site.htaccess/httpd3.conf, which is httpd2.conf with the authentication directives on the salespeople's directory back in again. The Config filewantscleaners; the . myaccess file wantsdirectors. If we now put the authorization directives, favoringcleaners, back into the Config file:User webuser Group webgroup ServerName www.butterthlies.com AccessFileName .myaccess ServerAdmin sales@butterthlies.com DocumentRoot /usr/www/APACHE3/site.htaccess/htdocs/salesmen ErrorLog /usr/www/APACHE3/site.htaccess/logs/error_log TransferLog /usr/www/APACHE3/site.htaccess/logs/access_log ServerName sales.butterthlies.com #AllowOverride None AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups require group cleaners
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Chapter 6: Content Description and Modification
- InhaltsvorschauApache has the ability to tune the information it returns to the abilities of the client — and even to improve the client's efforts. Currently, this affects:
- The choice of MIME type returned. An image might be the very old-fashioned bitmap, the old-fashioned .gif, the more modern and smaller .jpg, or the extremely up-to-date .png. Once the type is indicated, Apache's reactions can be extended and controlled with a number of directives.
- The language of the returned file.
- Updates to the returned file.
- The spelling of the client's requests.
Apache v2 also offers a new mechanism — Section 6.6, which is described at the end of this chapter.MIME stands for Multipurpose Internet Mail Extensions, a standard developed by the Internet Engineering Task Force for email but then repurposed for the Web. Apache uses mod_mime.c, compiled in by default, to determine the type of a file from its extension. MIME types are more sophisticated than file extensions, providing a category (like "text," "image," or "application"), as well as a more specific identifier within that category. In addition to specifying the type of the file, MIME permits the specification of additional information, like the encoding used to represent characters.The "type" of a file that is sent is indicated by a header near the beginning of the data. For instance:content-type: text/html
indicates that what follows is to be treated as HTML, though it may also be treated as text. If the type were "image/jpg", the browser would need to use a completely different bit of code to render the data.This header is inserted automatically by Apache based on the MIME type and is absorbed by the browser so you do not see it if you right-click in a browser window and select "View Source" (MSIE) or similar. Notwithstanding, it is an essential element of a web page.The list of MIME types that Apache already knows about is distributed in the file ..conf/mime.types or can be found athttp://www.isi.edu/in-notes/iana/assignments/media-types/media-types. You can edit it to include extra types, or you can use the directives discussed in this chapter. The default location for the file isEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - MIME Types
- InhaltsvorschauMIME stands for Multipurpose Internet Mail Extensions, a standard developed by the Internet Engineering Task Force for email but then repurposed for the Web. Apache uses mod_mime.c, compiled in by default, to determine the type of a file from its extension. MIME types are more sophisticated than file extensions, providing a category (like "text," "image," or "application"), as well as a more specific identifier within that category. In addition to specifying the type of the file, MIME permits the specification of additional information, like the encoding used to represent characters.The "type" of a file that is sent is indicated by a header near the beginning of the data. For instance:
content-type: text/html
indicates that what follows is to be treated as HTML, though it may also be treated as text. If the type were "image/jpg", the browser would need to use a completely different bit of code to render the data.This header is inserted automatically by Apache based on the MIME type and is absorbed by the browser so you do not see it if you right-click in a browser window and select "View Source" (MSIE) or similar. Notwithstanding, it is an essential element of a web page.The list of MIME types that Apache already knows about is distributed in the file ..conf/mime.types or can be found athttp://www.isi.edu/in-notes/iana/assignments/media-types/media-types. You can edit it to include extra types, or you can use the directives discussed in this chapter. The default location for the file is .../<site>/conf, but it may be more convenient to keep it elsewhere, in which case you would use the directiveTypesConfig.Changing the encoding of a file with one of these directives does not change the value of theLast-Modifiedheader, so cached copies with the old label may linger after you make such changes. (Servers often send aLast-Modifiedheader containing the date and time the content of was last changed, so that the browser can use cached material at the other end if it is still fresh.) Files can have more than one extension, and their order normally doesn't matter. If the extensionEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Content Negotiation
- InhaltsvorschauThere may be different ways to handle the data that Apache returns, and there are two equivalent ways of implementing this functionality. The multiviews method is simpler (and more limited) than the *.var method, so we shall start with it. The Config file (from ... /site.multiview) looks like this:
User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.multiview/htdocs ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin AddLanguage it .it AddLanguage en .en AddLanguage ko .ko LanguagePriority it en ko <Directory /usr/www/APACHE3/site.multiview/htdocs> Options + MultiViews </Directory>
For historical reasons, you have to say:Options +MultiViews
even though you might reasonably think thatOptionsAllwould cover the case. The general idea is that whenever you want to offer variations of a file (e.g., JPG, GIF, or bitmap for images, or different languages for text), multiviews will handle it. Apache v2 offers a relevant directive.MultiviewsMatchpermits three different behaviors for mod_negotiation's Multiviews feature.MultiviewsMatch [NegotiatedOnly] [Handlers] [Filters] [Any] server config, virtual host, directory, .htaccess Compatibility: only available in Apache 2.0.26 and later.
Multiviews allows a request for a file, e.g., index.html, to match any negotiated extensions following the base request, e.g., index.html.en, index.html.fr, or index.html.gz.TheNegotiatedOnlyoption provides that every extension following the base name must correlate to a recognized mod_mime extension for content negotiation, e.g., Charset, Content-Type, Language, or Encoding. This is the strictest implementation with the fewest unexpected side effects, and it's the default behavior.To include extensions associated with Handlers and/or Filters, set theMultiviewsMatchdirective to eitherHandlers,Filters, or both option keywords. If all other factors are equal, the smallest file will be served, e.g., in deciding between index.html.cgi of 500 characters and index.html.pl of 1,000 bytes, the .cgi file would win in this example. Users ofEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Language Negotiation
- InhaltsvorschauThe same useful functionality also applies to language. To demonstrate this, we need to make up .html scripts in different languages. Well, we won't bother with actual different languages; we'll just edit the scripts to say, for example:
<h1>Italian Version</h1>
and edit the English version so that it includes a new line:<h1>English Version</h1>
Then we give each file an appropriate extension:- index.html.en for English
- index.html.it for Italian
- index.html.ko for Korean
Apache recognizes language variants: en-US is seen as a version of general English, en, which seems reasonable. You can also offer documents that serve more than one language. If you had a "franglais" version, you could serve it to both English speakers and Francophones by naming it frangdoc.en.fr. Of course, in real life you would have to go to substantially more trouble, what with translators and special keyboards and all. Also, the Italian version of the index would need to point to Italian versions of the catalogs. But in the fantasy world of Butterthlies, Inc., it's all so simple.The Italian version of our index would be index.html.it. By default, Apache looks for a file called index.html.<something>. If it has a language extension, like index.html.it, it will find the index file, happily add the language extension, and then serve up what the browser prefers. If, however, you call the index file index.it.html, Apache will still look for, and fail to find, index.html.<something>. If index.html.en is present, that will be served up. If index.en.html is there, then Apache gives up and serves up a list of all the files. The moral is, if you want to deal with index filenames in either order — index.it.html alongside index.html.en — you need the directive:DirectoryIndex index
to make Apache look for a file called index.<something> rather than the default index.html.<something>.To give Apache the idea, we need the corresponding lines in the httpd1.conf file:AddLanguage it .it AddLanguage en .en AddLanguage ko .ko
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Type Maps
- InhaltsvorschauIn the last section, we looked at multiviews as a way of providing language and image negotiation. The other way to achieve the same effects in the current release of Apache, as well as more lavish effects later (probably to negotiate browser plug-ins), is to use type maps, also known as *.var files. Multiviews works by scrambling together a plain vanilla type map; now you have the chance to set it up just as you want it. The Config file in .../site.typemap/conf/httpd1.conf is as follows:
User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.typemap/htdocs AddHandler type-map var DirectoryIndex index.var
One should write, as seen in this file:AddHandler type-map var
Having set that, we can sensibly say:DirectoryIndex index.var
to set up a set of language-specific indexes.What this means, in plainer English, is that theDirectoryIndexline overrides the default index file index.html. If you also want index.html to be used as an alternative, you would have to specify it — but you probably don't, because you are trying to do something more elaborate here. In this case there are several versions of the index — index.en.html, index.it.html, and index.ko.html — so Apache looks for index.var for an explanation.Look at ... /site.typemap/htdocs. We want to offer language-specific versions of the index.html file and alternatives to the generalized images bath, hen, tree, and bench, so we create two files, index.var and bench.var (we will only bother with one of the images, since the others are the same).This is index.var :# It seems that this URI _must_ be the filename minus the extension... URI: index; vary="language" URI: index.en.html # Seems we _must_ have the Content-type or it doesn't work... Content-type: text/html Content-language: en URI: index.it.html Content-type: text/html Content-language: it
This is bench.var :URI: bench; vary="type" URI: bench.jpg Content-type: image/jpeg; qs=0.8 level=3 URI: bench.gif Content-type: image/gif; qs=0.5 level=1
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Browsers and HTTP 1.1
- InhaltsvorschauLike any other human creation, the Web fills up with rubbish. The webmaster cannot assume that all clients will be using up-to-date browsers — all the old, useless versions are out there waiting to make a mess of your best-laid plans.In 1996, the weekly Internet magazine devoted to Apache affairs, Apache Week (Issue 25), had this to say about the impact of the then-upcoming HTTP 1.1:For negotiation to work, browsers must send the correct request information. For human languages, browsers should let the user pick what language or languages they are interested in. Recent beta versions of Netscape let the user select one or more languages (see the Netscape Options, General Preferences, Languages section).For content-types, the browser should send a list of types it can accept. For example, "text/html, text/plain, image/jpeg, image/gif." Most browsers also add the catch-all type of "*/*" to indicate that they can accept any content type. The server treats this entry with lower priority than a direct match.Unfortunately, the */* type is sometimes used instead of listing explicitly acceptable types. For example, if the Adobe Acrobat Reader plug-in is installed into Netscape, Netscape should add application/pdf to its acceptable content types. This would let the server transparently send the most appropriate content type (PDF files to suitable browsers, else HTML). Netscape does not send the content types it can accept, instead relying on the */* catch-all. This makes transparent content-negotiation impossible.Although time has passed, the situation has probably not changed very much. In addition, most browsers do not indicate a preference for particular types. This should be done by adding a preference factor (
q) to the content type. For example, a browser that accepts Acrobat files might prefer them to HTML, so it could send an accept-type list that includes:content-type: text/html: q=0.7, application/pdf: q=0.8
When the server handles the request, it combines this information with its source quality information (if any) to pick the "best" content type to return.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Filters
- InhaltsvorschauApache v2 introduced a new mechanism called a "Filter", together with a reworking of Multiviews. The documentation says:A filter is a process which is applied to data that is sent or received by the server. Data sent by clients to the server is processed by input filters while data sent by the server to the client is processed by output filters. Multiple filters can be applied to the data, and the order of the filters can be explicitly specified.Filters are used internally by Apache to perform functions such as chunking and byte-range request handling. In addition, modules can provide filters which are selectable using run-time configuration directives. The set of filters which apply to data can be manipulated with the SetInputFilter and SetOutputFilter directives.The only configurable filter currently included with the Apache distribution is the INCLUDES filter which is provided by mod_include to process output for Server Side Includes. There is also an experimental module called mod_ext_filter which allows for external programs to be defined as filters.There is a demonstration filter that changes text to uppercase. In .../site.filter/htdocs we have two files, 1.txt and 1.html, which have the same contents:
HULLO WORLD FROM site.filter
The Config file is as follows:User webuser Group webgroup Listen 80 ServerName my586 AddOutputFilter CaseFilter html DocumentRoot /usr/www/APACHE3/site.filter/htdocs
If we visit the site, we are offered a directory. If we choose 1.txt, we see the contents as shown earlier. If we choose 1.html, we find it has been through the filter and is now all uppercase:HULLO WORLD FROM SITE.FILTER
The Directives are as follows:AddInputFilterAddInputFilter filter[;filter...] extension [extension ...] directory, files, location, .htaccess AddInputFilter is only available in Apache 2.0.26 and later.AddInputFiltermaps the filename extensionsextensionto the filter or filters that will process client requests and POST input when they are received by the server. This is in addition to any filters defined elsewhere, including theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 7: Indexing
- InhaltsvorschauAs we saw back on site.first (see Chapter 3), if there is no index.html file in ... /htdocs or
DirectoryIndexdirective, Apache concocts an index called "Index of /", where "/" means theDocumentRootdirectory. For many purposes this will, no doubt, be enough. But since this jury-rigged index is the first thing a client sees, you may want to do more.There is a wide range of possibilities; some are demonstrated at ... /site.fancyindex /httpd1.conf:User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.fancyindex/htdocs <Directory /usr/www/APACHE3/site.fancyindex/htdocs> IndexOptions FancyIndexing AddDescription "One of our wonderful catalogs" catalog_summer.html / catalog_autumn.html IndexIgnore *.jpg IndexIgnore .. IndexIgnore icons HEADER README AddIconByType (CAT,icons/bomb.gif) text/* DefaultIcon icons/burst.gif </Directory>When you type ./go 1on the server and access http://www.butterthlies.com/ on the browser, you should see a rather fancy display:Index of / Name Last Modified Size Description -------------------------------------------------------------------- <bomb>catalog_autumn.html 23-Jul-1998 09:11 1k One of our wonderful catalogs <bomb>catalog_summer.html 25-Jul-1998 10:31 1k One of our wonderful catalogs <burst>index.html.ok 23-Jul-1998 09:11 1k --------------------------------------------------------------------In the previous listing,<bomb>and<burst>stand in for standard graphic icons Apache has at its disposal. How does all this work? As you can see from the httpd.conf file, this smart formatting is displayed directory by directory. The key directive isIndexOptions.IndexOptionsIndexOptions option [option] ... (Apache 1.3.2 and earlier) IndexOptions [+|-]option [[+|-]option] ... (Apache 1.3.3 and later) Server config, virtual host, directory, .htaccessThis directive is somewhat complicated, and its syntax varies drastically depending on your version of Apache.+/-syntax and merging of multipleEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making Better Indexes in Apache
- InhaltsvorschauThere is a wide range of possibilities; some are demonstrated at ... /site.fancyindex /httpd1.conf:
User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.fancyindex/htdocs <Directory /usr/www/APACHE3/site.fancyindex/htdocs> IndexOptions FancyIndexing AddDescription "One of our wonderful catalogs" catalog_summer.html / catalog_autumn.html IndexIgnore *.jpg IndexIgnore .. IndexIgnore icons HEADER README AddIconByType (CAT,icons/bomb.gif) text/* DefaultIcon icons/burst.gif </Directory>When you type ./go 1on the server and access http://www.butterthlies.com/ on the browser, you should see a rather fancy display:Index of / Name Last Modified Size Description -------------------------------------------------------------------- <bomb>catalog_autumn.html 23-Jul-1998 09:11 1k One of our wonderful catalogs <bomb>catalog_summer.html 25-Jul-1998 10:31 1k One of our wonderful catalogs <burst>index.html.ok 23-Jul-1998 09:11 1k --------------------------------------------------------------------In the previous listing,<bomb>and<burst>stand in for standard graphic icons Apache has at its disposal. How does all this work? As you can see from the httpd.conf file, this smart formatting is displayed directory by directory. The key directive isIndexOptions.IndexOptionsIndexOptions option [option] ... (Apache 1.3.2 and earlier) IndexOptions [+|-]option [[+|-]option] ... (Apache 1.3.3 and later) Server config, virtual host, directory, .htaccessThis directive is somewhat complicated, and its syntax varies drastically depending on your version of Apache.+/-syntax and merging of multipleIndexOptionsdirectives is only available with Apache 1.3.3 and later; theFoldersFirstandDescriptionWidthoptions are only available with Apache 1.3.10 and later; theTrackModifiedoption is only available with Apache 1.3.15 and later.TheIndexOptionsdirective specifies the behavior of the directory indexing.optioncan be one of the following:DescriptionWidth=[n | *] (Apache 1.3.10 and later)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making Our Own Indexes
- InhaltsvorschauIn the last section, we looked at Apache's indexing facilities. So far we have not been very adventurous with our own indexing of the document root directory. We replaced Apache's adequate directory listing with a custom-made .html file: index.html (see Chapter 3).We can improve on index.html with the
DirectoryIndexcommand. This command specifies a list of possible index files to be used in order.TheDirectoryIndexdirective sets the list of resources to look for when the client requests an index of the directory by specifying a / at the end of the directory name.DirectoryIndex local-url local-url ... Default: index.html Server config, virtual host, directory, .htaccesslocal-url is the URL of a document on the server relative to the requested directory; it is usually the name of a file in the directory. Several URLs may be given, in which case the server will return the first one that it finds. If none of the resources exists andIndexOptionsis set, the server will generate its own listing of the directory. For example, if this is the specification:DirectoryIndex index.html
then a request for http://myserver/docs/ would return http://myserver/docs/index.html if it did not exist; if it exists, the request would list the directory, provided indexing was allowed. Note that the documents do not need to be relative to the directory:DirectoryIndex index.html index.txt /cgi-bin/index.pl
This would cause the CGI script /cgi-bin/index.pl to be executed if neither index.html nor index.txt existed in a directory.A common technique for getting a CGI script to run immediately when a site is accessed is to declare it as theDirectoryIndex:DirectoryIndex /cgi-bin/my_start_script
If this is to work, redirection tocgi-binmust have been arranged usingScriptAliasorScriptAliasMatchhigher up in the Config file.The Config file from ... /site.ownindex is as follows:User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.ownindex/htdocs AddHandler cgi-script cgi Options ExecCGI indexes <Directory /usr/www/APACHE3/site.ownindex/htdocs/d1> DirectoryIndex hullo.cgi index.html goodbye </Directory> <Directory /usr/www/APACHE3/site.ownindex/htdocs/d2> DirectoryIndex index.html goodbye </Directory> <Directory /usr/www/APACHE3/site.ownindex/htdocs/d3> DirectoryIndex goodbye </Directory>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Imagemaps
- InhaltsvorschauWe have experimented with various sorts of indexing. Bearing in mind that words are going out of fashion in many circles, we may want to present an index as some sort of picture. In some circumstances, two dimensions may work much better than one; selecting places from a map, for instance, is a natural example. The objective here is to let the client user click on images or areas of images and to deduce from the position of the cursor at the time of the click what she wants to do next.Recently, browsers have improved in capability, and client-side mapping (built into the returned HTML document) is becoming more popular. If you want to use server-side image maps, however, Apache provides support. The httpd.conf in ... /site.imap is as follows:
User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.imap/htdocs AddHandler imap-file map ImapBase map ImapMenu Formatted
The three lines of note are the last.AddHandlersets upImageMaphandling using files with the extension .map. When you access the site you see the following:Index of / Parent Directory bench.jpg bench.map bench.map.bak default.html left.html right.html sides.html things
This index could be made simpler and more elegant by using some of the directives mentioned earlier. In the interest of keeping the Config file simple, we leave this as an exercise for the reader.Click on sides.html to see the action. The picture of the bench is presented: if you click on the left you see this:Index of /things Parent Directory 1 2 3
If you click on the righthand side, you see:you like to sit on the right
If you click outside one of the defined areas (as in ... /htdocs/sides.html), you see:You're clicking in the wrong place
The document we serve up is ... /htdocs/sides.html:<!DOCTYPE HTML PUBLIC "//-W3C//DTD HTML 4.0//EN" <html> <head> <title>Index to Butterthlies Catalogues</title> </head> <body> <h1>Welcome to Butterthlies Inc</h1> <h2>Which Side of the Bench?</h2> <p>Tell us on which side of the bench you like to sit </p> <hr> <p> <p align=center> <a href="bench.map"> <img ismap src="bench.jpg" alt="A picture of a bench"> </a> <p align=center> Click on the side you prefer </body> </html>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Image Map Directives
- InhaltsvorschauThe three image map directives let you specify how Apache handles serverside image maps.ImapBaseImapBase [map|referer|URL] Default: http://servername Server config, virtual host, directory, .htaccessThis directive sets the base URL for the
ImageMap, as follows:-
map - The URL of the
ImageMapitself. -
referer - The URL of the referring document. If this is unknown, http://servername/ is used.
- URL
- The specified URL.
If this directive is absent, the map base defaults to http://servername/, which is the same as theDocumentRootdirectory.ImapMenuImapMenu [none|formatted|semiformatted|unformatted] Server config, virtual host, directory, .htaccess Default: formattedThis directive applies if mapping fails or if the browser is incapable of displaying images. If the site is accessed using a text-based browser such as Lynx, a menu is displayed showing the possibilities in the .map file:MENU FOR /BENCH.MAP -------------------------------------- things right.htmlThis is formatted according to the argument given toImapMenu. The previous effect is produced byformatted. The manual explains the options as follows:-
formatted - A
formattedmenu is the simplest menu. Comments in theImageMapfile are ignored. A level-one header is printed, then a horizontal rule, and then the links, each on a separate line. The menu has a consistent, plain look close to that of a directory listing. -
semiformatted - In the
semiformattedmenu, comments are printed where they occur in theImageMapfile. Blank lines are turned into HTML breaks. No header or horizontal rule is printed, but otherwise the menu is the same as aformattedmenu. -
unformatted - Comments are printed; blank lines are ignored. Nothing is printed that does not appear in the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Chapter 8: Redirection
- InhaltsvorschauFew things are ever in exactly the right place at the right time, and this is as true of most web servers as of anything else.
AliasandRedirectallow requests to be shunted about your filesystem or around the Web. Although in a perfect world it should never be necessary to do this, in practice it is often useful to move HTML files around on the server — or even to a different server — without having to change all the links in the HTML document. A more legitimate use — ofAlias, at least — is to rationalize directories spread around the system. For example, they may be maintained by different users and may even be held on remotely mounted filesystems. ButAliascan make them appear to be grouped in a more logical way.A related directive,ScriptAlias, allows you to run CGI scripts, discussed in Chapter 16. You have a choice: everything thatScriptAliasdoes, and much more, can be done by the newRewritedirective (described later in this chapter), but at a cost of some real programming effort.ScriptAliasis relatively simple to use, but it is also a good example of Apache's modularity being a little less modular than we might like. AlthoughScriptAliasis defined in mod_alias.c in the Apache source code, it needs mod_cgi.c (or any module that does CGI) to function — it does, after all, run CGI scripts. mod_alias.c is compiled into Apache by default.Some care is necessary in arranging the order of all these directives in the Config file. Generally, the narrower choices should come first, with the "catch-all" versions at the bottom. Be prepared to move them around (restarting Apache each time, of course) until you get the effect you want.Our base httpd1.conf file on ... /site.alias, to which we will add some directives, contains the following:User webuser Group webgroup NameVirtualHost 192.168.123.2 <VirtualHost www.butterthlies.com> ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.alias/htdocs/customers ErrorLog /usr/www/APACHE3/site.alias/logs/error_log TransferLog /usr/www/APACHE3/site.alias/logs/access_log </VirtualHost> <VirtualHost sales.butterthlies.com> DocumentRoot /usr/www/APACHE3/site.alias/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/site.alias/logs/error_log TransferLog /usr/www/APACHE3/site.alias/logs/access_log </VirtualHost>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Alias
- InhaltsvorschauOne of the most useful directives is
Alias, which lets you store documents elsewhere. We can demonstrate this simply by creating a new directory, /usr/www/APACHE3/somewhere_else, and putting in it a file lost.txt, which has this message in it:I am somewhere else
httpd2.conf has an extra line:... Alias /somewhere_else /usr/www/APACHE3/somewhere_else ...
Stop Apache and run./go 2. From the browser, access http://www.butterthlies.com/somewhere_else/. We see the following:Index of /somewhere_else . Parent Directory . lost.txt
If we click onParentDirectory, we arrive at theDocumentRootfor this server, /usr/www/APACHE3/site.alias/htdocs/customers, not, as might be expected, at /usr/www/APACHE3. This is becauseParentDirectoryreally means "parent URL," which is http://www.butterthlies.com/ in this case.What sometimes puzzles people (even those who know about it but have temporarily forgotten) is that if you go to http://www.butterthlies.com/ and there's no ready-made index, you don't see somewhere_else listed.Note that you do not want to write:Alias /somewhere_else/ /usr/www/APACHE3/somewhere_else
The trailing / on the alias will prevent things working. To understand this, imagine that you start with a web server that has a subdirectory called fred in itsDocumentRoot. That is, there's a directory called /www/docs/fred, and the Config file says:DocumentRoot /www/docs
The URL http://your.webserver.com/fred fails because there is no file called fred. However, the request is redirected by Apache to http://your.webserver.com/fred/, which is then handled by looking for the directory index of /fred.So, if you have a web page that says:<a href="/fred">Take a look at fred</a>
it will work. When you click on "Take a look at fred," you get redirected, and your browser looks for:http://your.webserver.com/fred/
as its URL, and all is well.One day, you move fred to /some/where/else. You alter your Config file:Alias /fred/ /some/where/else
or, equally ill-advisedly:Alias /fred/ /some/where/else/
You put the trailing / on the aliases because you wanted to refer to a directory. But either will fail. Why?Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Rewrite
- InhaltsvorschauThe preceding section described the Alias module and its allies. Everything these directives can do, and more, can be done instead by mod_rewrite.c, an extremely compendious module that is almost a complete software product in its own right. But for simple tasks
Aliasand friends are much easier to use.The documentation is thorough, and the reader is referred tohttp://www.engelschall.com/pw/apache/rewriteguide/for any serious work. You should also look athttp://www.apache.org/docs/mod/mod_rewrite.html. This section is intended for orientation only.Rewritetakes a rewriting pattern and applies it to the URL. If it matches, a rewriting substitution is applied to the URL. The patterns are regular expressions familiar to us all in their simplest form — for example,mod.*\.c, which matches any module filename. The complete science of regular expressions is somewhat extensive, and the reader is referred to ... /src/regex/regex.7, a manpage that can be read withnroff-manregex.7(on FreeBSD, at least). Regular expressions are also described in the POSIX specification and in Jeffrey Friedl's Mastering Regular Expressions (O'Reilly, 2002).It might well be worth using Perl to practice with regular expressions before using them in earnest. To make complicated expressions work, it is almost essential to build them up from simple ones, testing each change as you go. Even the most expert find that convoluted regular expressions often do not work the first time.The essence of regular expressions is that a number of special characters can be used to match parts of incoming URLs. The substitutions available in mod_rewrite can include mapping functions that take bits of the incoming URL and look them up in databases or even apply programs to them. The rules can be applied repetitively and recursively to the evolving URL. It is possible (as the documentation says) to create "rewriting loops, rewriting breaks, chained rules, pseudo if-then-else constructs, forced redirects, forced MIME-types, forced proxy module throughout." The functionality is so extensive that it is probably impossible to master it in the abstract. When and if you have a problem of this sort, it looks as ifEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Speling
- InhaltsvorschauA useful module, mod_speling, has been added to the distribution. It corrects miscapitalizations — and many omitted, transposed, or mistyped characters in URLs corresponding to files or directories — by comparing the input with the filesystem. Note that it does not correct misspelled usernames.The
CheckSpellingdirective turns spell checking on and off.CheckSpelling [on|off] Anywhere
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 9: Proxying
- InhaltsvorschauThere are a few good reasons why you should not connect a busy web site straight to the Web:
- To get better performance by caching popular pages and distributing other requests among a number of servers.
- To improve security by giving the Bad Guys another stretch of defended ground to crawl over.
- To give local users, protected by a firewall, access to the great Web outside, as discussed in Chapter 11.
The answer is to use a proxy server, which can be either Apache itself or a specialized product like Squid.An important concern on the Web is keeping the Bad Guys out of your network (see Chapter 11). One established technique is to keep the network hidden behind a firewall; this works well, but as soon as you do it, it also means that everyone on the same network suddenly finds that their view of the Net has disappeared (rather like people living near Miami Beach before and after the building boom). This becomes an urgent issue at Butterthlies, Inc., as competition heats up and naughty-minded Bad Guys keep trying to break our security and get in. We install a firewall and, anticipating the instant outcries from the marketing animals who need to get out on the Web and surf for prey, we also install a proxy server to get them out there.So, in addition to the Apache that serves clients visiting our sites and is protected by the firewall, we need a copy of Apache to act as a proxy server to let us, in our turn, access other sites out on the Web. Without the proxy server, those inside are safe but blind.We are not concerned here with firewalls, so we take them for granted. The interesting thing is how we configure the proxy Apache to make life with a firewall tolerable to those behind it.site.proxy has three subdirectories: cache, proxy, real. The Config file from ... /site. proxy/proxy is as follows:User webuser Group webgroup ServerName www.butterthlies.com Port 8000 ProxyRequests on CacheRoot /usr/www/APACHE3/site.proxy/cache CacheSize 1000
The points to notice are as follows:- On this site we use
ServerName
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Security
- InhaltsvorschauAn important concern on the Web is keeping the Bad Guys out of your network (see Chapter 11). One established technique is to keep the network hidden behind a firewall; this works well, but as soon as you do it, it also means that everyone on the same network suddenly finds that their view of the Net has disappeared (rather like people living near Miami Beach before and after the building boom). This becomes an urgent issue at Butterthlies, Inc., as competition heats up and naughty-minded Bad Guys keep trying to break our security and get in. We install a firewall and, anticipating the instant outcries from the marketing animals who need to get out on the Web and surf for prey, we also install a proxy server to get them out there.So, in addition to the Apache that serves clients visiting our sites and is protected by the firewall, we need a copy of Apache to act as a proxy server to let us, in our turn, access other sites out on the Web. Without the proxy server, those inside are safe but blind.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Proxy Directives
- InhaltsvorschauWe are not concerned here with firewalls, so we take them for granted. The interesting thing is how we configure the proxy Apache to make life with a firewall tolerable to those behind it.site.proxy has three subdirectories: cache, proxy, real. The Config file from ... /site. proxy/proxy is as follows:
User webuser Group webgroup ServerName www.butterthlies.com Port 8000 ProxyRequests on CacheRoot /usr/www/APACHE3/site.proxy/cache CacheSize 1000
The points to notice are as follows:- On this site we use
ServerNamewww.butterthlies.com. - The
Portnumber is set to 8000 so we don't collide with the real web server running on the same machine. - We turn
ProxyRequestsonand provide a directory for the cache, which we will discuss later in this chapter. CacheRootis set up in a special directory.CacheSizeis set to 1000 kilobytes.
AllowCONNECTAllowCONNECT port [port] ... AllowCONNECT 443 563 Server config, virtual host Compatibility: AllowCONNECT is only available in Apache 1.3.2 and later.TheAllowCONNECTdirective specifies a list of port numbers to which the proxy CONNECT method may connect. Today's browsers use this method when a https connection is requested and proxy tunneling over http is in effect.By default, only the default https port (443) and the default snews port (563) are enabled. Use theAllowCONNECTdirective to override this default and allow connections to the listed ports only.ProxyRequestsProxyRequests [on|off] Default: off Server configThis directive turns proxy serving on. Even ifProxyRequestsisoff,ProxyPassdirectives are still honored.ProxyRemoteProxyRemote match remote-server Server configThis directive defines remote proxies to this proxy (that is, proxies that should be used for some requests instead of being satisfied directly). match is either the name of a URL scheme that the remote server supports, a partial URL for which the remote server should be used, orEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Apparent Bug
- InhaltsvorschauWhen a server is set up as a proxy, then requests of the form:
GET http://someone.else.com/ HTTP/1.0
are accepted and proxied to the appropriate web server. By default, Apache does not proxy, but it can appear that it is prepared to — requests like the previous will be accepted and handled by the default configuration. Apache assumes that someone.else.com is a virtual host on the current machine. People occasionally think this is a bug, but it is, in fact, correct behavior. Note that pages served will be the same as those that would be served for any real unknown virtual host on the same machine, so this does not pose a security risk.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Performance
- InhaltsvorschauThe proxy server's performance can be improved by caching incoming pages so that the next time one is called for, it can be served straight up without having to waste time going over the Web. We can do the same thing for outgoing pages, particularly pages generated on the fly by CGI scripts and database accesses (bearing in mind that this can lead to stale content and is not invariably desirable).Another reason for using a proxy server is to cache data from the Web to save the bandwidth of the world's clogged telephone systems and therefore to improve access time on our server. Note, however, that it in practice it often saves bandwidth at the expense of increased access times.The directive
CacheRoot, cunningly inserted in the Config file shown earlier, and the provision of a properly permissioned cache directory allow us to show this happening. We start by providing the directory ... / site.proxy/cache, and Apache then improves on it with some sort of directory structure like ... /site.proxy/cache/d/o/j/gfqbZ@49rZiy6LOCw.The file gfqbZ@49rZiy6LOCw contains the following:320994B6 32098D95 3209956C 00000000 0000001E X-URL: http://192.168.124.1/message HTTP/1.0 200 OK Date: Thu, 08 Aug 1996 07:18:14 GMT Server: Apache/1.1.1 Content-length: 30 Last-modified Thu, 08 Aug 1996 06:47:49 GMT I am a web site far out there
Next time someone wants to access http://192.168.124.1/message, the proxy server does not have to lug bytes over the Web; it can just go and look it up.There are a number of housekeeping directives that help with caching.CacheRootCacheRoot directory Default: none Server config, virtual hostThis directive sets thedirectoryto contain cache files; must be writable by Apache.CacheSizeCacheSize size_in_kilobytes Default: 5 Server config, virtual hostThis directive sets the size of the cache area in kilobytes. More may be stored temporarily, but garbage collection reduces it to less than the set number.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Setup
- InhaltsvorschauThe cache directory for the proxy server has to be set up rather carefully with owner webuser and group webgroup, since it will be accessed by that insignificant person (see Chapter 2).You now have to tell your browser that you are going to be accessing the Web via a proxy. For example, in Netscape click on Edit → Preferences → Advanced → Proxies tab → Manual Proxy Configuration. Click on View,and in the HTTP box enter the IP address of our proxy, which is on the same network, 192.168.123, as our copy of Netscape:
192.168.123.4
Enter8000in the Port box.For Microsoft Internet Explorer, select View → Options → Connection tab, check the Proxy Server checkbox, then click the Settings button, and set up the HTTP proxy as described previously. That is all there is to setting up a real proxy server.You might want to set up a simulation to watch it in action, as we did, before you do the real thing. However, it is not that easy to simulate a proxy server on one desktop, and when we have simulated it, the elements play different roles from those they have supported in demonstrations so far. We end up with four elements:- Netscape running on a Windows 95 machine. Normally this is a person out there on the Web trying to get at our sales site; now, it simulates a Butterthlies member trying to get out.
- An imaginary firewall.
- A copy of Apache (site: ... / site.proxy/proxy) running on the FreeBSD machine as a proxy server to the Butterthlies site.
- Another copy of Apache, also running on FreeBSD (site: ... / site.proxy/real ) that simulates another web site "out there" that we are trying to access. We have to imagine that the illimitable wastes of the Web separate it from us.
The configuration in ... /site.proxy/proxy is as shown earlier. Since the proxy server is running on a machine notionally on the other side of the Web from the machine running ... /site.proxy/real, we need to put it on another port, traditionally 8000.The configuration file in ... /proxy/real is:User webuser Group webgroup ServerName www.faraway.com Listen www.faraway.com:80 DocumentRoot /usr/www/APACHE3/site.proxy/real/htdocs
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 10: Logging
- InhaltsvorschauA good maxim of war is "know your enemy," and the same advice applies to business. You need to know your customers or, on a web site, your visitors. Everything you can know about them is in the Environment variables (discussed in Chapter 16) that Apache gets from the incoming request. Apache's logging directives, which are explained in this chapter, extract whichever elements of this data you want and write them to log files.However, this is often not very useful data in itself. For instance, you may well want to track the repeated visits of individual customers as revealed by their cookie trail. This means writing rather tricky CGI scripts to read in great slabs of log file, break them into huge, multilevel arrays, and search the arrays to track the data you want.If your site uses a database manager, you could sidestep this cumbersome procedure by writing scripts on the fly to log everything you want to know about your visitors, reading data about them from the environment variables, and recording their choices as they work through the site. Depending on your needs, it can be much easier to log the data directly than to mine it out of the log files. For instance, one of the authors (PL) has a medical encyclopedia web site (
www.Medic-Planet.com). Simple Perl scripts write database records to keep track of the following:- How often each article has been read
- How visitors got to it
- How often search engine spiders visit and who they are
- How often visitors click through the many links on the site and where they go
Having stored this useful information in the database manager, it is then not hard to write a script, accessed via an SSL connection (see Chapter 11), which can only be accessed by the site management to generate HTML reports with totals and statistics that illuminate marketing problems.Apache offers a wide range of options for controlling the format of the log files. In line with current thinking, older methods (RefererLog,AgentLog, andCookieLog) have now been replaced by the config_log_module. To illustrate this, we have takenEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Logging by Script and Database
- InhaltsvorschauIf your site uses a database manager, you could sidestep this cumbersome procedure by writing scripts on the fly to log everything you want to know about your visitors, reading data about them from the environment variables, and recording their choices as they work through the site. Depending on your needs, it can be much easier to log the data directly than to mine it out of the log files. For instance, one of the authors (PL) has a medical encyclopedia web site (
www.Medic-Planet.com). Simple Perl scripts write database records to keep track of the following:- How often each article has been read
- How visitors got to it
- How often search engine spiders visit and who they are
- How often visitors click through the many links on the site and where they go
Having stored this useful information in the database manager, it is then not hard to write a script, accessed via an SSL connection (see Chapter 11), which can only be accessed by the site management to generate HTML reports with totals and statistics that illuminate marketing problems.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Apache's Logging Facilities
- InhaltsvorschauApache offers a wide range of options for controlling the format of the log files. In line with current thinking, older methods (
RefererLog,AgentLog, andCookieLog) have now been replaced by the config_log_module. To illustrate this, we have taken ... /site.authent and copied it to ... /site.logging so that we can play with the logs:User webuser Group webgroup ServerName www.butterthlies.com IdentityCheck on NameVirtualHost 192.168.123.2 <VirtualHost www.butterthlies.com> LogFormat "customers: host %h, logname %l, user %u, time %t, request %r, status %s,bytes %b," CookieLog logs/cookies ServerAdmin sales@butterthlies.com DocumentRoot /usr/www/APACHE3/site.logging/htdocs/customers ServerName www.butterthlies.com ErrorLog /usr/www/APACHE3/site.logging/logs/customers/error_log TransferLog /usr/www/APACHE3/site.logging/logs/customers/access_log ScriptAlias /cgi_bin /usr/www/APACHE3/cgi_bin </VirtualHost> <VirtualHost sales.butterthlies.com> LogFormat "sales: agent %{httpd_user_agent}i, cookie: %{http_Cookie}i, referer: %{Referer}o, host %!200h, logname %!200l, user %u, time %t, request %r, status %s,bytes %b," CookieLog logs/cookies ServerAdmin sales_mgr@butterthlies.com DocumentRoot /usr/www/APACHE3/site.logging/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/site.logging/logs/salesmen/error_log TransferLog /usr/www/APACHE3/site.logging/logs/salesmen/access_log ScriptAlias /cgi_bin /usr/www/APACHE3/cgi_bin <Directory /usr/www/APACHE3/site.logging/htdocs/salesmen> AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups require valid-user </Directory> <Directory /usr/www/APACHE3/cgi_bin> AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups #AuthDBMUserFile /usr/www/APACHE3/ok_dbm/sales #AuthDBMGroupFile /usr/www/APACHE3/ok_dbm/groups require valid-user </Directory> </VirtualHost>There are a number of directives.ErrorLogErrorLog filenameEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Configuration Logging
- InhaltsvorschauApache is able to report to a client a great deal of what is happening to it internally. The necessary module is contained in the mod_info.c file, which should be included at build time. It provides a comprehensive overview of the server configuration, including all installed modules and directives in the configuration files. This module is not compiled into the server by default. To enable it, either load the corresponding module if you are running Win32 or Unix with DSO support enabled, or add the following line to the server build Config file and rebuild the server:
AddModule modules/standard/mod_info.o
It should also be noted that if mod_info is compiled into the server, its handler capability is available in all configuration files, including per-directory files (e.g., .htaccess). This may have security-related ramifications for your site. To demonstrate how this facility can be applied to any site, the Config file on .../site.info is the .../site.authent file slightly modified:User webuser Group webgroup ServerName www.butterthlies.com NameVirtualHost 192.168.123.2 LogLevel debug <VirtualHost www.butterthlies.com> #CookieLog logs/cookies AddModuleInfo mod_setenvif.c "This is what I've added to mod_setenvif" ServerAdmin sales@butterthlies.com DocumentRoot /usr/www/APACHE3/site.info/htdocs/customers ServerName www.butterthlies.com ErrorLog /usr/www/APACHE3/site.info/logs/error_log TransferLog /usr/www/APACHE3/site.info/logs/customers/access_log ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin <Location /server-info> SetHandler server-info </Location> </VirtualHost> <VirtualHost sales.butterthlies.com> CookieLog logs/cookies ServerAdmin sales_mgr@butterthlies.com DocumentRoot /usr/www/APACHE3/site.info/htdocs/salesmen ServerName sales.butterthlies.com ErrorLog /usr/www/APACHE3/site.info/logs/error_log TransferLog /usr/www/APACHE3/site.info/logs/salesmen/access_log ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin <Directory /usr/www/APACHE3/site.info/htdocs/salesmen> AuthType Basic #AuthType Digest AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups #AuthDBMUserFile /usr/www/APACHE3/ok_dbm/sales #AuthDBMGroupFile /usr/www/APACHE3/ok_dbm/groups #AuthDigestFile /usr/www/APACHE3/ok_digest/sales require valid-user satisfy any order deny,allow allow from 192.168.123.1 deny from all #require user daphne bill #require group cleaners #require group directors </Directory> <Directory /usr/www/APACHE3/cgi-bin> AuthType Basic AuthName darkness AuthUserFile /usr/www/APACHE3/ok_users/sales AuthGroupFile /usr/www/APACHE3/ok_users/groups #AuthDBMUserFile /usr/www/APACHE3/ok_dbm/sales #AuthDBMGroupFile /usr/www/APACHE3/ok_dbm/groups require valid-user </Directory> </VirtualHost>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Status
- InhaltsvorschauIn a similar way, Apache can be persuaded to cough up comprehensive diagnostic information by including and invoking the module mod_status:
AddModule modules/standard/mod_status.o
This produces invaluable information for the webmaster of a busy site, enabling her to track down problems before they become disasters. However, since this is really our own business, we don't want the unwashed mob out on the Web jostling to see our secrets. To protect the information, we therefore restrict it to a whole or partial IP address that describes our own network and no one else's.For this exercise, which includesinfoas previously, the httpd.conf in ... /site.status file should look like this:User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.status/htdocs ExtendedStatus on <Location /status> order deny,allow allow from 192.168.123.1 deny from all SetHandler server-status </Location> <Location /info> order deny,allow allow from 192.168.123.1 deny from all SetHandler server-status SetHandler server-info </Location>
Theallowfromdirective keeps our laundry private.Remember the wayorderworks: the last entry has the last word. Notice also the use ofSetHandler, which sets a handler for all requests to a directory, instead ofAddHandler, which specifies a handler for particular file extensions. If you then access www.butterthlies.com/status, you get this response:Apache Server Status for www.butterthlies.com Server Version: Apache/1.3.14 (Unix) Server Built: Feb 13 2001 15:20:23 Current Time: Tuesday, 13-Feb-2001 16:03:30 GMT Restart Time: Tuesday, 13-Feb-2001 16:01:49 GMT Parent Server Generation: 0 Server uptime: 1 minute 41 seconds Total accesses: 21 - Total Traffic: 49 kB CPU Usage: u.0703125 s.015625 cu0 cs0 - .0851% CPU load .208 requests/sec - 496 B/second - 2389 B/request 1 requests currently being processed, 5 idle servers _W___ _.......................................................... ................................................................ ................................................................ ................................................................ Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Reading Request, "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup, "L" Logging, "G" Gracefully finishing, "." Open slot with no current process Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request 0-0 2434 0/1/1 _ 0.01 93 5 0.0 0.00 0.00 192.168.123.1 www.butterthlies.com GET /status HTTP/1.1 1-0 2435 20/20/20 W 0.08 1 0 47.1 0.05 0.05 192.168.123.1 www.butterthlies.com GET /status?refresh=2 HTTP/1.1 Srv Child Server number - generation PID OS process ID Acc Number of accesses this connection / this child / this slot M Mode of operation CPU CPU usage, number of seconds SS Seconds since beginning of most recent request Req Milliseconds required to process most recent request Conn Kilobytes transferred this connection Child Megabytes transferred this child Slot Total megabytes transferred this slot
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 11: Security
- InhaltsvorschauThe operation of a web server raises several security issues. Here we look at them in general terms; later on, we will discuss the necessary code in detail.We are no more anxious to have unauthorized people in our computer than to have unauthorized people in our house. In the ordinary way, a desktop PC is pretty secure. An intruder would have to get physically into your house or office to get at the information in it or to damage it. However, once you connect to a public telephone network through a modem, cable modem, or wireless network, it's as if you moved your house to a street with 50 million close neighbors (not all of them desirable), tore your front door off its hinges, and went out leaving the lights on and your children in bed.A complete discussion of computer security would fill a library. However, the meat of the business is as follows. We want to make it impossible for strangers to copy, alter, or erase any of our data. We want to prevent strangers from running any unapproved programs on our machine. Just as important, we want to prevent our friends and legitimate users from making silly mistakes that may have consequences as serious as deliberate vandalism. For instance, they can execute the command:
rm -f -r *
and delete all their own files and subdirectories, but they won't be able to execute this dramatic action in anyone else's area. One hopes no one would be as silly as that, but subtler mistakes can be as damaging.As far as the system designer is concerned, there is not a lot of difference between villainy and willful ignorance. Both must be guarded against.We look at basic security as it applies to a system with a number of terminals that might range from 2 to 10,000, and then we see how it can be applied to a web server. We assume that a serious operating system such as Unix is running.
We do not include Win32 in this chapter, even though Apache now runs on it, because it is our opinion that if you care about security you should not be using Win32. That is not to say that Win32 has no security, but it is poorly documented, understood by vech06 ry few people, and constantly undermined by bugs and dubious practices (such as advocating ActiveX downloads from the Web).Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Internal and External Users
- InhaltsvorschauAs we have said, most serious operating systems, including Unix, provide security by limiting the ability of each user to perform certain operations. The exact details are unimportant, but when we apply this principle to a web server, we clearly have to decide who the users of the web server are with respect to the security of our network sheltering behind it. When considering a web server's security, we must recognize that there are essentially two kinds of users: internal and external.The internal users are those within the organization that owns the server (or, at least, the users the owners wish to update server content); the external ones inhabit the rest of the Internet. Of course, there are many levels of granularity below this one, but here we are trying to capture the difference between users who are supposed to use the HTTP server only to browse pages (the external users) and users who may be permitted greater access to the web server (the internal users).We need to consider security for both of these groups, but the external users are more worrisome and have to be more strictly controlled. It is not that the internal users are necessarily nicer people or less likely to get up to mischief. In some ways, they are more likely to create trouble, having motive and knowledge, but, to put it bluntly, we know (mostly) who signs their paychecks and where they live. The external users are usually beyond our vengeance.In essence, by connecting to the Internet, we allow anyone in the world to become an external user and type anything she likes on our server's keyboard. This is an alarming thought: we want to allow them to do a very small range of safe things and to make sure that they cannot do anything outside that range. This desire has a couple of implications:
- External users should only have to access those files and programs we have specified and no others.
- The server should not be vulnerable to sneaky attacks, like asking for a page with a 1 MB name (the Bad Guy hopes that a name that long might overflow a fixed-length buffer and trash the stack) or with funny characters (like !, #, or /) included in the page name that might cause part of it to be construed as a command by the server's operating system, and so on. These scenarios can be avoided only by careful programming. Apache's approach to the first problem is to avoid using fixed-size buffers for anything but fixed-size data; it sounds simple, but really it costs a lot of painstaking work. The other problems are dealt with case by case, sometimes after a security breach has been identified, but most often just by careful thought on the part of Apache's coders.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Binary Signatures, Virtual Cash
- InhaltsvorschauIn the long term, we imagine that one of the most important uses of cryptography will be providing virtual money or binary cash; from another point of view, this could mean making digital signatures, and therefore electronic checks, possible.At first sight, this seems impossible. The authority to issue documents such as checks is proved by a signature. Simple as it is, and apparently open to fraud, the system does actually work on paper. We might transfer it literally to the Web by scanning an image of a person's signature and sending that to validate her documents. However, whatever security that was locked to the paper signature has now evaporated. A forger simply has to copy the bit pattern that makes up the image, store it, and attach it to any of his purchases to start free shopping.The way to write a digital signature is to perform some action on data provided by the other party that only you could have performed, thereby proving you are who you say. We will look at what this action might be, as follows.The ideas of public key (PK) encryption are pretty well known by now, so we will just skim over the salient points. You have two keys: one (your public key) that encrypts messages and one (your private key) that decrypts messages encrypted with your public key (and vice versa). Unlike conventional encryption and decryption, you can encrypt either your private or public key and decrypt with the other.You give the public key to anyone who asks and keep your private key secret. Because the keys for encryption and decryption are not the same, the system is also called asymmetric key encryption .So the "action" mentioned earlier, to prove you are who you say you are, would be to encrypt some piece of text using your private decryption key. Anyone can then decrypt it using your public key. If it decrypts to meaningful text, it came from you, otherwise not.For instance, let's apply the technology to a simple matter of the heart. You subscribe to a lonely hearts newsgroup where people describe their attractions and their willingness to engage with persons of complementary romantic desires. The person you fancy publishes his or her public key at the bottom of the message describing his or her attractions. You reply:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Certificates
- Inhaltsvorschau"No man is an island," John Donne reminds us. We do not practice cryptography on our own: there would be little point. Even in the simple situation of the spy and his spymaster, it is important to be sure you are actually talking to the correct person. Many counter-intelligence operations depend on capturing the spy and replacing him at the encrypting station with one of their own people to feed the enemy with twaddle. This can be annoying and dangerous for the spymaster, so he often teaches his spies little tricks that he hopes the captors will overlook and so betray themselves.In the larger cryptographic world of the Web, the problem is as acute. When we order a pack of cards from www.butterthlies.com, we want to be sure the company accepting our money really is that celebrated card publisher and not some interloper; similarly, Butterthlies, Inc., wants to be sure that we are who we say we are and that we have some sort of credit account that will pay for their splendid offerings. The problems are solved to some extent by the idea of a certificate. A certificate is an electronic document signed (i.e., having a secure hash of it encrypted using a private key, which can therefore be checked with the public key) by some respectable person or company called a certification authority (CA). It contains the holder's public key plus information about her: name, email address, company, and so on (see Chapter 11, later in this chapter). You get this document by filling in a certificate request form issued by some CA; after you have crossed their palm with silver and they have applied whatever level of verification they deem appropriate — which may be no more than telephoning the number you have given them to see if "you" answer the phone — they send you back the data file.In the future, the certification authority itself may hold a certificate from some higher-up CA, and so on, back to a CA that is so august and immensely respectable that it can sign its own certificate. (In the absence of a corporeal deity, some human has to do this.) This certificate is known as aEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Firewalls
- InhaltsvorschauIt is well known that the Web is populated by mean and unscrupulous people who want to mess up your site. Many conservative citizens think that a firewall is the way to stop them. The purpose of a firewall is to prevent the Internet from connecting to arbitrary machines or services on your own LAN/WAN. Another purpose, depending on your environment, may be to stop users on your LAN from roaming freely around the Internet.The term firewall does not mean anything standard. There are lots of ways to achieve the objectives just stated. Two extremes are presented in this section, and there are lots of possibilities in between. This is a big subject: here we are only trying to alert the webmaster to the problems that exist and to sketch some of the ways to solve them. For more information on this subject, see Building Internet Firewalls, by D. Brent Chapman and Elizabeth D. Zwicky (O'Reilly, 2000).This technique is the simplest firewall. In essence, you restrict packets that come in from the Internet to safe ports. Packet-filter firewalls are usually implemented using the filtering built into your Internet router. This means that no access is given to ports below 1024 except for certain specified ones connecting to safe services, such as SMTP, NNTP, DNS, FTP, and HTTP. The benefit is that access is denied to potentially dangerous services, such as the following:
- finger
- Gives a list of logged-in users, and in the process tells the Bad Guys half of what they need to log in themselves.
- exec
- Allows the Bad Guy to run programs remotely.
- TFTP
- An almost completely security-free file-transfer protocol. The possibilities are horrendous!
The advantages of packet filtering are that it's quick and easy. But there are at least two disadvantages:- Even the standard services can have bugs allowing access. Once a single machine is breached, the whole of your network is wide open. The horribly complex program
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Legal Issues
- InhaltsvorschauIn earlier editions of this book, legal issues to do with security filled a good deal of space. Happily, things are now a great deal simpler. The U.S. Government has dropped its unenforceable objections to strong cryptography. The French Government, which had outlawed cryptography of any sort in France, has now adopted a more practical stance and tolerates it. Most other countries in the world seem to have no strong opinions except for the British Government, which has introduced a law making it an offence not to decrypt a message when ordered to by a Judge and making ISPs responsible for providing "back-door" access to their client's communications. Dire results are predicted from this Act, but at the time of writing nothing of interest had happened.One difficulty with trying to criminalize the use of encrypted files is that they cannot be positively identified. An encrypted message may be hidden in an obvious nonsense file, but it may also be hidden in unimportant bits in a picture or a piece of music or something like that. (This is called steganography.) Conversely, a nonsense file may be an encrypted message, but it may also be a corrupt ordinary file or a proprietary data file whose format is not published. There seems to be no reliable way of distinguishing between the possibilities except by producing a decode. And the only person who can do that is the "criminal," who is not likely to put himself in jeopardy.On the patent front things have also improved. The RSA patent — which, because it concerned software, was only valid in the U.S. — divided the world into two incompatible blocks. However, it expired in the year 2000, and so removed another legal hurdle to the easy exchange of cryptographic methods.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Secure Sockets Layer (SSL)
- InhaltsvorschauApache 1.3 has never had SSL shipped with the standard source, which is mostly a legacy of U.S. export laws. The Apache Software Foundation decided, while 2.0 was being written, to incorporate SSL in the future, and so 2.0 now has SSL built in out-of-the-box. Unfortunately, our preferred solution for Apache 1.3, Apache-SSL, is rather different from Apache 2.0's native solution, mod_ssl, so we have a section for each.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Apache's Security Precautions
- InhaltsvorschauApache addresses these problems as follows:
- When Apache starts, it connects to the network and creates numerous copies of itself. These copies immediately shift identity to that of a safer user, in the case of our examples, the feeble webusers of webgroup (see Chapter 2). Only the original process retains the superuser identity, but only the new processes service network requests. The original process never handles the network; it simply oversees the operation of the child processes, starting new ones as needed and killing off excess ones as network load decreases.
- Output to shells is carefully tested for dangerous characters, but this only half solves the problem. The writers of CGI scripts (see Chapter 13) must be careful to avoid the pitfalls too.
For example, consider the simple shell script:#!/bin/sh cat /somedir/$1
You can imagine using something like this to show the user a file related to an item she picked off a menu, for example. Unfortunately, it has a number of faults. The most obvious one is that causing$1to be"../etc/passwd"will result in the server displaying /etc/passwd! Suppose you fix that (which experience has shown to be nontrivial in itself ), then there's another problem lurking — if$1is"xx /etc/passwd", then /somedir/xx and /etc/passwd would both be displayed. As you can see, both care and imagination are required to be completely secure. Unfortunately, there is no hard-and-fast formula — though generally speaking confirming that script inputs only have the desired characters (we advise sticking strictly to alphanumeric) is a very good starting point.Internal users present their own problems. The main one is that they want to write CGI scripts to go with their pages. In a typical installation, the client, dressed as Apache (webuser of webgroup), does not have high enough permissions to run those scripts in any useful way. This can be solved with suEXEC (see the section Section 16.6).The object of what follows is to make a version of Apache 1.3.X that handles the HTTPS (HTTP over SSL) protocol. Currently, this is only available in Unix versions, and given the many concerns that exist over the security of Win32, there seems little point in trying to implement SSL in the Win32 version of Apache.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - SSL Directives
- InhaltsvorschauApache-SSL's directives for Apache v1.3 follow, with the new ones introduced by v2 after that. Then there is a small section at the end of the chapter concerning cipher suites.SSLDisableSSLDisable Server config, virtual host Not available in Apache v2This directive disables SSL. This directive is useful if you wish to run both secure and nonsecure hosts on the same server. Conversely, SSL can be enabled with
SSLEnable. We suggest that you use this directive at the start of the file before virtual hosting is specified.SSLEnableSSLEnable Server config, virtual host Not available in Apache v2This directive enables SSL. The default; but if you've usedSSLDisablein the main server, you can enable SSL again for virtual hosts using this directive.SSLRequireSSLSSLRequireSSL Server config, .htaccess, virtual host, directory Apache v1.3, v2This directive requires SSL. This can be used in<Directory>sections (and elsewhere) to protect against inadvertently disabling SSL. If SSL is not in use when this directive applies, access will be refused. This is a useful belt-and-suspenders measure for critical information.SSLDenySSLSSLDenySSL Server config, .htaccess, virtual host, directory Not available in Apache v2The obverse of SSLRequireSSL, this directive denies access if SSL is active. You might want to do this to maintain the server's performance. In a complicated Config file, a section might inadvertently have SSL enabled and would slow things down: this directive would solve the problem — in a crude way.SSLCacheServerPathSSLCacheServerPath filename Server config Not available in Apache v2This directive specifies the path to the global cache server, gcache. It can be absolute or relative to the server root.SSLCacheServerRunDirSSLCacheServerRunDir directory Server config Not available in Apache v2This directive sets the directory in which gcache runs, so that it can produce core dumps during debugging.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Cipher Suites
- InhaltsvorschauThe SSL protocol does not restrict clients and servers to a single encryption brew for the secure exchange of information. There are a number of possible cryptographic ingredients, but as in any cookpot, some ingredients go better together than others. The seriously interested can refer to Bruce Schneier's Applied Cryptography (John Wiley & Sons, 1995), in conjunction with the SSL specification (from
http://www.netscape.com/). The list of cipher suites is in the OpenSSL software at ... /ssl/ssl.h. The macro names give a better idea of what is meant than the text strings.SSLRequiredCiphersSSLRequiredCiphers cipher-list Server config, virtual hostl Not available in Apache v2This directive specifies a colon-separated list of cipher suites, used by OpenSSL to limit what the client end can do. Possible suites are listed Table 11-3. This is a per-server option. For example:SSLRequiredCiphers RC4-MD5:RC4-SHA:IDEA-CBC-MD5:DES-CBC3-SHA
Table 11-3: Cipher suites for Apache v1.3 OpenSSL nameConfig nameKeysizeEncrypted-KeysizeSSL3_TXT_RSA_IDEA_128_SHA
IDEA-CBC-SHA
128128SSL3_TXT_RSA_NULL_MD5
NULL-MD5
00SSL3_TXT_RSA_NULL_SHA
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Security in Real Life
- InhaltsvorschauThe problems of security are complex and severe enough that those who know about it reasonably say that people who do not understand it should not mess with it. This is the position of one of us (BL). The other (PL) sees things more from the point of view of the ordinary web master who wants to get his wares before the public. Security of the web site is merely one of many problems that have to be solved.It is rather as if you had to take a PhD in combustion technology before you could safely buy and operate a motor car. The motor industry was like that around 1900 — it has moved on since then.In earlier editions we rather cravenly ducked the practical questions, referring the reader to other authorities. However, we feel now that things have settled down enough that a section on what the professionals call "cookbook security" would be helpful. We would not suggest that you read this and then set up an online bank. However, if your security concerns are simply to keep casual hackers and possible business rivals out of the back room, then this may well be good enough.Most of us need a good lock on the front door, and over the years we have learned how to choose and fit such a lock. Sadly this level of awareness has not yet developed on the Web. In this section we deal with a good, ordinary door lock — the reactive letter box is left to a later stage.The first problem in security is to know with whom you are dealing. The client's concerns about the site's identity ("Am I sending my money to the real MegaBank or a crew of clowns in Bogota?") should be settled by a server certificate as described earlier.You, as the webmaster, may well want to be sure that the person who logs on as one of your valued clients really is that person and not a cunning clown.Without any extra effort, SSL encrypts both your data and your Basic Authentication passwords (see Chapter 5) as they travel over the Web. This is a big step forward in security. Bad Guys trying to snoop on our traffic should be somewhat discouraged. But we rely on a password to prove that it isn't a Bad Guy at the client end. We can improve on that with Client Certificates.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Future Directions
- InhaltsvorschauOne of the fundamental problems with computer and network security is that we are trying to bolt it onto systems that were not really designed for the purpose. Although Unix doesn't do a bad job, a vastly better one is clearly possible. We though we'd mention a few things that we think might improve matters in the future.The first one we should mention is the NSA's Security Enhanced Linux. This is a version of Linux that allows very fine-grained access control to various resources, including files, interprocess communication and so forth. One of its attractions is that you don't have to change your way of working completely to improve your security. Find out more at
http://www.nsa.gov/selinux/.EROS is the Extremely Reliable Operating System. It uses things called capabilities (not to be confused with POSIX capabilities, which are something else entirely) to give even more fine-grained control over absolutely everything. We think that EROS is a very promising system that may one day be used widely for high-assurance systems. At the moment, unfortunately, it is still very much experimental, though we expect to use it seriously soon. The downside of capability systems is that they require you to think rather differently about your programming — though not so differently that we believe it is a serious barrier. A bigger barrier is that it is almost impossible to port existing code to exploit EROS' capabilities properly, but even so, using them in conjunction with existing code is likely to prove of considerable benefit. Read more athttp://www.eros-os.org/.E is a rather fascinating beast. It is essentially a language designed to allow you to use capabilities in an intuitive way — and also to make them work in a distributed system. It has many remarkable properties, but probably the best way to find out about it is to read "E in a Walnut" — which can be found, along with E, athttp://www.erights.org/.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 12: Running a Big Web Site
- InhaltsvorschauIn this chapter we try to bring together the major issues that should concern the webmaster in charge of a big site. Of course, the bigger the site, the more diverse the issues that have to be thought about, so we do not at all claim to cover every possible problem. What follows is a bare minimum, most of which just refers to topics that have already been covered elsewhere in this book.Each machine should be set up with the following:
- The current, stable versions of the operating system and all the supporting software, such as Apache, database manager, scripting language, etc. It is obviously essential that all machines on the site should be running the same versions of all these products.
- Currently working TCP/IP layer with all up-to-date patches.
- The correct time: since elements of the HTTP protocol use the time of day — it is worth using Unix's
xntpd(http://www.eecis.udel.edu/~ntp/), Win32'sntpdate(http://www.eecis.udel.edu/~ntp/ntp_spool/html/ntpdate.html), or Tardis (http://www.kaska.demon.co.uk) to make sure your machines keep accurate time.
There are many changing aspects to securing a server, but the following points should get you started. All of these need to be checked regularly and by someone other than the normal sys admin. Two sets of eyes find more problems, and an independent and knowledgeable review ensures trust.The root password on your server is the linchpin of your security. Do not let people write it on the wall over their monitors or otherwise expose it.File security is a fundamental aspect of web server security. These are rules to follow for file positions and ownership:- Files should not be owned by the user(s) that services (http, ftpd, sendmail...) run as — each service should have its own user. Ideally, ownership of files and services should be as finely divided as possible — for instance, the user that the Apache daemon runs as should probably be different from the user that owns its configuration files — this prevents the server from changing its own configuration even if someone does manage to subvert it. Each service should also have its own user, to increase the difficulty of attacks that use multiple servers. (With different users, it is likely that files dropped off using one server can't be accessed from another, for example). Qmail, a secure mail server, for instance, uses no less than six different users for different parts of its service, and its configuration files are owned by yet another user, usually root.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Machine Setup
- InhaltsvorschauEach machine should be set up with the following:
- The current, stable versions of the operating system and all the supporting software, such as Apache, database manager, scripting language, etc. It is obviously essential that all machines on the site should be running the same versions of all these products.
- Currently working TCP/IP layer with all up-to-date patches.
- The correct time: since elements of the HTTP protocol use the time of day — it is worth using Unix's
xntpd(http://www.eecis.udel.edu/~ntp/), Win32'sntpdate(http://www.eecis.udel.edu/~ntp/ntp_spool/html/ntpdate.html), or Tardis (http://www.kaska.demon.co.uk) to make sure your machines keep accurate time.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Server Security
- InhaltsvorschauThere are many changing aspects to securing a server, but the following points should get you started. All of these need to be checked regularly and by someone other than the normal sys admin. Two sets of eyes find more problems, and an independent and knowledgeable review ensures trust.The root password on your server is the linchpin of your security. Do not let people write it on the wall over their monitors or otherwise expose it.File security is a fundamental aspect of web server security. These are rules to follow for file positions and ownership:
- Files should not be owned by the user(s) that services (http, ftpd, sendmail...) run as — each service should have its own user. Ideally, ownership of files and services should be as finely divided as possible — for instance, the user that the Apache daemon runs as should probably be different from the user that owns its configuration files — this prevents the server from changing its own configuration even if someone does manage to subvert it. Each service should also have its own user, to increase the difficulty of attacks that use multiple servers. (With different users, it is likely that files dropped off using one server can't be accessed from another, for example). Qmail, a secure mail server, for instance, uses no less than six different users for different parts of its service, and its configuration files are owned by yet another user, usually root.
- Services shouldn't share file trees.
- Don't put executable files in the web tree — that is, on or below Apache's DocumentRoot.
- Don't put service control files in the web tree or ftp tree or anywhere else that can be accessed remotely.
- Ideally, run each service on a different machine.
These are rules to follow for file permissions:- If files are owned by someone else, you have to grant read permissions to the group that includes the relevant service. Similarly, you have to grant execute permissions to compiled binaries. Compiled binaries don't need read permissions, but shell scripts do. Always try to grant the most restrictive permissions possible — so don't grant write permission to the server for configuration files, for instance.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Managing a Big Site
- InhaltsvorschauA major problem in managing a big site is that it is always in flux. The person in charge therefore has to manage a constant flow of new material from the development machines, through the beta test systems, to the live site. This process can be very complicated and he will need as much help from automation as he can get.The development hardware has to address two issues: the functionality of the code — running on any machine — and the interaction of the different machines on the live site.The development of the code — by one or several programmers — will benefit enormously from using a version control system like CVS (see
http://www.cvshome.org/). CVS allows you to download files from the archive, work on them, and upload them again. The changes are logged and a note is broadcast to everyone else in the project. At any time you can go back to any earlier version of a file. You can also create "branches" — temporary diversions from the main development that run in parallel.CVS can operate through a secure shell so that developers can share code securely over the Internet. We used it to control the writing of this edition of this book. It is also used to manage the development of Apache itself, and, in fact, most free software.The network of development machines needs to resemble the network of live machines so that load balancing and other intersystem activities can be verified. It is possible to simulate multiple machines by running multiple services on one machine. However, this can miss accidental dependences that arise, so it is not a good idea for the beta test stage.The beta test site should be separate from the development machines. It should be a replica of the real site in every sense (though perhaps scaled down — e.g., if the live site is 10 load-balanced machines, the beta test site might only have 2), so that all the different ways that networked computers can interfere with each other can have full rein. It should be set up by the sys admins but tested by a very special sort of person: not a programmer, but someone who understands both computing and end users. Like a test pilot, she should be capable of making the crassest mistakes while noting exactly what she did and what happened next.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Supporting Software
- InhaltsvorschauBesides Apache, there are two big chunks of supporting software you will need: a scripting language and a database manager. We cover languages fairly extensively in Chapter 13, Chapter 15, Chapter 16, and Chapter 17. There are also some smaller items.The computing world divides into two camps — the sort-of-free camp and the definitely expensive camp. If you are reading this, you probably already use or intend to use Apache and you will therefore be in the sort-of-free camp. This camp offers free software under a variety of licences (see later) plus, in varying degrees, commercial support. Nowadays, all DBMs (database managers) use the SQL model, so a good book on this topic is essential. Most of the scripting languages now have more or less standardized interfaces to the leading DBMs. When working with a database manager, the programmer often has a choice between using functions in the DBM or the language. For instance, MySQL has powerful date-formatting routines that will return a date and time from the database served up to your taste. This could equally be done in Perl, though at a cost in labor. It is worth exploring the programming language hidden inside a DBM.These are the significant freeware database managers:
- MySQL (
http://www.mysql.com) - MySQL is said to be a "lighter weight" DBM. However, we have found it to be very reliable, fast, and easy to use. It follows what one might call the "European" programming style, in which the features most people will want to use are brought to the fore and made easy, while more sophisticated features are accessible if you need them. The "American" style seems to range all the package's features with equal prominence, so that the user has to be aware of what he does not want to use, as well as what he does.
- PostgreSQL (
http://www.postgresql.org) - PostgreSQL is said to be a more sophisticated, "proper" database. However, it did not, at the time of writing, offer outer joins and a few other useful features. It is also annoyingly literal about the case of table and field names, but requires quotation marks to actually pay attention to them.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - MySQL (
- Scalability
- InhaltsvorschauMoving a web site from one machine serving a few test requests to an industrial-strength site capable of serving the full flood of web demand may not be a simple matter.A busy site will have performance issues, which boil down to the question: "Are we serving the maximum number of customers at the minimum cost?"
Section 12.5.1.1: Tools
You can see how resources are being used under Unix from the utilities:top,vmstat,swapinfo,iostat, and their friends. (See Essential System Administration, by Aeleen Frisch [O'Reilly, 2002].)Section 12.5.1.2: Apache's mod_info
mod_info can be used to monitor and diagnose processes that deal with HTTPD. See Chapter 10.Section 12.5.1.3: Bandwidth
Your own hardware may be working wonderfully, but it's being strangled by bandwidth limitations between you and the Web backbone. You should be able to make rough estimates of the bandwidth you need by multiplying the number of transactions per second by the number of bytes transferred (making allowance for the substantial HTTP headers that go with each web page). Having done that, check what is actually happening by using a utility likeipfmfromhttp://www.via.ecp.fr/~tibob/ipfm/:HOST IN OUT TOTAL host1.domain.com 12345 6666684 6679029 host2.domain.com 1232314 12345 1244659 host3.domain.com 6645632 123 6645755 ...
Or use cricket (http://cricket.sourceforge.net/) to produce pretty graphs.Section 12.5.1.4: Load balancing
mod_backhandis free software for load balancing, covered later in this chapter. For expensive software look for ServerIron, BigIP, LoadDirector, on the Web.Section 12.5.1.5: Image server, text server
The amount of RAM at your disposal limits the number of copies of Apache (ashttpdorhttpsd) that you can run, and that limits the number of simultaneous clients you can serve. You can reduce the size of some of thehttpdinstances by having a cutdown version for images, PDF files, or text while running a big version for scripts.What normally makes the difference in size is the necessity to load a scripting language such as Perl or PHP intoEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Load Balancing
- InhaltsvorschauThis section deals with the problems of running a high-volume web site on a number of physical servers. These problems are roughly:
- Connecting the servers together.
- Tuning individual servers to get the best out of the hardware and Apache.
- Spreading the load among a number of servers with
mod_backhand. - Spreading your data over the servers with
Splashso that failure of one database machine does not crash the whole site. - Collecting log files in one place with
rsync(seehttp://www.rsync.org/) — if you choose not to do your logging in the database.
The simplest and, in many ways, the best way to deal with an underpowered web site is to throw hardware at it. PCs are the cheapest way to buy MegaFlops, and TCP/IP connects them together nicely. All that's needed to make a server farm is something to balance the load around the PCs, keeping them all evenly up to the collar, like a well-driven team of horses.There are expensive solutions: Cisco's LocalDirector, LinuxDirector, ServerIrons, and a host of others.The cheap solution ismod_backhand, distributed on the same licence as Apache. It originated in the Center for Networking and Distributed Systems at Johns Hopkins University.Its function is to keep track of the resources of individual machines running Apache and connected in a cluster. It then diverts incoming requests to the machines with the largest available resources. There is a small overhead in the redirection, but overall, the cluster works much better.In the simplest arrangement, a single server has the site's IP number and farms the requests out to the other servers, which are set up identically (apart from IP addresses) and with identicalmod_backhanddirectives. The machines communicate with each other (once a second, by default, but this can be changed), exchanging information on the resources each currently has available. On the basis of this information, the machine that catches a request can forward it to the machine best able to deal with it. Naturally, there is a computing cost to this, but it is small and predictable.mod_backhandEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 13: Building Applications
- InhaltsvorschauThings are going so well here at Butterthlies, Inc. that we are hard put to keep up with the flood of demand. Everyone, even the cat, is hard at work typing in orders that arrive incessantly by mail and telephone.Then someone has a brainstorm: "Hey," she cries, "let's use the Internet to take the orders!" The essence of her scheme is simplicity itself. Instead of letting customers read our catalog pages on the Web and then, drunk with excitement, phone in their orders, we provide them with a form they can fill out on their screens. At our end we get a chunk of data back from the Web, which we then pass to a script or program we have written. This brings us into the world of scripting, where the web site can take a much more active role in interacting with users. These tools make Apache a foundation for building applications, not just publishing web pages.While many sites act as simple repositories, providing users with a collection of files they can retrieve and navigate through with hyperlinks, web sites are capable of much more sophisticated interactions. Sites can collect information from users through forms, customize their appearance and their contents to reflect the interests of particular users, or let users interact with a wide variety of information sources. Sites can also serve as hosts for services provided not to browsers but to other computers, as "web services" become a more common part of computing.Apache provides a solid foundation for applications, using its core web server to manage HTTP transactions and a wide variety of modules and interfaces to connect those transactions to programs. Developers can create logic that manages a much more complex flow of information than just reading pages, they can use the development environment of their choice, as well as Apache services for HTTP, security, and other web-specific aspects of application design. Everything from simple inclusion of changing information to sophisticated integration of different environments and applications is possible.In publishing a site, we've been focusing on only one method of the HTTP protocol,Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Web Sites as Applications
- InhaltsvorschauWhile many sites act as simple repositories, providing users with a collection of files they can retrieve and navigate through with hyperlinks, web sites are capable of much more sophisticated interactions. Sites can collect information from users through forms, customize their appearance and their contents to reflect the interests of particular users, or let users interact with a wide variety of information sources. Sites can also serve as hosts for services provided not to browsers but to other computers, as "web services" become a more common part of computing.Apache provides a solid foundation for applications, using its core web server to manage HTTP transactions and a wide variety of modules and interfaces to connect those transactions to programs. Developers can create logic that manages a much more complex flow of information than just reading pages, they can use the development environment of their choice, as well as Apache services for HTTP, security, and other web-specific aspects of application design. Everything from simple inclusion of changing information to sophisticated integration of different environments and applications is possible.In publishing a site, we've been focusing on only one method of the HTTP protocol,
GET. Apache's basic handling ofGETis more than adequate for sites that just need to publish information from files, but HTTP (and Apache) can support a much wider range of options. Developers who want to create interactive sites will have to write some programs to supply the basic logic. However, many useful tasks are simple to create, and Apache is quite capable of supporting much more complex applications, including applications that connect to databases or other information sources.Every HTTP request must specify a method. This tells the server how to handle the incoming data. For a complete account, see the HTTP 1.1 specification (http://www.w3.org/Protocols/rfc2616/rfc2616.html). Briefly, however, the methods are as follows:-
GET - Returns the data asked for. To save network traffic, a "conditional
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Providing Application Logic
- InhaltsvorschauWhile you could write Apache modules that provide the logic for your applications, most developers find it much easier to use scripting languages and integrate them with Apache using modules others have already written. Ultimately, all any computer language can do is to make the CPU compare, add, subtract, multiply, and divide bytes. An important point about scripting languages is that they should run without modification on as many platforms as possible, so that your site can move from machine to machine. On the other hand, if you are a beginner and know someone who can help with one particular language, then that one might be the best choice. We devote a chapter to installing support for each of the major languages and run over the main possibilities here.The discussion of computer languages is made rather difficult by the fact that human beings fall into two classes: those who love some particular language and those don't. Naturally, the people who discuss languages fall into the first class; many of the people who read books like this in the hope of doing something useful with a computer tend more towards the second. The authors regard computer languages as a necessary evil. Languages all have their quirks, ranging from the mildly amusing to pleasures comparable to gargling battery acid. We would like enthusiasts for each of these languages to know that our comments on the others have reduced those enthusiasts to fury as well.Server-side includes are more of a means of avoiding scripting languages than a proper scripting language. If your needs are very limited, you may also find that the basic functionality this tool provides can solve a number of content issues, and it may also prove useful in combination with other approaches. Server-side includes are covered in Chapter 14.Another approach to the problem of orchestrating HTML with CGI scripts, databases, and Apache is PHP. Someone who is completely new to programming of any sort might do best to start with PHP, which extends HTML — and one has to learn HTML anyway.Instead of writing CGI scripts in a language like Perl or Java, which then run in interaction with Apache and generate HTML pages to be sent to the client, PHP's strategy is to embed itself into the HTML. The author then writes HTML with embedded commands, which are interpreted by the PHP package as the page is served up. For instance, you could include the line:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- XML, XSLT, and Web Applications
- InhaltsvorschauExtensible Markup Language (XML) has taken off in the last few years as a generic format for storing information. XML looks much like HTML, with a similar combination of elements and attributes for marking up text, but it lets developers create their own vocabularies. Some XML is shared directly over the Web; some XML is used by web services applications; and some XML is used as a foundation for web sites that need to present information in multiple forms. Serving XML documents is just like serving any other files in Apache, requiring only putting the files up and setting a MIME type identifier for them. Web services generally require the installation of modules specific to a particular web-service protocol, which then act as a gateway between the web server and application logic elsewhere on the computer.The last option — using XML as a foundation for information the Apache server needs to be able to present in multiple forms — is growing more common and fits well in more typical web-server applications. In this case, XML typically provides a format for storing information separate from its presentation details. When the Apache server gets a request for a particular file, say in HTML, it passes it to a tool that deals with the XML. That tool typically loads the XML document, generates a file in the format requested, and passes it back to Apache, which then transmits it to the user. (The XML processor may pull the file from a cache if the file has been requested previously.) If a site is only serving up HTML files, all this extra work is probably unnecessary, but sites that provide HTML, PDF, WML (Wireless Markup Language), and plain-text versions of the same content will likely find this approach very useful. Even sites that offer multiple HTML renditions of the same information may find this approach easier than managing multiple files.Most commonly, the transformation between the original XML document and the result the user wants is defined using Extensible Stylesheet Language Transformations (XSLT). Developers use XSLT to create templates that define the production of result documents from original XML documents, and these templates can generally be applied to many originals to produce many results.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Chapter 14: Server-Side Includes
- InhaltsvorschauServer-side includes trigger further actions whose output, if any, may then be placed inline into served documents or affect subsequent includes. The same results could be achieved by CGI scripts — either shell scripts or specially written C programs — but server-side includes often achieve these results with a lot less effort. There are, however, some security problems. The range of possible actions is immense, so we will just give basic illustrations of each command in a number of text files in ...site.ssi/htdocs.The Config file, .../conf/httpd1.conf, is as follows:
User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/site.ssi/htdocs ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin AddHandler server-parsed shtml Options +Includes
Run it by executing./go 1.shtml is the normal extension for HTML documents with server-side includes in them and is found as the extension to the relevant files in ... /htdocs. We could just as well use brian or dog_run, as long as it appears the same in the file with the relevant command and in the configuration file. Using html can be useful — for instance, you can easily implement site-wide headers and footers — but it does mean that every HTML page gets parsed by the SSI engine. On busy systems, this could reduce performance.Bear in mind that HTML generated by a CGI script does not get put through the SSI processor, so it's no good including the markup listed in this chapter in a CGI script.OptionsIncludesturns on processing of SSIs. As usual, look in the error_log if things don't work. The error messages passed to the client are necessarily uninformative since they are probably being read three continents away, where nothing useful can be done about them.The trick of SSI is to insert special strings into our documents, which then get picked up by Apache on their way through, tested against reference strings using =, !=, <, <=, >, and >=, and then replaced by dynamically written messages. As we will see, the strings have a deliberately unusual form so they won't get confused with more routine stuff. This is the syntax of a command:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - File Size
- InhaltsvorschauThe
fsizecommand allows you to report the size of a file inside a document. The file size.shtml is as follows:<!--#config errmsg="Bungled again!"--> <!--#config sizefmt="bytes"--> The size of this file is <!--#fsize file="size.shtml"--> bytes. The size of another_file is <!--#fsize file="another_file"--> bytes.
The first line provides an error message. The second line means that the size of any files is reported in bytes printed as a number, for instance, 89. Changingbytestoabbrevgets the size in kilobytes, printed as1k. The third line prints the size of size.shtml itself; the fourth line prints the size of another_file.configcommands must appear above commands that might want to use them.You can replace the wordfile=in this script, and in those which follow, withvirtual=, which gives a %-encoded URL path relative to the document root. If it does not begin with a slash, it is taken to be relative to the current document.If you play with this stuff, you find that Apache is strict about the syntax. For instance, trailing spaces cause an error because valid filenames don't have them:The size of this file is <!--#fsize file="size.shtml "--> bytes. The size of this file is Bungled again! bytes.
If we had not used theerrmsgcommand, we would see the following:...[an error occurred while processing this directive]...
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - File Modification Time
- Inhaltsvorschau
The last modification time of a file can be reported withflastmod. This lets the client know how fresh the data is that you are offering. The format of the output is controlled by thetimefmtattribute of theconfigelement. The default rules fortimefmtare the same as for the C-library functionstrftime( ), except that the year is now shown in four-digit format to cope with the Year 2000 problem. Win32 Apache is soon to be modified to make it work in the same way as the Unix version. Win32 users who do not have access to Unix C manuals can consult the FreeBSD documentation athttp://www.freebsd.org, for example:
% man strftime
(We have not included it here because it may well vary from system to system.)The file time.shtml gives an example:<!--#config errmsg="Bungled again!"--> <!--#config timefmt="%A %B %C, the %jth day of the year, %S seconds since the Epoch"--> The mod time of this file is <!--#flastmod virtual="size.shtml"--> The mod time of another_file is <!--#flastmod virtual="another_file"-->This produces a response such as the following:The mod time of this file is Tuesday August 19, the 240th day of the year, 841162166 seconds since the Epoch The mod time of another_file is Tuesday August 19, the 240th day of the year, 841162166 seconds since the Epoch
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Includes
- InhaltsvorschauWe can include one file in another with the
includecommand:<!--#config errmsg="Bungled again!"--> This is some text in which we want to include text from another file: << <!--#include virtual="another_file"--> >> That was it.
This produces the following response:This is some text in which we want to include text from another file: << This is the stuff in 'another_file'. >> That was it.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Execute CGI
- InhaltsvorschauWe can have a CGI script executed without having to bother with
AddHandler,SetHandler, orExecCGI.The file exec.shtml contains the following:<!--#config errmsg="Bungled again!"--> We're now going to execute 'cmd="ls -l"'': << <!--#exec cmd="ls -l"--> >> and now /usr/www/APACHE3/cgi-bin/mycgi.cgi: << <!--#exec cgi="/cgi-bin/mycgi.cgi"--> >> and now the 'virtual' option: << <!--#include virtual="/cgi-bin/mycgi.cgi"--> >> That was it.
There are two attributes available toexec:cgiandcmd. The difference is thatcgineeds a URL (in this case /cgi-bin/mycgi.cgi, set up by theScriptAliasline in the Config file) and is protected by suEXEC if configured, whereascmdwill execute anything.There is a third way of executing a file, namely, through thevirtualattribute to theincludecommand. When we select exec.shtml from the browser, we get this result:We're now going to execute 'cmd="ls -l"'': << total 24 -rw-rw-r-- 1 414 xten 39 Oct 8 08:33 another_file -rw-rw-r-- 1 414 xten 106 Nov 11 1997 echo.shtml -rw-rw-r-- 1 414 xten 295 Oct 8 10:52 exec.shtml -rw-rw-r-- 1 414 xten 174 Nov 11 1997 include.shtml -rw-rw-r-- 1 414 xten 206 Nov 11 1997 size.shtml -rw-rw-r-- 1 414 xten 269 Nov 11 1997 time.shtml >> and now /usr/www/APACHE3/cgi-bin/mycgi.cgi: << Have a nice day >> and now the 'virtual' option: << Have a nice day >> That was it.
A prudent webmaster should view thecmdandcgioptions with grave suspicion, since they let writers of SSIs give both themselves and outsiders dangerous access. However, if he usesOptions+IncludesNOEXECin conf/httpd2.conf, stops Apache, and restarts with./go 2, the problem goes away:We're now going to execute 'cmd="ls -l"'': << Bungled again! >> and now /usr/www/APACHE3/cgi-bin/mycgi.cgi: << Bungled again! >> and now the 'virtual' option: << Have a nice day >> That was it.
Now, nothing can be executed through an SSI that couldn't be executed directly through a browser, with all the control that this implies for the webmaster. (You might think thatEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Echo
- InhaltsvorschauFinally, we can
echoa limited number of environment variables:DATE_GMT,DATE_LOCAL,DOCUMENT_NAME,DOCUMENT_URI, andLAST_MODIFIED. The file echo.shtml is as follows:Echoing the Document_URI <!--#echo var="DOCUMENT_URI"--> Echoing the DATE_GMT <!--#echo var="DATE_GMT"-->
and produces the response:Echoing the Document_URI /echo.shtml Echoing the DATE_GMT Saturday, 17-Aug-96 07:50:31
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Apache v2: SSI Filters
- InhaltsvorschauApache v2, with its filter mechanism, introduced some new SSI directives:SSIEndTagSSIEndTag tag Default: SSIEndTag " -- >" Context: Server config, virtual hostThis directive changes the string that
mod_includelooks for to mark the end of an include element.ExampleSSIEndTag "%>"
See also SSIStartTag.SSIErrorMsgSSIErrorMsg message Default: SSIErrorMsg "[an error occurred while processing this directive]" Context: Server config, virtual host, directory, .htaccessTheSSIErrorMsgdirective changes the error message displayed whenmod_includeencounters an error. For production servers you may consider changing the default error message to"<!-- Error -->"so that the message is not presented to the user. This directive has the same effect as the<!--#config errmsg="message" -->element.ExampleSSIErrorMsg "<!-- Error -->"
SSIStartTagSSIStartTag message Default: SSIStartTag "<! -- " Context: Server config, virtual hostThis directive changes the string thatmod_includelooks for to mark an include element to process. You may want to use this option if you have two servers parsing the output of a file each processing different commands (possibly at different times).ExampleSSIStartTag "<%"
This example, in conjunction with a matching SSIEndTag, will allow you to use SSI directives as shown in the following example (SSI directives with alternate start and end tags):<%#printenv %>
See also SSIEndTag.SSITimeFormatSSITimeFormat formatstring Default: SSITimeFormat "%A, %d-%b-%Y %H:%M:%S %Z" Context: Server config, virtual host, directory, .htaccessThis directive changes the format in which date strings are displayed when echoingDATEenvironment variables. Theformatstringis as in strftime(3) from the C standard library.This directive has the same effect as the<!--#config timefmt="formatstring" -->element.ExampleSSITimeFormat "%R, %B %d, %Y"
The previous directive would cause times to be displayed in the format "22:26, June 14, 2002".SSIUndefinedEchoSSIUndefinedEcho tag Default: SSIUndefinedEcho "<! -- undef -- >" Context: Server config, virtual hostEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 15: PHP
- InhaltsvorschauPHP (a recursive acronym for PHP: Hypertext Preprocessor) is one of the easiest ways to get started building web applications. PHP uses a template strategy, embedding its instructions in HTML documents, making it easy to integrate logic with existing HTML frameworks. PHP does all this neatly and ingeniously. No doubt it has its dusty corners, but the normal cycle of HTML form → client data → database → returned data should be straightforward.PHP was created with web use explicitly in mind, which has eased a number of issues that trip up other environments. The simple syntax is based on C with some Perl, making it approachable to a wide variety of developers. PHP is relatively new, but it is also focused and small, which reduces the amount of churn.There do seem to be an unusual number of security alerts about PHP. Versions prior to 4.2.2 have a serious hole allowing an intruder to execute an arbitrary script with the permissions of the web server. This could be alarming, but if you have followed our advice about webuser and webgroup, it will not be much of a problem.You might think that since your CGI scripts are, in effect, part of the HTML you send to clients, the Bad Guys might thereby learn more than they should. PHP is not as silly as that and strips its code before sending the pages out onto the Web.Installing PHP proved to be very simple for us. We went to
http://www.php.netand selected downloadsand got the latest release. This produced the usual 2MB of gzipped tar file.When the software was unpacked, we dutifully read the INSTALL file. It offered two builds: one to produce a dynamic Apache module (DSO), which we didn't want, since we try to keep away from DSO's for production sites. Anyway, if you use PHP at all, you will want it permanently installed.So we chose the static version and put the software in /usr/src/php/php-4.0.1p12 (of course, the numbers will be different when you do it). Assuming that you have the Apache sources, have compiled Apache, and are using MySQL, we then ran:./configure --with-mysql --with-apache=../../apache/apache_1.3.9 --enable-track=vars make make install
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Installing PHP
- InhaltsvorschauInstalling PHP proved to be very simple for us. We went to
http://www.php.netand selected downloadsand got the latest release. This produced the usual 2MB of gzipped tar file.When the software was unpacked, we dutifully read the INSTALL file. It offered two builds: one to produce a dynamic Apache module (DSO), which we didn't want, since we try to keep away from DSO's for production sites. Anyway, if you use PHP at all, you will want it permanently installed.So we chose the static version and put the software in /usr/src/php/php-4.0.1p12 (of course, the numbers will be different when you do it). Assuming that you have the Apache sources, have compiled Apache, and are using MySQL, we then ran:./configure --with-mysql --with-apache=../../apache/apache_1.3.9 --enable-track=vars make make install
We now moved to the Apache directory and ran:./configure --prefix=/www/APACHE3 --activate-module=src/modules/php4/libphp4.a make
This produced a new httpd, which we copied to /usr/local/sbin/httpd.php4. It is then possible to configure PHP by editing the file /usr/local/lib/php.ini. This is a fairly substantial file that arrives set up with the default configuration and so needs no immediate attention. But it would be worth reading it through and reviewing it from time to time as you get more familiar with PHP since its comments and directives contain useful hints on ways to extend the installation. For instance, Windows DLLs and Unix DSOs can be loaded dynamically from scripts. There are sections within the file to configure the logging and to cope with interfaces to various database engines and interfaces: ODBC, MySQL, mSQL, Sybase-CT, Informix, MSSQL.All that remains is to edit the Config file (see site.php):User webuser Group webgroup ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3w/APACHE3/site.php/htdocs AddType application/x-httpd-php .php
This was a very simple test file in .../htdocs:<HTML><HEAD>PHP Test</HEAD><BODY> This is a test of PHP<BR> <?phpinfo()?> </BODY></HTML>
this is the magic line:<?phpinfo()?>
When run, this produces a spectacular page of nicely formatted PHP environment data.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Site.php
- InhaltsvorschauBy way of illustration, we produced a little package to allow a client to search a database of people (see Chapter 13). PHP syntax is not hard and the manual is at
http://www.php.net/manual/en/ref.mysql.php.The database has two fields: xname and sname.The first page is called index.html so it gets run automatically and is a standard HTML form:<HTML> <HEAD> <TITLE>PHP Test</TITLE> </HEAD> <BODY> <form action="lookup.php" method="post"> Look for people. Enter a first name:<BR><BR> First name:  <input name="xname" type="text" size=20><BR> <input type=submit value="Go"> </form> </BODY> </HTML>
In the action attribute of the form element, we tell the returning form to run lookup.php. This contains the PHP script, with its interface to MySQL.The script is as follows:<HTML> <HEAD> <TITLE>PHP Test: lookup</TITLE> </HEAD> <BODY> Lookup: <?php print "You want people called $xname"?><BR> We have: <?php /* connect */ mysql_connect("127.0.0.1","webserv",""); mysql_select_db("people"); /* retrieve */ $query = "select xname,sname from people where xname='$xname'"; $result = mysql_query($query); /* print */ while(list($xname,$sname)=mysql_fetch_row($result)) { print "<p>$xname, $sname</p>"; } mysql_free_result($result); ?> </BODY> </HTML>The PHP code comes between the<?phpand?>tags. Comments are enclosed by/*and */, just as with C.The standard steps have to be taken:- Connect to MySQL — on a real site, you would want to arrange a persistent connection to avoid the overhead of reconnecting for each query
- Invoke a particular database — here, people
- Construct a database query:
select xname,sname from people where xname='$xname'
- Invoke the query and store the result in a variable —
$result - Dissect
$resultto reveal the various records that have satisfied the query - Print the returned data, line by line
- Free
$resultto make its memory available for reuse
And we see on the screen:Lookup: You want people called jane We have: Jane, Smith Jane, Jones
The content of the variable$queryis exactly what you would type into MySQL. A point worth remembering is that while the query:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 16: CGI and Perl
- InhaltsvorschauThe Common Gateway Interface (CGI) is one of the oldest tools for connecting web sites to program logic, and it's still a common starting point. CGI provides a standard interface between the web server and applications, making it easier to write applications without having to build them directly into the server. Developers have been writing CGI scripts since the early days of the NCSA server, and Apache continues to support this popular and well-understood (if inefficient) mechanism for connecting HTTP requests to programs. While CGI scripts can be written in a variety of languages, the dominant language for CGI work has pretty much always been Perl. This chapter will explore CGI's capabilities, explain its integration with Apache, and provide a demonstration in Perl.Very few serious sites nowadays can do without scripts in one way or another. If you want to interact with your visitors — even as simply as "Hello John Doe, thanks for visiting us again" (done by checking his cookie (as described later in this chapter) against a database of names), you need to write some code. If you want to do any kind of business with him, you can hardly avoid it. If you want to serve up the contents of a database — the stock of a shop or the articles of an encyclopedia — a script might be a useful way to do it. Scripts are typically, though not always, interpreted, and they are generally an easier approach to gluing pieces together than the write and compile cycle of more formal programs.Writing scripts brings together a number of different packages and web skills whose documentation is sometimes hard to find. Until all of it works, none of it works; so we thought it might be useful to run through the basic elements here and to point readers at sources of further knowledge.What is a script? If you're not a programmer, it can all be rather puzzling. A script is a set of instructions to do something, which are executed by the computer. To demonstrate what happens, get your computer to show its command-line prompt, start up a word processor, and type:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- The World of CGI
- InhaltsvorschauVery few serious sites nowadays can do without scripts in one way or another. If you want to interact with your visitors — even as simply as "Hello John Doe, thanks for visiting us again" (done by checking his cookie (as described later in this chapter) against a database of names), you need to write some code. If you want to do any kind of business with him, you can hardly avoid it. If you want to serve up the contents of a database — the stock of a shop or the articles of an encyclopedia — a script might be a useful way to do it. Scripts are typically, though not always, interpreted, and they are generally an easier approach to gluing pieces together than the write and compile cycle of more formal programs.Writing scripts brings together a number of different packages and web skills whose documentation is sometimes hard to find. Until all of it works, none of it works; so we thought it might be useful to run through the basic elements here and to point readers at sources of further knowledge.What is a script? If you're not a programmer, it can all be rather puzzling. A script is a set of instructions to do something, which are executed by the computer. To demonstrate what happens, get your computer to show its command-line prompt, start up a word processor, and type:
#! /bin/sh echo "have a nice day"
Save this as fred, and make it executable by doing:chmod +x fred
Run it with the following:./fred @echo off echo "have a nice day"
The odd first line turns off command-line echoing (to see what this means, omit it). Save this as the file fred.bat, and run it by typingfred.In both cases we get the cheering messagehaveaniceday. If you have never written a program before — you have now. It may seem one thing to write a program that you can execute on your own screen; it's quite another to write a program that will do something useful for your clients on the Web. However, we will leap the gap.A script that is going to be useful on the Web must be executed by Apache. There are two considerations here:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Telling Apache About the Script
- InhaltsvorschauSince we have two different techniques here, we have two Config files: .../conf/httpd1.conf and .../conf/httpd2.conf . The script
gotakes the argument1or2.You need to do either of the following:UseScriptAliasin your host's Config file, pointing to a safe location outside your web space. This makes for better security because the Bad Guys cannot read your scripts and analyze them for holes. "Security by obscurity" is not a sound policy on its own, but it does no harm when added to more vigorous precautions.To steer incoming demands for the script to the right place (.../cgi-bin ), we need to edit our ... /site.cgi/conf/httpd1.conf file so it looks something like this:User webuser Group webgroup ServerName www.butterthlies.com #for scripts in ../cgi-bin ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin DirectoryIndex /cgi-bin/script_html
You would probably want to proceed in this way, that is, putting the script in the cgi-bin directory (which is not in /usr/www/APACHE3/site.cgi/htdocs), if you were offering a web site to the outside world and wanted to maximize your security. Run Apache to use this script with the following:./go 1
You would access this script by browsing to http://www.butterthlies.com/cgi-bin/mycgi.cgi.The other method is to put scripts in among the HTML files. You should only do this if you trust the authors of the site to write safe scripts (or not write them at all) since security is much reduced. Generally speaking, it is safer to use a separate directory for scripts, as explained previously. First, it means that people writing HTML can't accidentally or deliberately cause security breaches by including executable code in the web tree. Second, it makes life harder for the Bad Guys: often it is necessary to allow fairly wide access to the nonexecutable part of the tree, but more careful control can be exercised on the CGI directories.We would not suggest you do this unless you absolutely have to. But regardless of these good intentions, we put mycgi.cgi in.../site.cgi/htdocsEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Setting Environment Variables
- InhaltsvorschauWhen a script is called, it receives a lot of environment variables, as we have seen. It may be that you want to invent and pass some of your own. There are two directives to do this:
SetEnvandPassEnv.SetEnvSetEnv variable value Server config, virtual hostsThis directive sets an environment variable that is then passed to CGI scripts. We can create our own environment variables and give them values. For instance, we might have several virtual hosts on the same machine that use the same script. To distinguish which virtual host called the script (in a more abstract way than using theHTTP_HOSTenvironment variable), we could make up our own environment variableVHOST:<VirtualHost host1> SetEnv VHOST customers ... </VirtualHost> <VirtualHost host2> SetEnv VHOST salesmen ... </VirtualHost>
UnsetEnvUnsetEnv variable variable ... Server config, virtual hostsThis directive takes a list of environment variables and removes them.PassEnvPassEnvThis directive passes an environment variable to CGI scripts from the environment that was in force when Apache was started. The script might need to know the operating system, so you could use the following:PassEnv OSTYPE
This variation assumes that your operating system setsOSTYPE, which is by no means a foregone conclusion.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Cookies
- InhaltsvorschauIn the modern world of fawningly friendly e-retailing, cookies play an essential role in allowing web sites to recognize previous users and to greet them like long-lost, rich, childless uncles. Cookies offer the webmaster a way of remembering her visitors. The cookie is a bit of text, often containing a unique ID number, that is contained in the HTTP header. You can get Apache to concoct and send it automatically, but it is not very hard to do it yourself, and then you have more control over what is happening. You can also get Perl modules to help: CGI.pm and CGI::Cookie. But, as before, we think it is better to start as close as you can to the raw material.The client's browser keeps a list of cookies and web sites. When the user goes back to a web site, the browser will automatically return the cookie, provided it hasn't expired. If a cookie does not arrive in the header, you, as webmaster, might like to assume that this is a first visit. If there is a cookie, you can tie up the site name and ID number in the cookie with any data you stored the last time someone visited you from that browser. For instance, when we visit Amazon, a cozy message appears: "Welcome back Peter — or Ben — Laurie," because the Amazon system recognizes the cookie that came with our HTTP request because our browser looked up the cookie Amazon sent us last time we visited.A cookie is a text string. It's minimum content is Name=Value, and these can be anything you like, except semicolon, comma, or whitespace. If you absolutely must have these characters, use URL encoding (described earlier as "&" = "%26", etc.). A useful sort of cookie would be something like this:
Butterthlies=8335562231
Butterthliesidentifies the web site that issued it — necessary on a server that hosts many sites.8335562231is the ID number assigned to this visitor on his last visit. To prevent hackers upsetting your dignity by inventing cookies that turn out to belong to other customers, you need to generate a rather large random number from an unguessable seed, or protect them cryptographically.These are other possible fields in a cookie:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Script Directives
- InhaltsvorschauApache has five directives dealing with CGI scripts.ScriptAliasScriptAlias URLpath CGIpath Server config, virtual hostThe
ScriptAliasdirective does two things. It sets Apache up to execute CGI scripts, and it converts requests for URLs starting with URLpathto execution of the script in CGIpath. For example:ScriptAlias /bin /usr/local/apache/cgi-bin
An incoming URL like www.butterthlies.com/bin/fred will run the script/usr/local/apache/cgi-bin/fred. Note that CGIpath must be an absolute path, starting at /.A very useful feature ofScriptAliasis that the incoming URL can be loaded with fake subdirectories. Thus, the incoming URL www.butterthlies.com/bin/fred/purchase/learjetwill run.../fredas before, but will also make the text purchase/learjet available tofredin the environment variablePATH_INFO. In this way you can write a single script to handle a multitude of different requests. You just need to monitor the command-line arguments at the top and dispatch the requests to different subroutines.ScriptAliasMatchScriptAliasMatch regex directory Server config, virtual hostThis directive is equivalent toScriptAliasbut makes use of standard regular expressions instead of simple prefix matching. The supplied regular expression is matched against the URL; if it matches, the server will substitute any parenthesized matches into the given string and use the result as a filename. For example, to activate any script in /cgi-bin, one might use the following:ScriptAliasMatch /cgi-bin/(.*) /usr/local/apache/cgi-bin/$1
If the user is sent by a link to http://www.butterthlies.com/cgi-bin/script3, "/cgi-bin/"matches against/cgi-bin/. We then have to match script3 against.*, which works, because "." means any character and "*" means any number of whatever matches ".". The parentheses around.*tell Apache to store whatever matched to.*in the variable$1. (If some other pattern followed, also surrounded by parentheses, that would be stored in$2). In the second part of the line,ScriptAliasMatchEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - suEXEC on Unix
- InhaltsvorschauThe vulnerability of servers running scripts is a continual source of concern to the Apache Group. Unix systems provide a special method of running CGIs that gives much better security via a wrapper . A wrapper is a program that wraps around another program to change the way it operates. Usually this is done by changing its environment in some way; in this case, it makes sure it runs as if it had been invoked by an appropriate user. The basic security problem is that any program or script run by Apache has the same permissions as Apache itself. Of course, these permissions are not those of the superuser, but even so, Apache tends to have permissions powerful enough to impair the moral development of a clever hacker if he could get his hands on them. Also, in environments where there are many users who can write scripts independently of each other, it is a good idea to insulate them from each other's bugs, as much as is possible.suEXEC reduces this risk by changing the permissions given to a program or script launched by Apache. To use it, you should understand the Unix concepts of user and group execute permissions on files and directories. suEXEC is executed whenever an HTTP request is made for a script or program that has ownership or group-membership permissions different from those of Apache itself, which will normally be those appropriate to webuser of webgroup.The documentation says that suEXEC is quite deliberately complicated so that "it will only be installed by users determined to use it." However, we found it no more difficult than Apache itself to install, so you should not be deterred from using what may prove to be a very valuable defense. If you are interested, please consult the documentation and be guided by it. What we have written in this section is intended only to help and encourage, not to replace the words of wisdom. See
http://httpd.apache.org/docs/suexec.html.To install suEXEC to run with the demonstration site site.suexec, go to the support subdirectory below the location of your Apache source code. EditEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Handlers
- InhaltsvorschauA handler is a piece of code built into Apache that performs certain actions when a file with a particular MIME or handler type is called. For example, a file with the handler type
cgi-scriptneeds to be executed as a CGI script. This is illustrated in ... /site.filter.Apache has a number of handlers built in, and others can be added with theActionscommand (see the next section). The built-in handlers are as follows:-
send-as-is - Sends the file as is, with HTTP headers (mod_asis).
-
cgi-script - Executes the file (mod_cgi). Note that
OptionsExecCGImust also be set. -
imap-file - Uses the file as an imagemap (mod_imap).
-
server-info - Gets the server's configuration (mod_info).
-
server-status - Gets the server's current status (mod_status).
-
server-parsed - Parses server-side includes (mod_include). Note that
OptionsIncludesmust also be set. -
type-map - Parses the file as a type map file for content negotiation (mod_negotiation).
-
isapi-isa( Win32 only)
Causes ISA DLLs placed in the document root directory to be loaded when their URLs are accessed.OptionsExecCGImust be active in the directory that contains the ISA. Check the Apache documentation, since this feature is under development (mod_isapi).
The corresponding directives follow.AddHandlerAddHandler handler-name extension1 extension2 ... Server config, virtual host, directory, .htaccessAddHandlerwakes up an existing handler and maps the filename(s) extension1, etc., to handler-name . You might specify the following in your Config file:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Actions
- InhaltsvorschauA related notion to that of handlers is actions (nothing to do with HTML form "Action" discussed earlier). An action passes specified files through a named CGI script before they are served up. Apache v2 has the somewhat related "Filter" mechanism.
Action type cgi_script Server config, virtual host, directory, .htaccess
The cgi_script is applied to any file of MIME or handler type matching type whenever it is requested. This mechanism can be used in a number of ways. For instance, it can be handy to put certain files through a filter before they are served up on the Web. As a simple example, suppose we wanted to keep all our .html files in compressed format to save space and to decompress them on the fly as they are retrieved. Apache happily does this. We make site.filter a copy of site.firs t, except that the httpd.conf file is as follows:User webuser Group webgroup ServerName localhost DocumentRoot /usr/www/APACHE3/site.filter/htdocs ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin AccessConfig /dev/null ResourceConfig /dev/null AddHandler peter-zipped-html zhtml Action peter-zipped-html /cgi-bin/unziphtml <Directory /usr/www/APACHE3/site.filter/htdocs> DirectoryIndex index.zhtml </Directory>
The points to notice are that:AddHandlersets up a new handler with a name we invented,peter-zipped-html, and associates a file extension with it: zhtml (notice the absence of the period).Actionsets up a filter. For instance:Action peter-zipped-html /cgi-bin/unziphtml
- means "apply the CGI script unziphtml to anything with the handler name
peter-zipped-html."
The CGI script ... /cgi-bin/unziphtml contains the following:#!/bin/sh echo "Content-Type: text/html" echo gzip -S .zhtml -d -c $PATH_TRANSLATED
This applies gzip with the following flags:-
-S - Sets the file extension as .zhtml
-
-d - Uncompresses the file
-
-c - Outputs the results to the standard output so they get sent to the client, rather than decompressing in place
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Browsers
- InhaltsvorschauOne complication of the Web is that people are free to choose their own browsers, and not all browsers work alike or even nearly alike. They vary enormously in their capabilities. Some browsers display images; others won't. Some that display images won't display frames, tables, Java, and so on.You can try to circumvent this problem by asking the customer to go to different parts of your script ("Click here to see the frames version"), but in real life people often do not know what their browser will and won't do. A lot of them will not even understand what question you are asking. To get around this problem, Apache can detect the browser type and set environment variables so that your CGI scripts can detect the type and act accordingly.SetEnvIf and SetEnvIfNoCaseSetEnvIf attribute regex envar[=value] [..] SetEnvIfNoCase attribute regex envar[=value] [..] Server config, virtual host, directory, .htaccess (from v 1.3.14)The attribute can be one of the HTTP request header fields, such as
Host,User-Agent,Referer, and/or one of the following:-
Remote_Host - The client's hostname, if available
-
Remote_Addr - The client's IP address
-
Remote_User - The client's authenticated username, if available
-
Request_Method GET,POST, etc.-
Request_URI - The part of the URL following the scheme and host
TheNoCaseversion works the same except that regular-expression matching is evaluated without regard to letter case.BrowserMatch and BrowserMatchNoCaseBrowserMatch regex env1[=value1] env2[=value2] ... BrowserMatchNoCase regex env1[=value1] env2[=value2] ... Server config, virtual host, directory, .htaccess (from Apache v 1.3.14)regex is a regular expression matched against the client'sEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Chapter 17: mod_perl
- InhaltsvorschauPerl does some very useful things and provides such huge resources in the CPAN library (
http://cpan.org) that it will clearly be with us for a long time yet as a way of writing scripts to run behind Apache. While Perl is powerful, CGI is not a particularly efficient means of connecting Perl to Apache. CGI's big disadvantage is that each time a script is invoked, Apache has to load the Perl interpreter and then it has to load the script. This is a heavy and pointless overhead on a busy site, and it would obviously be much easier if Perl stayed loaded in memory, together with the scripts, to be invoked each time they were needed. This is what mod_perl does by modifying Apache.This modification is definitely popular: according to Netcraft surveys in mid-2000, mod_perl was the third most popular add-on to Apache (after FrontPage and PHP), serving more than a million URLs on over 120,000 different IP numbers (http://perl.apache.org/outstanding/stats/netcraft.html).The reason that this chapter is more than a couple of pages long is that Perl does not sit easily in a web server. It was originally designed as a better shell script to run standalone under Unix. It developed, over time, into a full-blown programming language. However, because the original Perl was not designed for this kind of work, various things have to happen. To illustrate them, we will start with a simple Perl script that runs under Apache's mod_cgi and then modify it to run under mod_perl. (We assume that the reader is familiar enough with Perl to write a simple script, understands the ideas of Perl modules, use(), require(), and theBEGINandENDpragmas.)On site.mod_perl we have two subdirectories: mod_cgi and mod_perl. In mod_cgi we present a simple script-driven site that runs a home page that has a link to another page.The Config file is as follows:User webuser Group webuser ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.mod_perl/mod_cgi/htdocs TransferLog /usr/www/APACHE3/APACHE3/site.mod_perl/mod_cgi/logs/access_log LogLevel debug ScriptAlias /bin /usr/www/APACHE3/APACHE3/site.mod_perl/cgi-bin ScriptAliasMatch /AA(.*) /usr/www/APACHE3/APACHE3/site.mod_perl/cgi-bin/AA$1 DirectoryIndex /bin/home.pl
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - How mod_perl Works
- InhaltsvorschauThe principle of mod_perl is simple enough: Perl is loaded into Apache when it starts up — which makes for very big Apache child processes. This saves the time that would be spent loading and unloading the Perl interpreter but calls for a lot more RAM.If you use
Apache::PerlRun, you get a half-way environment where Perl is kept in memory but scripts are loaded each time they are run. Most CGI scripts will work right away in this environment.If you go whole hog and useApache::Registry, your scripts will be loaded at startup too, thus saving the overhead of loading and unloading them. If your scripts use a database manager, you can also keep an open connection to the DBM, and so save time there as well (see later). Good as this for execution speed, there is a drawback, in that your scripts now all run as subroutines below a hidden main program. The problem with this, and it can be a killer if you get it wrong, is that global variables are initialized only when Apache starts up. More of this follows.The problems of mod_perl — which are not that serious — almost all stem from the fact that all your separate scripts now run as a single script in a rather odd environment.However, because Apache and Perl are now rather intimately blended, there is a corresponding fuzziness about the interface between them. Rather surprisingly, we can now include Perl scripts in the Apache Config file, though we will not go to such extreme lengths here.Since things are more complicated, there are more things to go wrong and greater need for careful testing. The error_log is going to be your best friend. Make sure that correct line numbers are enabled when you compile mod_perl, and you may want to use Carp at runtime to get fuller error messages.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - mod_perl Documentation
- InhaltsvorschauBefore doing anything, it would be sensible to cast a glance at the documentation: what are we getting? What can we do with it? What are the pitfalls?In line with the maturity (or bloat) of the Apache project, there is a stunning amount of this material at
http://perl.apache.org/#docs. We started off by downloading The mod_perl Guide by Stas Bekman athttp://perl.apache.org/guide. There must be more than 500 pages, many of which are applicable only to very specialized situations. Obviously we cannot transcribe or usefully compress this amount of material into a few pages here. Be aware that it exists and if you have problems, look there first and thoroughly: you may very well find an answer.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Installing mod_perl — The Simple Way
- InhaltsvorschauWe assume, to begin with, that you are running on some sort of Unix machine, you have downloaded the Apache sources, built Apache, and that now you are going to add mod_perl.The first thing to do is to get the mod_perl sources. Go to
http://apache.org. In the list of links to the left of the screen you should see "mod_perl": select it. This takes you tohttp://perl.apache.org, the home page of the Apache/Perl Integration Project.The first step is to select "Download," which then offers you a number of ways of getting to the executables. The simplest is to download fromhttp://perl.apache.org/dist(linked as this site), but there are many alternatives. When we did it, the gzipped tar on offer was mod_perl-1.24.tar.gz — no doubt the numbers will have moved on by the time this is in print. This gives you about 600 KB of file that you get onto your Unix machine as best you can.It is worth saving it in a directory near your Apache, because this slightly simplifies the business of building and installing it later on. We keep all this stuff in /usr/src/mod_perl, near where the Apache sources were already stored. We created a directory for mod_perl, moved the downloaded file into it, unzipped it withgunzip <filename>, and extracted the files withtar xvf <filename>so we have: /usr/src/apache/mod_perl/mod_perl-1.24, and not very far away: /usr/src/apache/apache_1.3.26.Go into /usr/src/apache/mod_perl/mod_perl-1.24, and read INSTALL. The simple way of installing the package offers no surprises:perl Makefile.PL make make test make install
For some reason, we found we had to repeat the whole process two or three times before it all went smoothly without error messages. So if you get obscure complaints, go back to the top and try again before beginning to scream.Some clever things happen, culminating in a recompile of Apache. This works because the mod_perl makefile looks for the most recent Apache source in a neighboring directory. If you want to take this route, make sure that the right version is in the right place. If the installation process cannot find an Apache source directory, it will ask you where to look. This process generates a new httpd in /usr/src/apache/apache_1.3.26/src, which needs to be copied to wherever you keep your executables — in our case, /usr/local/bin.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Modifying Your Scripts to Run Under mod_perl
- InhaltsvorschauMany scripts that will run under mod_cgi will run under mod_perl using
Apache::PerlRunin the Config file. This in itself speeds things up because Perl does not have to reload for each call; scripts that have been tidied up or written especially will run even better underApache::Registry.You may want to experiment with different Config files and scripts. If you are running underApache::Registry, you will have to restart Apache to reload the script.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Global Variables
- InhaltsvorschauThe biggest single "gotcha" for scripts running under
Apache::Registryis caused by global variables. The mod_cgi environment is rather kind to the slack programmer. Your scripts, which tend to be short and simple, get loaded, run, and then thrown away. Perl rather considerately initializes all variables toundefat startup, so one tends to forget about the dangers they represent.Unhappily, under mod_perl andApache::Registry, scripts effectively run as subroutines. Global variables get initialized at startup as usual, but not again, so if you don't explicitly initialize them at each call, they will carry forward whatever value they had after the last call. What makes these bugs more puzzling is that as the Apache child processes start, each one of them has its variables set to 0. The errant behavior will not begin to show until a child process is used a second time — and maybe not even then.There are several lines of attack:- Do away with every global variable that isn't absolutely necessary
- Make sure that every global variable that survives is initialized
- Put your code into modules as subroutines and call it from the main script — for some reason global variables in the module will be initialized
To illustrate this tiresome behavior we created a new directory /usr/www/APACHE3/APACHE3/site.mod_perl/mod_perl and copied everything across into it from.../mod_cgi. The startup file go was now:httpd.perl -d /usr/www/APACHE3/APACHE3/site.mod_perl/mod_perl
The Config file is as follows:User webuser Group webuser ServerName www.butterthlies.com LogLevel debug DocumentRoot /usr/www/APACHE3/APACHE3/site.mod_perl/mod_cgi/htdocs TransferLog /usr/www/APACHE3/APACHE3/site.mod_perl/logs/access_log ErrorLog /usr/www/APACHE3/APACHE3/site.mod_perl/logs/error_log LogLevel debug #change to AliasMatch from ScriptAliasMatch AliasMatch /(.*) /usr/www/APACHE3/APACHE3/site.mod_perl/cgi-bin/$1 DirectoryIndex /bin/home Alias /bin /usr/www/APACHE3/APACHE3/site.mod_perl/cgi-bin SetHandler perl-script PerlHandler Apache::Registry #PerlHandler Apache::PerlRun
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Strict Pregame
- InhaltsvorschauIt is extremely important to:
use strict;
under mod_perl, to detect unsafe Perl constructs.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Loading Changes
- InhaltsvorschauUnder mod_cgi and mod_perl
Apache::PerlRunyou simply have to edit a script and save it to start it working. Under mod_perl andApache::Registry, the changes will not take effect until you restart Apache or reload your scripts. Stas Beckman (http://perl.apache.org/guide/config.html) gives some very elaborate ways of doing this, including a method of rewriting your Config file via an HTML form. We feel that although this sort of trick may amaze and delight your friends, it may please your enemies even more, who will find there new and exciting ways of penetrating your security. We see nothing wrong with restarting Apache with the script stop_go: it will give anyone who is logged on to your site a surprise:kill -USR1 `cat logs\httpd.pid`
This reloads Perl, loads the scripts afresh, and reinitializes all variables.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Opening and Closing Files
- InhaltsvorschauAnother consequence of scripts remaining permanently loaded is that opened files are not automatically closed when a script terminates — because it doesn't terminate until Apache is shut down. Failure to do this will eat up memory and file handles. It is important therefore that every opened file should be explicitly closed. However, it is not good enough just to use close() conscientiously because something may go wrong in the script, causing it to exit without executing the close() statement. The cure is to use the I/O module. This has the effect that the file handle is closed when the block it is in goes out of scope:
use IO; ... my $fh=IO::File->new("name") or die $!; $fh->print($text); #or $stuff=<$fh>; # $fh closes automaticallyAlternatively:use Symbol; ... My $fh=Symbol::gensym; Open $fh or die $!; .... #automatic close
Under Perl 5.6.0 this is enough:open my $fh, $filename or die $!; ... # automatic close
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Configuring Apache to Use mod_perl
- InhaltsvorschauBearing all this in mind, we can now set up the Config file neatly. In line with convention, we rename .../cgi-bin to .../perl. We can then put most of the Perl stuff neatly in a
<Location>block:User webuser Group webuser ServerName www.butterthlies.com DocumentRoot /usr/www/APACHE3/APACHE3/site.mod_perl/mod_cgi/htdocs TransferLog /usr/www/APACHE3/APACHE3/site.mod_perl/logs/access_log ErrorLog /usr/www/APACHE3/APACHE3/site.mod_perl/logs/error_log #change this before production! LogLevel debug AliasMatch /perl(.*) /usr/www/APACHE3/APACHE3/site.mod_perl/perl/$1 Alias /perl /usr/www/APACHE3/APACHE3/site.mod_perl/perl DirectoryIndex /perl/home PerlTaintCheck On PerlWarn On <Location /perl> SetHandler perl-script PerlHandler Apache::Registry #PerlHandler Apache::PerlRun Options ExecCGI PerlSendHeader On </Location>
Remember to reduce the Debug level before using this in earnest! Note that the two directives:PerlTaintCheck On PerlWarn On
won't go into the<Location>block because they are executed when Perl loads.A quick web site is well on the way to being a good web site. It is probably worth taking a little trouble to speed up your scripts; but bear in mind that most elapsed time on the Web is spent by clients looking at their browser screens, trying to work out what they're about.We discuss the larger problems of speeding up whole sites in Chapter 12. Here we offer a few tips on making scripts run faster in less space. The faster they run, the more clients you can serve in sequence; the less space they run in, the more copies you can run and the more clients you can serve simultaneously. However, if your site attracts so many people it is still bogging down, you can surely afford to throw more hardware at it. If you can't, why are you bothering?Users of FreeBSD might like to look at http://www.freebsd.org/cgi/man.cgi?query=tuning for some basic suggestionsThe search for perfect optimization can get into subtle and time-consuming byways that are very dependent on the details of how your scripts work. A good reason not to spend too much time on optimizing your code is that the small change you make tomorrow to fix a maintenance problem will probably throw the hard-won optimizations all out of whack.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 18: mod_jserv and Tomcat
- InhaltsvorschauSince the advent of the Servlets API, Java developers have been able to work behind a web server interface. For reasons of price, convenience, and ready availability, Apache has long been a popular choice for Java developers, holding its own in a programming world otherwise largely dominated by commercial tools.The Apache-approved method for adding Java support to Apache is to use Tomcat. This is an open source version of the Java servlet engine that installs itself into Apache. The interpreter is always available, without being loaded at each call, to run your scripts. The old way to run Java with Apache was via JServ — which is now (again, in theory) obsolete on its own. JServ and Tomcat are both Java applications that talk to Apache via an Apache module (mod_jserv for JServ and mod_jk for Tomcat), using a socket to get from Apache to the JVM.In practice, we had considerable difficulty with Tomcat. Since mod_jserv is still maintained and is not (all that) difficult to install, Java enthusiasts might like to try it. We will describe JServ first and then Tomcat. For more on Servlet development in general, see Jason Hunter's Java Servlet Programming (O'Reilly, 2001).
Windows users should get the self-installing .exe distribution fromhttp://java.apache.org/.
Download the gzipped tar file fromhttp://java.apache.org/, and unpack it in a suitable place — we put it in /usr/src/mod_jserv.The READMEfile says:Apache JServ is a 100% pure Java servlet engine designed to implement the Sun Java Servlet API 2.0 specifications and add Java Servlet capabilities to the Apache HTTP Server.For this installation to work, you must have:- Apache 1.3.9 or later.
- But not Apache v2, which does not support mod_jserv.
- A fully compliant Java 1.1 Runtime Environment
- We decided to install the full Java Development Kit (which we needed anyway for Tomcat — see later on). We went to the FreeBSD site and downloaded the 1.1.8 JDK from
ftp://ftp.FreeBSD.org/pub/FreeBSD/ports/local-distfiles/nate/JDK1.1/jdk1.1.8_ELF.V1999-11-9.tar.gz
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - mod_jserv
- Inhaltsvorschau
Windows users should get the self-installing .exe distribution fromhttp://java.apache.org/.
Download the gzipped tar file fromhttp://java.apache.org/, and unpack it in a suitable place — we put it in /usr/src/mod_jserv.The READMEfile says:Apache JServ is a 100% pure Java servlet engine designed to implement the Sun Java Servlet API 2.0 specifications and add Java Servlet capabilities to the Apache HTTP Server.For this installation to work, you must have:- Apache 1.3.9 or later.
- But not Apache v2, which does not support mod_jserv.
- A fully compliant Java 1.1 Runtime Environment
- We decided to install the full Java Development Kit (which we needed anyway for Tomcat — see later on). We went to the FreeBSD site and downloaded the 1.1.8 JDK from
ftp://ftp.FreeBSD.org/pub/FreeBSD/ports/local-distfiles/nate/JDK1.1/jdk1.1.8_ELF.V1999-11-9.tar.gz.If you are adventurous, 1.2 is available fromhttp://www.freebsd.org/java/dists/12.html. When you have it, see Section 18.2.1 for what to do next. If you are using a different operating system from any of those mentioned, you will have to find the necessary package for yourself. - The Java servlet development kit (JSDK)
- A range of versions is available at
http://java.sun.com/products/servlet/download.html. As is usual with anything to do with Java, a certain amount of confusion is evident. The words "Java Servlet Development Kit" or "JSDK" are hard to find on this page, and when found they seem to refer to the very oldest versions rather than the newer ones that are called "Java Servlet." However, we felt that older is probably better in the fast-moving but erratic world of Java, and we downloaded v2.0 fromhttp://java.sun.com/products/servlet/archive.html. This offered both Windows and "Unix (Solaris and others)" code, with the reassuring note: "The Unix download is labeled as being for Solaris but contains no Solaris specific code." The tar file arrived with a .Z extension, signifying that it needs to be expanded with the Unix utilityuncompress. There is a FreeBSD JSDK available at
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Tomcat
- InhaltsvorschauTomcat, part of the Jakarta Project, is the modern version of JServ and is able to act as a server in its own right. But we feel that it will be a long time catching up with Apache and that it would not be a sensible choice as the standalone server for a serious web site.The home URL for the Jakarta project is
http://jakarta.apache.org/, where we are told:The goal of the Jakarta Project is to provide commercial-quality server solutions based on the Java Platform that are developed in an open and cooperative fashion.At the time of writing, Tomcat 4.0 was incompatible with Apache's mod-cgi, and in any case requires Java 1.2, which is less widely available than Java 1.1, so we decided to concentrate on Tomcat 3.2.In the authors' experience, installing anything to do with Java is a very tiresome process, and this was no exception. The assumption seems to be that Java is so fascinating that proper explanations are unnecessary — devotees will immerse themselves in the holy stream and all will become clear after many days beneath the surface. This is probably because explanations are expensive and large commercial interests are involved. It contrasts strongly with the Apache site or the Perl CPAN network, both of which are maintained by unpaid enthusiasts and usually, in our experience, are easy to understand and work immaculately.First, you need a Java Development Kit (JDK). We downloaded jdk1.1.8 for FreeBSD fromhttp://java.sun.comand installed it. Another source isftp://ftp.FreeBSD.org/pub/FreeBSD/ports/local-distfiles/nate/JDK1.1/jdk1.1.8_ELF.V1999-11-9.tar.gz. Installation is simple: you just unzip the tarball and then extract the files. If you read the README without paying close attention, you may get the impression that you need to unzip the src.zip file — you do not, unless you want to read the source code of the Java components. And, of course, you absolutely must not unzip classes.zip.An essential step that may not be very clear from the documentation is to include the JDK, at ..../usr/src/java/jdk1.1.8/bin on your path, to set the environment variableEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Connecting Tomcat to Apache
- InhaltsvorschauThe basic document here is .../doc/tomcat-apache-howto.html. It starts with the discouraging observation:Since the Tomcat source tree is constantly changing, the information herein may be out of date. The only definitive reference at this point is the source code.As we have noted earlier, this may make you think that Tomcat is more suited to people who prefer the journey to the destination. You will also want to look at
http://jakarta.apache.org/tomcat/tomcat-3.2-doc/uguide/tomcat_ug.html, though the two documents seem to disagree on various points.The Tomcat interface in Apache is mod_jk. The first job is to get, compile, and install it into Apache. When we downloaded Tomcat earlier, we were getting Java, which is platform independent, and therefore the binaries would do. mod_jk is needed in source form and is distributed with the source version of Tomcat, so we went back tohttp://jakarta.apache.org/builds/jakarta-tomcat/release/v3.3a/src/and downloaded jakarta-tomcat-3.3a-src.tar.gz. Things are looking up: when we first tried this, some months before, the tar files for the Tomcat binaries and sources had the same name. When you unpacked one, it obliterated the other.Before starting, it is important that Apache has been compiled correctly, or this won't work at all. First, it must have been built using configure in the top directory, rather than src/Configure. Second, it must have shared object support enabled; that is, it should have been configured with at least one shared module enabled. An easy way to do this is to use:./configure --enable-shared=example
Note that if you have previously configured Apache and are running a version prior to 1.3.24, you'll need to remove src/support/apxs to force a rebuild, or things will mysteriously fail. Once built, Apache should then be installed with this:make install
Once this has been done, we can proceed.Having unpacked the sources, we went down to the .../src directory. The documentation is in ..../jakarta-tomcat-3.3a-src/src/doc/mod_jk-howto.html.. Set the environment variableEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 19: XML and Cocoon
- InhaltsvorschauSo far we have talked about different ways of writing scripts, worrying more about the logic they contain than their content. Working with XML and Cocoon takes a rather different tack, defining transformation pathways from a generic XML format to destination formats, typically HTML but possibly in other formats. Using this approach, a single set of documents can be used to generate a variety of different representations appropriate to different devices or situations.Like HTML, Extensible Markup Language (XML) uses markup (elements, attributes, comments, etc.) to identify content within a document. Unlike HTML, XML lets developers create their own vocabularies to describe that content, encouraging a much greater separation of content from presentation. When we wrote this page, we put the chapter title at the top right hand corner of a blank page: "XML and Cocoon." Then we started on the text:So far we have talked about different ways of writing scripts, worrying more about the logic they contain than their content...If you put this book down open and come back to it tomorrow, a glance at the top of the page reminds you of the subject of this chapter, and a glance at the top of the paragraph reminds you where we have got to in that chapter.It is not necessary to explain what these typographic page elements are telling you because we have all been reading books for years in a civilization that has had cheap printing and widespread literacy for half a millennium, so we don't even think about the conventions that have developed.Putting the right message in the right sort of type in the right place on the page in order to convey the right meaning to the reader was originally a specialized technical job done by the book editor and the printer.Now, computing is changing all that. We typeset our own manuscripts with the help of publishing packages. We publish our own books without the help of trained editors. We don't have to bother with the book format: we publish our own web pages by the billion, often without recourse to any standards of layout, intelligibility, or even sanity. Since computer data has no inherent format to tell us what it means, there is — and has been for a long time — an urgent need for some sort of markup language to tell us at what we are looking.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- XML
- InhaltsvorschauLike HTML, Extensible Markup Language (XML) uses markup (elements, attributes, comments, etc.) to identify content within a document. Unlike HTML, XML lets developers create their own vocabularies to describe that content, encouraging a much greater separation of content from presentation. When we wrote this page, we put the chapter title at the top right hand corner of a blank page: "XML and Cocoon." Then we started on the text:So far we have talked about different ways of writing scripts, worrying more about the logic they contain than their content...If you put this book down open and come back to it tomorrow, a glance at the top of the page reminds you of the subject of this chapter, and a glance at the top of the paragraph reminds you where we have got to in that chapter.It is not necessary to explain what these typographic page elements are telling you because we have all been reading books for years in a civilization that has had cheap printing and widespread literacy for half a millennium, so we don't even think about the conventions that have developed.Putting the right message in the right sort of type in the right place on the page in order to convey the right meaning to the reader was originally a specialized technical job done by the book editor and the printer.Now, computing is changing all that. We typeset our own manuscripts with the help of publishing packages. We publish our own books without the help of trained editors. We don't have to bother with the book format: we publish our own web pages by the billion, often without recourse to any standards of layout, intelligibility, or even sanity. Since computer data has no inherent format to tell us what it means, there is — and has been for a long time — an urgent need for some sort of markup language to tell us at what we are looking.A start was made on solving the problem many decades ago with the Standard Generalized Markup Language (SGML). This evolved informally for a long time and then was accepted by the International Organization for Standardization (ISO) in 1986. SGML has been taken up in a number of industries and used to define more specfic tag languages: ATA-2100 for aircraft maintenance manuals, PCIS in the semiconductor industry, DocBook for software documentation in the computer industry.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- XML and Perl
- InhaltsvorschauBefore you embark seriously on Cocoon, you might like to look at the FAQs (
http://xml.apache.org/cocoon/faqs.html#faq-noant). This will give you some notion of the substantial size, complexity, and tentative condition of the intellectual arena in which you will operate.If you don't feel quite up to embarking on the Java adventure (which seems to one of us (PL) comparable with trying to walk a straight line from New York to the South Pole), but you still need to get to grips with XML, there are a large number of Perl packages on CPAN (http://search.cpan.org/search?mode=module&query=xml), which might produce useful results much faster. The interface between Perl and Apache is covered in Chapter 16 and Chapter 17. Another option, also hosted by the XML Apache Project, is AxKit (http://axkit.org), a Perl package for transforming and presenting information stored in XML.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Cocoon
- InhaltsvorschauGo to
http://xml.apache.org/cocoon/index.htmlfor an introduction to Cocoon and a link to the download page. You will see that a number of mysterious entities are mentioned: Xerces, Xalan, FOP, Xang, SOAP. These are all subsidiary packages that are used to make up Cocoon. What you need of them is included with the Cocoon download and is guaranteed to work, even though they may not be the latest releases. This makes the file rather large, but saves problems with inconsistent versions.If you are running Apache on a platform where support for JDK 1.2 is either missing or difficult, you may still find it useful to run an older version of Cocoon. The following section documents Cocoon 1.8 installation with JServ, as well as the more recent Cocoon 2.0.3, which uses Tomcat. Both sources and binary versions are available for both multiple platforms.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Cocoon 1.8 and JServ
- InhaltsvorschauGo to
http://xml.apache.org/cocoon/index.htmlfor an introduction to Cocoon and a link to the download page. You will see that a number of mysterious entities are mentioned: Xerces, Xalan, FOP, Xang, SOAP. These are all subsidiary packages that are used to make up Cocoon. What you need of them is included with the Cocoon download and is guaranteed to work, even though they may not be the latest releases. This makes the file rather large, but saves problems with inconsistent versions.If you are running Win32, download the zipped executable; if Unix, then download the sources. We got Cocoon-1.8.tar.gz, which was flagged as the latest distribution.As usual read the README file. It tells you that the documentation is in the .../docs subdirectory as .html files — what it might mention, but did not, is that these files are formatted using fixed-width tables for a wide screen and, if you want hardcopy, don't print out well. They are not easy to read either, so more flexible versions, suitable for reading and printing, are in the .../docs.printer subdirectory. There is a snag, which appeared later: the printable files are completely different from the screen files and omit a crucial piece of information. Still, as the reader will have gathered, this is normal stuff in the world of Java.What follows is a minimum version of the installation process.It seemed sensible to read install.html. Since Cocoon is a Java servlet, albeit rather a large one, you need a Java virtual machine, v1.1 or better. We had v1.1.8. If you have v1.2 or better, you need to treat the file <jdk_home>/lib/tools.jar, which contains the Java compiler, as a Cocoon component and include it in your classpath. This meant editing .login again (see Chapter 18) to include:setenv CLASSPATH "/usr/src/java/jdk1.1.8/lib/tools.jar:."
We have to make Cocoon and all its bits visible to JServ by editing the file: usr/local/bin/etc/jserv.properties. The Cocoon documentaion suggests that you add the lines:wrapper.classpath=/usr/local/java/jdk1.1.8/lib/classes.zip wrapper.classpath=/usr/src/cocoon/bin/cocoon.jar wrapper.classpath=/usr/src/cocoon/lib/xerces_1_2.jar wrapper.classpath=/usr/src/cocoon/lib/xalan_1_2_D02.jar wrapper.classpath=/usr/src/cocoon/lib/fop_0_13_0.jar
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Cocoon 2.0.3 and Tomcat
- InhaltsvorschauCocoon 2.0.3 is pretty completely self-contained. The collection of classes in Cocoon and Tomcat has been tuned to avoid any conflicts, and installing Cocoon on an existing Tomcat installation involves adding one file to Tomcat and adding some directives to httpd.conf. As Java installations go, this one is quite friendly.Unless you have a strong need to customize Cocoon directly, by far the easiest way to install Cocoon is to download the binary distribution, in this case from
http://xml.apache.org/dist/cocoon/. Installing Cocoon on Tomcat 3.3 or 4.0 (with the exception of 4.03, for which you should read the docs about someCLASSPATHissues) requires unzipping the distribution file and copying the cocoon.war file into the /webapps directory of the Tomcat installation and restarting Tomcat. When Tomcat restarts, it will find the new file, expand it into a cocoon directory, and configure itself to support Cocoon. (Once this is done, you can delete the cocoon.war file.)If you've left Tomcat running its independent server, you can test whether Cocoon is running by firing up a browser and visiting http://localhost:8080/cocoon on your server. You should see the welcome screen for Cocoon. To move beyond using Tomcat by itself (which is fairly slow, though useful for testing), you have two options, depending on which Apache module you use to connect the Apache server to Tomcat.The older (but in some ways more capable) option is to use mod_jk, as described in Chapter 18. If you are using mod_jk, you can connect the Cocoon examples to Apache quite simply using by adding the directive:JkMount /cocoon/* ajp12
to your httpd.conf file and restarting Apache. mod_jk is designed to support general integration of Java Servlets and Java Server Pages with Apache and provides finer-grained control over how Apache calls on these facilities. mod_jk also provides support for Apache's load-balancing facilities.The newer approach uses mod_webapp, a module that seems more focused on simple connections between the Apache server and particular applications. mod_webapp comes with Tomcat 4.0 and higher, and you can find binary and RPM releases as well as source atEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Testing Cocoon
- InhaltsvorschauWhile the Cocoon examples are a welcome way to see that the installation process has gone smoothly, you'll most likely want to get your own documents into the system. Unlike the other application-building tools covered in the last few chapters, most uses of Cocoon start with publishing information rather than interacting with users. The following demonstration provides a first step toward publishing your own information, though you'll need a book on XSLT to learn how to make the most of this.We'll start with a simple XML document containing a test phrase:
<?xml version="1.0"?> <phrase> testing, testing, 1... 2... 3... </phrase>
Save this as test.xml in the main Cocoon directory. Next, we'll need an XSLT stylesheet, stored as test2html.xsl in the main Cocoon directory, to transform that "phrase" document into an HTML document:<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="phrase"> <html> <head><title><xsl:value-of select="." /></title></head> <body><h1><xsl:value-of select="." /></h1></body> </html> </xsl:template> </xsl:stylesheet>This stylesheet creates an HTML document when it encounters thephraseelement and uses the contents of thephraseelement (referenced by<xsl:value-of select="."/>, which returns the contents of the current context) to fill in the title of the HTML document, as well as a header in body content. What appeared once in the XML document will appear twice in the HTML result.We now have the pieces that Cocoon can use to generate HTML, but we still need to tell Cocoon that these parts have a purpose. Cocoon uses a site map, stored in the XML file sitemap.xmap, to manage all of its processing. Processing is defined using pipelines, which can be sophisticated combinations of stylesheets and code, but which in our case need to provide a home for an XML document and its XSLT transformation. By adding onemap:pipelineelement to the end of themap:pipelineselement, we can add our test to the list of pipelines Cocoon will run.<map:pipeline> <map:match pattern="test" /> <map:generate src="test.xml" /> <map:transform src="test2html.xsl" /> <map:serialize /> </map:pipeline>Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 20: The Apache API
- InhaltsvorschauApache provides an Application Programming Interface (API) to modules to insulate them from the mechanics of the HTTP protocol and from each other. In this chapter, we explore the main concepts of the API and provide a detailed listing of the functions available to the module author.In previous editions of this book, we described the Apache 1.x API. As you know, things have moved on since then, and Apache 2.x is upon us. The facilities in 2.x include some radical and exciting improvements over 1.x, and furthermore, 1.x has been frozen, apart from maintenance. So we decided that, unlike the rest of the book, we would document only the new API. (Appendix A provides some coverage of the 1.x API.)Also, in previous editions, we had an API reference section. Because Apache 2.0 has substantially improved API documentation of its own, and because the API is still moving around as we write, we have decided to concentrate on the concepts and examples and refer you to the Web for the API reference. Part of the work we have done while writing this chapter is to help ensure that the online documentation does actually cover all the important APIs.In this chapter, we will cover the important concepts needed to understand the API and point you to appropriate documentation. In the next chapter, we will illustrate the use of the API through a variety of example modules.In Apache 2.0 the Apache Group has gone to great lengths to try to document the API properly. Included in the headers is text that can by used to generate online documentation. Currently it expects to be processed by
doxygen, a system similar to javadoc, only designed for use with C and C++. Doxygen can be found athttp://www.stack.nl/~dimitri/doxygen/. Doxygen produces a variety of formats, but the only one we actively support is HTML. This format can be made simply by typing:make dox
in the top Apache directory. The older target "docs" attempts to usescandocinstead ofdoxygen, but it doesn't work very well.We do not reproduce information available in the online documentation here, but rather try to present a broader picture. We did consider including a copy of the documentation in the book, but decided against it because it is still changing quite frequently, and anyway it works much better as HTML documents than printed text.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Documentation
- InhaltsvorschauIn Apache 2.0 the Apache Group has gone to great lengths to try to document the API properly. Included in the headers is text that can by used to generate online documentation. Currently it expects to be processed by
doxygen, a system similar to javadoc, only designed for use with C and C++. Doxygen can be found athttp://www.stack.nl/~dimitri/doxygen/. Doxygen produces a variety of formats, but the only one we actively support is HTML. This format can be made simply by typing:make dox
in the top Apache directory. The older target "docs" attempts to usescandocinstead ofdoxygen, but it doesn't work very well.We do not reproduce information available in the online documentation here, but rather try to present a broader picture. We did consider including a copy of the documentation in the book, but decided against it because it is still changing quite frequently, and anyway it works much better as HTML documents than printed text.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - APR
- InhaltsvorschauAPR is the Apache Portable Runtime. This is a new library, used extensively in 2.0, that abstracts all the system-dependent parts of Apache. This includes file handling, sockets, pipes, threads, locking mechanisms (including file locking, interprocess locking, and interthread locking), and anything else that may vary according to platform.Although APR is designed to fulfill Apache's needs, it is an entirely independent standalone library with its own development team. It can also be used in other projects that have nothing to do with Apache.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Pools
- InhaltsvorschauOne of the most important thing to understand about the Apache API is the idea of a pool. This is a grouped collection of resources (i.e., file handles, memory, child programs, sockets, pipes, and so on) that are released when the pool is destroyed. Almost all resources used within Apache reside in pools, and their use should only be avoided after careful thought.An interesting feature of pool resources is that many of them can be released only by destroying the pool. Pools may contain subpools, and subpools may contain subsubpools, and so on. When a pool is destroyed, all its subpools are destroyed with it.Naturally enough, Apache creates a pool at startup, from which all other pools are derived. Configuration information is held in this pool (so it is destroyed and created anew when the server is restarted with a
kill). The next level of pool is created for each connection Apache receives and is destroyed at the end of the connection. Since a connection can span several requests, a new pool is created (and destroyed) for each request. In the process of handling a request, various modules create their own pools, and some also create subrequests, which are pushed through the API machinery as if they were real requests. Each of these pools can be accessed through the corresponding structures (i.e., the connect structure, the request structure, and so on).With this in mind, we can more clearly state when you should not use a pool: when the lifetime of the resource in question does not match the lifetime of a pool. If you need temporary storage (or files, etc.), you can create a subpool of an appropriate pool (the request pool is the most likely candidate) and destroy it when you are done, so lifetimes that are shorter than the pool's are easily handled. The only example we could think of where there was no appropriate pool in Apache 1.3 was the code for handling listeners (copy_listeners( )andclose_unused_listeners( )in http_main.c), which had a lifetime longer than the topmost pool! However, the introduction in 2.x of pluggable process models has changed this: there is now an appropriate pool, the process pool, which lives inEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Per-Server Configuration
- InhaltsvorschauSince a single instance of Apache may be called on to handle a request for any of the configured virtual hosts (or the main host), a structure is defined that holds the information related to each host. This structure,
server_rec, is defined in include/httpd.h:struct server_rec { /** The process this server is running in */ process_rec *process; /** The next server in the list */ server_rec *next; /** The name of the server */ const char *defn_name; /** The line of the config file that the server was defined on */ unsigned defn_line_number; /* Contact information */ /** The admin's contact information */ char *server_admin; /** The server hostname */ char *server_hostname; /** for redirects, etc. */ apr_port_t port; /* Log files --- note that transfer log is now in the modules... */ /** The name of the error log */ char *error_fname; /** A file descriptor that references the error log */ apr_file_t *error_log; /** The log level for this server */ int loglevel; /* Module-specific configuration for server, and defaults... */ /** true if this is the virtual server */ int is_virtual; /** Config vector containing pointers to modules' per-server config * structures. */ struct ap_conf_vector_t *module_config; /** MIME type info, etc., before we start checking per-directory info */ struct ap_conf_vector_t *lookup_defaults; /* Transaction handling */ /** I haven't got a clue */ server_addr_rec *addrs; /** Timeout, in seconds, before we give up */ int timeout; /** Seconds we'll wait for another request */ int keep_alive_timeout; /** Maximum requests per connection */ int keep_alive_max; /** Use persistent connections? */ int keep_alive; /** Pathname for ServerPath */ const char *path; /** Length of path */ int pathlen; /** Normal names for ServerAlias servers */ apr_array_header_t *names; /** Wildcarded names for ServerAlias servers */ apr_array_header_t *wild_names; /** limit on size of the HTTP request line */ int limit_req_line; /** limit on size of any request header field */ int limit_req_fieldsize; /** limit on number of request header fields */ int limit_req_fields; };Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Per-Directory Configuration
- InhaltsvorschauIt is also possible for modules to be configured on a per-directory, per-URL, or per-file basis. Again, each module optionally creates its own per-directory configuration (the same structure is used for all three cases). This configuration is made available to modules either directly (during configuration) or indirectly (once the server is running), through the
request_recstructure, which is detailed in the next section.Note that the module doesn't care how the configuration has been set up in terms of servers, directories, URLs, or file matches — the core of the server works out the appropriate configuration for the current request before modules are called by merging the appropriate set of configurations.The method differs from per-server configuration, so here's an example, taken this time from the standard module, modules/metadata/mod_expires.c:typedef struct { int active; char *expiresdefault; apr_table_t *expiresbytype; } expires_dir_config;First we have a per-directory configuration structure:static void *create_dir_expires_config(apr_pool_t *p, char *dummy) { expires_dir_config *new = (expires_dir_config *) apr_pcalloc(p, sizeof(expires_dir_config)); new->active = ACTIVE_DONTCARE; new->expiresdefault = ""; new->expiresbytype = apr_table_make(p, 4); return (void *) new; }This is the function that creates it, which will be linked from the module structure, as usual. Note that theactivemember is set to a default that can't be set by directives — this is used later on in the merging function.static const char *set_expiresactive(cmd_parms *cmd, void *in_dir_config, int arg) { expires_dir_config *dir_config = in_dir_config; /* if we're here at all it's because someone explicitly * set the active flag */ dir_config->active = ACTIVE_ON; if (arg == 0) { dir_config->active = ACTIVE_OFF; }; return NULL; } static const char *set_expiresbytype(cmd_parms *cmd, void *in_dir_config, const char *mime, const char *code) { expires_dir_config *dir_config = in_dir_config; char *response, *real_code; if ((response = check_code(cmd->pool, code, &real_code)) == NULL) { apr_table_setn(dir_config->expiresbytype, mime, real_code); return NULL; }; return apr_pstrcat(cmd->pool, "'ExpiresByType ", mime, " ", code, "': ", response, NULL); } static const char *set_expiresdefault(cmd_parms *cmd, void *in_dir_config, const char *code) { expires_dir_config * dir_config = in_dir_config; char *response, *real_code; if ((response = check_code(cmd->pool, code, &real_code)) == NULL) { dir_config->expiresdefault = real_code; return NULL; }; return apr_pstrcat(cmd->pool, "'ExpiresDefault ", code, "': ", response, NULL); } static const command_rec expires_cmds[] = { AP_INIT_FLAG("ExpiresActive", set_expiresactive, NULL, DIR_CMD_PERMS, "Limited to 'on' or 'off'"), AP_INIT_TAKE2("ExpiresBytype", set_expiresbytype, NULL, DIR_CMD_PERMS, "a MIME type followed by an expiry date code"), AP_INIT_TAKE1("ExpiresDefault", set_expiresdefault, NULL, DIR_CMD_PERMS, "an expiry date code"), {NULL} };Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Per-Request Information
- InhaltsvorschauThe core ensures that the right information is available to the modules at the right time. It does so by matching requests to the appropriate virtual server and directory information before invoking the various functions in the modules. This, and other information, is packaged in a
request_recstructure, defined in httpd.h:/** A structure that represents the current request */ struct request_rec { /** The pool associated with the request */ apr_pool_t *pool; /** The connection over which this connection has been read */ conn_rec *connection; /** The virtual host this request is for */ server_rec *server; /** If we wind up getting redirected, pointer to the request we * redirected to. */ request_rec *next; /** If this is an internal redirect, pointer to where we redirected * *from*. */ request_rec *prev; /** If this is a sub_request (see request.h) pointer back to the * main request. */ request_rec *main; /* Info about the request itself... we begin with stuff that only * protocol.c should ever touch... */ /** First line of request, so we can log it */ char *the_request; /** HTTP/0.9, "simple" request */ int assbackwards; /** A proxy request (calculated during post_read_request/translate_name) * possible values PROXYREQ_NONE, PROXYREQ_PROXY, PROXYREQ_REVERSE */ int proxyreq; /** HEAD request, as opposed to GET */ int header_only; /** Protocol, as given to us, or HTTP/0.9 */ char *protocol; /** Number version of protocol; 1.1 = 1001 */ int proto_num; /** Host, as set by full URI or Host: */ const char *hostname; /** When the request started */ apr_time_t request_time; /** Status line, if set by script */ const char *status_line; /** In any case */ int status; /* Request method, two ways; also, protocol, etc.. Outside of protocol.c, * look, but don't touch. */ /** GET, HEAD, POST, etc. */ const char *method; /** M_GET, M_POST, etc. */ int method_number; /** * allowed is a bitvector of the allowed methods. * * A handler must ensure that the request method is one that * it is capable of handling. Generally modules should DECLINE * any request methods they do not handle. Prior to aborting the * handler like this the handler should set r->allowed to the list * of methods that it is willing to handle. This bitvector is used * to construct the "Allow:" header required for OPTIONS requests, * and HTTP_METHOD_NOT_ALLOWED and HTTP_NOT_IMPLEMENTED status codes. * * Since the default_handler deals with OPTIONS, all modules can * usually decline to deal with OPTIONS. TRACE is always allowed, * modules don't need to set it explicitly. * * Since the default_handler will always handle a GET, a * module which does *not* implement GET should probably return * HTTP_METHOD_NOT_ALLOWED. Unfortunately this means that a Script GET * handler can't be installed by mod_actions. */ int allowed; /** Array of extension methods */ apr_array_header_t *allowed_xmethods; /** List of allowed methods */ ap_method_list_t *allowed_methods; /** byte count in stream is for body */ int sent_bodyct; /** body byte count, for easy access */ long bytes_sent; /** Time the resource was last modified */ apr_time_t mtime; /* HTTP/1.1 connection-level features */ /** sending chunked transfer-coding */ int chunked; /** multipart/byteranges boundary */ const char *boundary; /** The Range: header */ const char *range; /** The "real" content length */ apr_off_t clength; /** bytes left to read */ apr_size_t remaining; /** bytes that have been read */ long read_length; /** how the request body should be read */ int read_body; /** reading chunked transfer-coding */ int read_chunked; /** is client waiting for a 100 response? */ unsigned expecting_100; /* MIME header environments, in and out. Also, an array containing * environment variables to be passed to subprocesses, so people can * write modules to add to that environment. * * The difference between headers_out and err_headers_out is that the * latter are printed even on error, and persist across internal redirects * (so the headers printed for ErrorDocument handlers will have them). * * The 'notes' apr_table_t is for notes from one module to another, with no * other set purpose in mind... */ /** MIME header environment from the request */ apr_table_t *headers_in; /** MIME header environment for the response */ apr_table_t *headers_out; /** MIME header environment for the response, printed even on errors and * persist across internal redirects */ apr_table_t *err_headers_out; /** Array of environment variables to be used for sub processes */ apr_table_t *subprocess_env; /** Notes from one module to another */ apr_table_t *notes; /* content_type, handler, content_encoding, content_language, and all * content_languages MUST be lowercased strings. They may be pointers * to static strings; they should not be modified in place. */ /** The content-type for the current request */ const char *content_type; /* Break these out --- we dispatch on 'em */ /** The handler string that we use to call a handler function */ const char *handler; /* What we *really* dispatch on */ /** How to encode the data */ const char *content_encoding; /** for back-compat. only -- do not use */ const char *content_language; /** array of (char*) representing the content languages */ apr_array_header_t *content_languages; /** variant list validator (if negotiated) */ char *vlist_validator; /** If an authentication check was made, this gets set to the user name. */ char *user; /** If an authentication check was made, this gets set to the auth type. */ char *ap_auth_type; /** This response is non-cache-able */ int no_cache; /** There is no local copy of this response */ int no_local_copy; /* What object is being requested (either directly, or via include * or content-negotiation mapping). */ /** the uri without any parsing performed */ char *unparsed_uri; /** the path portion of the URI */ char *uri; /** The filename on disk that this response corresponds to */ char *filename; /** The path_info for this request if there is any. */ char *path_info; /** QUERY_ARGS, if any */ char *args; /** ST_MODE set to zero if no such file */ apr_finfo_t finfo; /** components of uri, dismantled */ apr_uri_components parsed_uri; /* Various other config info which may change with .htaccess files * These are config vectors, with one void* pointer for each module * (the thing pointed to being the module's business). */ /** Options set in config files, etc. */ struct ap_conf_vector_t *per_dir_config; /** Notes on *this* request */ struct ap_conf_vector_t *request_config; /** * a linked list of the configuration directives in the .htaccess files * accessed by this request. * N.B. always add to the head of the list, _never_ to the end. * that way, a sub request's list can (temporarily) point to a parent's list */ const struct htaccess_result *htaccess; /** A list of output filters to be used for this request */ struct ap_filter_t *output_filters; /** A list of input filters to be used for this request */ struct ap_filter_t *input_filters; /** A flag to determine if the eos bucket has been sent yet */ int eos_sent; /* Things placed at the end of the record to avoid breaking binary * compatibility. It would be nice to remember to reorder the entire * record to improve 64bit alignment the next time we need to break * binary compatibility for some other reason. */ };Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Access to Configuration and Request Information
- InhaltsvorschauAll this sounds horribly complicated, and, to be honest, it is. But unless you plan to mess around with the guts of Apache (which this book does not encourage you to do), all you really need to know is that these structures exist and that your module can access them at the appropriate moments. Each function exported by a module gets access to the appropriate structure to enable it to function. The appropriate structure depends on the function, of course, but it is typically either a
server_rec, the module's per-directory configuration structure (or two), or arequest_rec. As we saw earlier, if you have aserver_rec, you can get access to your per-server configuration, and if you have arequest_rec, you can get access to both your per-server and your per-directory configurations.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Hooks, Optional Hooks, and Optional Functions
- InhaltsvorschauIn Apache 1.x modules hooked into the appropriate "phases" of the main server by putting functions into appropriate slots in the module structure. This process is known as "hooking." This has been revised in Apache 2.0 — instead a single function is called at startup in each module, and this registers the functions that need to be called. The registration process also permits the module to specify how it should be ordered relative to other modules for each hook. (In Apache 1.x this was only possible for all hooks in a module instead of individually and also had to be done in the configuration file, rather than being done by the module itself.)This approach has various advantages. First, the list of hooks can be extended arbitrarily without causing each function to have a huge unwieldy list of NULL entries. Second, optional modules can export their own hooks, which are only invoked when the module is present, but can be registered regardless — and this can be done without modification of the core code.Another feature of hooks that we think is pretty cool is that, although they are dynamic, they are still typesafe — that is, the compiler will complain if the type of the function registered for a hook doesn't match the hook (and each hook can use a different type of function). They are also extremely efficient.So, what exactly is a hook? Its a point at which a module can request to be called. So, each hook specifies a function prototype, and each module can specify one (or more in 2.0) function that gets called at the appropriate moment. When the moment arrives, the provider of the hook calls all the functions in order. It may terminate when particular values are returned — the hook functions can return either "declined" or "ok" or an error. In the first case all are called until an error is returned (if one is, of course); in the second, functions are called until either an error or "ok" is returned. A slight complication in Apache 2.0 is that because each hook function can define the return type, it must also define how "ok," "decline," and errors are returned (in 1.x, the return type was fixed, so this was easier).Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Filters, Buckets, and Bucket Brigades
- InhaltsvorschauA new feature of Apache 2.0 is the ability to create filters, as described in Chapter 6. These are modules (or parts of modules) that modify the output or input of other modules in some way. Over the course of Apache's development, it has often been said that these could only be done in a threaded server, because then you can make the process look just like reading and writing files. Early attempts to do it without threading met the argument that the required "inside out" model would be too hard for most module writers to handle. So, when Apache 2.0 came along with threading as a standard feature, there was much rejoicing. But wait! Unfortunately, even in 2.0, there are platforms that don't handle threading and process models that don't use it even if the platform supports it. So, we were back at square one. But, strangely, a new confidence in the ability of module writers meant that people suddenly believed that they could handle the "inside out" programming model. And so, bucket brigades were born.The general concept is that each "layer" in the filter stack can talk to the next layer up (or down, depending on whether it is an input filter or an output filter) and deal with the I/O between them by handing up (or down) "bucket brigades," which are a list of "buckets." Each bucket can contain some data, which should be dealt with in order by the filter, which, in turn, generates new bucket brigades and buckets.Of course, there is an obvious asymmetry between input filters and output filters. Despite its obviousness, it takes a bit of getting used to when writing filters. An output filter is called with a bucket brigade and told "here, deal with the contents of this." In turn, it creates new bucket brigades and hands them on to the downstream filters. In contrast, an input filter gets asked "could you please fill this brigade?" and must, in turn, call lower-level filters to seed the input.Of course, there are special cases for the ends of brigades — the "bottom" end will actually receive or send data (often through a special bucket) and the "top" end will consume or generate data without any higher (for output) or lower (for input) filter feeding it.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Modules
- InhaltsvorschauAlmost everything in this chapter has been illustrated by a module implementing some kind of functionality. But how do modules fit into Apache? In fact, almost all of the work is done in the module itself, but a little extra is required outside. All that is required beyond that is to add it to the config.m4 file in its directory, which gets incorporated into the configure script. The lines for the two of the modules illustrated earlier are:
APACHE_MODULE(optional_fn_import, example optional function importer, , , no) APACHE_MODULE(optional_fn_export, example optional function exporter, , , no)
The two modules can be enabled with the--enable-optional-fn-exportand--enable-optional-fn-importflags to configure. Of course, the whole point is that you can enable either, both, or neither, and they will always work correctly.The complete list of arguments for APACHE_MODULE() are:APACHE_MODULE(name, helptext[, objects[, structname[, default[, config]]]])
where:-
name - This is the name of the module, which normally matches the source filename (i.e., it is mod_name.c).
-
helptext - This is the text displayed when
configureis run with--helpas an argument. -
objects - If this is present, it overrides the default object file of mod_name.o.
-
structname - The module structure is called name_module by default, but if this is present, it overrides it.
-
default - If present, this determines when the module is included. If set to
yes, the module is always included unless explicitly disabled. Ifno, the module is never included unless explicitly enabled. Ifmost, then it is not enabled unless--enable-mostis specified. If absent orall, then it is only enabled when--enable-allis specified.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Chapter 21: Writing Apache Modules
- InhaltsvorschauOne of the great things about Apache is that if you don't like what it does, you can change it. Now, this is actually true for any package with source code available, but Apache makes this easier. It has a generalized interface to modules that extends the functionality of the base package. In fact, when you download Apache, you get far more than just the base package, which is barely capable of serving files at all. You get all the modules the Apache Group considers vital to a web server. You also get modules that are useful enough to most people to be worth the effort of the Group to maintain them. In this chapter, we explore the intricacies of programming modules for Apache. We expect you to be thoroughly conversant with C and Unix (or Win32), because we are not going to explain anything about them. Refer to Chapter 20 or your Unix/Win32 manuals for information about functions used in the examples. We start out by explaining how to write a module for both Apache 1.3 and 2.0. We also explain how to port a 1.3 module to Apache v2.0.Perhaps the most important part of an Apache module is the
modulestructure. This is defined in http_config.h, so all modules should start (apart from copyright notices, etc.) with the following lines:#include "httpd.h" #include "http_config.h"
Note that httpd.h is required for all Apache source code.What is themodulestructure for? Simple: it provides the glue between the Apache core and the module's code. It contains pointers (to functions, lists, and so on) that are used by components of the core at the correct moments. The core knows about the variousmodulestructures because they are listed in modules.c, which is generated by the Configure script from the Configuration file.Traditionally, each module ends with itsmodulestructure. Here is a particularly trivial example, from mod_asis.c (1.3):module asis_module = { STANDARD_MODULE_STUFF, NULL, /* initializer */ NULL, /* create per-directory config structure */ NULL, /* merge per-directory config structures */ NULL, /* create per-server config structure */ NULL, /* merge per-server config structures */ NULL, /* command table */ asis_handlers, /* handlers */ NULL, /* translate_handler */ NULL, /* check_user_id */ NULL, /* check auth */ NULL, /* check access */ NULL, /* type_checker */ NULL, /* prerun fixups */ NULL /* logger */ NULL, /* header parser */ NULL, /* child_init */ NULL, /* child_exit */ NULL /* post read request */ };Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Overview
- InhaltsvorschauPerhaps the most important part of an Apache module is the
modulestructure. This is defined in http_config.h, so all modules should start (apart from copyright notices, etc.) with the following lines:#include "httpd.h" #include "http_config.h"
Note that httpd.h is required for all Apache source code.What is themodulestructure for? Simple: it provides the glue between the Apache core and the module's code. It contains pointers (to functions, lists, and so on) that are used by components of the core at the correct moments. The core knows about the variousmodulestructures because they are listed in modules.c, which is generated by the Configure script from the Configuration file.Traditionally, each module ends with itsmodulestructure. Here is a particularly trivial example, from mod_asis.c (1.3):module asis_module = { STANDARD_MODULE_STUFF, NULL, /* initializer */ NULL, /* create per-directory config structure */ NULL, /* merge per-directory config structures */ NULL, /* create per-server config structure */ NULL, /* merge per-server config structures */ NULL, /* command table */ asis_handlers, /* handlers */ NULL, /* translate_handler */ NULL, /* check_user_id */ NULL, /* check auth */ NULL, /* check access */ NULL, /* type_checker */ NULL, /* prerun fixups */ NULL /* logger */ NULL, /* header parser */ NULL, /* child_init */ NULL, /* child_exit */ NULL /* post read request */ };The first entry,STANDARD_MODULE_STUFF, must appear in allmodulestructures. It initializes some structure elements that the core uses to manage modules. Currently, these are the API version number, the index of the module in various vectors, the name of the module (actually, its filename), and a pointer to the nextEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Status Codes
- InhaltsvorschauThe HTTP 1.1 standard defines many status codes that can be returned as a response to a request. Most of the functions involved in processing a request return
OK,DECLINED, or a status code.DECLINEDgenerally means that the module is not interested in processing the request;OKmeans it did process it, or that it is happy for the request to proceed, depending on which function was called. Generally, a status code is simply returned to the user agent, together with any headers defined in the request structure'sheaders_outtable. At the time of writing, the status codes predefined in httpd.h were as follows:#define HTTP_CONTINUE 100 #define HTTP_SWITCHING_PROTOCOLS 101 #define HTTP_OK 200 #define HTTP_CREATED 201 #define HTTP_ACCEPTED 202 #define HTTP_NON_AUTHORITATIVE 203 #define HTTP_NO_CONTENT 204 #define HTTP_RESET_CONTENT 205 #define HTTP_PARTIAL_CONTENT 206 #define HTTP_MULTIPLE_CHOICES 300 #define HTTP_MOVED_PERMANENTLY 301 #define HTTP_MOVED_TEMPORARILY 302 #define HTTP_SEE_OTHER 303 #define HTTP_NOT_MODIFIED 304 #define HTTP_USE_PROXY 305 #define HTTP_BAD_REQUEST 400 #define HTTP_UNAUTHORIZED 401 #define HTTP_PAYMENT_REQUIRED 402 #define HTTP_FORBIDDEN 403 #define HTTP_NOT_FOUND 404 #define HTTP_METHOD_NOT_ALLOWED 405 #define HTTP_NOT_ACCEPTABLE 406 #define HTTP_PROXY_AUTHENTICATION_REQUIRED 407 #define HTTP_REQUEST_TIME_OUT 408 #define HTTP_CONFLICT 409 #define HTTP_GONE 410 #define HTTP_LENGTH_REQUIRED 411 #define HTTP_PRECONDITION_FAILED 412 #define HTTP_REQUEST_ENTITY_TOO_LARGE 413 #define HTTP_REQUEST_URI_TOO_LARGE 414 #define HTTP_UNSUPPORTED_MEDIA_TYPE 415 #define HTTP_INTERNAL_SERVER_ERROR 500 #define HTTP_NOT_IMPLEMENTED 501 #define HTTP_BAD_GATEWAY 502 #define HTTP_SERVICE_UNAVAILABLE 503 #define HTTP_GATEWAY_TIME_OUT 504 #define HTTP_VERSION_NOT_SUPPORTED 505 #define HTTP_VARIANT_ALSO_VARIES 506
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - The Module Structure
- InhaltsvorschauNow we will look in detail at each entry in the
modulestructure. We examine the entries in the order in which they are used, which is not the order in which they appear in the structure, and we also show how they are used in the standard Apache modules. We will also note the differences between versions 1.3 and 2.0 of Apache as we go along.Create Per-Server Config Structurevoid *module_create_svr_config(pool *pPool, server_rec *pServer)This structure creates the per-server configuration structure for the module. It is called once for the main server and once per virtual host. It allocates and initializes the memory for the per-server configuration and returns a pointer to it.pServerpoints to theserver_recfor the current server. See Example 21-1 (1.3) for an excerpt from mod_cgi.c.ExampleExample 21-1. mod_cgi.c#define DEFAULT_LOGBYTES 10385760 #define DEFAULT_BUFBYTES 1024 typedef struct { char *logname; long logbytes; int bufbytes; } cgi_server_conf; static void *create_cgi_config(pool *p, server_rec *s) { cgi_server_conf *c = (cgi_server_conf *) ap_pcalloc(p, sizeof(cgi_server_conf)); c->logname = NULL; c->logbytes = DEFAULT_LOGBYTES; c->bufbytes = DEFAULT_BUFBYTES; return c; }All this code does is allocate and initialize a copy ofcgi_server_conf, which gets filled in during configuration.The only changes for 2.0 in this are thatpoolbecomesapr_pool_tandap_pcalloc( )becomesapr_pcalloc().Create Per-Directory Config Structurevoid *module_create_dir_config(pool *pPool,char *szDir)This structure is called once per module, withszDirset toNULL, when the main host's configuration is initialized and again for each<Directory>,<Location>, or<File>section in the Config files containing a directive from this module, withszPathset to the directory. Any per-directory directives found outside<Directory>,<Location>, or<File>sections end up in theNULLconfiguration. It is also called whenEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - A Complete Example
- InhaltsvorschauWe spent some time trying to think of an example of a module that uses all the available hooks. At the same time, we spent considerable effort tracking through the innards of Apache to find out what happened when. Then we suddenly thought of writing a module to show what happened when. And, presto, mod_reveal.c was born. This is not a module you'd want to include in a live Apache without modification, since it prints stuff to the standard error output (which ends up in the error log, for the most part). But rather than obscure the main functionality by including code to switch the monitoring on and off, we thought it best to keep it simple. Besides, even in this form the module is very useful; it's presented and explained in this section.The module implements two commands,
RevealServerTagandRevealTag.RevealServerTagnames a server section and is stored in the per-server configuration.RevealTagnames a directory (or location or file) section and is stored in the per-directory configuration. When per-server or per-directory configurations are merged, the resulting configuration is tagged with a combination of the tags of the two merged sections. The module also implements a handler, which generates HTML with interesting information about a URL.No self-respecting module starts without a copyright notice:/* Reveal the order in which things are done. Copyright (C) 1996, 1998 Ben Laurie */
Note that the included http_protocol.h is only needed for the request handle; the other two are required by almost all modules:#include "httpd.h" #include "http_config.h" #include "http_protocol.h" #include "http_request.h" [2.0] #include "apr_strings.h" [2.0] #include "http_connection.h" [2.0] #include "http_log.h" [2.0] #include "http_core.h" [2.0] #include "scoreboard.h" [2.0] #include <unistd.h> [2.0]
The per-directory configuration structure is:typedef struct { char *szDir; char *szTag; } SPerDir;And the per-server configuration structure is:typedef struct { char *szServer; char *szTag; } SPerServer;There is an unavoidable circular reference in most modules; theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - General Hints
- InhaltsvorschauApache 2.0 may well be multithreaded (depending on the MPM in use), and, of course, the Win32 version always is. If you want your module to stand the test of time, you should avoid global variables, if at all possible. If not possible, put some thought into how they will be used by a multithreaded server. Don't forget that you can use the
notestable in the request record to store any per-request data you may need to pass between hooks.Never use a fixed-length buffer. Many of the security holes found in Internet software have fixed-length buffers at their root. The pool mechanism provides a rich set of tools you can use to avoid the need for fixed-length buffers.Remember that your module is just one of a random set an Apache user may configure into his server. Don't rely on anything that may be peculiar to your own setup. And don't do anything that might interfere with other modules (a tall order, we know, but do your best!).Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Porting to Apache 2.0
- InhaltsvorschauIn addition to the earlier discussion on how to write a module from scratch for Apache 2.0, which is broadly the same as for 1.x, we'll show how to port one.First of all, it is probably easiest to compile the module using apxs (although we are not keen on this approach, it is definitely the easiest, sadly). You'll need to have configured Apache like this:
./configure --enable-so
Then compiling mod_reveal is easy:apxs -c mod_reveal.c
This will, once its working, yield .libs/mod_reveal.so (use the-ioption, andapxswill obligingly install it in /usr/local/apache2/lib). However, compiling the Apache 1.x version of mod_reveal produces a large number of errors (note that you might save yourself some agony by adding-Wc,-Walland-Wc,-Werrorto the command line). The first problem is that some headers have been split up and moved around. So, we had to add:#include "http_request.h"
to get the definition forserver_rec.Also, many data structures and functions in Apache 1.3 had names that could cause conflict with other libraries. So, they have all been prefixed in an attempt to make them unique. The prefixes areap_,apr_, andapu_depending on whether they belong to Apache, APR, or APR-util. If they are data structures, they typically have also had_tappended. So,poolhas becomeapr_pool_t. Many functions have also moved fromap_toapr_; for example,ap_pstrcat()has becomeapr_pstrcat()and now needs the header apr_strings.h.Functions that didn't take pool arguments now do. For example:ap_add_version_component("Reveal/0.0");becomes:ap_add_version_component(pPool,"Reveal/0.0");
The command structure is now typesafe and uses special macros for each type of command, depending on the number of parameters it takes. For example:static command_rec aCommands[]= { { "RevealTag", RevealTag, NULL, ACCESS_CONF|OR_ALL, TAKE1, "a tag for this section"}, { "RevealServerTag", RevealServerTag, NULL, RSRC_CONF, TAKE1, "a tag for this server" }, { NULL } };becomes:static command_rec aCommands[]= { AP_INIT_TAKE1("RevealTag", RevealTag, NULL, ACCESS_CONF|OR_ALL, "a tag for this section"), AP_INIT_TAKE1("RevealServerTag", RevealServerTag, NULL, RSRC_CONF, "a tag for this server" ), { NULL } };Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Appendix A: The Apache 1.x API
- InhaltsvorschauApache 1.x provides an Application Programming Interface (API) to modules to insulate them from the mechanics of the HTTP protocol and from each other. In this appendix, we explore the main concepts of the API and provide a detailed listing of the functions available to the module author targeting Apache 1.x.
Section A.1: Pools
Section A.2: Per-Server Configuration
Section A.3: Per-Directory Configuration
Section A.4: Per-Request Information
Section A.5: Access to Configuration and Request Information
Section A.6: Functions
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Pools
- InhaltsvorschauThe most important thing to understand about the Apache API is the idea of a pool. This is a grouped collection of resources (i.e., file handles, memory, child programs, sockets, pipes, and so on) that are released when the pool is destroyed. Almost all resources used within Apache reside in pools, and their use should only be avoided with careful thought.An interesting feature of pool resources is that many of them can be released only by destroying the pool. Pools may contain subpools, and subpools may contain subsubpools, and so on. When a pool is destroyed, all its subpools are destroyed with it.Naturally enough, Apache creates a pool at startup, from which all other pools are derived. Configuration information is held in this pool (so it is destroyed and created anew when the server is restarted with a
kill). The next level of pool is created for each connection Apache receives and is destroyed at the end of the connection. Since a connection can span several requests, a new pool is created (and destroyed) for each request. In the process of handling a request, various modules create their own pools, and some also create subrequests, which are pushed through the API machinery as if they were real requests. Each of these pools can be accessed through the corresponding structures (i.e., the connect structure, the request structure, and so on).With this in mind, we can more clearly state when you should not use a pool: when the lifetime of the resource in question does not match the lifetime of a pool. If you need temporary storage (or files, etc.), you can create a subpool of a convenient pool (the request pool is the most likely candidate) and destroy it when you are done, so having a lifetime that is shorter than the pool's is not normally a good enough excuse. The only example we can think of where there is no appropriate pool is the code for handling listeners (copy_listeners( )andclose_unused_listeners( )in http_main.c), which have a lifetime longer than the topmost pool!There are a number of advantages to this approach, the most obvious being that modules can use resources without having to worry about when and how to release them. This is particularly useful when Apache handles an error condition. It simply bails out, destroying the pool associated with the erroneous request, confident that everything will be neatly cleaned up. Since each instance of Apache may handle many requests, this functionality is vital to the reliability of the server. Unsurprisingly, pools come into almost every aspect of Apache's API, as we shall see in this chapter. They are defined inEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Per-Server Configuration
- InhaltsvorschauSince a single instance of Apache may be called on to handle a request for any of the configured virtual hosts (or the main host), a structure is defined that holds the information related to each host. This structure,
server_rec, is defined in httpd.h:struct server_rec { server_rec *next; /* Description of where the definition came from */ const char *defn_name; unsigned defn_line_number; /* Full locations of server config info */ char *srm_confname; char *access_confname; /* Contact information */ char *server_admin; char *server_hostname; unsigned short port; /* For redirects, etc. */ /* Log files --- note that transfer log is now in the modules... */ char *error_fname; FILE *error_log; int loglevel; /* Module-specific configuration for server, and defaults... */ int is_virtual; /* True if this is the virtual server */ void *module_config; /* Config vector containing pointers to * modules' per-server config structures. */ void *lookup_defaults; /* MIME type info, etc., before we start * checking per-directory info. */ /* Transaction handling */ server_addr_rec *addrs; int timeout; /* Timeout, in seconds, before we give up */ int keep_alive_timeout; /* Seconds we'll wait for another request */ int keep_alive_max; /* Maximum requests per connection */ int keep_alive; /* Maximum requests per connection */ int send_buffer_size; /* Size of TCP send buffer (in bytes) */ char *path; /* Pathname for ServerPath */ int pathlen; /* Length of path */ char *names; /* Normal names for ServerAlias servers */ array_header *wild_names; /* Wildcarded names for ServerAlias servers */ uid_t server_uid; /* Effective user ID when calling exec wrapper */ gid_t server_gid; /* Effective group ID when calling exec wrapper */ };Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Per-Directory Configuration
- InhaltsvorschauIt is also possible for modules to be configured on a per-directory, per-URL, or per-file basis. Again, each module optionally creates its own per-directory configuration (the same structure is used for all three cases). This configuration is made available to modules either directly (during configuration) or indirectly (once the server is running, through the
request_recstructure, detailed in the next section).Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Per-Request Information
- InhaltsvorschauThe core ensures that the right information is available to the modules at the right time by matching requests to the appropriate virtual server and directory information before invoking the various functions in the modules. This, and other information, is packaged in a
request_recstructure, defined in httpd.h:struct request_rec { ap_pool *pool; conn_rec *connection; server_rec *server; request_rec *next; /* If we wind up getting redirected, * pointer to the request we redirected to. */ request_rec *prev; /* If this is an internal redirect, * pointer to where we redirected *from*. */ request_rec *main; /* If this is a subrequest (see request.h), * pointer back to the main request. */ /* Info about the request itself... we begin with stuff that only * protocol.c should ever touch... */ char *the_request; /* First line of request, so we can log it */ int assbackwards; /* HTTP/0.9, "simple" request */ int proxyreq; /* A proxy request (calculated during * post_read_request or translate_name) */ int header_only; /* HEAD request, as opposed to GET */ char *protocol; /* Protocol, as given to us, or HTTP/0.9 */ int proto_num; /* Number version of protocol; 1.1 = 1001 */ const char *hostname; /* Host, as set by full URI or Host: */ time_t request_time; /* When the request started */ char *status_line; /* Status line, if set by script */ int status; /* In any case */ /* Request method, two ways; also, protocol, etc. Outside of protocol.c, * look, but don't touch. */ char *method; /* GET, HEAD, POST, etc. */ int method_number; /* M_GET, M_POST, etc. */ /* allowed is a bitvector of the allowed methods. A handler must ensure that the request method is one that it is capable of handling. Generally modules should DECLINE any request methods they do not handle. Prior to aborting the handler like this, the handler should set r->allowed to the list of methods that it is willing to handle. This bitvector is used to construct the "Allow:" header required for OPTIONS requests, and METHOD_NOT_ALLOWED and NOT_IMPLEMENTED status codes. Since the default_handler deals with OPTIONS, all modules can usually decline to deal with OPTIONS. TRACE is always allowed; modules don't need to set it explicitly. Since the default_handler will always handle a GET, a module which does *not* implement GET should probably return METHOD_NOT_ALLOWED. Unfortunately, this means that a Script GET handler can't be installed by mod_actions. */ int allowed; /* Allowed methods - for 405, OPTIONS, etc. */ int sent_bodyct; /* Byte count in stream is for body */ long bytes_sent; /* Body byte count, for easy access */ time_t mtime; /* Time the resource was last modified */ /* HTTP/1.1 connection-level features */ int chunked; /* Sending chunked transfer-coding */ int byterange; /* Number of byte ranges */ char *boundary; /* Multipart/byteranges boundary */ const char *range; /* The Range: header */ long clength; /* The "real" content length */ long remaining; /* Bytes left to read */ long read_length; /* Bytes that have been read */ int read_body; /* How the request body should be read */ int read_chunked; /* Reading chunked transfer-coding */ /* MIME header environments, in and out. Also, an array containing * environment variables to be passed to subprocesses, so people can * write modules to add to that environment. * * The difference between headers_out and err_headers_out is that the * latter are printed even on error and persist across internal redirects * (so the headers printed for ErrorDocument handlers will have them). * * The 'notes' table is for notes from one module to another, with no * other set purpose in mind... */ table *headers_in; table *headers_out; table *err_headers_out; table *subprocess_env; table *notes; /* content_type, handler, content_encoding, content_language, and all * content_languages MUST be lowercased strings. They may be pointers * to static strings; they should not be modified in place. */ char *content_type; /* Break these out --- we dispatch on 'em */ char *handler; /* What we *really* dispatch on */ char *content_encoding; char *content_language; array_header *content_languages; /* Array of (char*) */ int no_cache; int no_local_copy; /* What object is being requested (either directly, or via include * or content-negotiation mapping). */ char *unparsed_uri; /* The URI without any parsing performed */ char *uri; /* The path portion of the URI */ char *filename; char *path_info; char *args; /* QUERY_ARGS, if any */ struct stat finfo; /* ST_MODE set to zero if no such file */ uri_components parsed_uri; /* Components of URI, dismantled */ /* Various other config info, which may change with .htaccess files. * These are config vectors, with one void* pointer for each module * (the thing pointed to being the module's business). */ void *per_dir_config; /* Options set in config files, etc. */ void *request_config; /* Notes on *this* request */ /* * A linked list of the configuration directives in the .htaccess files * accessed by this request. * N.B. Always add to the head of the list, _never_ to the end. * That way, a subrequest's list can (temporarily) point to a parent's * list. */ const struct htaccess_result *htaccess; };Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Access to Configuration and Request Information
- InhaltsvorschauAll this sounds horribly complicated, and, to be honest, it is. But unless you plan to mess around with the guts of Apache (which this book does not encourage you to do), all you really need to know is that these structures exist and that your module can get access to them at the appropriate moments. Each function exported by a module gets access to the appropriate structure to enable it to function. The appropriate structure depends on the function, of course, but it is always either a
server_rec, the module's per-directory configuration structure (or two), or arequest_rec. As we saw earlier, if you have aserver_rec, you can get access to your per-server configuration, and if you have arequest_rec, you can get access to both your per-server and your per-directory configurations.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Functions
- InhaltsvorschauNow that we have covered the main structures used by modules, we can detail the functions available to use and manipulate those structures.ap_make_sub_poolcreate a subpoolpool *ap_make_sub_pool(pool *p)Creates a subpool within a pool. The subpool is destroyed automatically when the pool
pis destroyed, but can also be destroyed earlier withdestroy_poolor cleared withclear_pool. Returns the new pool.ap_clear_poolclear a pool without destroying itvoid ap_clear_pool(pool *p)Clears a pool, destroying all its subpools withdestroy_pooland running cleanups. This leaves the pool itself empty but intact, and therefore available for reuse.ap_destroy_pooldestroy a pool and all its contentsvoid ap_destroy_pool(pool *p)Destroys a pool, running cleanup methods for the contents and also destroying all subpools. The subpools are destroyed before the pool's cleanups are run.ap_bytes_in_poolreport the size of a poollong ap_bytes_in_pool(pool *p)Returns the number of bytes currently allocated to a pool.ap_bytes_in_free_blocksreport the total size of free blocks in the pool systemlong ap_bytes_in_free_blocks(void)Returns the number of bytes currently in free blocks for all pools.ap_pallocallocate memory within a poolvoid *ap_palloc(pool *p, int size)Allocates memory of at leastsizebytes. The memory is destroyed when the pool is destroyed. Returns a pointer to the new block of memory.ap_pcallocallocate and clear memory within a poolvoid *ap_pcalloc(pool *p, int size)Allocates memory of at leastsizebytes. The memory is initialized to zero. The memory is destroyed when the pool is destroyed. Returns a pointer to the new block of memory.ap_pstrdupduplicate a string in a poolchar *ap_pstrdup(pool *p,const char *s)Duplicates a string within a pool. The memory is destroyed when the pool is destroyed. IfsisNULL, the return value isNULL; otherwise, it is a pointer to the new copy of the string.ap_pstrndupduplicate a string in a pool with limited lengthEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Zurück zu Apache: The Definitive Guide
