JETZT ONLINE BESTELLEN
Add to Cart
Perl Cookbook

Second Edition August 2003
ISBN 978-0-596-00313-5
964 Seiten
EUR41.00

Weitere Informationen zu diesem Buch

Inhaltsverzeichnis | Kolophon | Rezensionen |


Inhaltsverzeichnis

	
Chapter 1: Strings
Inhaltsvorschau
He multiplieth words without knowledge.
—Job 35:16
Many programming languages force you to work at an uncomfortably low level. You think in lines, but your language wants you to deal with pointers. You think in strings, but it wants you to deal with bytes. Such a language can drive you to distraction. Don't despair; Perl isn't a low-level language, so lines and strings are easy to handle.
Perl was designed for easy but powerful text manipulation. In fact, Perl can manipulate text in so many ways that they can't all be described in one chapter. Check out other chapters for recipes on text processing. In particular, see Chapter 6 and Chapter 8, which discuss interesting techniques not covered here.
Perl's fundamental unit for working with data is the scalar, that is, single values stored in single (scalar) variables. Scalar variables hold strings, numbers, and references. Array and hash variables hold lists or associations of scalars, respectively. References are used for referring to values indirectly, not unlike pointers in low-level languages. Numbers are usually stored in your machine's double-precision floating-point notation. Strings in Perl may be of any length, within the limits of your machine's virtual memory, and can hold any arbitrary data you care to put there—even binary data containing null bytes.
A string in Perl is not an array of characters—nor of bytes, for that matter. You cannot use array subscripting on a string to address one of its characters; use substr for that. Like all data types in Perl, strings grow on demand. Space is reclaimed by Perl's garbage collection system when no longer used, typically when the variables have gone out of scope or when the expression in which they were used has been evaluated. In other words, memory management is already taken care of, so you don't have to worry about it.
A scalar value is either defined or undefined. If defined, it may hold a string, number, or reference. The only undefined value is
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Many programming languages force you to work at an uncomfortably low level. You think in lines, but your language wants you to deal with pointers. You think in strings, but it wants you to deal with bytes. Such a language can drive you to distraction. Don't despair; Perl isn't a low-level language, so lines and strings are easy to handle.
Perl was designed for easy but powerful text manipulation. In fact, Perl can manipulate text in so many ways that they can't all be described in one chapter. Check out other chapters for recipes on text processing. In particular, see Chapter 6 and Chapter 8, which discuss interesting techniques not covered here.
Perl's fundamental unit for working with data is the scalar, that is, single values stored in single (scalar) variables. Scalar variables hold strings, numbers, and references. Array and hash variables hold lists or associations of scalars, respectively. References are used for referring to values indirectly, not unlike pointers in low-level languages. Numbers are usually stored in your machine's double-precision floating-point notation. Strings in Perl may be of any length, within the limits of your machine's virtual memory, and can hold any arbitrary data you care to put there—even binary data containing null bytes.
A string in Perl is not an array of characters—nor of bytes, for that matter. You cannot use array subscripting on a string to address one of its characters; use substr for that. Like all data types in Perl, strings grow on demand. Space is reclaimed by Perl's garbage collection system when no longer used, typically when the variables have gone out of scope or when the expression in which they were used has been evaluated. In other words, memory management is already taken care of, so you don't have to worry about it.
A scalar value is either defined or undefined. If defined, it may hold a string, number, or reference. The only undefined value is undef. All other values are defined, even numeric and the empty string. Definedness is not the same as Boolean truth, though; to check whether a value is defined, use the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Accessing Substrings
Inhaltsvorschau
You want to access or modify just a portion of a string, not the whole thing. For instance, you've read a fixed-width record and want to extract individual fields.
The substr function lets you read from and write to specific portions of the string.
$value = substr($string, $offset, $count);

$value = substr($string, $offset);



substr($string, $offset, $count) = $newstring;

substr($string, $offset, $count, $newstring);  # same as previous

substr($string, $offset)         = $newtail;
The unpack function gives only read access, but is faster when you have many substrings to extract.
# get a 5-byte string, skip 3 bytes,

# then grab two 8-byte strings, then the rest;

# (NB: only works on ASCII data, not Unicode)

($leading, $s1, $s2, $trailing) =

    unpack("A5 x3 A8 A8 A*", $data);



# split at 5-byte boundaries

@fivers = unpack("A5" x (length($string)/5), $string);



# chop string into individual single-byte characters

@chars  = unpack("A1" x length($string), $string);
Strings are a basic data type; they aren't arrays of a basic data type. Instead of using array subscripting to access individual characters as you sometimes do in other programming languages, in Perl you use functions like unpack or substr to access individual characters or a portion of the string.
The offset argument to substr indicates the start of the substring you're interested in, counting from the front if positive and from the end if negative. If the offset is 0, the substring starts at the beginning. The count argument is the length of the substring.
$string = "This is what you have";

#         +012345678901234567890  Indexing forwards  (left to right)

#          109876543210987654321- Indexing backwards (right to left)

#           note that 0 means 10 or 20, etc. above



$first  = substr($string, 0, 1);  # "T"

$start  = substr($string, 5, 2);  # "is"

$rest   = substr($string, 13);    # "you have"

$last   = substr($string, -1);    # "e"

$end    = substr($string, -4);    # "have"

$piece  = substr($string, -8, 3); # "you"
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Establishing a Default Value
Inhaltsvorschau
You would like to supply a default value to a scalar variable, but only if it doesn't already have one. It often happens that you want a hardcoded default value for a variable that can be overridden from the command line or through an environment variable.
Use the || or ||= operator, which work on both strings and numbers:
# use $b if $b is true, else $c

$a = $b || $c;



# set $x to $y unless $x is already true

$x ||= $y;
If 0, "0", and "" are valid values for your variables, use defined instead:
# use $b if $b is defined, else $c

$a = defined($b) ? $b : $c;



# the "new" defined-or operator from future perl

use v5.9;

$a = $b // $c;
The big difference between the two techniques (defined and ||) is what they test: definedness versus truth. Three defined values are still false in the world of Perl: 0, "0", and "". If your variable already held one of those, and you wanted to keep that value, a || wouldn't work. You'd have to use the more elaborate three-way test with defined instead. It's often convenient to arrange for your program to care about only true or false values, not defined or undefined ones.
Rather than being restricted in its return values to a mere 1 or 0 as in most other languages, Perl's || operator has a much more interesting property: it returns its first operand (the lefthand side) if that operand is true; otherwise it returns its second operand. The && operator also returns the last evaluated expression, but is less often used for this property. These operators don't care whether their operands are strings, numbers, or references—any scalar will do. They just return the first one that makes the whole expression true or false. This doesn't affect the Boolean sense of the return value, but it does make the operators' return values more useful.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Exchanging Values Without Using Temporary Variables
Inhaltsvorschau
You want to exchange the values of two scalar variables, but don't want to use a temporary variable.
Use list assignment to reorder the variables.
($VAR1, $VAR2) = ($VAR2, $VAR1);
Most programming languages require an intermediate step when swapping two variables' values:
$temp    = $a;

$a       = $b;

$b       = $temp;
Not so in Perl. It tracks both sides of the assignment, guaranteeing that you don't accidentally clobber any of your values. This eliminates the temporary variable:
$a       = "alpha";

$b       = "omega";

($a, $b) = ($b, $a);        # the first shall be last -- and versa vice
You can even exchange more than two variables at once:
($alpha, $beta, $production) = qw(January March August);

# move beta       to alpha,

# move production to beta,

# move alpha      to production

($alpha, $beta, $production) = ($beta, $production, $alpha);
When this code finishes, $alpha, $beta, and $production have the values "March", "August", and "January".
The section on "List value constructors" in perldata(1) and on "List Values and Arrays" in Chapter 2 of Programming Perl
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Converting Between Characters and Values
Inhaltsvorschau
You want to print the number represented by a given character, or you want to print a character given a number.
Use ord to convert a character to a number, or use chr to convert a number to its corresponding character:
$num  = ord($char);

$char = chr($num);
The %c format used in printf and sprintf also converts a number to a character:
$char = sprintf("%c", $num);                # slower than chr($num)

printf("Number %d is character %c\n", $num, $num);

Number 101 is character e

            
A C* template used with pack and unpack can quickly convert many 8-bit bytes; similarly, use U* for Unicode characters.
@bytes = unpack("C*", $string);

$string = pack("C*", @bytes);



$unistr = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9);

@unichars = unpack("U*", $unistr);
Unlike low-level, typeless languages such as assembler, Perl doesn't treat characters and numbers interchangeably; it treats strings and numbers interchangeably. That means you can't just assign characters and numbers back and forth. Perl provides Pascal's chr and ord to convert between a character and its corresponding ordinal value:
$value     = ord("e");    # now 101

$character = chr(101);    # now "e"
If you already have a character, it's really represented as a string of length one, so just print it out directly using print or the %s format in printf and sprintf. The %c format forces printf or sprintf to convert a number into a character; it's not used for printing a character that's already in character format (that is, a string).
printf("Number %d is character %c\n", 101, 101);
The pack, unpack, chr, and ord functions are all faster than
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Named Unicode Characters
Inhaltsvorschau
You want to use Unicode names for fancy characters in your code without worrying about their code points.
Place a use charnames at the top of your file, then freely insert "\N{ CHARSPEC}" escapes into your string literals.
The use charnames pragma lets you use symbolic names for Unicode characters. These are compile-time constants that you access with the \N{ CHARSPEC} double-quoted string sequence. Several subpragmas are supported. The :full subpragma grants access to the full range of character names, but you have to write them out in full, exactly as they occur in the Unicode character database, including the loud, all-capitals notation. The :short subpragma gives convenient shortcuts. Any import without a colon tag is taken to be a script name, giving case-sensitive shortcuts for those scripts.
use charnames ':full';

print "\N{GREEK CAPITAL LETTER DELTA} is called delta.\n";



Δ is called delta



use charnames ':short';

print "\N{greek:Delta} is an upper-case delta.\n";



Δ is an upper-case delta



use charnames qw(cyrillic greek);

print "\N{Sigma} and \N{sigma} are Greek sigmas.\n";

print "\N{Be} and \N{be} are Cyrillic bes.\n";



Σ

               and

               σ

               are Greek sigmas

               Б 

               and 

               б 

               are Cyrillic bes

            
Two functions, charnames::viacode and charnames::vianame, can translate between numeric code points and the long names. The Unicode documents use the notation U+XXXX to indicate the Unicode character whose code point is XXXX, so we'll use that here in our output.
use charnames qw(:full);

for $code (0xC4, 0x394) { 

    printf "Character U+%04X (%s) is named %s\n",

        $code, chr($code), charnames::viacode($code);

}



Character U+00C4 (Ä) is named LATIN CAPITAL LETTER A WITH DIAERESIS
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing a String One Character at a Time
Inhaltsvorschau
You want to process a string one character at a time.
Use split with a null pattern to break up the string into individual characters, or use unpack if you just want the characters' values:
@array = split(//, $string);      # each element a single character

@array = unpack("U*", $string);   # each element a code point (number)
Or extract each character in turn with a loop:
while (/(.)/g) {         # . is never a newline here

        # $1 has character, ord($1) its number

    }
As we said before, Perl's fundamental unit is the string, not the character. Needing to process anything a character at a time is rare. Usually some kind of higher-level Perl operation, like pattern matching, solves the problem more handily. See, for example, Recipe 7.14, where a set of substitutions is used to find command-line arguments.
Splitting on a pattern that matches the empty string returns a list of individual characters in the string. This is a convenient feature when done intentionally, but it's easy to do unintentionally. For instance, /X*/ matches all possible strings, including the empty string. Odds are you will find others when you don't mean to.
Here's an example that prints the characters used in the string "an apple a day", sorted in ascending order:
%seen = ( );

$string = "an apple a day";

foreach $char (split //, $string) {

    $seen{$char}++;

}

print "unique chars are: ", sort(keys %seen), "\n";

unique chars are:  adelnpy

            
These split and unpack solutions give an array of characters to work with. If you don't want an array, use a pattern match with the /g flag in a while loop, extracting one character at a time:
%seen = ( );

$string = "an apple a day";

while ($string =~ /(.)/g) {

    $seen{$1}++;

}

print "unique chars are: ", sort(keys %seen), "\n";

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reversing a String by Word or Character
Inhaltsvorschau
You want to reverse the words or characters of a string.
Use the reverse function in scalar context for flipping characters:
$revchars = reverse($string);
To flip words, use reverse in list context with split and join:
$revwords = join(" ", reverse split(" ", $string));
The reverse function is two different functions in one. Called in scalar context, it joins together its arguments and returns that string in reverse order. Called in list context, it returns its arguments in the opposite order. When using reverse for its character-flipping behavior, use scalar to force scalar context unless it's entirely obvious.
$gnirts   = reverse($string);       # reverse letters in $string



@sdrow    = reverse(@words);        # reverse elements in @words



$confused = reverse(@words);        # reverse letters in join("", @words)
Here's an example of reversing words in a string. Using a single space, " ", as the pattern to split is a special case. It causes split to use contiguous whitespace as the separator and also discard leading null fields, just like awk. Normally, split discards only trailing null fields.
# reverse word order

$string = 'Yoda said, "can you see this?"';

@allwords    = split(" ", $string);

$revwords    = join(" ", reverse @allwords);

print $revwords, "\n";

this?" see you "can said, Yoda

            
We could remove the temporary array @allwords and do it on one line:
$revwords = join(" ", reverse split(" ", $string));
Multiple whitespace in $string becomes a single space in $revwords. If you want to preserve whitespace, use this:
$revwords = join("", reverse split(/(\s+)/, $string));
One use of reverse is to test whether a word is a palindrome (a word that reads the same backward or forward):
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Treating Unicode Combined Characters as Single Characters
Inhaltsvorschau
You have a Unicode string that contains combining characters, and you'd like to treat each of these sequences as a single logical character.
Process them using \X in a regular expression.
$string = "fac\x{0327}ade";         # "façade"

$string =~ /fa.ade/;                # fails

$string =~ /fa\Xade/;               # succeeds



@chars = split(//, $string);        # 7 letters in @chars

@chars = $string =~ /(.)/g;         # same thing

@chars = $string =~ /(\X)/g;        # 6 "letters" in @chars
In Unicode, you can combine a base character with one or more non-spacing characters following it; these are usually diacritics, such as accent marks, cedillas, and tildas. Due to the presence of precombined characters, for the most part to accommodate legacy character systems, there can be two or more ways of writing the same thing.
For example, the word "façade" can be written with one character between the two a's, "\x{E7}", a character right out of Latin1 (ISO 8859-1). These characters might be encoded into a two-byte sequence under the UTF-8 encoding that Perl uses internally, but those two bytes still only count as one single character. That works just fine.
There's a thornier issue. Another way to write U+00E7 is with two different code points: a regular "c" followed by "\x{0327}". Code point U+0327 is a non-spacing combining character that means to go back and put a cedilla underneath the preceding base character.
There are times when you want Perl to treat each combined character sequence as one logical character. But because they're distinct code points, Perl's character-related operations treat non-spacing combining characters as separate characters, including substr, length, and regular expression metacharacters, such as in /./ or /[^abc]/.
In a regular expression, the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Canonicalizing Strings with Unicode Combined Characters
Inhaltsvorschau
You have two strings that look the same when you print them out, but they don't test as string equal and sometimes even have different lengths. How can you get Perl to consider them the same strings?
When you have otherwise equivalent strings, at least some of which contain Unicode combining character sequences, instead of comparing them directly, compare the results of running them through the NFD( ) function from the Unicode::Normalize module.
use Unicode::Normalize;

$s1 = "fa\x{E7}ade";                

$s2 = "fac\x{0327}ade";                

if (NFD($s1) eq NFD($s2)) { print "Yup!\n" }
The same character sequence can sometimes be specified in multiple ways. Sometimes this is because of legacy encodings, such as the letters from Latin1 that contain diacritical marks. These can be specified directly with a single character (like U+00E7, LATIN SMALL LETTER C WITH CEDILLA) or indirectly via the base character (like U+0063, LATIN SMALL LETTER C) followed by a combining character (U+0327, COMBINING CEDILLA).
Another possibility is that you have two or more marks following a base character, but the order of those marks varies in your data. Imagine you wanted the letter "c" to have both a cedilla and a caron on top of it in order to print a ̌. That could be specified in any of these ways:
$string = v231.780;

#   LATIN SMALL LETTER C WITH CEDILLA

#   COMBINING CARON



$string = v99.807.780;

#         LATIN SMALL LETTER C

#         COMBINING CARON

#         COMBINING CEDILLA



$string = v99.780.807

#         LATIN SMALL LETTER C

#         COMBINING CEDILLA

#         COMBINING CARON
The normalization functions rearrange those into a reliable ordering. Several are provided, including NFD( ) for canonical decomposition and NFC( ) for canonical decomposition followed by canonical composition. No matter which of these three ways you used to specify your
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Treating a Unicode String as Octets
Inhaltsvorschau
You have a Unicode string but want Perl to treat it as octets (e.g., to calculate its length or for purposes of I/O).
The use bytes pragma makes all Perl operations in its lexical scope treat the string as a group of octets. Use it when your code is calling Perl's character-aware functions directly:
$ff = "\x{FB00}";             # ff ligature

$chars = length($ff);         # length is one character

{

  use bytes;                  # force byte semantics

  $octets = length($ff);      # length is two octets

}

$chars = length($ff);         # back to character semantics
Alternatively, the Encode module lets you convert a Unicode string to a string of octets, and back again. Use it when the character-aware code isn't in your lexical scope:
use Encode qw(encode_utf8);



sub somefunc;                 # defined elsewhere



$ff = "\x{FB00}";             # ff ligature

$ff_oct = encode_utf8($ff);   # convert to octets



$chars = somefunc($ff);       # work with character string

$octets = somefunc($ff_oct);  # work with octet string
As explained in this chapter's Introduction, Perl knows about two types of string: those made of simple uninterpreted octets, and those made of Unicode characters whose UTF-8 representation may require more than one octet. Each individual string has a flag associated with it, identifying the string as either UTF-8 or octets. Perl's I/O and string operations (such as length) check this flag and give character or octet semantics accordingly.
Sometimes you need to work with bytes and not characters. For example, many protocols have a Content-Length header that specifies the size of the body of a message in octets. You can't simply use Perl's length function to calculate the size, because if the string you're calling length on is marked as UTF-8, you'll get the size in characters.
The use bytes
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Expanding and Compressing Tabs
Inhaltsvorschau
You want to convert tabs in a string to the appropriate number of spaces, or vice versa. Converting spaces into tabs can be used to reduce file size when the file has many consecutive spaces. Converting tabs into spaces may be required when producing output for devices that don't understand tabs or think them at different positions than you do.
Either use a rather funny looking substitution:
while ($string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e) {

    # spin in empty loop until substitution finally fails

}
or use the standard Text::Tabs module:
use Text::Tabs;

@expanded_lines  = expand(@lines_with_tabs);

@tabulated_lines = unexpand(@lines_without_tabs);
Assuming tab stops are set every N positions (where N is customarily eight), it's easy to convert them into spaces. The standard textbook method does not use the Text::Tabs module but suffers slightly from being difficult to understand. Also, it uses the $` variable, whose very mention currently slows down every pattern match in the program. This is explained in Special Variables in Chapter 6. You could use this algorithm to make a filter to expand its input's tabstops to eight spaces each:
while (<>) {

    1 while s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;

    print;

}
To avoid $`, you could use a slightly more complicated alternative that uses the numbered variables for explicit capture; this one expands tabstops to four each instead of eight:
1 while s/^(.*?)(\t+)/$1 . ' ' x (length($2) * 4 - length($1) % 4)/e;
Another approach is to use the offsets directly from the @+ and @- arrays. This also expands to four-space positions:
1 while s/\t+/' ' x (($+[0] - $-[0]) * 4 - $-[0] % 4)/e;
If you're looking at all of these 1 while loops and wondering why they couldn't have been written as part of a simple
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Expanding Variables in User Input
Inhaltsvorschau
You've read a string with an embedded variable reference, such as:
You owe $debt to me.
Now you want to replace $debt in the string with its value.
Use a substitution with symbolic references if the variables are all globals:
$text =~ s/\$(\w+)/${$1}/g;
But use a double /ee if they might be lexical (my) variables:
$text =~ s/(\$\w+)/$1/gee;
The first technique is basically to find what looks like a variable name, then use symbolic dereferencing to interpolate its contents. If $1 contains the string somevar, ${$1} will be whatever $somevar contains. This won't work if the use strict 'refs' pragma is in effect because that bans symbolic dereferencing.
Here's an example:
our ($rows, $cols);

no strict 'refs';                   # for ${$1}/g below

my $text;



($rows, $cols) = (24, 80);

$text = q(I am $rows high and $cols long);  # like single quotes!

$text =~ s/\$(\w+)/${$1}/g;

print $text;

I am 24 high and 80 long

            
You may have seen the /e substitution modifier used to evaluate the replacement as code rather than as a string. It's designed for situations where you don't know the exact replacement value, but you do know how to calculate it. For example, doubling every whole number in a string:
$text = "I am 17 years old";

$text =~ s/(\d+)/2 * $1/eg;
When Perl is compiling your program and sees a /e on a substitute, it compiles the code in the replacement block along with the rest of your program, long before the substitution actually happens. When a substitution is made, $1 is replaced with the string that matched. The code to evaluate would then be something like:
2 * 17
If we tried saying:
$text = 'I am $AGE years old';      # note single quotes

$text =~ s/(\$\w+)/$1/eg;           # WRONG
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Controlling Case
Inhaltsvorschau
A string in uppercase needs converting to lowercase, or vice versa.
Use the lc and uc functions or the \L and \U string escapes.
$big = uc($little);             # "bo peep" -> "BO PEEP"

$little = lc($big);             # "JOHN"    -> "john"

$big = "\U$little";             # "bo peep" -> "BO PEEP"

$little = "\L$big";             # "JOHN"    -> "john"
To alter just one character, use the lcfirst and ucfirst functions or the \l and \u string escapes.
$big = "\u$little";             # "bo"      -> "Bo"

$little = "\l$big";             # "BoPeep"  -> "boPeep"
The functions and string escapes look different, but both do the same thing. You can set the case of either just the first character or the whole string. You can even do both at once to force uppercase (actually, titlecase; see later explanation) on initial characters and lowercase on the rest.
$beast   = "dromedary";

# capitalize various parts of $beast

$capit   = ucfirst($beast);         # Dromedary

$capit   = "\u\L$beast";            # (same)

$capall  = uc($beast);              # DROMEDARY

$capall  = "\U$beast";              # (same)

$caprest = lcfirst(uc($beast));     # dROMEDARY

$caprest = "\l\U$beast";            # (same)
These capitalization-changing escapes are commonly used to make a string's case consistent:
# titlecase each word's first character, lowercase the rest

$text = "thIS is a loNG liNE";

$text =~ s/(\w+)/\u\L$1/g;

print $text;

This Is A Long Line

            
You can also use these for case-insensitive comparison:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Properly Capitalizing a Title or Headline
Inhaltsvorschau
You have a string representing a headline, the title of book, or some other work that needs proper capitalization.
Use a variant of this tc( ) titlecasing function:
INIT {

    our %nocap;

    for (qw(

            a an the

            and but or

            as at but by for from in into of off on onto per to with

        ))

    {

        $nocap{$_}++;

    }

}



sub tc {

    local $_ = shift;



    # put into lowercase if on stop list, else titlecase

    s/(\pL[\pL']*)/$nocap{$1} ? lc($1) : ucfirst(lc($1))/ge;



    s/^(\pL[\pL']*) /\u\L$1/x;  # first  word guaranteed to cap

    s/ (\pL[\pL']*)$/\u\L$1/x;  # last word guaranteed to cap



    # treat parenthesized portion as a complete title

    s/\( (\pL[\pL']*) /(\u\L$1/x;

    s/(\pL[\pL']*) \) /\u\L$1)/x;



    # capitalize first word following colon or semi-colon

    s/ ( [:;] \s+ ) (\pL[\pL']* ) /$1\u\L$2/x;



    return $_;

}
The rules for correctly capitalizing a headline or title in English are more complex than simply capitalizing the first letter of each word. If that's all you need to do, something like this should suffice:
s/(\w+\S*\w*)/\u\L$1/g;
Most style guides tell you that the first and last words in the title should always be capitalized, along with every other word that's not an article, the particle "to" in an infinitive construct, a coordinating conjunction, or a preposition.
Here's a demo, this time demonstrating the distinguishing property of titlecase. Assume the tc function is as defined in the Solution.
# with apologies (or kudos) to Stephen Brust, PJF,

# and to JRRT, as always.

@data = (

            "the enchantress of \x{01F3}ur mountain",

    "meeting the enchantress of \x{01F3}ur mountain",

    "the lord of the rings: the fellowship of the ring",

);



$mask = "%-20s: %s\n";



sub tc_lame {

    local $_ = shift;

    s/(\w+\S*\w*)/\u\L$1/g;

    return $_;

}



for $datum (@data) { 

    printf $mask, "ALL CAPITALS",       uc($datum);

    printf $mask, "no capitals",        lc($datum);

    printf $mask, "simple titlecase",   tc_lame($datum);

    printf $mask, "better titlecase",   tc($datum);

    print "\n";

}



Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Interpolating Functions and Expressions Within Strings
Inhaltsvorschau
You want a function call or expression to expand within a string. This lets you construct more complex templates than with simple scalar variable interpolation.
Break up your expression into distinct concatenated pieces:
$answer = $var1 . func( ) . $var2;   # scalar only
Or use the slightly sneaky @{[ LIST EXPR ]} or ${ \(SCALAR EXPR ) } expansions:
$answer = "STRING @{[ LIST EXPR ]} MORE STRING";

$answer = "STRING ${\( SCALAR EXPR )} MORE STRING";
This code shows both techniques. The first line shows concatenation; the second shows the expansion trick:
$phrase = "I have " . ($n + 1) . " guanacos.";

$phrase = "I have ${\($n + 1)} guanacos.";
The first technique builds the final string by concatenating smaller strings, avoiding interpolation but achieving the same end. Because print effectively concatenates its entire argument list, if we were going to print $phrase, we could have just said:
print "I have ",  $n + 1, " guanacos.\n";
When you absolutely must have interpolation, you need the punctuation-riddled interpolation from the Solution. Only @, $, and \ are special within double quotes and most backquotes. (As with m// and s///, the qx( ) synonym is not subject to double-quote expansion if its delimiter is single quotes! $home = qx'echo home is $HOME'; would get the shell $HOME variable, not one in Perl.) So, the only way to force arbitrary expressions to expand is by expanding a ${ } or @{ } whose block contains a reference.
In the example:
$phrase = "I have ${\( count_em( ) )} guanacos.";
the function call within the parentheses is not in scalar context; it is still in list context. The following overrules that:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Indenting Here Documents
Inhaltsvorschau
When using the multiline quoting mechanism called a here document, the text must be flush against the margin, which looks out of place in the code. You would like to indent the here document text in the code, but not have the indentation appear in the final string value.
Use a s/// operator to strip out leading whitespace.
# all in one

($var = << HERE_TARGET) =~ s/^\s+//gm;

    your text

    goes here

HERE_TARGET



# or with two steps

$var = << HERE_TARGET;

    your text

    goes here

HERE_TARGET

$var =~ s/^\s+//gm;
The substitution is straightforward. It removes leading whitespace from the text of the here document. The /m modifier lets the ^ character match at the start of each line in the string, and the /g modifier makes the pattern-matching engine repeat the substitution as often as it can (i.e., for every line in the here document).
($definition = << 'FINIS') =~ s/^\s+//gm;

    The five varieties of camelids

    are the familiar camel, his friends

    the llama and the alpaca, and the

    rather less well-known guanaco

    and vicuña.

FINIS
Be warned: all patterns in this recipe use \s, meaning one whitespace character, which will also match newlines. This means they will remove any blank lines in your here document. If you don't want this, replace \s with [^\S\n] in the patterns.
The substitution uses the property that the result of an assignment can be used as the lefthand side of =~. This lets us do it all in one line, but works only when assigning to a variable. When you're using the here document directly, it would be considered a constant value, and you wouldn't be able to modify it. In fact, you can't change a here document's value unless you first put it into a variable.
Not to worry, though, because there's an easy way around this, particularly if you're going to do this a lot in the program. Just write a subroutine:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reformatting Paragraphs
Inhaltsvorschau
Your string is too big to fit the screen, and you want to break it up into lines of words, without splitting a word between lines. For instance, a style correction script might read a text file a paragraph at a time, replacing bad phrases with good ones. Replacing a phrase like utilizes the inherent functionality of with uses will change the length of lines, so it must somehow reformat the paragraphs when they're output.
Use the standard Text::Wrap module to put line breaks at the right place:
use Text::Wrap;

@output = wrap($leadtab, $nexttab, @para);
Or use the more discerning CPAN module, Text::Autoformat, instead:
use Text::Autoformat;

$formatted = autoformat $rawtext;
The Text::Wrap module provides the wrap function, shown in Example 1-3, which takes a list of lines and reformats them into a paragraph with no line more than $Text::Wrap::columns characters long. We set $columns to 20, ensuring that no line will be longer than 20 characters. We pass wrap two arguments before the list of lines: the first is the indent for the first line of output, the second the indent for every subsequent line.
Example 1-3. wrapdemo
  #!/usr/bin/perl -w

  # wrapdemo - show how Text::Wrap works

  @input = ("Folding and splicing is the work of an editor,",

            "not a mere collection of silicon",

            "and",

            "mobile electrons!");

  use Text::Wrap qw($columns &wrap);

  $columns = 20;

  print "0123456789" x 2, "\n";

  print wrap("    ", "  ", @input), "\n";
The result of this program is:

               01234567890123456789

               Folding and

               splicing is the

               work of an

               editor, not a

               mere collection

               of silicon and
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Escaping Characters
Inhaltsvorschau
You need to output a string with certain characters (quotes, commas, etc.) escaped. For instance, you're producing a format string for sprintf and want to convert literal % signs into %%.
Use a substitution to backslash or double each character to be escaped:
# backslash

$var =~ s/([CHARLIST])/\\$1/g;



# double

$var =~ s/([CHARLIST])/$1$1/g;
$var is the variable to be altered. The CHARLIST is a list of characters to escape and can contain backslash escapes like \t and \n. If you just have one character to escape, omit the brackets:
$string =~ s/%/%%/g;
The following code lets you do escaping when preparing strings to submit to the shell. (In practice, you would need to escape more than just ' and " to make any arbitrary string safe for the shell. Getting the list of characters right is so hard, and the risks if you get it wrong are so great, that you're better off using the list form of system and exec to run programs, shown in Recipe 16.2. They avoid the shell altogether.)
$string = q(Mom said, "Don't do that.");

$string =~ s/(['"])/\\$1/g;
We had to use two backslashes in the replacement because the replacement section of a substitution is read as a double-quoted string, and to get one backslash, you need to write two. Here's a similar example for VMS DCL, where you need to double every quote to get one through:
$string = q(Mom said, "Don't do that.");

$string =~ s/(['"])/$1$1/g;
Microsoft command interpreters are harder to work with. In Windows, COMMAND.COM recognizes double quotes but not single ones, disregards backquotes for running commands, and requires a backslash to make a double quote into a literal. Any of the many free or commercial Unix-like shell environments available for Windows will work just fine, though.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Trimming Blanks from the Ends of a String
Inhaltsvorschau
You have read a string that may have leading or trailing whitespace, and you want to remove it.
Use a pair of pattern substitutions to get rid of them:
$string =~ s/^\s+//;

$string =~ s/\s+$//;
Or write a function that returns the new value:
$string = trim($string);

@many   = trim(@many);



sub trim {

    my @out = @_;

    for (@out) {

        s/^\s+//;          # trim left

        s/\s+$//;          # trim right

    }

    return @out =  = 1 

              ? $out[0]   # only one to return

              : @out;     # or many

}
This problem has various solutions, but this one is the most efficient for the common case. This function returns new versions of the strings passed in to it with their leading and trailing whitespace removed. It works on both single strings and lists.
To remove the last character from the string, use the chop function. Be careful not to confuse this with the similar but different chomp function, which removes the last part of the string contained within that variable if and only if it is contained in the $/ variable, "\n" by default. These are often used to remove the trailing newline from input:
# print what's typed, but surrounded by > < symbols

while (<STDIN>) {

    chomp;

    print ">$_<\n";

}
This function can be embellished in any of several ways.
First, what should you do if several strings are passed in, but the return context demands a single scalar? As written, the function given in the Solution does a somewhat silly thing: it (inadvertently) returns a scalar representing the number of strings passed in. This isn't very useful. You could issue a warning or raise an exception. You could also squash the list of return values together.
For strings with spans of extra whitespace at points other than their ends, you could have your function collapse any remaining stretch of whitespace characters in the interior of the string down to a single space each by adding this line as the new last line of the loop:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing Comma-Separated Data
Inhaltsvorschau
You have a data file containing comma-separated values that you need to read, but these data fields may have quoted commas or escaped quotes in them. Most spreadsheets and database programs use comma-separated values as a common interchange format.
If your data file follows normal Unix quoting and escaping conventions, where quotes within a field are backslash-escaped "like \"this\"", use the standard Text::ParseWords and this simple code:
use Text::ParseWords;

sub parse_csv0 {

    return quotewords("," => 0, $_[0]);

}
However, if quotes within a field are doubled "like ""this""", you could use the following procedure from Mastering Regular Expressions, Second Edition:
sub parse_csv1 {

    my $text = shift;      # record containing comma-separated values

    my @fields  = ( );



    while ($text =~ m{

        # Either some non-quote/non-comma text:

        ( [^"',] + )



         # ...or...

         | 



        # ...a double-quoted field: (with "" allowed inside)



        " # field's opening quote; don't save this

         (   now a field is either

          (?:     [^"]    # non-quotes or

              |

                  ""      # adjacent quote pairs

           ) *  # any number

         )

        " # field's closing quote; unsaved



    }gx)

    {

      if (defined $1) {

          $field = $1;

      } else {

          ($field = $2) =~ s/""/"/g;

      }

      push @fields, $field;

    }

    return @fields;

}
Or use the CPAN Text:CSV module:
use Text::CSV;

sub parse_csv1 {

    my $line = shift;

    my $csv = Text::CSV->new( );              

    return $csv->parse($line) && $csv->fields( );           

}
Or use the CPAN Tie::CSV_File module:
tie @data, "Tie::CSV_File", "data.csv";



for ($i = 0; $i < @data; $i++) {

    printf "Row %d (Line %d) is %s\n", $i, $i+1, "@{$data[$i]}";

    for ($j = 0; $j < @{$data[$i]}; $j++) {

        print "Column $j is <$data[$i][$j]>\n";

    } 

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Constant Variables
Inhaltsvorschau
You want a variable whose value cannot be modified once set.
If you don't need it to be a scalar variable that can interpolate, the use constant pragma will work:
use constant AVOGADRO => 6.02252e23;



printf "You need %g of those for guac\n", AVOGADRO;
If it does have to be a variable, assign to the typeglob a reference to a literal string or number, then use the scalar variable:
*AVOGADRO = \6.02252e23;

print "You need $AVOGADRO of those for guac\n";
But the most foolproof way is via a small tie class whose STORE method raises an exception:
package Tie::Constvar;

use Carp;

sub TIESCALAR {

    my ($class, $initval) = @_;

    my $var = $initval;

    return bless \$var => $class;

}

sub FETCH {

    my $selfref = shift;

    return $$selfref;

}

sub STORE {

    confess "Meddle not with the constants of the universe";

}
The use constant pragma is the easiest to use, but has a few drawbacks. The biggest one is that it doesn't give you a variable that you can expand in double-quoted strings. Another is that it isn't scoped; it puts a subroutine of that name into the package namespace.
The way the pragma really works is to create a subroutine of that name that takes no arguments and always returns the same value (or values if a list is provided). That means it goes into the current package's namespace and isn't scoped. You could do the same thing yourself this way:
sub AVOGADRO( ) { 6.02252e23 }
If you wanted it scoped to the current block, you could make a temporary subroutine by assigning an anonymous subroutine to the typeglob of that name:
use subs qw(AVOGADRO);

local *AVOGADRO = sub ( ) { 6.02252e23 };
But that's pretty magical, so you should comment the code if you don't plan to use the pragma.
If instead of assigning to the typeglob a reference to a subroutine, you assign to it a reference to a constant scalar, then you'll be able to use the variable of that name. That's the second technique given in the Solution. Its disadvantage is that typeglobs are available only for package variables, not for lexicals created via
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Soundex Matching
Inhaltsvorschau
You have two English surnames and want to know whether they sound somewhat similar, regardless of spelling. This would let you offer users a "fuzzy search" of names in a telephone book to catch "Smith" and "Smythe" and others within the set, such as "Smite" and "Smote".
Use the standard Text::Soundex module:
use Text::Soundex;

$CODE  = soundex($STRING);

@CODES = soundex(@LIST);
Or use the CPAN module Text::Metaphone:
use Text::Metaphone;

$phoned_words = Metaphone('Schwern');
The soundex algorithm hashes words (particularly English surnames) into a small space using a simple model that approximates an English speaker's pronunciation of the words. Roughly speaking, each word is reduced to a four-character string. The first character is an uppercase letter; the remaining three are digits. By comparing the soundex values of two strings, we can guess whether they sound similar.
The following program prompts for a name and looks for similarly sounding names from the password file. This same approach works on any database with names, so you could key the database on the soundex values if you wanted to. Such a key wouldn't be unique, of course.
use Text::Soundex;

use User::pwent;



print "Lookup user: ";

chomp($user =<STDIN>);

exit unless defined $user;

$name_code = soundex($user);



while ($uent = getpwent( )) {

    ($firstname, $lastname) = $uent->gecos =~ /(\w+)[^,]*\b(\w+)/;



    if ($name_code eq soundex($uent->name) ||

        $name_code eq soundex($lastname)   ||

        $name_code eq soundex($firstname)  )

    {

        printf "%s: %s %s\n", $uent->name, $firstname, $lastname;

    }

}
The Text::Metaphone module from CPAN addresses the same problem in a different and better way. The soundex function returns a letter and a three-digit code that maps just the beginning of the input string, whereas
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: fixstyle
Inhaltsvorschau
Imagine you have a table with both old and new strings, such as the following:
Old words
New words
bonnet
hood
rubber
eraser
lorry
truck
trousers
pants
The program in Example 1-4 is a filter that changes all occurrences of each element in the first set to the corresponding element in the second set.
When called without filename arguments, the program is a simple filter. If filenames are supplied on the command line, an in-place edit writes the changes to the files, with the original versions saved in a file with a ".orig" extension. See Recipe 7.16 for a description. A -v command-line option writes notification of each change to standard error.
The table of original strings and their replacements is stored below _ _END_ _ in the main program, as described in Recipe 7.12. Each pair of strings is converted into carefully escaped substitutions and accumulated into the $code variable like the popgrep2 program in Recipe 6.10.
A -t check to test for an interactive run check tells whether we're expecting to read from the keyboard if no arguments are supplied. That way if users forget to give an argument, they aren't wondering why the program appears to be hung.
Example 1-4. fixstyle
    #!/usr/bin/perl -w

    # fixstyle - switch first set of <DATA> strings to second set

    #   usage: $0 [-v] [files ...]

    use strict;

    my $verbose = (@ARGV && $ARGV[0] eq '-v' && shift);

    if (@ARGV) {

      $^I = ".orig";          # preserve old files

    } else {

      warn "$0: Reading from stdin\n" if -t STDIN;

    }

    my $code = "while (<>) {\n";

    # read in config, build up code to eval

    while (<DATA>) {

      chomp;

      my ($in, $out) = split /\s*=>\s*/;

      next unless $in && $out;

      $code .= "s{\\Q$in\\E}{$out}g";

      $code .= "&& printf STDERR qq($in => $out at \$ARGV line \$.\\n)"

                                                          if $verbose;

      $code .= ";\n";

    }

    $code .= "print;\n}\n";

    eval "{ $code } 1" || die;

    _ _END_ _

    analysed        => analyzed

    built-in        => builtin

    chastized       => chastised

    commandline     => command-line

    de-allocate     => deallocate

    dropin          => drop-in

    hardcode        => hard-code

    meta-data       => metadata

    multicharacter  => multi-character

    multiway        => multi-way

    non-empty       => nonempty

    non-profit      => nonprofit

    non-trappable   => nontrappable

    pre-define      => predefine

    preextend       => pre-extend

    re-compiling    => recompiling

    reenter         => re-enter

    turnkey         => turn-key
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: psgrep
Inhaltsvorschau
Many programs, including ps, netstat, lsof, ls -l, find -ls, and tcpdump, can produce more output than can be conveniently summarized. Logfiles also often grow too long to be easily viewed. You could send these through a filter like grep to pick out only certain lines, but regular expressions and complex logic don't mix well; just look at the hoops we jump through in Recipe 6.18.
What we'd really like is to make full queries on the program output or logfile. For example, to ask ps something like, "Show me all processes that exceed 10K in size but which aren't running as the superuser" or "Which commands are running on pseudo-ttys?"
The psgrep program does this—and infinitely more—because the specified selection criteria are not mere regular expressions; they're full Perl code. Each criterion is applied in turn to every line of output. Only lines matching all arguments are output. The following is a list of things to find and how to find them.
Lines containing "sh" at the end of a word:
% psgrep '/sh\b/'
Processes whose command names end in "sh":
% psgrep 'command =~ /sh$/'
Processes running with a user ID below 10:
% psgrep 'uid < 10'
Login shells with active ttys:
% psgrep 'command =~ /^-/' 'tty ne "?"'
Processes running on pseudo-ttys:
% psgrep 'tty =~ /^[p-t]/'
Non-superuser processes running detached:
% psgrep 'uid && tty eq "?"'
Huge processes that aren't owned by the superuser:
% psgrep 'size > 10 * 2**10' 'uid != 0'
The last call to psgrep produced the following output when run on our system. As one might expect, only netscape and its spawn qualified.

            FLAGS   UID   PID  PPID PRI  NI   SIZE   RSS WCHAN     STA TTY TIME COMMAND

            0   101  9751     1   0   0  14932  9652 do_select S   p1  0:25 netscape

            100000   101  9752  9751   0   0  10636   812 do_select S   p1  0:00 (dns helper)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 2: Numbers
Inhaltsvorschau
Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.
—John von Neumann (1951)
Numbers, the most basic data type of almost any programming language, can be surprisingly tricky. Random numbers, numbers with decimal points, series of numbers, and conversion between strings and numbers all pose trouble.
Perl works hard to make life easy for you, and the facilities it provides for manipulating numbers are no exception to that rule. If you treat a scalar value as a number, Perl converts it to one. This means that when you read ages from a file, extract digits from a string, or acquire numbers from any of the other myriad textual sources that Real Life pushes your way, you don't need to jump through the hoops created by other languages' cumbersome requirements to turn an ASCII string into a number.
Perl tries its best to interpret a string as a number when you use it as one (such as in a mathematical expression), but it has no direct way of reporting that a string doesn't represent a valid number. Perl quietly converts non-numeric strings to zero, and it will stop converting the string once it reaches a non-numeric character—so "A7" is still 0, and "7A" is just 7. (Note, however, that the -w flag will warn of such improper conversions.) Sometimes, such as when validating input, you need to know whether a string represents a valid number. We show you how in Recipe 2.1.
Recipe 2.15 shows how to get a number from strings containing hexadecimal, octal, or binary representations of numbers such as "0xff", "0377", and "0b10110". Perl automatically converts numeric literals of these non-decimal bases that occur in your program code (so $a = 3 + 0xff will set $a to 258) but not data read by that program (you can't read "ff" or even "0xff" into $b and then say $a = 3
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Numbers, the most basic data type of almost any programming language, can be surprisingly tricky. Random numbers, numbers with decimal points, series of numbers, and conversion between strings and numbers all pose trouble.
Perl works hard to make life easy for you, and the facilities it provides for manipulating numbers are no exception to that rule. If you treat a scalar value as a number, Perl converts it to one. This means that when you read ages from a file, extract digits from a string, or acquire numbers from any of the other myriad textual sources that Real Life pushes your way, you don't need to jump through the hoops created by other languages' cumbersome requirements to turn an ASCII string into a number.
Perl tries its best to interpret a string as a number when you use it as one (such as in a mathematical expression), but it has no direct way of reporting that a string doesn't represent a valid number. Perl quietly converts non-numeric strings to zero, and it will stop converting the string once it reaches a non-numeric character—so "A7" is still 0, and "7A" is just 7. (Note, however, that the -w flag will warn of such improper conversions.) Sometimes, such as when validating input, you need to know whether a string represents a valid number. We show you how in Recipe 2.1.
Recipe 2.15 shows how to get a number from strings containing hexadecimal, octal, or binary representations of numbers such as "0xff", "0377", and "0b10110". Perl automatically converts numeric literals of these non-decimal bases that occur in your program code (so $a = 3 + 0xff will set $a to 258) but not data read by that program (you can't read "ff" or even "0xff" into $b and then say $a = 3 + $b to make $a become 258).
As if integers weren't giving us enough grief, floating-point numbers can cause even more headaches. Internally, a computer represents numbers with decimal points as floating-point numbers in binary format. Floating-point numbers are not the same as real numbers; they are an approximation of real numbers, with limited precision. Although infinitely many real numbers exist, you only have finite space to represent them, usually about 64 bits or so. You have to cut corners to fit them all in.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Checking Whether a String Is a Valid Number
Inhaltsvorschau
You want to check whether a string represents a valid number. This is a common problem when validating input, as in CGI scripts, configuration files, and command-line arguments.
Compare it against a regular expression that matches the kinds of numbers you're interested in:
if ($string =~ /PATTERN/) {

    # is a number

} else {

    # is not

}
Or use the patterns provided by the CPAN module Regexp::Common:
if ($string =~ m{^$RE{num}{real}$}) {

    # is a real number

} else {

    # is not

}
This problem gets to the heart of what we mean by a number. Even things that sound simple, like integer, make you think hard about what you will accept; for example, "Is a leading + for positive numbers optional, mandatory, or forbidden?" The many ways that floating-point numbers can be represented could overheat your brain.
Decide what you will and will not accept. Then, construct a regular expression to match those things alone. Here are some precooked solutions (the Cookbook's equivalent of just-add-water meals) for most common cases:
warn "has nondigits"        if     /\D/;

warn "not a natural number" unless /^\d+$/;             # rejects -3

warn "not an integer"       unless /^-?\d+$/;           # rejects +3

warn "not an integer"       unless /^[+-]?\d+$/;

warn "not a decimal number" unless /^-?\d+\.?\d*$/;     # rejects .2

warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/;

warn "not a C float"

       unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
These lines do not catch the IEEE notations of "Infinity" and "NaN", but unless you're worried that IEEE committee members will stop by your workplace and beat you over the head with copies of the relevant standards documents, you can probably forget about these strange forms.
If your number has leading or trailing whitespace, those patterns won't work. Either add the appropriate logic directly, or call the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Rounding Floating-Point Numbers
Inhaltsvorschau
You want to round a floating-point value to a certain number of decimal places. This problem arises from the same inaccuracies in representation that make testing for equality difficult (see Recipe 2.3), as well as in situations where you must reduce the precision of your answers for readability.
Use the Perl function sprintf, or printf if you're just trying to produce output:
# round off to two places

$rounded = sprintf("%.2f", $unrounded);
Or you can use other rounding functions described in the Discussion.
Whether visible or not, rounding of some sort is virtually unavoidable when working with floating-point numbers. Carefully defined standards (namely, IEEE 754, the standard for binary floating-point arithmetic) coupled with reasonable defaults within Perl often manage to eliminate or at least hide these round-off errors.
In fact, Perl's implicit rounding on output is usually good enough so that it rarely surprises. It's almost always best to leave the numbers unrounded until output, and then, if you don't like Perl's default rounding, use printf or sprintf yourself with a format that makes the rounding explicit. The %f, %e, and %g formats all let you specify how many decimal places to round their argument to. Here's an example showing how all three behave; in each case, we're asking for a field that's 12 spaces wide, but with a precision of no more than four digits to the right of the decimal place.
for $n ( 0.0000001, 10.1, 10.00001, 100000.1 ) {

    printf "%12.4e %12.4f %12.4g\n", $n, $n, $n;

}
This produces the following output:
1.0000e-07       0.0000        1e-07

1.0100e+01      10.1000         10.1

1.0000e+01      10.0000           10

1.0000e+05  100000.1000        1e+05
If that were all there were to the matter, rounding would be pretty easy. You'd just pick your favorite output format and be done with it.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Comparing Floating-Point Numbers
Inhaltsvorschau
Floating-point arithmetic isn't exact. You want to compare two floating-point numbers and know whether they're equal when carried out to a certain number of decimal places. Most of the time, this is the way you should compare floating-point numbers for equality.
Use sprintf to format the numbers to a certain number of decimal places, then compare the resulting strings:
# equal(NUM1, NUM2, PRECISION) : returns true if NUM1 and NUM2 are

# equal to PRECISION number of decimal places

sub equal {

    my ($A, $B, $dp) = @_;

    return sprintf("%.${dp}g", $A) eq sprintf("%.${dp}g", $B);

  }
Alternatively, store the numbers as integers by assuming the decimal place.
You need the equal routine because computers' floating-point representations are just approximations of most real numbers, as we discussed in the Introduction to this chapter. Perl's normal printing routines display numbers rounded to 15 decimal places or so, but its numeric tests don't round. So sometimes you can print out numbers that look the same (after rounding) but do not test the same (without rounding).
This problem is especially noticeable in a loop, where round-off error can silently accumulate. For example, you'd think that you could start a variable out at zero, add one-tenth to it ten times, and end up with one. Well, you can't, because a base-2 computer can't exactly represent one-tenth. For example:
for ($num = $i = 0; $i < 10; $i++) { $num += 0.1 }

if ($num != 1) {

    printf "Strange, $num is not 1; it's %.45f\n", $num;

}
prints out:

               Strange, 1 is not 1; it's 0.999999999999999888977697537484345957636833191

            
The $num is interpolated into the double-quoted string using a default conversion format of "%.15g" (on most systems), so it looks like 1. But internally, it really isn't. If you had checked only to a few decimal places, for example, five:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Operating on a Series of Integers
Inhaltsvorschau
You want to perform an operation on all integers between X and Y, such as when you're working on a contiguous section of an array or wherever you want to process all numbers within a range.
Use a for loop, or .. in conjunction with a foreach loop:
foreach ($X .. $Y) {

    # $_ is set to every integer from X to Y, inclusive

}



foreach $i ($X .. $Y) {

    # $i is set to every integer from X to Y, inclusive

}



for ($i = $X; $i <= $Y; $i++) {

    # $i is set to every integer from X to Y, inclusive

}



for ($i = $X; $i <= $Y; $i += 7) {

    # $i is set to every integer from X to Y, stepsize = 7

}
The first two approaches use a foreach loop in conjunction with the $X .. $Y construct, which creates a list of integers between $X and $Y. Now, if you were just assigning that range to an array, this would use up a lot of memory whenever $X and $Y were far apart. But in a foreach loop, Perl notices this and doesn't waste time or memory allocating a temporary list. When iterating over consecutive integers, the foreach loop will run faster than the equivalent for loop.
Another difference between the two constructs is that the foreach loop implicitly localizes the loop variable to the body of the loop, but the for loop does not. That means that after the for loop finishes, the loop variable will contain the value it held upon the final iteration. But in the case of the foreach loop, that value will be inaccessible, and the variable will hold whatever it held—if anything—prior to entering the loop. You can, however, use a lexically scoped variable as the loop variable:
foreach my $i ($X .. $Y)       { ... }

for (my $i=$X; $i <= $Y; $i++) { ... }
The following code shows each technique. Here we just print the numbers we generate:
print "Infancy is: ";

foreach (0 .. 2) {

    print "$_ ";

}

print "\n";



print "Toddling is: ";

foreach $i (3 .. 4) {

    print "$i ";

}

print "\n";



print "Childhood is: ";

for ($i = 5; $i <= 12; $i++) {

    print "$i ";

}

print "\n";



Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Working with Roman Numerals
Inhaltsvorschau
You want to convert between regular numbers and Roman numerals. You need to do this with items in outlines, page numbers on a preface, and copyrights for movie credits.
Use the Roman module from CPAN:
use Roman;

$roman = roman($arabic);                        # convert to roman numerals

$arabic = arabic($roman) if isroman($roman);    # convert from roman numerals
The Roman module provides both Roman and roman for converting Arabic ("normal") numbers to their Roman equivalents. Roman produces uppercase letters, whereas roman gives lowercase ones.
The module only deals with Roman numbers from 1 to 3999, inclusive. The Romans didn't represent negative numbers or zero, and 5000 (which 4000 is represented in terms of) uses a symbol outside the ASCII character set.
use Roman;

$roman_fifteen = roman(15);                         # "xv"

print "Roman for fifteen is $roman_fifteen\n";

$arabic_fifteen = arabic($roman_fifteen);

print "Converted back, $roman_fifteen is $arabic_fifteen\n";



Roman for fifteen is xv

               Converted back, xv is 15

            
Or to print the current year:
use Time::localtime;

use Roman;

printf "The year is now %s\n", Roman(1900 + localtime->year);



The year is now MMIII

            
Now, if you happen to have Unicode fonts available, you'll find that code points U+2160 through U+2183 represent Roman numerals, including those beyond the typical ASCII values.
use charnames ":full";

print "2003 is", "\N{ROMAN NUMERAL ONE THOUSAND}" x 2, "\N{ROMAN NUMERAL THREE}\n";
2003 is
However, the Roman module doesn't yet have an option to use those characters.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Generating Random Numbers
Inhaltsvorschau
You want to make random numbers in a given range, inclusive, such as when you randomly pick an array index, simulate rolling a die in a game of chance, or generate a random password.
Use Perl's rand function:
$random = int( rand( $Y-$X+1 ) ) + $X;
This code generates and prints a random integer between 25 and 75, inclusive:
$random = int( rand(51)) + 25;

print "$random\n";
The rand function returns a fractional number, from (and including) 0 up to (but not including) its argument. We give it an argument of 51 to get a number that can be 0 or more, but never 51 or more. We take the integer portion of this to get a number from 0 to 50, inclusive (50.99999.... will be turned into 50 by int). We then add 25 to it to get a number from 25 to 75, inclusive.
A common application of this is the random selection of an element from an array:
$elt = $array[ rand @array ];
That's just like saying:
$elt = $array[ int( rand(0+@array) ) ];
Because rand is prototyped to take just one argument, it implicitly imposes scalar context on that argument, which, on a named array, is the number of elements in that array. The function then returns a floating-point number smaller than its argument and greater than or equal to zero. A floating-point number used as an array subscript implicitly undergoes integer truncation (rounding toward zero), producing in the end an evenly distributed, randomly selected array element to assign to $elt.
Generating a random password from a sequence of characters is similarly easy:
@chars = ( "A" .. "Z", "a" .. "z", 0 .. 9, qw(! @ $ % ^ & *) );

$password = join("", @chars[ map { rand @chars } ( 1 .. 8 ) ]);
We use map to generate eight random indices into @chars, extract the corresponding characters with a slice, and join them together to form the random password. This isn't a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Generating Repeatable Random Number Sequences
Inhaltsvorschau
Every time you run your program, you get a different sequence of (pseudo-)random numbers. But you want a reproducible sequence, useful when running a simulation, so you need Perl to produce the same set of random numbers each time.
Use Perl's srand function:
srand EXPR;   # use a constant here for repeated sequences
Making random numbers is hard. The best that computers can do, without special hardware, is generate "pseudo-random" numbers, which are evenly distributed in their range of values. These are generated using a mathematical formula, which means that given the same seed (starting point), two programs will produce identical pseudo-random numbers.
The srand function creates a new seed for the pseudo-random number generator. If given an argument, it uses that number as the seed. If no argument is given, srand uses a value that's reasonably difficult to guess as the seed.
If you call rand without first calling srand yourself, Perl calls srand for you, choosing a "good" seed. This way, every time you run your program you'll get a different set of random numbers. Ancient versions of Perl did not call srand, so the same program always produced the same sequence of pseudo-random numbers every time the program was run. Certain sorts of programs don't want a different set of random numbers each time; they want the same set. When you need that behavior, call srand yourself, supplying it with a particular seed:
srand( 42 );  # pick any fixed starting point
Don't call srand more than once in a program, because if you do, you'll start the sequence again from that point. Unless, of course, that's what you want.
Just because Perl tries to use a good default seed does not necessarily guarantee that the numbers generated are cryptographically secure against the most intrepid crackers. Textbooks on cryptography are usually good sources of cryptographically secure random number generators.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making Numbers Even More Random
Inhaltsvorschau
You want to generate numbers that are more random than Perl's random numbers. Limitations of your C library's random number generator seeds can sometimes cause problems. The sequence of pseudo-random numbers may repeat too soon for some applications.
Use a different random number generator, such as those provided by the Math::Random and Math::TrulyRandom modules from CPAN:
use Math::TrulyRandom;

$random = truly_random_value( );



use Math::Random;

$random = random_uniform( );
The Perl build process tries to find the best C-library routine to use for generating pseudo-random numbers, looking at rand(3), random(3), and drand48(3). (This can be changed manually at build time, however.) The standard library functions are getting pretty good, but some ancient implementations of the rand function return only 16-bit random numbers or have other algorithmic weaknesses, and may therefore not be sufficiently random for your purposes.
The Math::TrulyRandom module uses inadequacies of your system's timers to generate the random numbers. This takes a while, so it isn't useful for generating a lot of random numbers.
The Math::Random module uses the randlib library to generate random numbers. It also includes a wide range of related functions for generating random numbers according to specific distributions, such as binomial, poisson, and exponential.
The srand and rand functions in perlfunc(1) and Chapter 29 of Programming Perl; Recipe 2.6 and Recipe 2.7; the documentation for the CPAN modules Math::Random and Math::TrulyRandom
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Generating Biased Random Numbers
Inhaltsvorschau
You want to pick a random value where the probabilities of the values are not equal (the distribution is not even). You might be trying to randomly select a banner to display on a web page, given a set of relative weights saying how often each banner is to be displayed. Alternatively, you might want to simulate behavior according to a normal distribution (the bell curve).
If you want a random value distributed according to a specific function—e.g., the Gaussian (Normal) distribution—consult a statistics textbook to find the appropriate function or algorithm. This subroutine generates random numbers that are normally distributed, with a standard deviation of 1 and a mean of 0:
sub gaussian_rand {

    my ($u1, $u2);  # uniformly distributed random numbers

    my $w;          # variance, then a weight

    my ($g1, $g2);  # gaussian-distributed numbers



    do {

        $u1 = 2 * rand( ) - 1;

        $u2 = 2 * rand( ) - 1;

        $w = $u1*$u1 + $u2*$u2;

    } while ($w >= 1 || $w =  = 0);



    $w = sqrt( (-2 * log($w))  / $w );

    $g2 = $u1 * $w;

    $g1 = $u2 * $w;

    # return both if wanted, else just one

    return wantarray ? ($g1, $g2) : $g1;

}
If you have a list of weights and values you want to randomly pick from, follow this two-step process: first, turn the weights into a probability distribution with weight_to_dist, and then use the distribution to randomly pick a value with weighted_rand:
# weight_to_dist: takes a hash mapping key to weight and returns

# a hash mapping key to probability

sub weight_to_dist {

    my %weights = @_;

    my %dist    = ( );

    my $total   = 0;

    my ($key, $weight);

    local $_;



    foreach (values %weights) {

        $total += $_;

    }



    while ( ($key, $weight) = each %weights ) {

        $dist{$key} = $weight/$total;

    }



    return %dist;

}



# weighted_rand: takes a hash mapping key to probability, and

# returns the corresponding element

sub weighted_rand {

    my %dist = @_;

    my ($key, $weight);



    while (1) {                     # to avoid floating point inaccuracies

        my $rand = rand;

        while ( ($key, $weight) = each %dist ) {

            return $key if ($rand -= $weight) < 0;

        }

    }

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Doing Trigonometry in Degrees, Not Radians
Inhaltsvorschau
You want your trigonometry routines to operate in degrees instead of Perl's native radians.
Convert between radians and degrees (2π radians equals 360 degrees):
use constant PI => (4 * atan2 (1, 1));



sub deg2rad {

    my $degrees = shift;

    return ($degrees / 180) * PI;

}



sub rad2deg {

    my $radians = shift;

    return ($radians / PI) * 180;

}
Alternatively, use the standard Math::Trig module:
use Math::Trig;



$radians = deg2rad($degrees);

$degrees = rad2deg($radians);
If you're doing a lot of trigonometry, look into using either the standard Math::Trig or POSIX modules. They provide many more trigonometric functions than are defined in the Perl core. Otherwise, the first solution will define the rad2deg and deg2rad functions. The value of π isn't built directly into Perl, but you can calculate it to as much precision as your floating-point hardware provides. In the Solution, the PI function is a constant created with use constant. Instead of having to remember that π is 3.14159265358979 or so, we use the built-in function call, resolved at compile time, which, besides sparing us from memorizing a long string of digits, is also guaranteed to provide as much accuracy as the platform supports.
If you're looking for the sine in degrees, use this:
# deg2rad and rad2deg defined either as above or from Math::Trig

sub degree_sine {

    my $degrees = shift;

    my $radians = deg2rad($degrees);

    my $result = sin($radians);



    return $result;

}
The sin, cos, and atan2 functions in perlfunc(1) and Chapter 29 of Programming Perl; the documentation for the standard POSIX and Math::Trig modules (also in Chapter 32 of Programming Perl)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Calculating More Trigonometric Functions
Inhaltsvorschau
You want to calculate values for trigonometric functions like sine, tangent, or arc-cosine.
Perl provides only sin, cos, and atan2 as standard functions. From these, you can derive tan and all other trig functions (if you're intimately familiar with esoteric trig identities):
sub tan {

    my $theta = shift;



    return sin($theta)/cos($theta);

}
The POSIX module provides a wider range of trig functions:
use POSIX;



$y = acos(3.7);
The standard Math::Trig module provides a complete set of functions and supports operations on or resulting in complex numbers:
use Math::Trig;



$y = acos(3.7);
The tan function will cause a division-by-zero exception when $theta is π/2, 3π/2, and so on, because the cosine is 0 for these values. Similarly, tan and many other functions from Math::Trig may generate the same error. To trap these, use eval:
eval {

    $y = tan($pi/2);

} or return undef;
The sin, cos, and atan2 functions in perlfunc(1) and Chapter 29 of Programming Perl; the documentation for the standard Math::Trig module; we talk about trigonometry in the context of imaginary numbers in Recipe 2.14; we talk about the use of eval to catch exceptions in Recipe 10.12
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Taking Logarithms
Inhaltsvorschau
You want to take a logarithm in various bases.
For logarithms to base e, use the built-in log :
$log_e = log(VALUE);
For logarithms to base 10, use the POSIX module's log10 function:
use POSIX qw(log10);

$log_10 = log10(VALUE);
For other bases, use the mathematical identity:
where x is the number whose logarithm you want, n is the desired base, and e is the natural logarithm base.
sub log_base {

    my ($base, $value) = @_;

    return log($value)/log($base);

}
The log_base function lets you take logarithms to any base. If you know the base you'll want in advance, it's more efficient to cache the log of the base instead of recalculating it every time.
# log_base as defined earlier

$answer = log_base(10, 10_000);

print "log10(10,000) = $answer\n";

log10(10,000) = 4

            
The Math::Complex module does the caching for you via its logn( ) routine, so you can write:
use Math::Complex;

printf "log2(1024) = %lf\n", logn(1024, 2); # watch out for argument order!

log2(1024) = 10.000000

            
even though no complex number is involved here. This is not very efficient, but there are plans to rewrite Math::Complex in C for speed.
The log function in perlfunc(1) and Chapter 29 of Programming Perl; the documentation for the standard POSIX and Math::Complex modules (also in Chapter 32 of Programming Perl)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Multiplying Matrices
Inhaltsvorschau
You want to multiply a pair of two-dimensional arrays. Mathematicians and engineers often need this.
Use the PDL modules, available from CPAN. PDL is the Perl Data Language—modules that give fast access to compact matrix and mathematical functions:
use PDL;

# $a and $b are both pdl objects

$c = $a x $b;
Alternatively, apply the matrix multiplication algorithm to your two-dimensional array:
sub mmult {

    my ($m1,$m2) = @_;

    my ($m1rows,$m1cols) = matdim($m1);

    my ($m2rows,$m2cols) = matdim($m2);



    unless ($m1cols =  = $m2rows) {  # raise exception

        die "IndexError: matrices don't match: $m1cols != $m2rows";

    }



    my $result = [  ];

    my ($i, $j, $k);



    for $i (range($m1rows)) {

        for $j (range($m2cols)) {

            for $k (range($m1cols)) {

                $result->[$i][$j] += $m1->[$i][$k] * $m2->[$k][$j];

            }

        }

    }

    return $result;

}



sub range { 0 .. ($_[0] - 1) }



sub veclen {

    my $ary_ref = $_[0];

    my $type = ref $ary_ref;

    if ($type ne "ARRAY") { die "$type is bad array ref for $ary_ref" }

    return scalar(@$ary_ref);

}



sub matdim {

    my $matrix = $_[0];

    my $rows = veclen($matrix);

    my $cols = veclen($matrix->[0]);

    return ($rows, $cols);

}
If you have the PDL library installed, you can use its lightning-fast manipulation of numbers. This requires far less memory and CPU than Perl's array manipulation. When using PDL objects, many numeric operators (such as + and *) are overloaded and work on an element-by-element basis (e.g., * is the so-called scalar multiplication operator). To get true matrix multiplication, use the overloaded x operator.
use PDL;



$a = pdl [

    [ 3, 2, 3 ],

    [ 5, 9, 8 ],

];



$b = pdl [

    [ 4, 7 ],

    [ 9, 3 ],

    [ 8, 1 ],

];



$c = $a x $b;  # x overload
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Complex Numbers
Inhaltsvorschau
Your application must manipulate complex numbers, as are often needed in engineering, science, and mathematics.
Either keep track of the real and imaginary components yourself:
# $c = $a * $b manually

$c_real = ( $a_real * $b_real ) - ( $a_imaginary * $b_imaginary );

$c_imaginary = ( $a_real * $b_imaginary ) + ( $b_real * $a_imaginary );
or use the Math::Complex module (part of the standard Perl distribution):
# $c = $a * $b using Math::Complex

use Math::Complex;

$c = $a * $b;
Here's how you'd manually multiply 3+5i and 2-2i:
$a_real = 3; $a_imaginary = 5;              # 3 + 5i;

$b_real = 2; $b_imaginary = -2;             # 2 - 2i;

$c_real = ( $a_real * $b_real ) - ( $a_imaginary * $b_imaginary );

$c_imaginary = ( $a_real * $b_imaginary ) + ( $b_real * $a_imaginary );

print "c = ${c_real}+${c_imaginary}i\n";



c = 16+4i

            
and with Math::Complex:
use Math::Complex;

$a = Math::Complex->new(3,5);               # or Math::Complex->new(3,5);

$b = Math::Complex->new(2,-2);

$c = $a * $b;

print "c = $c\n";



c = 16+4i

            
You may create complex numbers via the cplx constructor or via the exported constant i:
use Math::Complex;

$c = cplx(3,5) * cplx(2,-2);                # easier on the eye

$d = 3 + 4*i;                               # 3 + 4i

printf "sqrt($d) = %s\n", sqrt($d);



sqrt(3+4i) = 2+i

            
The Math::Trig module uses the Math::Complex module internally because some functions can break out from the real axis into the complex plane—for example, the inverse sine of 2.
The documentation for the standard Math::Complex module (also in Chapter 32 of Programming Perl)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Converting Binary, Octal, and Hexadecimal Numbers
Inhaltsvorschau
You want to convert a string (e.g., "0b10110", "0x55", or "0755") containing a binary, octal, or hexadecimal number to the correct number.
Perl understands numbers specified in binary (base-2), octal (base-8), and hexadecimal (base-16) notation only when they occur as literals in your programs. If they come in as data—such as by reading from files or environment variables, or when supplied as command-line arguments—no automatic conversion takes place.
Use Perl's hex function if you have a hexadecimal string like "2e" or "0x2e":
$number = hex($hexadecimal);         # hexadecimal only ("2e" becomes 47)
Use the oct function if you have a hexadecimal string like "0x2e", an octal string like "047", or a binary string like "0b101110":
$number = oct($hexadecimal);         # "0x2e" becomes 47

$number = oct($octal);               # "057"  becomes 47

$number = oct($binary);              # "0b101110" becomes 47
The oct function converts octal numbers with or without the leading "0"; for example, "0350" or "350". Despite its name, oct does more than convert octal numbers: it also converts hexadecimal ("0x350") numbers if they have a leading "0x" and binary ("0b101010") numbers if they have a leading "0b". The hex function converts only hexadecimal numbers, with or without a leading "0x": "0x255", "3A", "ff", or "deadbeef". (Letters may be in upper- or lowercase.)
Here's an example that accepts an integer in decimal, binary, octal, or hex, and prints that integer in all four bases. It uses the oct function to convert the data from binary, octal, and hexadecimal if the input begins with a 0. It then uses printf to convert into all four bases as needed.
print "Gimme an integer in decimal, binary, octal, or hex: ";

$num = <STDIN>;

chomp $num;

exit unless defined $num;

$num = oct($num) if $num =~ /^0/; # catches 077 0b10 0x20

printf "%d %#x %#o %#b\n", ($num) x 4;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Putting Commas in Numbers
Inhaltsvorschau
You want to output a number with commas in the right places. People like to see long numbers broken up in this way, especially in reports.
Reverse the string so you can use backtracking to avoid substitution in the fractional part of the number. Then use a regular expression to find where you need commas, and substitute them in. Finally, reverse the string back.
sub commify {

    my $text = reverse $_[0];

    $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;

    return scalar reverse $text;

}
It's a lot easier in regular expressions to work from the front than from the back. With this in mind, we reverse the string and make a minor change to the algorithm that repeatedly inserts commas three digits from the end. When all insertions are done, we reverse the final string and return it. Because reverse is sensitive to its implicit return context, we force it to scalar context.
This function can easily be adjusted to accommodate the use of periods instead of commas, as are used in many countries.
Here's an example of commify in action:
# more reasonable web counter :-)

use Math::TrulyRandom;

$hits = truly_random_value( );       # negative hits!

$output = "Your web page received $hits accesses last month.\n";

print commify($output);

Your web page received -1,740,525,205 accesses last month

            
perllocale(1); the reverse function in perlfunc(1) and Chapter 29 of Programming Perl; the section "Adding Commas to a Number with Lookaround" in Chapter 2 of Mastering Regular Expressions, Second Edition
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Printing Correct Plurals
Inhaltsvorschau
You're printing something like "It took $time hours", but "It took 1 hours" is ungrammatical. You would like to get it right.
Use printf and the conditional operator (X ? Y : Z) to alter the noun or verb:
printf "It took %d hour%s\n", $time, $time =  = 1 ? "" : "s";



printf "%d hour%s %s enough.\n", $time, 

        $time =  = 1 ? ""   : "s",

        $time =  = 1 ? "is" : "are";
Or use the Lingua::EN::Inflect module from CPAN, as described in the following Discussion.
The only reason inane messages like "1 file(s) updated" appear is because their authors are too lazy to bother checking whether the count is 1 or not.
If your noun changes by more than an "-s", you'll need to change the printf accordingly:
printf "It took %d centur%s", $time, $time =  = 1 ? "y" : "ies";
This is good for simple cases, but you'll tire of writing it. This leads you to write funny functions like this:
sub noun_plural {

    local $_ = shift;

    # order really matters here!

    s/ss$/sses/                             ||

    s/([psc]h)$/${1}es/                     ||

    s/z$/zes/                               ||

    s/ff$/ffs/                              ||

    s/f$/ves/                               ||

    s/ey$/eys/                              ||

    s/y$/ies/                               ||

    s/ix$/ices/                             ||

    s/([sx])$/$1es/                         ||

    s/$/s/                                  ||

                die "can't get here";

    return $_;

}

*verb_singular = \&noun_plural;   # make function alias
As you find more exceptions, your function will become increasingly convoluted. When you need to handle such morphological changes, turn to the flexible solution provided by the Lingua::EN::Inflect module from CPAN.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: Calculating Prime Factors
Inhaltsvorschau
The following program takes one or more integer arguments and determines the prime factors. It uses Perl's native numeric representation, unless those numbers use floating-point representation and thus lose accuracy. Otherwise (or if the program's -b switch is used), it uses the standard Math::BigInt library, thus allowing for huge numbers. However, it only loads this library if necessary. That's why we use require and import instead of use, which would unconditionally load the library at compile time instead of conditionally at runtime. This is not an efficient way to crack the huge integers used for cryptographic purposes.
Call the program with a list of numbers, and it will show you the prime factors of those numbers:
% bigfact 8 9 96 2178

8          2**3

            9          3**2

            96         2**5 3

            2178       2 3**2 11**2

         
You can give it very large numbers:
% bigfact 239322000000000000000000

+239322000000000000000000 2**19 3 5**18 +39887



% bigfact 25000000000000000000000000

+25000000000000000000000000 2**24 5**26

         
The program is shown in Example 2-2.
Example 2-2. bigfact
  #!/usr/bin/perl

  # bigfact - calculate prime factors

  use strict;

  use integer;

  

  our ($opt_b, $opt_d);

  use Getopt::Std;

  

  @ARGV && getopts('bd')        or die "usage: $0 [-b] number ...";

  

  load_biglib( ) if $opt_b;

  

  ARG: foreach my $orig ( @ARGV ) {

      my ($n, %factors, $factor);

      $n = $opt_b ? Math::BigInt->new($orig) : $orig;

      if ($n + 0 ne $n) { # don't use -w for this

          printf STDERR "bigfact: %s would become %s\n", $n, $n+0 if $opt_d;

          load_biglib( );

          $n = Math::BigInt->new($orig);

      }

      printf "%-10s ", $n;

  

      # Here $sqi will be the square of $i. We will take advantage

      # of the fact that ($i + 1) ** 2 =  = $i ** 2 + 2 * $i + 1.

      for (my ($i, $sqi) = (2, 4); $sqi <= $n; $sqi += 2 * $i ++ + 1) {

          while ($n % $i =  = 0) {

              $n /= $i;

              print STDERR "<$i>" if $opt_d;

              $factors {$i} ++;

          }

      }

  

      if ($n != 1 && $n != $orig) { $factors{$n}++ }

      if (! %factors) {

          print "PRIME\n";

          next ARG;

      }

      for $factor ( sort { $a <=> $b } keys %factors ) {

          print "$factor";

          if ($factors{$factor} > 1) {

              print "**$factors{$factor}";

          }

          print " ";

      }

      print "\n";

  }

  

  # this simulates a use, but at runtime

  sub load_biglib {

      require Math::BigInt;

      Math::BigInt->import( );          #immaterial?

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 3: Dates and Times
Inhaltsvorschau
It is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.
—IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2
Times and dates are important things to be able to manipulate. "How many users logged in last month?", "How many seconds should I sleep if I want to wake up at midday?", and "Has this user's password expired yet?" are common questions whose answers involve surprisingly non-obvious manipulations.
Perl represents points in time as intervals, measuring seconds past a point in time called the Epoch. On Unix and many other systems, the Epoch was 00:00 Jan 1, 1970, UTC (Universal Corrected Time).
When we talk about dates and times, we often interchange two different concepts: points in time (dates and times) and intervals between points in time (weeks, months, days, etc.). Epoch seconds represent intervals and points in the same units, so you can do basic arithmetic on them.
However, people are not used to working with Epoch seconds. We are more used to dealing with individual year, month, day, hour, minute, and second values. Furthermore, the month can be represented by its full name or its abbreviation. The day can precede or follow the month. Because of the difficulty of performing calculations with a variety of formats, we typically convert human-supplied strings or lists to Epoch seconds, calculate, and then convert back to strings or lists for output.
Epoch seconds are an absolute number of seconds, so they don't take into account time zones or daylight saving times. When converting to or from distinct values, always consider whether the time represented is UTC or local. Use different conversion functions depending on whether you need to convert from UTC to local time or vice versa.
Perl's
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Times and dates are important things to be able to manipulate. "How many users logged in last month?", "How many seconds should I sleep if I want to wake up at midday?", and "Has this user's password expired yet?" are common questions whose answers involve surprisingly non-obvious manipulations.
Perl represents points in time as intervals, measuring seconds past a point in time called the Epoch. On Unix and many other systems, the Epoch was 00:00 Jan 1, 1970, UTC (Universal Corrected Time).
When we talk about dates and times, we often interchange two different concepts: points in time (dates and times) and intervals between points in time (weeks, months, days, etc.). Epoch seconds represent intervals and points in the same units, so you can do basic arithmetic on them.
However, people are not used to working with Epoch seconds. We are more used to dealing with individual year, month, day, hour, minute, and second values. Furthermore, the month can be represented by its full name or its abbreviation. The day can precede or follow the month. Because of the difficulty of performing calculations with a variety of formats, we typically convert human-supplied strings or lists to Epoch seconds, calculate, and then convert back to strings or lists for output.
Epoch seconds are an absolute number of seconds, so they don't take into account time zones or daylight saving times. When converting to or from distinct values, always consider whether the time represented is UTC or local. Use different conversion functions depending on whether you need to convert from UTC to local time or vice versa.
Perl's time function returns the number of seconds that have passed since the Epoch—more or less. POSIX requires that time not include leap seconds, a peculiar practice of adjusting the world's clock by a second here and there to account for the slowing down of the Earth's rotation due to tidal angular-momentum dissipation. (See the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding Today's Date
Inhaltsvorschau
You need to find the year, month, and day values for today's date.
Use localtime, which returns values for the current date and time if given no arguments. You can either use localtime and extract the information you want from the list it returns:
($DAY, $MONTH, $YEAR) = (localtime)[3,4,5];
or use Time::localtime, which overrides localtime to return a Time::tm object:
use Time::localtime;

$tm = localtime;

($DAY, $MONTH, $YEAR) = ($tm->mday, $tm->mon, $tm->year);
Here's how you'd print the current date as "YYYY MM DD", using the non-overridden localtime:
($day, $month, $year) = (localtime)[3,4,5];

printf("The current date is %04d %02d %02d\n", $year+1900, $month+1, $day);

The current date is 2003 03 06

            
To extract the fields we want from the list returned by localtime, we take a list slice. We could also have written it as:
($day, $month, $year) = (localtime)[3..5];
This is how we'd print the current date as "YYYY-MM-DD" (in approved ISO 8601 fashion), using Time::localtime:
use Time::localtime;

$tm = localtime;

printf("The current date is %04d-%02d-%02d\n", $tm->year+1900, 

    ($tm->mon)+1, $tm->mday);

The current date is 2003-03-06

            
The object interface might look out of place in a short program. However, when you do a lot of work with the distinct values, accessing them by name makes code much easier to understand.
A more obfuscated way that does not involve temporary variables is:
printf("The current date is %04d-%02d-%02d\n",

       sub {($_[5]+1900, $_[4]+1, $_[3])}->(localtime));
There is also strftime from the POSIX module discussed in Recipe 3.8:
use POSIX qw(strftime);

print strftime "%Y-%m-%d\n", localtime;
The gmtime function works just as
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Converting DMYHMS to Epoch Seconds
Inhaltsvorschau
You want to convert a date, a time, or both with distinct values for day, month, year, etc. to Epoch seconds.
Use the timelocal or timegm functions in the standard Time::Local module, depending on whether the date and time is in the current time zone or in UTC.
use Time::Local;

$TIME = timelocal($sec, $min, $hours, $mday, $mon, $year);

$TIME = timegm($sec, $min, $hours, $mday, $mon, $year);
The built-in function localtime converts an Epoch seconds value to distinct DMYHMS values; the timelocal subroutine from the standard Time::Local module converts distinct DMYHMS values to an Epoch seconds value. Here's an example that shows how to find Epoch seconds for a time in the current day. It gets the day, month, and year values from localtime:
# $hours, $minutes, and $seconds represent a time today,

# in the current time zone

use Time::Local;

$time = timelocal($seconds, $minutes, $hours, (localtime)[3,4,5]);
If you're passing month and year values to timelocal, it expects values with the same range as those which localtime returns. Namely, months start at 0, and years have 1900 subtracted from them.
The timelocal function assumes the DMYHMS values represent a time in the current time zone. Time::Local also exports a timegm subroutine that assumes the DMYHMS values represent a time in the UTC time zone. Unfortunately, there is no convenient way to convert from a time zone other than the current local time zone or UTC. The best you can do is convert to UTC and add or subtract the time zone offset in seconds.
This code illustrates both the use of timegm and how to adjust the ranges of months and years:
# $day is day in month (1-31)

# $month is month in year (1-12)

# $year is four-digit year e.g., 1967

# $hours, $minutes and $seconds represent UTC (GMT) time 

use Time::Local;

$time = timegm($seconds, $minutes, $hours, $day, $month-1, $year-1900);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Converting Epoch Seconds to DMYHMS
Inhaltsvorschau
You have a date and time in Epoch seconds, and you want to calculate individual DMYHMS values from it.
Use the localtime or gmtime functions, depending on whether you want the date and time in UTC or your local time zone.
($seconds, $minutes, $hours, $day_of_month, $month, $year,

    $wday, $yday, $isdst) = localtime($time);
The standard Time::timelocal and Time::gmtime modules override the localtime and gmtime functions to provide named access to the individual values.
use Time::localtime;        # or Time::gmtime

$tm = localtime($TIME);     # or gmtime($TIME)

$seconds = $tm->sec;

# ...
The localtime and gmtime functions return strange year and month values; the year has 1900 subtracted from it, and 0 is the month value for January. Be sure to correct the base values for year and month, as this example does:
($seconds, $minutes, $hours, $day_of_month, $month, $year,

    $wday, $yday, $isdst) = localtime($time);

printf("Dateline: %02d:%02d:%02d-%04d/%02d/%02d\n",

    $hours, $minutes, $seconds, $year+1900, $month+1,

    $day_of_month);
We could have used the Time::localtime module to avoid the temporary variables:
use Time::localtime;

$tm = localtime($time);

printf("Dateline: %02d:%02d:%02d-%04d/%02d/%02d\n",

    $tm->hour, $tm->min, $tm->sec, $tm->year+1900,

    $tm->mon+1, $tm->mday);
The localtime function in perlfunc(1) and Chapter 29 of Programming Perl; the documentation for the standard Time::localtime and Time::gmtime modules; convert in the other direction using Recipe 3.3
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Adding to or Subtracting from a Date
Inhaltsvorschau
You have a date and time and want to find the date and time of some period in the future or past.
Simply add or subtract Epoch seconds:
$when = $now + $difference;

$then = $now - $difference;
If you have distinct DMYHMS values, use the CPAN Date::Calc module. If you're doing arithmetic with days only, use Add_Delta_Days ($offset is a positive or negative integral number of days):
use Date::Calc qw(Add_Delta_Days);

($y2, $m2, $d2) = Add_Delta_Days($y, $m, $d, $offset);
If you are concerned with hours, minutes, and seconds (in other words, times as well as dates), use Add_Delta_DHMS:
use Date::Calc qw(Add_Delta_DHMS);

($year2, $month2, $day2, $h2, $m2, $s2) = 

    Add_Delta_DHMS( $year, $month, $day, $hour, $minute, $second,

                $days_offset, $hour_offset, $minute_offset, $second_offset );
Calculating with Epoch seconds is easiest, disregarding the effort to get dates and times into and out of Epoch seconds. This code shows how to calculate an offset (55 days, 2 hours, 17 minutes, and 5 seconds, in this case) from a given base date and time:
$birthtime = 96176750;                  # 18/Jan/1973, 3:45:50 am

$interval = 5 +                         # 5 seconds

            17 * 60 +                   # 17 minutes

            2  * 60 * 60 +              # 2 hours

            55 * 60 * 60 * 24;          # and 55 days

$then = $birthtime + $interval;

print "Then is ", scalar(localtime($then)), "\n";

Then is Wed Mar 14 06:02:55 1973

            
We could have used Date::Calc's Add_Delta_DHMS function and avoided the conversion to and from Epoch seconds:
use Date::Calc qw(Add_Delta_DHMS);

($year, $month, $day, $hh, $mm, $ss) = Add_Delta_DHMS(

    1973, 1, 18, 3, 45, 50, # 18/Jan/1973, 3:45:50 am

             55, 2, 17, 5); # 55 days, 2 hrs, 17 min, 5 sec

print "To be precise: $hh:$mm:$ss, $month/$day/$year\n";

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Difference of Two Dates
Inhaltsvorschau
You need to find the number of days between two dates or times.
If your dates are in Epoch seconds and fall in the range Fri Dec 13 20:45:52 1901 to Tue Jan 19 03:14:07 2038 (inclusive), subtract one from the other and convert the seconds to days:
$seconds = $recent - $earlier;
If you have distinct DMYMHS values or are worried about the range limitations of Epoch seconds, use the Date::Calc module from CPAN. It can calculate the difference between dates:
use Date::Calc qw(Delta_Days);

$days = Delta_Days( $year1, $month1, $day1, $year2, $month2, $day2);
It also calculates the difference between a pair of dates and times:
use Date::Calc qw(Delta_DHMS);

($days, $hours, $minutes, $seconds) =

  Delta_DHMS( $year1, $month1, $day1, $hour1, $minute1, $seconds1,  # earlier

              $year2, $month2, $day2, $hour2, $minute2, $seconds2); # later
One problem with Epoch seconds is how to convert the large integers back to forms that people can read. The following example shows one way of converting an Epoch seconds value back to its component numbers of weeks, days, hours, minutes, and seconds:
$bree = 361535725;          # 16 Jun 1981, 4:35:25

$nat  =  96201950;          # 18 Jan 1973, 3:45:50



$difference = $bree - $nat;

print "There were $difference seconds between Nat and Bree\n";

There were 265333775 seconds between Nat and Bree



$seconds    =  $difference % 60;

$difference = ($difference - $seconds) / 60;

$minutes    =  $difference % 60;

$difference = ($difference - $minutes) / 60;

$hours      =  $difference % 24;

$difference = ($difference - $hours)   / 24;

$days       =  $difference % 7;

$weeks      = ($difference - $days)    /  7;



print "($weeks weeks, $days days, $hours:$minutes:$seconds)\n";

(438 weeks, 4 days, 23:49:35)

            
Date::Calc's functions can ease these calculations. The Delta_Days
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Day in a Week/Month/Year or Week Number
Inhaltsvorschau
You have a date, either in Epoch seconds or as distinct year, month, etc. values. You want to find out what week of the year, day of the week, day of the month, or day of the year that the date falls on.
If you have Epoch seconds, the day of the year, day of the month, and day of the week are returned by localtime. The week of the year is easily calculated from the day of the year (but see the following discussion, as standards differ).
($MONTHDAY, $WEEKDAY, $YEARDAY) = (localtime $DATE)[3,6,7];

$WEEKNUM = int($YEARDAY / 7) + 1;
If you have distinct DMYHMS values, you can either convert them to Epoch seconds values as in Recipe 3.2 and then use the previous solution, or else use the Day_of_Week, Week_Number, and Day_of_Year functions from the CPAN module Date::Calc:
use Date::Calc qw(Day_of_Week Week_Number Day_of_Year);

# you have $year, $month, and $day

# $day is day of month, by definition.

$wday = Day_of_Week($year, $month, $day);

$wnum = Week_Number($year, $month, $day);

$dnum = Day_of_Year($year, $month, $day);
The Day_of_Week, Week_Number, and Day_of_Year functions all expect years that haven't had 1900 subtracted from them and months where January is 1, not 0. The return value from Day_of_Week can be 1 through 7 (corresponding to Monday through Sunday) or 0 in case of an error (an invalid date, for example).
use Date::Calc qw(Day_of_Week Week_Number Day_of_Week_to_Text);



$year  = 1981;

$month = 6;         # (June)

$day   = 16;



$wday = Day_of_Week($year, $month, $day);

print "$month/$day/$year was a ", Day_of_Week_to_Text($wday), "\n";

## see comment above



$wnum = Week_Number($year, $month, $day);

print "in the $wnum week.\n";

6/16/1981 was a Tuesday

               in week number 25

            
The governing standard bodies of particular countries may have rules about when the first week of the year starts. For example, in Norway the first week must have at least 4 days in it (and weeks start on Mondays). If January 1 falls on a week with 3 or fewer days, it is counted as week 52 (or 53) of the previous year. In America, the first Monday of the year is usually the start of the first workweek. Given such rules, you may have to write your own algorithm, or at least look at the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing Dates and Times from Strings
Inhaltsvorschau
You read in a date or time specification in an arbitrary format but need to parse that string into distinct year, month, etc. values.
If your date is already numeric, or in a rigid and easily parsed format, use a regular expression (and possibly a hash mapping month names to numbers) to extract individual day, month, and year values, and then use the standard Time::Local module's timelocal and timegm functions to turn that into an Epoch seconds value.
use Time::Local;

# $date is "2003-02-13" (YYYY-MM-DD form).

($yyyy, $mm, $dd) = ($date =~ /(\d+)-(\d+)-(\d+)/);

# calculate epoch seconds at midnight on that day in this timezone

$epoch_seconds = timelocal(0, 0, 0, $dd, $mm-1, $yyyy);
For a more flexible solution, use the ParseDate function provided by the CPAN module Date::Manip, and then use UnixDate to extract the individual values.
use Date::Manip qw(ParseDate UnixDate);

$date = ParseDate($STRING);

if (!$date) {

    # bad date

} else {

    @VALUES = UnixDate($date, @FORMATS);

}
The flexible ParseDate function accepts many formats. It even converts strings such as "today", "2 weeks ago Friday", "2nd Sunday in 1996", and "last Sunday in December", plus it understands the date and time format used in mail and news headers. It returns the decoded date in its own format: a string of the form "YYYYMMDDHH:MM:SS". You could compare two such strings to compare the dates they represent, but arithmetic is difficult. We therefore use the UnixDate function to extract the year, month, and day values in a preferred format.
UnixDate takes a date (as returned by ParseDate) and a list of formats. It applies each format to the string and returns the result. A format is a string describing one or more elements of the date and time and the way that the elements are to be formatted. For example,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Printing a Date
Inhaltsvorschau
You need to print a date and time shown in Epoch seconds format in human-readable form.
Call localtime or gmtime in scalar context, which takes an Epoch seconds value and returns a string of the form Tue July 22 05:15:20 2003:
$STRING = localtime($EPOCH_SECONDS);
Alternatively, the strftime function in the standard POSIX module supports a more customizable output format and takes individual DMYHMS values:
use POSIX qw(strftime);

$STRING = strftime($FORMAT, $SECONDS, $MINUTES, $HOUR,

                   $DAY_OF_MONTH, $MONTH, $YEAR, $WEEKDAY,

                   $YEARDAY, $DST);
The CPAN module Date::Manip has a UnixDate routine that works like a specialized form sprintf designed to handle dates. Pass it a Date::Manip date value. Using Date::Manip in lieu of POSIX::strftime has the advantage of not requiring a POSIX-compliant system.
use Date::Manip qw(UnixDate);

$STRING = UnixDate($DATE, $FORMAT);
The simplest solution is built into Perl already: the localtime function. In scalar context, it returns the string formatted in a particular way:

               Wed July 16 23:58:36 2003

            
This makes for simple code, although it restricts the format of the string:
use Time::Local;

$time = timelocal(50, 45, 3, 18, 0, 73);

print "Scalar localtime gives: ", scalar(localtime($time)), "\n";

Scalar localtime gives: Thu Jan 18 03:45:50 1973

            
Of course, localtime requires the date and time in Epoch seconds. The POSIX::strftime function takes individual DMYMHS values plus a format and returns a string. The format is similar to a printf format: % directives specify fields in the output string. A full list of these directives is available in your system's documentation for
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
High-Resolution Timers
Inhaltsvorschau
You need to measure time with a finer granularity than the full seconds that time returns.
The Time::HiRes module, which is included standard starting with the v5.8 release of Perl, encapsulates this functionality for most systems:
use Time::HiRes qw(gettimeofday);

$t0 = gettimeofday( );    

## do your operation here

$t1 = gettimeofday( );

$elapsed = $t1 - $t0;

# $elapsed is a floating point value, representing number

# of seconds between $t0 and $t1
Here's some code that uses Time::HiRes to time how long the user takes to press the Return key:
use Time::HiRes qw(gettimeofday);

print "Press return when ready: ";

$before = gettimeofday( );

$line = <STDIN>;

$elapsed = gettimeofday( ) - $before;

print "You took $elapsed seconds.\n";



Press return when ready

               You took 0.228149 seconds

            
The module's gettimeofday function returns a two-element list representing seconds and microseconds when called in list context, or a single floating-point number combining the two when called in scalar context. You can also import its time function to replace the standard core version by that name; this always acts like scalar gettimeofday.
The module also provides usleep and ualarm functions, which are alternate versions of the standard Perl sleep and alarm functions that understand granularities of microseconds instead of just seconds. They take arguments in microseconds; alternatively, you can import the module's sleep and alarm functions, which take floating-point arguments in seconds, to replace the standard versions, which take integer arguments in seconds. For access to your system's low-level itimer routines (if you have them), setitimer and getitimer are also provided.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Short Sleeps
Inhaltsvorschau
You need to sleep for less than a second.
Use the select( ) function, if your system supports it:
select(undef, undef, undef, $time_to_sleep);
Some systems don't support a four-argument select. The Time::HiRes module provides a sleep function that takes a floating-point number of seconds:
use Time::HiRes qw(sleep);

sleep($time_to_sleep);
Here's an example of select. It's a simpler version of the program in Recipe 1.6. Think of it as your very own 300-baud terminal.
while (<>) {

    select(undef, undef, undef, 0.25);

    print;

}
Using Time::HiRes, we'd write it as:
use Time::HiRes qw(sleep);

while (<>) {

    sleep(0.25);

    print;

}
The documentation for the CPAN modules Time::HiRes and Benchmark; the sleep and select functions in perlfunc(1) and Chapter 29 of Programming Perl; we use the select function for short sleeps in the slowcat program in Recipe 1.6
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: hopdelta
Inhaltsvorschau
Have you ever wondered why it took so long for someone's mail to get to you? With postal mail, you can't trace how long each intervening post office let your letter gather dust in their back office. But with electronic mail, you can. The message carries in its header Received: lines showing when each intervening mail transport agent along the way got the message.
The dates in the headers are hard to read. You have to read them backwards, bottom to top. They are written in many varied formats, depending on the whim of each transport agent. Worst of all, each date is written in its own local time zone. It's hard to eyeball "Tue, 26 May 1998 23:57:38 -0400" and "Wed, 27 May 1998 05:04:03 +0100" and realize these two dates are only 6 minutes and 25 seconds apart.
The ParseDate and DateCalc functions in the Date::Manip module from CPAN can help this:
use Date::Manip qw(ParseDate DateCalc);

$d1 = ParseDate("Sun, 09 Mar 2003 23:57:38 -0400");

$d2 = ParseDate("Mon, 10 Mar 2003 05:04:03 +0100");

print DateCalc($d1, $d2);

+0:0:0:0:0:6:25

         
That's a nice format for a program to read, but it's still not what the casual reader wants to see. The hopdelta program, shown in Example 3-1, takes a mailer header and tries to analyze the deltas (difference) between each hop (mail stop). Its output is shown in the local time zone.
Example 3-1. hopdelta
  #!/usr/bin/perl

  # hopdelta - feed mail header, produce lines

  #            showing delay at each hop.

  use strict;

  use Date::Manip qw (ParseDate UnixDate);

  

  # print header; this should really use format/write due to

  # printf complexities

  printf "%-20.20s %-20.20s %-20.20s   %s\n", 

          "Sender", "Recipient", "Time", "Delta";

  

  $/ = '';                # paragraph mode

  $_ = <>;                # read header

  s/\n\s+/ /g;            # join continuation lines

  

  # calculate when and where this started

  my($start_from) = /^From.*\@([^\s>]*)/m;

  my($start_date) = /^Date:\s+(.*)/m;

  my $then = getdate($start_date);

  printf "%-20.20s %-20.20s %s\n", 'Start', $start_from, fmtdate($then);

  

  my $prevfrom = $start_from;

  

  # now process the headers lines from the bottom up

  for (reverse split(/\n/)) {

      my ($delta, $now, $from, $by, $when);

      next unless /^Received:/;

      s/\bon (.*?) (id.*)/; $1/s;         # qmail header, I think

      unless (($when) = /;\s+(.*)$/) {    # where the date falls

          warn "bad received line: $_";

          next;

      }

      ($from) = /from\s+(\S+)/;           

      ($from) = /\((.*?)\)/ unless $from; # some put it here

      $from =~ s/\)$//;                   # someone was too greedy

      ($by)   = /by\s+(\S+\.\S+)/;        # who sent it on this hop

  

      # now random mungings to get their string parsable

      for ($when) {

          s/ (for|via) .*$//;

          s/([+-]\d\d\d\d) \(\S+\)/$1/;

          s/id \S+;\s*//;

      }

      next unless $now = getdate($when);          # convert to Epoch

      $delta = $now - $then;

  

      printf "%-20.20s %-20.20s %s  ", $from, $by, fmtdate($now);

      $prevfrom = $by;

      puttime($delta);

      $then = $now;               

  }

  

  exit;

  

  # convert random date strings into Epoch seconds

  sub getdate {

      my $string     =  shift;

      $string        =~ s/\s+\(.*\)\s*$//;        # remove nonstd tz

      my $date       =  ParseDate($string);

      my $epoch_secs =  UnixDate($date,"%s");

      return $epoch_secs;

  }

  

  # convert Epoch seconds into a particular date string

  sub fmtdate {

      my $epoch = shift;

      my($sec,$min,$hour,$mday,$mon,$year) = localtime($epoch);

      return sprintf "%02d:%02d:%02d %04d/%02d/%02d",

                      $hour, $min, $sec,

                      $year + 1900, $mon + 1, $mday,

  }

  

  # take seconds and print in pleasant-to-read format

  sub puttime {

      my($seconds) = shift;

      my($days, $hours, $minutes);

  

      $days    = pull_count($seconds, 24 * 60 * 60);

      $hours   = pull_count($seconds, 60 * 60);

      $minutes = pull_count($seconds, 60);

  

      put_field('s', $seconds);

      put_field('m', $minutes);

      put_field('h', $hours);

      put_field('d', $days);

  

      print "\n";

  }

  

  # usage: $count = pull_count(seconds, amount)

  # remove from seconds the amount quantity, altering caller's version.

  # return the integral number of those amounts so removed.

  sub pull_count {

      my($answer) = int($_[0] / $_[1]);

      $_[0] -= $answer * $_[1];

      return $answer;

  }

  

  # usage: put_field(char, number)

  # output number field in 3-place decimal format, with trailing char

  # suppress output unless char is 's' for seconds

  sub put_field {

      my ($char, $number) = @_;

      printf " %3d%s", $number, $char if $number || $char eq 's';

  }

  

  =end

  

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 4: Arrays
Inhaltsvorschau
Works of art, in my opinion, are the only objects in the material universe to possess internal order, and that is why, though I don't believe that only art matters, I do believe in Art for Art's sake.
—E.M. Forster
If you are asked about the contents of your pockets, or the names of the first three Greek letters, or how to get to the highway, you recite a list: you name one thing after another in a particular order. Lists are part of your conception of the world. With Perl's powerful list- and array-handling primitives, you can translate this world view directly into code.
In this chapter, we'll use the terms list and array as the Perl language thinks of them. Take ("alpha", "beta", "gamma"); that's a list of the names of the first three Greek letters, in order. To store that list into a variable, use an array, as in @greeks = ("alpha", "beta", "gamma"). Both are ordered groups of scalar values; the difference is that an array is a named variable, one whose array length can be directly changed, whereas a list is a more ephemeral notion. You might think of an array as a variable and a list as the values it contains.
This distinction may seem arbitrary, but operations that modify the length of these groupings (like push and pop) require a proper array and not merely a list. Think of the difference between $a and 4. You can say $a++ but not 4++. Likewise, you can say pop(@a) but not pop (1,2,3).
The most important thing to glean from this is that Perl's lists and arrays are both ordered groupings of scalars. Operators and functions that work on lists or arrays are designed to provide faster or more convenient access to the elements than manual access would provide. Since few actually deal with modifying the array's length, you can usually use arrays and lists interchangeably.
You can't use nested parentheses to create a list of lists. If you try that in Perl, your lists get
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
If you are asked about the contents of your pockets, or the names of the first three Greek letters, or how to get to the highway, you recite a list: you name one thing after another in a particular order. Lists are part of your conception of the world. With Perl's powerful list- and array-handling primitives, you can translate this world view directly into code.
In this chapter, we'll use the terms list and array as the Perl language thinks of them. Take ("alpha", "beta", "gamma"); that's a list of the names of the first three Greek letters, in order. To store that list into a variable, use an array, as in @greeks = ("alpha", "beta", "gamma"). Both are ordered groups of scalar values; the difference is that an array is a named variable, one whose array length can be directly changed, whereas a list is a more ephemeral notion. You might think of an array as a variable and a list as the values it contains.
This distinction may seem arbitrary, but operations that modify the length of these groupings (like push and pop) require a proper array and not merely a list. Think of the difference between $a and 4. You can say $a++ but not 4++. Likewise, you can say pop(@a) but not pop (1,2,3).
The most important thing to glean from this is that Perl's lists and arrays are both ordered groupings of scalars. Operators and functions that work on lists or arrays are designed to provide faster or more convenient access to the elements than manual access would provide. Since few actually deal with modifying the array's length, you can usually use arrays and lists interchangeably.
You can't use nested parentheses to create a list of lists. If you try that in Perl, your lists get flattened, meaning that both these lines are equivalent:
@nested = ("this", "that", "the", "other");

@nested = ("this", "that", ("the", "other"));
Why doesn't Perl (usefully) just support nested lists directly? Although partially for historical reasons, this easily allows for operations (like
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Specifying a List in Your Program
Inhaltsvorschau
You want to include a list in your program. This is how you initialize arrays.
You can write out a comma-separated list of elements:
@a = ("quick", "brown", "fox");
If you have a lot of single-word elements, use the qw( ) operator:
@a = qw(Meddle not in the affairs of wizards.);
If you have a lot of multiword elements, use a here document and extract lines:
@lines = (<< "END_OF_HERE_DOC" =~ /^\s*(.+)/gm);

    I sit beside the fire and think

    of all that I have seen,

    of meadow-flowers and butterflies 

    and summers that have been;

END_OF_HERE_DOC
The first technique is the one most commonly used, often because only small arrays are normally initialized as program literals. Initializing a large array would fill your program with values and make it hard to read, so such arrays either tend to be initialized in a separate library file (see Chapter 12), or else have their values read in from a file:
@bigarray = ( );

open(FH, "<", "myinfo")   or die "Couldn't open myinfo: $!";

while (<FH>) {

    chomp;

    push(@bigarray, $_);

}

close(FH);
The second technique uses qw( ), one of several pseudo-functions in Perl used for quoting without having to resort to actual quotation marks. This one splits its string argument on whitespace to produce a list of words, where "words" in this instance means strings that don't contain any whitespace. The initial argument is not subject to interpolation of variables or (most) backslash escape sequences.
@banner = ('Costs', 'only', '$4.95');

@banner = qw(Costs only $4.95);

@banner = split(' ', 'Costs only $4.95');
You can use qw( ) only when each whitespace-separated argument is to be a distinct element in the return list. Be careful not to give Columbus four ships instead of three:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Printing a List with Commas
Inhaltsvorschau
You'd like to print out a list containing an unknown number of elements, placing an "and" before the last element and commas between each element if there are more than two.
Use this function, which returns the formatted string:
sub commify_series {

    (@_ =  = 0) ? ''                                      :

    (@_ =  = 1) ? $_[0]                                   :

    (@_ =  = 2) ? join(" and ", @_)                       :

                join(", ", @_[0 .. ($#_-1)], "and $_[-1]");

}
It often looks odd to print out arrays:
@array = ("red", "yellow", "green");

print "I have ", @array, " marbles.\n";

print "I have @array marbles.\n";

I have redyellowgreen marbles

               I have red yellow green marbles

            
What you really want it to say is, "I have red, yellow, and green marbles". The function given in the solution generates strings in that format. The word "and" is placed between the last two list elements. If there are more than two elements in the list, a comma is placed between every element.
Example 4-1 gives a complete demonstration of the function, with one addition: if any element in the list already contains a comma, a semicolon is used for the separator character instead.
Example 4-1. commify_series
  #!/usr/bin/perl -w

  # commify_series - show proper comma insertion in list output

  

  # @lists is an array of (references to anonymous) arrays

  @lists = (

      [ 'just one thing' ],

      [ qw(Mutt Jeff) ],

      [ qw(Peter Paul Mary) ],

      [ 'To our parents', 'Mother Theresa', 'God' ],

      [ 'pastrami', 'ham and cheese', 'peanut butter and jelly', 'tuna' ],

      [ 'recycle tired, old phrases', 'ponder big, happy thoughts' ],

      [ 'recycle tired, old phrases', 

        'ponder big, happy thoughts', 

        'sleep and dream peacefully' ],

      );

  

  foreach $aref (@lists) {

      print "The list is: " . commify_series(@$aref) . ".\n";

  } 

  # demo for single list

  @list = qw(one two three);

  print "The last list is: " . commify_series(@list) . ".\n";

  

  sub commify_series {

      my $sepchar = grep(/,/ => @_) ? ";" : ",";

      (@_ =  = 0) ? ''                                      :

      (@_ =  = 1) ? $_[0]                                   :

      (@_ =  = 2) ? join(" and ", @_)                       :

                  join("$sepchar ", @_[0 .. ($#_-1)], "and $_[-1]");

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Changing Array Size
Inhaltsvorschau
You want to enlarge or truncate an array. For example, you might truncate an array of employees that's already sorted by salary to list the five highest-paid employees. Or, if you know how big your array will get and that it will grow piecemeal, it's more efficient to grab memory for it in one step by enlarging just once than to keep pushing values onto the end.
Assign to $#ARRAY :
# grow or shrink @ARRAY

$#ARRAY = $NEW_LAST_ELEMENT_INDEX_NUMBER;
Assigning to an element past the end automatically extends the array:
$ARRAY[$NEW_LAST_ELEMENT_INDEX_NUMBER] = $VALUE;
$#ARRAY is the number of the last valid index in @ARRAY. If we assign it a number smaller than its current value, we truncate the array. Truncated elements are lost forever. If we assign $#ARRAY a number larger than its current value, the array grows. New elements have the undefined value.
$#ARRAY is not @ARRAY, though. Although $#ARRAY is the last valid index in the array, @ARRAY (in scalar context, as when treated as a number) is the number of elements. $#ARRAY is one less than @ARRAY because array indices start at 0.
Here's some code that uses both. We have to say scalar @array in the print because Perl gives list context to (most) functions' arguments, but we want @array in scalar context.
sub what_about_that_array {

    print "The array now has ", scalar(@people), " elements.\n";

    print "The index of the last element is $#people.\n";

    print "Element #3 is `$people[3]'.\n";

}



@people = qw(Crosby Stills Nash Young);

what_about_that_array( );
prints:

               The array now has 4 elements

               The index of the last element is 3

               Element #3 is `Young'

            
whereas:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Implementing a Sparse Array
Inhaltsvorschau
An array with large, unoccupied expanses between occupied elements wastes memory. How do you reduce that overhead?
Use a hash instead of an array.
If you assign to the millionth element of an array, Perl allocates a million and one slots to store scalars. Only the last element contains interesting data, leaving earlier ones each set to undef at a cost of four (or more) bytes per unoccupied slot.
In recent versions of Perl, if you grow an array by assigning either past the end or directly to $#ARRAY, you can distinguish these implicit undefs from those that would result from assigning undef there by using exists instead of defined, just as you would with a hash.
$#foo = 5;

@bar = ( (undef) x 5 ) ;



printf "foo element 3 is%s defined\n",

        defined $foo[3] ? "" : "n't";

printf "foo element 3 does%s exist\n",

        exists $foo[3] ? "" : "n't";

printf "bar element 3 is%s defined\n",

        defined $bar[3] ? "" : "n't";

printf "bar element 3 does%s exist\n",

        exists $bar[3] ? "" : "n't";



foo element 3 isn't defined

               foo element 3 doesn't exist

               bar element 3 isn't defined

               bar element 3 does exist

            
However, you still waste a lot of space. That's because Perl's array implementation reserves a contiguous vector, one for each element up to the highest occupied position.
$real_array[ 1_000_000 ] = 1;       # costs 4+ megabytes
A hash works differently: you pay only for what you really use, not for unoccupied positions. Although a hash element costs somewhat more than an array element because you need to store both the value and its key, with sparse arrays, the savings can be astonishing.
$fake_array{ 1_000_000 } = 1;       # costs 28 bytes
What's the trade-off? Because a hash's keys aren't ordered, a little more work is needed to sort the numeric keys so you can handle their values in the same order as you would if they were stored as a real array. With an array, you'd just do this to process elements in index order:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Iterating Over an Array
Inhaltsvorschau
You want to repeat a procedure for every element in a list.
Often you use an array to collect information you're interested in; for instance, login names of users who have exceeded their disk quota. When you finish collecting the information, you want to process it by doing something with every element in the array. In the disk quota example, you might send each user a stern mail message.
Use a foreach loop:
foreach $item (LIST) {

    # do something with $item

}
Let's say we've used @bad_users to compile a list of users who are over their allotted disk quotas. To call some complain subroutine for each user, we'd use:
foreach $user (@bad_users) {

        complain($user);

}
Rarely is this recipe so simply applied. Instead, we often use functions to generate the list:
foreach $var (sort keys %ENV) {

    print "$var=$ENV{$var}\n";

}
Here we're using sort and keys to build a sorted list of environment variable names. If you use the list more than once, you'll obviously keep it around by saving in an array. But for one-shot processing, it's often tidier to process the list directly.
Not only can we add complexity to this formula by building up the list in the foreach, we can also add complexity by doing more work inside the code block. A common application of foreach is to gather information on every element of a list and then, based on that information, decide whether to do something. For instance, returning to the disk quota example:
foreach $user (@all_users) {

    $disk_space = get_usage($user);     # find out how much disk space in use

    if ($disk_space > $MAX_QUOTA) {     # if it's more than we want ...

        complain($user);                # ... then object vociferously

    }

}
More complicated program flow is possible. The code can call last to jump out of the loop, next
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Iterating Over an Array by Reference
Inhaltsvorschau
You have a reference to an array, and you want to use a loop to work with the array's elements.
Use foreach or for to loop over the dereferenced array:
# iterate over elements of array in $ARRAYREF

foreach $item (@$ARRAYREF) {

    # do something with $item

}



for ($i = 0; $i <= $#$ARRAYREF; $i++) {

    # do something with $ARRAYREF->[$i]

}
The solutions assume you have a scalar variable containing the array reference. This lets you do things like this:
@fruits = ( "Apple", "Blackberry" );

$fruit_ref = \@fruits;

foreach $fruit (@$fruit_ref) {

    print "$fruit tastes good in a pie.\n";

}

Apple tastes good in a pie

               Blackberry tastes good in a pie

            
We could have rewritten the foreach loop as a for loop like this:
for ($i=0; $i <= $#$fruit_ref; $i++) {

    print "$fruit_ref->[$i] tastes good in a pie.\n";

}
Frequently, though, the array reference is the result of a more complex expression. Use the @{ EXPR } notation to turn the result of the expression back into an array:
$namelist{felines} = \@rogue_cats;

foreach $cat ( @{ $namelist{felines} } ) {

    print "$cat purrs hypnotically..\n";

}

print "--More--\nYou are controlled.\n";
Again, we can replace the foreach with a for loop:
for ($i=0; $i <= $#{ $namelist{felines} }; $i++) {

    print "$namelist{felines}[$i] purrs hypnotically.\n";

}
perlref(1) and perllol(1); Chapter 8 of Programming Perl; Recipe 11.1; Recipe 4.5
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extracting Unique Elements from a List
Inhaltsvorschau
You want to eliminate duplicate values from a list, such as when you build the list from a file or from the output of another command. This recipe is equally applicable to removing duplicates as they occur in input and to removing duplicates from an array you've already populated.
Use a hash to record which items have been seen, then keys to extract them. You can use Perl's idea of truth to shorten and speed up your code.

Section 4.7.2.1: Straightforward

%seen = ( );

@uniq = ( );

foreach $item (@list) {

    unless ($seen{$item}) {

        # if we get here, we have not seen it before

        $seen{$item} = 1;

        push(@uniq, $item);

    }

}

Section 4.7.2.2: Faster

%seen = ( );

foreach $item (@list) {

    push(@uniq, $item) unless $seen{$item}++;

}

Section 4.7.2.3: Similar but with user function

%seen = ( );

foreach $item (@list) {

    some_func($item) unless $seen{$item}++;

}

Section 4.7.2.4: Faster but different

%seen = ( );

foreach $item (@list) {

    $seen{$item}++;

}

@uniq = keys %seen;

Section 4.7.2.5: Faster and even more different

%seen = ( );

@uniq = grep { ! $seen{$_} ++ } @list;
The question at the heart of the matter is "Have I seen this element before?" Hashes are ideally suited to such lookups. The first technique (Recipe 4.7.2.1) builds up the array of unique values as we go along, using a hash to record whether something is already in the array.
The second technique (Recipe 4.7.2.2) is the most natural way to write this sort of thing in Perl. It creates a new entry in the hash every time it sees an element that hasn't been seen before, using the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding Elements in One Array but Not Another
Inhaltsvorschau
You want to find elements that are in one array but not another.
You want to find elements in @A that aren't in @B. Build a hash of the keys of @B to use as a lookup table. Then check each element in @A to see whether it is in @B.

Section 4.8.2.1: Straightforward implementation

# assume @A and @B are already loaded

%seen = ( );                    # lookup table to test membership of B

@aonly = ( );                   # answer



# build lookup table

foreach $item (@B) { $seen{$item} = 1 }



# find only elements in @A and not in @B

foreach $item (@A) {

    unless ($seen{$item}) {

        # it's not in %seen, so add to @aonly

        push(@aonly, $item);

    }

}

Section 4.8.2.2: More idiomatic version

my %seen;     # lookup table

my @aonly;    # answer



# build lookup table

@seen{@B} = ( );



foreach $item (@A) {

    push(@aonly, $item) unless exists $seen{$item};

}

Section 4.8.2.3: Loopless version

my @A = ...;

my @B = ...;



my %seen;

@seen {@A} = ( );

delete @seen {@B};



my @aonly = keys %seen;
As with nearly any problem in Perl that asks whether a scalar is in one list or another, this one uses a hash. First, process @B so that the %seen hash records each element from @B by setting its value to 1. Then process @A one element at a time, checking whether that particular element had been in @B by consulting the %seen hash.
The given code retains duplicate elements in @A. This can be fixed easily by adding the elements of @A to %seen as they are processed:
foreach $item (@A) {

    push(@aonly, $item) unless $seen{$item};

    $seen{$item} = 1;                       # mark as seen

}
The first two solutions differ mainly in how they build the hash. The first iterates through
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Computing Union, Intersection, or Difference of Unique Lists
Inhaltsvorschau
You have a pair of lists, each holding unduplicated items. You'd like to find out which items are in both lists (intersection), one but not the other (difference), or either (union).
The following solutions need the listed initializations:
@a = (1, 3, 5, 6, 7, 8);

@b = (2, 3, 5, 7, 9);



@union = @isect = @diff = ( );

%union = %isect = ( );

%count = ( );

Section 4.9.2.1: Simple solution for union and intersection

foreach $e (@a) { $union{$e} = 1 }



foreach $e (@b) {

    if ( $union{$e} ) { $isect{$e} = 1 }

    $union{$e} = 1;

}

@union = keys %union;

@isect = keys %isect;

Section 4.9.2.2: More idiomatic version

foreach $e (@a, @b) { $union{$e}++ && $isect{$e}++ }



@union = keys %union;

@isect = keys %isect;

Section 4.9.2.3: Union, intersection, and symmetric difference

foreach $e (@a, @b) { $count{$e}++ }



@union = keys %count;

foreach $e (keys %count) {

    if ($count{$e} =  = 2) {

        push @isect, $e;

    } else {

        push @diff, $e;

    }

}

Section 4.9.2.4: Indirect solution

@isect = @diff = @union = ( );



foreach $e (@a, @b) { $count{$e}++ }



@union = keys %count;

foreach $e (keys %count) {

    push @{ $count{$e} =  = 2 ? \@isect : \@diff }, $e;

}
The first solution most directly computes the union and intersection of two lists, neither containing duplicates. Two hashes are used to record whether a particular item goes in the union or the intersection. We put every element of the first array in the union hash, giving it a true value. Then, processing each element of the second array, we check whether that element is already present in the union. If it is, we put it in the intersection as well. In any event, it goes into the union. When we're done, we extract the keys of both the union and intersection hashes. The values aren't needed.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Appending One Array to Another
Inhaltsvorschau
You want to join two arrays by appending all elements of one to the other.
Use push:
# push

push(@ARRAY1, @ARRAY2);
The push function is optimized for appending a list to the end of an array. You can take advantage of Perl's list flattening to join two arrays, but this results in significantly more copying than push:
@ARRAY1 = (@ARRAY1, @ARRAY2);
Here's an example of push in action:
@members = ("Time", "Flies");

@initiates = ("An", "Arrow");

push(@members, @initiates);

# @members is now ("Time", "Flies", "An", "Arrow")
To insert the elements of one array into the middle of another, use the splice function:
splice(@members, 2, 0, "Like", @initiates);

print "@members\n";

splice(@members, 0, 1, "Fruit");

splice(@members, -2, 2, "A", "Banana");

print "@members\n";
This is the output:

               Time Flies Like An Arrow

               Fruit Flies Like A Banana

            
The splice and push functions in perlfunc(1) and Chapter 29 of Programming Perl; the "List Values and Arrays" section of Chapter 2 of Programming Perl; the "List Value Constructors" section of perldata(1)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reversing an Array
Inhaltsvorschau
You want to reverse an array.
Use the reverse function:
# reverse @ARRAY into @REVERSED

@REVERSED = reverse @ARRAY;
Or process with a foreach loop on a reversed list:
foreach $element (reverse @ARRAY) {

    # do something with $element

}
Or use a for loop, starting with the index of the last element and working your way down:
for ($i = $#ARRAY; $i >= 0; $i--) {

    # do something with $ARRAY[$i]

}
Called in list context, the reverse function reverses elements of its argument list. You can save a copy of that reversed list into an array, or just use foreach to walk through it directly if that's all you need. The for loop processes the array elements in reverse order by using explicit indices. If you don't need a reversed copy of the array, the for loop can save memory and time on very large arrays.
If you're using reverse to reverse a list that you just sorted, you should have sorted it in the correct order to begin with. For example:
# two-step: sort then reverse

@ascending = sort { $a cmp $b } @users;

@descending = reverse @ascending;



# one-step: sort with reverse comparison

@descending = sort { $b cmp $a } @users;
The reverse function in perlfunc(1) and Chapter 29 of Programming Perl; we use reverse in Recipe 1.7
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing Multiple Elements of an Array
Inhaltsvorschau
You want to pop or shift multiple elements at a time.
Use splice:
# remove $N elements from front of @ARRAY (shift $N)

@FRONT = splice(@ARRAY, 0, $N);



# remove $N elements from the end of the array (pop $N)

@END = splice(@ARRAY, -$N);
The splice function allows you to add elements, delete elements, or both, at any point in an array, not just at the ends. All other operations that modify an array's length can also be written as a splice:
Direct method
Splice equivalent
push(@a, $x, $y)
splice(@a, @a, 0, $x, $y)
pop(@a)
splice(@a, -1)
shift(@a)
splice(@a, 0, 1)
unshift(@a, $x, $y)
splice(@a, 0, 0, $x, $y)
$a[$x] = $y
splice(@a, $x, 1, $y)
(@a, @a = ( ))
splice(@a)
Unlike pop and unshift, though, which always delete and return just one element at a time—and from the ends only—splice lets you specify the number of elements. This leads to code like the examples in the Solution.
It's often convenient to wrap these splices as functions:
sub shift2 (\@) {

    return splice(@{$_[0]}, 0, 2);

}



sub pop2 (\@) {

    return splice(@{$_[0]}, -2);

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding the First List Element That Passes a Test
Inhaltsvorschau
You want the first element in the list (or its index) that passes a test. Alternatively, you want to know whether any element passes the test. The test can be simple identity ("Is this element in the list?") or more complex ("I have a list of Employee objects, sorted from highest salary to lowest. Which manager has the highest salary?"). Simple cases normally require only the value of the element, but when the array itself will be altered, you probably need to know the index number of the first matching element.
To find a matching value, use foreach to loop over every element, and call last as soon as you find a match:
my ($match, $found, $item);

foreach $item (@array) {

    if (CRITERION) {

        $match = $item;  # must save

        $found = 1;

        last;

    }

}

if ($found) {

    ## do something with $match

} else {

    ## unfound

}
To find a matching index, use for to loop a variable over every array index, and call last as soon as you find a match:
my ($i, $match_idx);

for ($i = 0; $i < @array; $i++) {

    if (CRITERION) {

        $match_idx = $i;    # save the index

        last;

    }

}



if (defined $match_idx) {

    ## found in $array[$match_idx]

} else {

    ## unfound

}
The List::Util module, shipped standard with Perl as of v5.8 but available on CPAN for earlier versions, provides an even easier approach:
use List::Util qw(first);

$match = first { CRITERION } @list
Lacking (until recently) a built-in mechanism to do this, we must write our own code to go through the list and test each element. We use foreach and for, and call last to ensure that we stop as soon as we find a match. Before we use last to stop looking, though, we save the value or index.
A common approach is to try to use grep here. But grep always tests all elements and finds all matches, so it's inefficient if you want only the first match. However,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding All Elements in an Array Matching Certain Criteria
Inhaltsvorschau
From a list, you want only the elements that match certain criteria.
This notion of extracting a subset of a larger list is common. It's how you find all engineers in a list of employees, all users in the "staff" group, or all the filenames you're interested in.
Use grep to apply a condition to all elements in the list and return only those for which the condition was true:
@MATCHING = grep { TEST ($_) } @LIST;
This could also be accomplished with a foreach loop:
@matching = ( );

foreach (@list) {

    push(@matching, $_) if TEST ($_);

}
The Perl grep function is shorthand for all that looping and mucking about. It's not really like the Unix grep command; it doesn't have options to return line numbers or to negate the test, and it isn't limited to regular-expression tests. For example, to filter out just the large numbers from an array or to find out which keys in a hash have very large values:
@bigs = grep { $_ > 1_000_000 } @nums;

@pigs = grep { $users{$_} > 1e7 } keys %users;
Here's something that sets @matching to lines from the who command that start with "gnat ":
@matching = grep { /^gnat / } `who`;
Here's another example:
@engineers = grep { $_->position( ) eq "Engineer" } @employees;
It extracts only those objects from the array @employees whose position method returns the string Engineer.
You could have even more complex tests in a grep:
@secondary_assistance = grep { $_->income >= 26_000 &&

                               $_->income <  30_000 }

                        @applicants;
But at that point you may decide it would be more legible to write a proper loop instead.
The "For Loops," "Foreach Loops," and "Loop Control" sections of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sorting an Array Numerically
Inhaltsvorschau
You want to sort a list of numbers, but Perl's sort (by default) sorts in ASCII order.
Use Perl's sort function and the <=> numerical comparison operator:
@sorted = sort { $a <=> $b } @unsorted;
The sort function takes an optional code block, which lets you replace the default alphabetic comparison with your own subroutine. This comparison function is called each time sort has to compare two values. The values to compare are loaded into the special package variables $a and $b, which are automatically localized.
The comparison function should return a negative number if $a ought to appear before $b in the output list, 0 if they're the same and their order doesn't matter, or a positive number if $a ought to appear after $b. Perl has two operators that behave this way: <=> for sorting numbers in ascending numeric order, and cmp for sorting strings in ascending alphabetic order. By default, sort uses cmp-style comparisons.
Here's code that sorts the list of PIDs in @pids, lets the user select one, then sends it a TERM signal followed by a KILL signal. We use a code block that compares $a to $b with <=> to sort numerically:
# @pids is an unsorted array of process IDs

foreach my $pid (sort { $a <=> $b } @pids) {

    print "$pid\n";

}

print "Select a process ID to kill:\n";

chomp ($pid = <>);

die "Exiting ... \n" unless $pid && $pid =~ /^\d+$/;

kill('TERM',$pid);

sleep 2;

kill('KILL',$pid);
If you use $a <=> $b or $a cmp $b, the list will be sorted in ascending order. For a descending sort, all we have to do is swap $a and $b in the sort subroutine:
@descending = sort { $b <=> $a } @unsorted;
Comparison routines must be consistent; that is, they should always return the same answer when called with the same values. Inconsistent comparison routines lead to infinite loops or core dumps, especially in older releases of Perl.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sorting a List by Computable Field
Inhaltsvorschau
You want to sort a list by something more complex than a simple string or numeric comparison.
This is common when working with objects ("sort by the employee's salary") or complex data structures ("sort by the third element in the array that this is a reference to"). It's also applicable when you want to sort by more than one key; for instance, sorting by birthday and then by name when multiple people share the same birthday.
Use the customizable comparison routine in sort:
@ordered = sort { compare( ) } @unordered;
You can speed this up by precomputing the field.
@precomputed = map { [compute( ),$_] } @unordered;

@ordered_precomputed = sort { $a->[0] <=> $b->[0] } @precomputed;

@ordered = map { $_->[1] } @ordered_precomputed;
And, finally, you can combine the three steps:
@ordered = map { $_->[1] }

           sort { $a->[0] <=> $b->[0] }

           map { [compute( ), $_] }

           @unordered;
The use of a comparison routine was explained in Recipe 4.15. As well as using built-in operators like <=>, you can construct more complex tests:
@ordered = sort { $a->name cmp $b->name } @employees;
You often see sort used like this in part of a foreach loop:
foreach $employee (sort { $a->name cmp $b->name } @employees) {

    print $employee->name, " earns \$", $employee->salary, "\n";

}
If you're going to do a lot of work with elements in a particular order, it's more efficient to sort once and work from that:
@sorted_employees = sort { $a->name cmp $b->name } @employees;

foreach $employee (@sorted_employees) {

    print $employee->name, " earns \$", $employee->salary, "\n";

}

# load %bonus

foreach $employee (@sorted_employees) {

    if ( $bonus{ $employee->ssn } ) {

      print $employee->name, " got a bonus!\n";

    }

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Implementing a Circular List
Inhaltsvorschau
You want to create and manipulate a circular list.
Use unshift and pop (or push and shift) on a normal array.
unshift(@circular, pop(@circular));  # the last shall be first

push(@circular, shift(@circular));   # and vice versa
Circular lists are commonly used to repeatedly process things in order; for example, connections to a server. The code shown previously isn't a true computer science circular list, with pointers and true circularity. Instead, the operations provide for moving the last element to the first position, and vice versa.
sub grab_and_rotate ( \@ ) {

    my $listref = shift;

    my $element = $listref->[0];

    push(@$listref, shift @$listref);

    return $element;

}



@processes = ( 1, 2, 3, 4, 5 );

while (1) {

    $process = grab_and_rotate(@processes);

    print "Handling process $process\n";

    sleep 1;

}
The unshift and push functions in perlfunc(1) and Chapter 29 of Programming Perl; Recipe 13.13
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Randomizing an Array
Inhaltsvorschau
You want to randomly shuffle the elements of an array. The obvious application is writing a card game, where you must shuffle a deck of cards, but it is equally applicable to any situation where you want to treat elements of an array in a random order.
Use the shuffle function from the standard List::Util module, which returns the elements of its input list in a random order.
use List::Util qw(shuffle);

@array = shuffle(@array);
Shuffling is a surprisingly tricky process. It's easy to write a bad shuffle:
sub naive_shuffle {                             # DON'T DO THIS

    for (my $i = 0; $i < @_; $i++) {

        my $j = int rand @_;                    # pick random element

        ($_[$i], $_[$j]) = ($_[$j], $_[$i]);    # swap 'em

    }

}
This algorithm is biased; the list's possible permutations don't all have the same probability of being generated. The proof of this is simple: take the case where we're passed a three-element list. We generate three random numbers, each of which can have three possible values, yielding 27 possible outcomes. There are only six permutations of the three-element list, though. Because 27 isn't evenly divisible by 6, some outcomes are more likely than others.
The List::Util module's shuffle function avoids this bias to produce a more randomly shuffled result.
If all you want to do is pick one random element from the array, use:
$value = $array[ int(rand(@array)) ];
The rand function in perlfunc(1) and Chapter 29 of Programming Perl; for more on random numbers, see Recipe 2.6, Recipe 2.7, and Recipe 2.8; Recipe 4.20 provides another way to select a random permutation
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: words
Inhaltsvorschau
Have you ever wondered how programs like ls generate columns of sorted output that you read down the columns instead of across the rows? For example:
awk      cp       ed       login    mount    rmdir    sum

basename csh      egrep    ls       mt       sed      sync

cat      date     fgrep    mail     mv       sh       tar

chgrp    dd       grep     mkdir    ps       sort     touch

chmod    df       kill     mknod    pwd      stty     vi

chown    echo     ln       more     rm       su
Example 4-2 does this.
Example 4-2. words
  #!/usr/bin/perl -w

  # words - gather lines, present in columns

  

  use strict;

  

  my ($item, $cols, $rows, $maxlen);

  my ($xpixel, $ypixel, $mask, @data);

  

  getwinsize( );

  

  # first gather up every line of input,

  # remembering the longest line length seen

  $maxlen = 1;        

  while (<>) {

      my $mylen;

      s/\s+$//;

      $maxlen = $mylen if (($mylen = length) > $maxlen);

      push(@data, $_);

  }

  

  $maxlen += 1;               # to make extra space

  

  # determine boundaries of screen

  $cols = int($cols / $maxlen) || 1;

  $rows = int(($#data+$cols) / $cols);

  

  # pre-create mask for faster computation

  $mask = sprintf("%%-%ds ", $maxlen-1);

  

  # subroutine to check whether at last item on line

  sub EOL { ($item+1) % $cols =  = 0 }  

  

  # now process each item, picking out proper piece for this position

  for ($item = 0; $item < $rows * $cols; $item++) {

      my $target =  ($item % $cols) * $rows + int($item/$cols);

      my $piece = sprintf($mask, $target < @data ? $data[$target] : "");

      $piece =~ s/\s+$// if EOL( );  # don't blank-pad to EOL

      print $piece;

      print "\n" if EOL( );

  }

  

  # finish up if needed

  print "\n" if EOL( );

  

  # not portable -- linux only

  sub getwinsize {

      my $winsize = "\0" x 8;

      my $TIOCGWINSZ = 0x40087468;

      if (ioctl(STDOUT, $TIOCGWINSZ, $winsize)) {

          ($rows, $cols, $xpixel, $ypixel) = unpack('S4', $winsize);

      } else {

          $cols = 80;

      }

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: permute
Inhaltsvorschau
Have you ever wanted to generate all possible permutations of an array or to execute some code for every possible permutation? For example:
% echo man bites dog | permute

dog bites man

            bites dog man

            dog man bites

            man dog bites

            bites man dog

            man bites dog

         
The number of permutations of a set is the factorial of the size of the set. This number grows extremely fast, so you don't want to run it on many permutations:
Set Size            Permutations

1                   1

2                   2

3                   6

4                   24

5                   120

6                   720

7                   5040

8                   40320

9                   362880

10                  3628800

11                  39916800

12                  479001600

13                  6227020800

14                  87178291200

15                  1307674368000
Doing something for each alternative takes a correspondingly large amount of time. In fact, factorial algorithms exceed the number of particles in the universe with very small inputs. The factorial of 500 is greater than ten raised to the thousandth power!
use Math::BigInt;

sub factorial {

    my $n = shift;

    my $s = 1;

    $s *= $n-- while $n > 0;

    return $s;

}

print factorial(Math::BigInt->new("500"));

+1220136... (1035 digits total)

         
The two solutions that follow differ in the order of the permutations they return.
The solution in Example 4-3 uses a classic list permutation algorithm used by Lisp hackers. It's relatively straightforward but makes unnecessary copies. It's also hardwired to do nothing but print out its permutations.
Example 4-3. tsc-permute
  #!/usr/bin/perl -n

  # tsc_permute: permute each word of input

  permute([split], [  ]);

  sub permute {

      my @items = @{ $_[0] };

      my @perms = @{ $_[1] };

      unless (@items) {

          print "@perms\n";

      } else {

          my (@newitems,@newperms,$i);

          foreach $i (0 .. $#items) {

              @newitems = @items;

              @newperms = @perms;

              unshift(@newperms, splice(@newitems, $i, 1));

              permute( \@newitems, \@newperms);

          }

      }

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 5: Hashes
Inhaltsvorschau
Doing linear scans over an associative array is like trying to club someone to death with a loaded Uzi.
—Larry Wall
People and parts of computer programs interact in all sorts of ways. Single scalar variables are like hermits, living a solitary existence whose only meaning comes from within the individual. Arrays are like cults, where multitudes marshal themselves under the name of a charismatic leader. In the middle lies the comfortable, intimate ground of the one-to-one relationship that is the hash. (Older documentation for Perl often called hashes associative arrays, but that's a mouthful. Other languages that support similar constructs sometimes use different terms for them; you may hear about hash tables, tables, dictionaries, mappings, or even alists, depending on the language.)
Unfortunately, this isn't a relationship of equals. The relationship encoded in a hash is that of the genitive case or the possessive, like the word "of " in English, or like "'s". We could encode that the boss of Nat is Tim. Hashes only give convenient ways to access values for Nat's boss; you can't ask whose boss Tim is. Finding the answer to that question is a recipe in this chapter.
Fortunately, hashes have their own special benefits, just like relationships. Hashes are a built-in data type in Perl. Their use reduces many complex algorithms to simple variable accesses. They are also fast and convenient to build indices and quick lookup tables.
Only use the % when referring to the hash as a whole, such as %boss. When referring to the value associated with a particular key, that's a single scalar value, so a $ is called for—just as when referring to one element of an array, you also use a $. This means that "the boss of Nat" would be written as $boss{"Nat"}. We can assign "Tim" to that:
$boss{"Nat"} = "Tim";
It's time to put a name to these notions. The relationship embodied in a hash is a good thing to use for its name. In the previous example you see a dollar sign, which might surprise you since this is a hash, not a scalar. But we're setting a single scalar value in that hash, so use a dollar sign. Where a lone scalar has
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
People and parts of computer programs interact in all sorts of ways. Single scalar variables are like hermits, living a solitary existence whose only meaning comes from within the individual. Arrays are like cults, where multitudes marshal themselves under the name of a charismatic leader. In the middle lies the comfortable, intimate ground of the one-to-one relationship that is the hash. (Older documentation for Perl often called hashes associative arrays, but that's a mouthful. Other languages that support similar constructs sometimes use different terms for them; you may hear about hash tables, tables, dictionaries, mappings, or even alists, depending on the language.)
Unfortunately, this isn't a relationship of equals. The relationship encoded in a hash is that of the genitive case or the possessive, like the word "of " in English, or like "'s". We could encode that the boss of Nat is Tim. Hashes only give convenient ways to access values for Nat's boss; you can't ask whose boss Tim is. Finding the answer to that question is a recipe in this chapter.
Fortunately, hashes have their own special benefits, just like relationships. Hashes are a built-in data type in Perl. Their use reduces many complex algorithms to simple variable accesses. They are also fast and convenient to build indices and quick lookup tables.
Only use the % when referring to the hash as a whole, such as %boss. When referring to the value associated with a particular key, that's a single scalar value, so a $ is called for—just as when referring to one element of an array, you also use a $. This means that "the boss of Nat" would be written as $boss{"Nat"}. We can assign "Tim" to that:
$boss{"Nat"} = "Tim";
It's time to put a name to these notions. The relationship embodied in a hash is a good thing to use for its name. In the previous example you see a dollar sign, which might surprise you since this is a hash, not a scalar. But we're setting a single scalar value in that hash, so use a dollar sign. Where a lone scalar has
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Adding an Element to a Hash
Inhaltsvorschau
You need to add an entry to a hash.
Simply assign to the hash key:
$HASH{$KEY} = $VALUE;
Putting something into a hash is straightforward. In languages that don't provide the hash as an intrinsic data type, you have to worry about overflows, resizing, and collisions in your hash table. In Perl, all that is taken care of for you with a simple assignment. If that entry was already occupied (had a previous value), memory for that value is automatically freed, just as when assigning to a simple scalar.
# %food_color defined per the introduction

$food_color{Raspberry} = "pink";

print "Known foods:\n";

foreach $food (keys %food_color) {

    print "$food\n";

}



Known foods

               Banana

               Apple

               Raspberry

               Carrot

               Lemon

            
If you don't want to overwrite an existing value, but somehow have one key reference multiple values, see Recipe 5.8 and Recipe 11.2.
The "List Value Constructors" section of perldata(1); the "List Values and Arrays" section of Chapter 2 of Programming Perl; Recipe 5.2
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Testing for the Presence of a Key in a Hash
Inhaltsvorschau
You need to know whether a hash has a particular key, regardless of whatever value may be associated with that key.
Use the exists function.
# does %HASH have a value for $KEY ?

if (exists($HASH{$KEY})) {

    # it exists

} else {

    # it doesn't

}
This code uses exists to check whether a key is in the %food_color hash:
# %food_color per the introduction

foreach $name ("Banana", "Martini") {

    if (exists $food_color{$name}) {

        print "$name is a food.\n";

    } else {

        print "$name is a drink.\n";

    }

}



Banana is a food

               Martini is a drink

            
The exists function tests whether a key is in the hash. It doesn't test whether the value corresponding to that key is defined, nor whether the value is true or false. We may be splitting hairs, but problems caused by confusing existence, definedness, and truth can multiply like rabbits. Take this code:
%age = ( );

$age{"Toddler"}  = 3;

$age{"Unborn"}   = 0;

$age{"Phantasm"} = undef;



foreach $thing ("Toddler", "Unborn", "Phantasm", "Relic") {

    print "$thing: ";

    print "Exists "  if exists  $age{$thing};

    print "Defined " if defined $age{$thing};

    print "True "    if         $age{$thing};

    print "\n";

}



Toddler: Exists Defined True

               Unborn: Exists Defined

               Phantasm: Exists

               Relic

            
$age{"Toddler"} passes the existence, definedness, and truth tests. It exists because we gave "Toddler" a value in the hash; it's defined because that value isn't undef; and it's true because the value isn't one of Perl's false values.
$age{"Unborn"} passes only the existence and definedness tests. It exists because we gave "Unborn" a value in the hash, and it's defined because that value isn't undef. It isn't true, however, because 0 is one of Perl's false values.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating a Hash with Immutable Keys or Values
Inhaltsvorschau
You'd like to have a hash whose keys or values can't be altered once set.
Use the appropriate functions from the standard Hash::Util module.
use Hash::Util qw{ lock_keys  unlock_keys

                   lock_value unlock_value

                   lock_hash  unlock_hash  };
To restrict access to keys already in the hash, so no new keys can be introduced:
lock_keys(%hash);                # restrict to current keys

lock_keys(%hash, @klist);        # restrict to keys from @klist
To forbid deletion of the key or modification of its value:
lock_value(%hash, $key);
To make all keys and their values read-only:
lock_hash(%hash);
Suppose you're using a hash to implement a record (or an object) with some pre-determined set of keys, such as "NAME", "RANK", and "SERNO". You'd like to consider it an error to access any keys besides the ones initially in the hash, such as "ANME", a typo. Because Perl always creates hash elements on demand, this wouldn't be caught the way it would if you misspelled a variable name while under the use strict pragma.
The Hash::Util module's lock_keys function takes care of this for you. Once a hash is marked as having locked keys, you can't use any other keys than those. The keys need not yet be in the hash, and they may still be deleted if they are. But no new keys may be used.
Access to the values in those locked keys is not restricted by lock_keys. However, you may use the lock_value function to render a value in a hash read-only. That hash can also have its keys locked, but doesn't need to if the goal is just to have one or more values marked read-only.
If you want to lock down the entire hash, thereby restricting both its keys and its values, the lock_hash function will do.
The documentation for the Hash::Util module
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Deleting from a Hash
Inhaltsvorschau
You want to remove an entry from a hash so that it doesn't show up with keys, values, or each. If you were using a hash to associate salaries with employees, and an employee resigned, you'd want to remove their entry from the hash.
Use the delete function:
# remove $KEY and its value from %HASH

delete($HASH{$KEY});
Sometimes people mistakenly try to use undef to remove an entry from a hash. undef $hash{$key} and $hash{$key} = undef both make %hash have an entry with key $key and value undef.
The delete function is the only way to remove a specific entry from a hash. Once you've deleted a key, it no longer shows up in a keys list or an each iteration, and exists will return false for that key.
This demonstrates the difference between undef and delete:
# %food_color as per Introduction

sub print_foods {

    my @foods = keys %food_color;

    my $food;



    print "Keys: @foods\n";

    print "Values: ";



    foreach $food (@foods) {

        my $color = $food_color{$food};



        if (defined $color) {

            print "$color ";

        } else {

            print "(undef) ";

        }

    }

    print "\n";

}



print "Initially:\n";

print_foods( );



print "\nWith Banana undef\n";

undef $food_color{"Banana"};

print_foods( );



print "\nWith Banana deleted\n";

delete $food_color{"Banana"};

print_foods( );



Initially

               Keys: Banana Apple Carrot Lemon

               Values: yellow red orange yellow



               With Banana undef

               Keys: Banana Apple Carrot Lemon

               Values: (undef) red orange yellow



               With Banana deleted

               Keys: Apple Carrot Lemon

               Values: red orange yellow

            
As you see, if we set $food_color{"Banana"} to undef, "Banana" still shows up as a key in the hash. The entry is still there; we only succeeded in making the value
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Traversing a Hash
Inhaltsvorschau
You want to perform an action on each entry (i.e., each key-value pair) in a hash.
Use each with a while loop:
while(($key, $value) = each(%HASH)) {

    # do something with $key and $value

}
Or use keys with a foreach loop, unless the hash is potentially very large:
foreach $key (keys %HASH) {

    $value = $HASH{$key};

    # do something with $key and $value

}
Here's a simple example, iterating through the %food_color hash from the introduction:
# %food_color per the introduction

while(($food, $color) = each(%food_color)) {

    print "$food is $color.\n";

}

Banana is yellow

               Apple is red

               Carrot is orange

               Lemon is yellow



foreach $food (keys %food_color) {

    my $color = $food_color{$food};

    print "$food is $color.\n";

}

Banana is yellow

               Apple is red

               Carrot is orange

               Lemon is yellow

            
We didn't really need the $color variable in the foreach example, because we use it only once. Instead, we could have written:
print "$food is $food_color{$food}.\n"
Every time each is called on the same hash, it returns the "next" key-value pair. We say "next" because the pairs are returned in the order the underlying lookup structure imposes on them, which appears to be no order at all. When each runs out of hash elements, it returns the empty list ( ), whose assignment tests false and terminates the while loop.
The foreach example uses keys, which constructs an entire list containing every key from the hash before the loop even begins executing. The advantage to using each is that it gets the keys and values one pair at a time. If the hash contains many keys, not having to preconstruct a complete list of them can save substantial memory. The each
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Printing a Hash
Inhaltsvorschau
You want to print a hash, but neither print "%hash" nor print %hash does what you want; the first is a literal, while the second just has the keys and values all scrunched together.
One of several approaches is to iterate over every key-value pair in the hash using Recipe 5.5 and print them:
while ( ($k,$v) = each %hash ) {

    print "$k => $v\n";

}
Or use map to generate a list of strings:
print map { "$_ => $hash{$_}\n" } keys %hash;
Or use the interpolation trick from Recipe 1.15 to interpolate the hash as a list:
print "@{[ %hash ]}\n";
Or use a temporary array variable to hold the hash, and then print that:
{

    my @temp = %hash;

    print "@temp";

}
The methods differ in the degree that their output is customizable (in order and formatting) and in their efficiency.
The first method, iterating over the hash, is flexible and space-efficient. You can format the output as you like it, and it requires only two scalar variables: the current key and value. You can print the hash in key order (at the cost of building a list of sorted keys) if you use a foreach loop:
foreach $k (sort keys %hash) {

    print "$k => $hash{$k}\n";

}
The map function is just as flexible. You can still process the list in any order by sorting the keys. You can customize the output to your heart's content. But it builds up a list of strings like "KEY =>VALUE\n" to pass to print.
The last two methods are interpolation tricks. By treating the hash as a list, you can't predict or control the output order of key-value pairs. Furthermore, the output will consist of a list of keys and values, each separated by whatever string that $" happens to hold. You can't put newlines between pairs or "=>" within them, as we could with the other methods.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Retrieving from a Hash in Insertion Order
Inhaltsvorschau
The keys and each functions traverse the hash elements in a strange order, and you want them in the order in which you inserted them.
Use the Tie::IxHash module.
use Tie::IxHash;

tie %HASH, "Tie::IxHash";

# manipulate %HASH

@keys = keys %HASH;         # @keys is in insertion order
Tie::IxHash makes keys, each, and values return the hash elements in the order they were added. This often removes the need to preprocess the hash keys with a complex sort comparison or maintain a distinct array containing the keys in the order they were inserted into the hash.
Tie::IxHash also provides an object-oriented interface to splice, push, pop, shift, unshift, keys, values, and delete, among others.
Here's an example, showing both keys and each:
# initialize

use Tie::IxHash;



tie %food_color, "Tie::IxHash";

$food_color{"Banana"} = "Yellow";

$food_color{"Apple"}  = "Green";

$food_color{"Lemon"}  = "Yellow";



print "In insertion order, the foods are:\n";

foreach $food (keys %food_color) {

    print "  $food\n";

}



print "Still in insertion order, the foods' colors are:\n";

while (( $food, $color ) = each %food_color ) {

    print "$food is colored $color.\n";

}



In insertion order, the foods are

               Banana

               Apple

               Lemon

               Still in insertion order, the foods' colors are

               Banana is colored Yellow

               Apple is colored Green

               Lemon is colored Yellow

            
The documentation for the CPAN module Tie::IxHash; Recipe 13.5
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Hashes with Multiple Values per Key
Inhaltsvorschau
You want to store more than one value for each key.
Store an array reference in $hash{$key}, then put the values into the referenced array.
You can store only scalar values in a hash. References, however, are scalars. This solves the problem of storing multiple values for one key by making $hash{$key} a reference to an array containing values for $key. The normal hash operations—insertion, deletion, iteration, and testing for existence—can now be written in terms of array operations like push, splice, and foreach.
This code shows simple insertion into the hash. It processes the output of who(1) on Unix machines and outputs a terse listing of users and the ttys they're logged in on:
%ttys = ( );



open(WHO, "who|")                   or die "can't open who: $!";

while (<WHO>) {

    ($user, $tty) = split;

    push( @{$ttys{$user}}, $tty );

}



foreach $user (sort keys %ttys) {

    print "$user: @{$ttys{$user}}\n";

}
The heart of the code is the push line, the multivalued version of $ttys{$user} = $tty. The first time through, that hash value is undefined, so Perl automatically allocates a new anonymous hash and stores its reference in that value so that the push can succeed. This is called autovivification, and is explained more in Chapter 11.
We interpolate all the tty names in the print line with @{$ttys{$user}}. We'd loop over the anonymous array if, for instance, we wanted to print the owner of each tty:
foreach $user (sort keys %ttys) {

    print "$user: ", scalar( @{$ttys{$user}} ), " ttys.\n";

    foreach $tty (sort @{$ttys{$user}}) {

        @stat = stat("/dev/$tty");

        $user = @stat ? ( getpwuid($stat[4]) )[0] : "(not available)";

        print "\t$tty (owned by $user)\n";

    }

}
The exists function can have two meanings: "Is there at least one value for this key?" and "Does this value exist for this key?" Implementing the second approach requires searching the array for the value. The
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Inverting a Hash
Inhaltsvorschau
Hashes map keys to values. You have a hash and a value whose corresponding key you want to find.
Use reverse to create an inverted hash whose values are the original hash's keys and vice versa.
# %LOOKUP maps keys to values

%REVERSE = reverse %LOOKUP;
This technique uses the list equivalence of hashes mentioned in the introduction. In list context, reverse treats %LOOKUP as a list and reverses the order of its elements. The significant property of a hash treated as a list is that the list elements come in associated pairs: the first element is the key; the second, the value. When you reverse such a list, the first element is now the value, and the second the key. Treating this list as a hash results in a hash whose values are the keys of the original hash and vice versa.
Here's an example:
%surname = ( "Mickey" => "Mantle", "Babe" => "Ruth" );

%first_name = reverse %surname;

print $first_name{"Mantle"}, "\n";

Mickey

            
When we treat %surname as a list, it becomes:
("Mickey", "Mantle", "Babe", "Ruth")
(or maybe ("Babe", "Ruth", "Mickey", "Mantle") because we can't predict the order). Reversing this list gives us:
("Ruth", "Babe", "Mantle", "Mickey")
When we treat this list as a hash, it becomes:
("Ruth" => "Babe", "Mantle" => "Mickey")
Now instead of turning first names into surnames, it turns surnames into first names.
Example 5-2 is a program called foodfind. If you give it a food name, it'll tell you the color of that food. If you give it a color, it'll tell you a food of that color.
Example 5-2. foodfind
  #!/usr/bin/perl -w

  # foodfind - find match for food or color

  $given = shift @ARGV or die "usage: foodfind food_or_color\n";

  %color = (

             "Apple"  => "red",

             "Banana" => "yellow",

             "Lemon"  => "yellow",

             "Carrot" => "orange"

           );

  %food = reverse %color;

  if (exists $color{$given}) {

      print "$given is a food with color $color{$given}.\n";

  }

  if (exists $food{$given}) {

      print "$food{$given} is a food with color $given.\n";

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sorting a Hash
Inhaltsvorschau
You need to work with the elements of a hash in a particular order.
Use keys to get a list of keys, then sort them based on the ordering you want:
# %hash is the hash to sort

@keys = sort { criterion( ) } (keys %hash);

foreach $key (@keys) {

    $value = $hash{$key};

    # do something with $key, $value

}
Even though you can't directly maintain a hash in a specific order (unless you use the Tie::IxHash module mentioned in Recipe 5.7), you can access its entries in any order.
This technique offers many variations on the same basic mechanism: you extract the keys, reorder them using the sort function, and then process the entries in the new order. All the sorting tricks shown in Chapter 4 can be used here. Let's look at some applications.
The following code simply uses sort to order the keys alphabetically:
foreach $food (sort keys %food_color) {

    print "$food is $food_color{$food}.\n";

}
This sorts the keys by their associated values:
foreach $food (sort { $food_color{$a} cmp $food_color{$b} }

                keys %food_color)

{

    print "$food is $food_color{$food}.\n";

}
This sorts by length of the values:
@foods = sort {

          length($food_color{$a}) <=> length($food_color{$b})

         } keys %food_color;

foreach $food (@foods) {

    print "$food is $food_color{$food}.\n";

}
The sort and keys functions in perlfunc(1) and in Chapter 29 of Programming Perl; Recipe 5.7; we discuss sorting lists in Recipe 4.16
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Merging Hashes
Inhaltsvorschau
You need to make a new hash with the entries of two existing hashes.
Treat them as lists, and join them as you would lists.
%merged = (%A, %B);
To save memory, loop over the hashes' elements and build a new hash that way:
%merged = ( );

while ( ($k,$v) = each(%A) ) {

    $merged{$k} = $v;

}

while ( ($k,$v) = each(%B) ) {

    $merged{$k} = $v;

}
The first method, like the earlier recipe on inverting a hash, uses the hash-list equivalence explained in the introduction. (%A, %B) evaluates to a list of paired keys and values. When we assign it to %merged, Perl turns that list of pairs back into a hash.
Here's an example of that technique:
# %food_color as per the introduction

%drink_color = ( Galliano  => "yellow",

                 "Mai Tai" => "blue" );



%ingested_color = (%drink_color, %food_color);
Keys in both input hashes appear just once in the output hash. If a food and a drink shared the same name, for instance, then the last one seen by the first merging technique would be the one that showed up in the resultant hash.
This style of direct assignment, as in the first example, is easier to read and write, but requires a lot of memory if the hashes are large. That's because Perl has to unroll both hashes into a temporary list before the assignment to the merged hash is done. Step-by-step merging using each, as in the second technique, spares you that cost and lets you decide what to do with duplicate keys.
The first example could be rewritten to use the each technique:
# %food_color per the introduction, then

%drink_color = ( Galliano  => "yellow",

                 "Mai Tai" => "blue" );



%substance_color = ( );

while (($k, $v) = each %food_color) {

    $substance_color{$k} = $v;

}

while (($k, $v) = each %drink_color) {

    $substance_color{$k} = $v;

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding Common or Different Keys in Two Hashes
Inhaltsvorschau
You need to find keys in one hash that are or are not present in another hash.
Use keys to loop through the keys of one hash, checking whether each key is also in the other hash.

Section 5.12.2.1: Find common keys

my @common = ( );

foreach (keys %hash1) {

        push(@common, $_) if exists $hash2{$_};

}

# @common now contains common keys

Section 5.12.2.2: Find keys from one hash that aren't in both

my @this_not_that = ( );

foreach (keys %hash1) {

        push(@this_not_that, $_) unless exists $hash2{$_};

}
Because we're finding common or different keys of the hashes, we can apply our earlier array recipes for finding common or different elements to arrays of the hashes' keys. For an explanation, see Recipe 4.9.
This code uses the difference technique to find non-citrus foods:
# %food_color per the introduction



# %citrus_color is a hash mapping citrus food name to its color.

%citrus_color = ( Lemon  => "yellow",

                  Orange => "orange",

                  Lime   => "green" );



# build up a list of non-citrus foods

@non_citrus = ( );



foreach (keys %food_color) {

    push (@non_citrus, $_) unless $citrus_color{$_};

}
The "Hashes" section of Chapter 2 of Programming Perl; the each function in perlfunc(1) and in Chapter 29 of Programming Perl
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Hashing References
Inhaltsvorschau
When you use keys on a hash whose keys are references, the references that keys returns no longer work. This situation often arises when you want to cross-reference two different hashes.
Use Tie::RefHash:
use Tie::RefHash;

tie %hash, "Tie::RefHash";

# you may now use references as the keys to %hash
Hash keys are automatically "stringified," that is, treated as though they appeared between double quotes. With numbers or strings, nothing is lost. This isn't so with references, though.
Stringified references look like these:

               Class::Somewhere=HASH(0x72048)

               ARRAY(0x72048)

            
A stringified reference can't be dereferenced, because it is just a string and no longer a reference. This means you can't use references as the keys to a hash without losing their "magic."
Hand-rolled solutions to this problem involve maintaining a distinct hash whose keys are stringified references and whose values are the actual references. This is what Tie::RefHash does. We'll use IO objects for filehandles here to show you that even such strange references can index a hash tied with Tie::RefHash.
Here's an example:
use Tie::RefHash;

use IO::File;



tie %name, "Tie::RefHash";

foreach $filename ("/etc/termcap", "/vmunix", "/bin/cat") {

    $fh = IO::File->new("< $filename") or next;

    $name{$fh} = $filename;

}

print "open files: ", join(", ", values %name), "\n";

foreach $file (keys %name) {

    seek($file, 0, 2);      # seek to the end

    printf("%s is %d bytes long.\n", $name{$file}, tell($file));

}
If you're storing objects as the keys to a hash, though, you almost always should be storing a unique attribute of the object (e.g., name or ID number) instead.
The documentation for the standard Tie::RefHash module; the "Warning" section of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Presizing a Hash
Inhaltsvorschau
You want to preallocate memory for a hash to speed up your program so Perl won't have to incrementally allocate memory each time a new entry is added to the hash. Often you know the final size of a hash before you start building it up, and it's possible to use this information to speed up your program.
Assign the number of key-value pairs your hash will have to keys %HASH.
# presize %hash to $num

keys(%hash) = $num;
This feature may or may not improve your performance. Perl already shares keys between hashes, so if you already have a hash with "Apple" as a key, Perl won't need to allocate memory for another copy of "Apple" when you add an entry whose key is "Apple" to another hash.
# will have 512 users in %users

keys(%users) = 512;
Perl's internal data structures require the number of keys to be a power of 2. If we had said:
keys(%users) = 1000;
Perl would have internally allocated 1024 "buckets" for the hash. Keys and buckets aren't always one to one. You get the best performance when they are, but the distribution of keys to buckets is dependent on your keys and Perl's (immutable) hash algorithm.
The keys function in perlfunc(1) and Chapter 29 of Programming Perl; Recipe 4.3
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding the Most Common Anything
Inhaltsvorschau
You have an aggregate data structure, such as an array or a hash. You want to know how often each element in the array (or value in the hash) occurs. For instance, if your array contains web server transactions, you might want to find the most commonly requested file. If your hash maps usernames to number of logins, you want to find the most common number of logins.
Use a hash to count how many times each element, key, or value appears:
%count = ( );

foreach $element (@ARRAY) {

    $count{$element}++;

}
Any time you want to count how often different things appear, you should probably be using a hash. The foreach adds one to $count{$element} for every occurrence of $element.
Recipe 4.7 and Recipe 4.8
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Representing Relationships Between Data
Inhaltsvorschau
You want to represent relationships between elements of data—for instance, the mother of relationship in a family tree or parent process for a process table. This is closely related to representing tables in relational databases (tables represent relationships between information) and to representing computer science graph structures (edges represent relationships between nodes).
Use a hash to represent the relationship.
Here's part of the family tree from the Bible:
%father = ( 'Cain'      => 'Adam',

            'Abel'      => 'Adam',

            'Seth'      => 'Adam',

            'Enoch'     => 'Cain',

            'Irad'      => 'Enoch',

            'Mehujael'  => 'Irad',

            'Methusael' => 'Mehujael',

            'Lamech'    => 'Methusael',

            'Jabal'     => 'Lamech',

            'Jubal'     => 'Lamech',

            'Tubalcain' => 'Lamech',

            'Enos'      => 'Seth' );
This lets us, for instance, easily trace a person's lineage:
while (<>) {

    chomp;

    do {

        print "$_ ";        # print the current name

        $_ = $father{$_};   # set $_ to $_'s father

    } while defined;        # until we run out of fathers

    print "\n";

}
We can already ask questions like "Who begat Seth?" by checking the %father hash. By inverting this hash, we invert the relationship. This lets us use Recipe 5.9 to answer questions like "Whom did Lamech beget?"
while ( ($k,$v) = each %father ) {

    push( @{ $children{$v} }, $k );

}



$" = ', ';                  # separate output with commas

while (<>) {

    chomp;

    if ($children{$_}) {

        @children = @{$children{$_}};

    } else {

        @children = "nobody";

    }

    print "$_ begat @children.\n";

}
Hashes can also represent relationships such as the C language #includes. A includes B if A contains #include
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: dutree
Inhaltsvorschau
The dutree program, shown in Example 5-3, turns the output of du:
% du pcb

19      pcb/fix

            20      pcb/rev/maybe/yes

            10      pcb/rev/maybe/not

            705     pcb/rev/maybe

            54      pcb/rev/web

            1371    pcb/rev

            3       pcb/pending/mine

            1016    pcb/pending

            2412    pcb

         
into sorted, indented output:

            2412 pcb

   |    1371 rev

   |       |    705 maybe

               |       |      |      675 

               |       |      |       20 yes

               |       |      |       10 not

               |       |    612 

               |       |     54 web

   |    1016 pending

   |       |        1013 

               |       |           3 mine

   |      19 fix

   |       6 

         
The arguments you give dutree are passed through to du. That way you could call dutree in any of these ways, or maybe more if your du supports other options:
% dutree

% dutree /usr

% dutree -a

% dutree -a /bin
The %Dirsize hash maintains the mapping of names to sizes. For example, $Dirsize{"pcb"} contains 2412 in this sample run. We'll use that hash both for output and for sorting each directory's subdirectories by size.
%Kids is more interesting. For any given path PATH, $Kids{ PATH} contains a (reference to an) array of names of subdirectories of this one. The "pcb" entry contains a reference to an anonymous array containing "fix", "rev", and "pending". The "rev" entry contains "maybe" and "web". The "maybe" entry contains "yes" and "not", which do not have their own entries because they are end nodes in the tree.
The output function is passed the start of the tree—the last line read in from the output of du. First it prints that directory and its size. Then the function sorts the directory's children (if any) so that those with the most disk usage float to the top. Finally,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 6: Pattern Matching
Inhaltsvorschau
[Art is] pattern informed by sensibility.
—Sir Herbert Read, The Meaning of Art
Most modern programming languages offer primitive pattern-matching tools, usually through an extra library. In contrast, Perl's patterns are integrated directly into the language core. Perl's pattern matching boasts features not found elsewhere, features that encourage a whole different way of looking at data. Just as chess players see patterns in the board positions that their pieces control, Perl adepts look at data in terms of patterns. These patterns, expressed in the intensely symbolic notation of regular expressions, provide access to powerful algorithms normally available only to scholars of computer science.
"If this pattern matching thing is so powerful and so fantastic," you may be asking, "why don't you have a hundred different recipes on regular expressions in this chapter?" Regular expressions are the natural solution to many problems involving numbers, strings, dates, web documents, mail addresses, and almost everything else in this book; we use pattern matching over 100 times in other chapters. This chapter mostly presents recipes in which pattern matching forms part of the questions, not just part of the answers.
Perl's extensive and integrated support for regular expressions means that you not only have features available that you won't find in any other language, but you have new ways of using them, too. Programmers new to Perl often look for functions like these:
match( $string, $pattern );

subst( $string, $pattern, $replacement );
but matching and substituting are such common tasks that they merit their own notation:
$meadow =~ m/sheep/;   # True if $meadow contains "sheep"

$meadow !~ m/sheep/;   # True if $meadow doesn't contain "sheep"

$meadow =~ s/old/new/; # Replace "old" with "new" in $meadow
Pattern matching isn't like direct string comparison, even at its simplest level; it's more like string searching with mutant wildcards on steroids. Without anchors, the position where the match occurs can float freely throughout the string. Any of the following lines would also be matched by the expression
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Most modern programming languages offer primitive pattern-matching tools, usually through an extra library. In contrast, Perl's patterns are integrated directly into the language core. Perl's pattern matching boasts features not found elsewhere, features that encourage a whole different way of looking at data. Just as chess players see patterns in the board positions that their pieces control, Perl adepts look at data in terms of patterns. These patterns, expressed in the intensely symbolic notation of regular expressions, provide access to powerful algorithms normally available only to scholars of computer science.
"If this pattern matching thing is so powerful and so fantastic," you may be asking, "why don't you have a hundred different recipes on regular expressions in this chapter?" Regular expressions are the natural solution to many problems involving numbers, strings, dates, web documents, mail addresses, and almost everything else in this book; we use pattern matching over 100 times in other chapters. This chapter mostly presents recipes in which pattern matching forms part of the questions, not just part of the answers.
Perl's extensive and integrated support for regular expressions means that you not only have features available that you won't find in any other language, but you have new ways of using them, too. Programmers new to Perl often look for functions like these:
match( $string, $pattern );

subst( $string, $pattern, $replacement );
but matching and substituting are such common tasks that they merit their own notation:
$meadow =~ m/sheep/;   # True if $meadow contains "sheep"

$meadow !~ m/sheep/;   # True if $meadow doesn't contain "sheep"

$meadow =~ s/old/new/; # Replace "old" with "new" in $meadow
Pattern matching isn't like direct string comparison, even at its simplest level; it's more like string searching with mutant wildcards on steroids. Without anchors, the position where the match occurs can float freely throughout the string. Any of the following lines would also be matched by the expression
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Copying and Substituting Simultaneously
Inhaltsvorschau
You're tired of using two separate statements with redundant information, one to copy and another to substitute.
Instead of:
$dst = $src;

$dst =~ s/this/that/;
use:
($dst = $src) =~ s/this/that/;
Sometimes you wish you could run a search and replace on a copy of a string, but you don't care to write this in two separate steps. You don't have to, because you can apply the regex operation to the result of the copy operation.
For example:
# strip to basename

($progname = $0)        =~ s!^.*/!!;



# Make All Words Title-Cased

($capword  = $word)     =~ s/(\w+)/\u\L$1/g;



# /usr/man/man3/foo.1 changes to /usr/man/cat3/foo.1

($catpage  = $manpage)  =~ s/man(?=\d)/cat/;
You can even use this technique on an entire array:
@bindirs = qw( /usr/bin /bin /usr/local/bin );

for (@libdirs = @bindirs) { s/bin/lib/ }

print "@libdirs\n";

/usr/lib /lib /usr/local/lib

            
Because of precedence, parentheses are required when combining an assignment if you wish to change the result in the leftmost variable. The result of a substitution is its success: either "" for failure, or an integer number of times the substitution was done. Contrast this with the preceding examples where the parentheses surround the assignment itself. For example:
($a =  $b) =~ s/x/y/g;      # 1: copy $b and then change $a

 $a = ($b  =~ s/x/y/g);     # 2: change $b, count goes in $a

 $a =  $b  =~ s/x/y/g;      # 3: same as 2
The "Variables" section of Chapter 2 of Programming Perl, and the "Assignment Operators" section of perlop(1) and Chapter 3 of Programming Perl
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Matching Letters
Inhaltsvorschau
You want to see whether a string contains only alphabetic characters.
The obvious character class for matching regular letters isn't good enough in the general case:
if ($var =~ /^[A-Za-z]+$/) {

    # it is purely alphabetic

}
because it doesn't pay attention to letters with diacritics or characters from other writing systems. The best solution is to use Unicode properties:
if ($var =~ /^\p{Alphabetic}+$/) {   # or just /^\pL+$/

    print "var is purely alphabetic\n";

}
On older releases of Perl that don't support Unicode, your only real option was to use either a negated character class:
if ($var =~ /^[^\W\d_]+$/) {

    print "var is purely alphabetic\n";

}
or, if supported, POSIX character classes:
if ($var =~ /^[[:alpha:]]+$/) {

    print "var is purely alphabetic\n";

}
But these don't work for non-ASCII letters unless you use locale and the system you're running on actually supports POSIX locales.
Apart from Unicode properties or POSIX character classes, Perl can't directly express "something alphabetic" independent of locale, so we have to be more clever. The \w regular expression notation matches one alphabetic, numeric, or underscore character—hereafter known as an "alphanumunder" for short. Therefore, \W is one character that is not one of those. The negated character class [^\W\d_] specifies a character that must be neither a non-alphanumunder, a digit, nor an underscore. That leaves nothing but alphabetics, which is what we were looking for.
Here's how you'd use this in a program:
use locale;

use POSIX 'locale_h';



# the following locale string might be different on your system

unless (setlocale(LC_ALL, "fr_CA.ISO8859-1")) {

    die "couldn't set locale to French Canadian\n";

}



while (<DATA>) {

    chomp;

    if (/^[^\W\d_]+$/) {

        print "$_: alphabetic\n";

    } else {

        print "$_: line noise\n";

    }

}



_ _END_ _

silly

façade

coöperate

niño

Renée

Molière

hæmoglobin

naïve

tschüß

random!stuff#here
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Matching Words
Inhaltsvorschau
You want to pick out words from a string.
Think hard about what you want a word to be and what separates one word from the next, and then write a regular expression that encodes your decisions. For example:
/\S+/               # as many non-whitespace characters as possible

/[A-Za-z'-]+/       # as many letters, apostrophes, and hyphens
Because words vary between applications, languages, and input streams, Perl does not have built-in definitions of words. You must make them from character classes and quantifiers yourself, as we did previously. The second pattern is an attempt to recognize "shepherd's" and "sheep-shearing" each as single words.
Most approaches have limitations because of the vagaries of written language. For instance, although the second pattern successfully identifies "spank'd" and "counter-clockwise" as words, it also pulls the "rd" out of "23rd Psalm". To be more precise when pulling words out from a string, specify the characters surrounding the word. Normally, this should be a word boundary, not whitespace:
/\b([A-Za-z]+)\b/            # usually best

/\s([A-Za-z]+)\s/            # fails at ends or w/ punctuation
Although Perl provides \w, which matches a character that is part of a valid Perl identifier, Perl identifiers are rarely what you think of as words, since we mean a string of alphanumerics and underscores, but not colons or quotes. Because it's defined in terms of \w, \b may surprise you if you expect to match an English word boundary (or, even worse, a Mongolian word boundary).
\b and \B can still be useful. For example, /\Bis\B/ matches the string "is" within a word only, not at the edges. And while "thistle" would be found, "vis-à-vis" wouldn't.
The treatment of \b, \w, and \s in perlre
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Commenting Regular Expressions
Inhaltsvorschau
You want to make your complex regular expressions understandable and maintainable.
You have several techniques at your disposal: electing alternate delimiters to avoid so many backslashes, placing comments outside the pattern or inside it using the /x modifier, and building up patterns piecemeal in named variables.
The piece of sample code in Example 6-1 uses the first couple techniques, and its initial comment describes the overall intent of the regular expression. For simple patterns, this may be all that is needed. More complex patterns, as in the example, require more documentation.
Example 6-1. resname
  #!/usr/bin/perl -p

  # resname - change all "foo.bar.com" style names in the input stream

  # into "foo.bar.com [204.148.40.9]" (or whatever) instead

  use Socket;                 # load inet_addr

  s{

      (                       # capture the hostname in $1

          (?:                 # these parens for grouping only

              (?! [-_]  )     # lookahead for neither underscore nor dash

              [\w-] +         # hostname component

              \.              # and the domain dot

          ) +                 # now repeat that whole thing a bunch of times

          [A-Za-z]            # next must be a letter

          [\w-] +             # now trailing domain part

      )                       # end of $1 capture

  }{                          # replace with this:

      "$1 " .                 # the original bit, plus a space

             ( ($addr = gethostbyname($1))   # if we get an addr

              ? "[" . inet_ntoa($addr) . "]" #        format it

              : "[???]"                      # else mark dubious

             )

  }gex;               # /g for global

                      # /e for execute

                      # /x for nice formatting
For aesthetics, the example uses alternate delimiters. When you split your match or substitution over multiple lines, using matching braces aids readability. A more common use of alternate delimiters is for patterns and replacements that themselves contain slashes, such as in
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding the Nth Occurrence of a Match
Inhaltsvorschau
You want to find the Nth match in a string, not just the first one. For example, you'd like to find the word preceding the third occurrence of "fish":

               One fish two fish red fish blue fish

            
Use the /g modifier in a while loop, keeping count of matches:
$WANT = 3;

$count = 0;

while (/(\w+)\s+fish\b/gi) {

    if (++$count =  = $WANT) {

        print "The third fish is a $1 one.\n";

        # Warning: don't `last' out of this loop

    }

}

The third fish is a red one.

            
Or use a repetition count and repeated pattern like this:
/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
As explained in this chapter's Introduction, using the /g modifier in scalar context creates something of a progressive match, useful in while loops. This is commonly used to count the number of times a pattern matches in a string:
# simple way with while loop

$count = 0;

while ($string =~ /PAT/g) {

    $count++;               # or whatever you'd like to do here

}



# same thing with trailing while

$count = 0;

$count++ while $string =~ /PAT/g;



# or with for loop

for ($count = 0; $string =~ /PAT/g; $count++) { }



# Similar, but this time count overlapping matches

$count++ while $string =~ /(?=PAT)/g;
To find the Nth match, it's easiest to keep your own counter. When you reach the appropriate N, do whatever you care to. A similar technique could be used to find every Nth match by checking for multiples of N using the modulus operator. For example, (++$count % 3) = = 0 would be used to find every third match.
If this is too much bother, you can always extract all matches and then hunt for the ones you'd like.
$pond  = 'One fish two fish red fish blue fish';



# using a temporary

@colors = ($pond =~ /(\w+)\s+fish\b/gi);      # get all matches

$color  = $colors[2];                         # then the one we want



# or without a temporary array

$color = ( $pond =~ /(\w+)\s+fish\b/gi )[2];  # just grab element 3



print "The third fish in the pond is $color.\n";

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Matching Within Multiple Lines
Inhaltsvorschau
You want to use regular expressions on a string containing more than one logical line, but the special characters . (any character but newline), ^ (start of string), and $ (end of string) don't seem to work for you. This might happen if you're reading in multiline records or the whole file at once.
Use /m, /s, or both as pattern modifiers. /s allows . to match a newline (normally it doesn't). If the target string has more than one line in it, /foo.*bar/s could match a "foo" on one line and a "bar" on a following line. This doesn't affect dots in character classes like [#%.], since they are literal periods anyway.
The /m modifier allows ^ and $ to match immediately before and after an embedded newline, respectively. /^=head[1-7]/m would match that pattern not just at the beginning of the record, but anywhere right after a newline as well.
A common, brute-force approach to parsing documents where newlines are not significant is to read the file one paragraph at a time (or sometimes even the entire file as one string) and then extract tokens one by one. If the pattern involves dot, such as .+ or .*?, and must match across newlines, you need to do something special to make dot match a newline; ordinarily, it does not. When you've read more than one line into a string, you'll probably prefer to have ^ and $ match beginning- and end-of-line, not just beginning- and end-of-string.
The difference between /m and /s is important: /m allows ^ and $ to match next to an embedded newline, whereas /s allows . to match newlines. You can even use them together—they're not mutually exclusive.
Example 6-2 creates a simplistic filter to strip HTML tags out of each file in @ARGV and then send those results to STDOUT. First we undefine the record separator so each read operation fetches one entire file. (There could be more than one file, because
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading Records with a Separator
Inhaltsvorschau
You want to read records separated by a pattern, but Perl doesn't allow its input record separator variable to be a regular expression.
Many problems, most obviously those involving parsing complex file formats, become simpler when you can extract records separated by different strings.
Read the whole file and use split:
undef $/;

@chunks = split(/pattern/, <FILEHANDLE>);
Perl's official record separator, the $/ variable, must be a fixed string, not a pattern. To sidestep this limitation, undefine the input record separator entirely so that the next readline operation reads the rest of the file. This is sometimes called slurp mode, because it slurps in the whole file as one big string. Then split that huge string using the record separating pattern as the first argument.
Here's an example where the input stream is a text file that includes lines consisting of ".Se", ".Ch", and ".Ss", which are special codes in the troff macro set that this book was developed under. These strings are the separators, and we want to find text that falls between them.
# .Ch, .Se and .Ss divide chunks of STDIN

{

    local $/ = undef;

    @chunks = split(/^\.(Ch|Se|Ss)$/m, <>);

}

print "I read ", scalar(@chunks), " chunks.\n";
We create a localized version of $/ so its previous value is restored once the block finishes. By using split with parentheses in the pattern, captured separators are also returned. This way data elements in the return list alternate with elements containing "Se", "Ch", or "Ss".
If you don't want separators returned, but still need parentheses, use non-capturing parentheses in the pattern: /^\.(?:Ch|Se|Ss)$/m.
To split before a pattern but include the pattern in the return, use a lookahead assertion: /^(?=\.(?:Ch|Se|Ss))/m. That way each chunk except the first starts with the pattern.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extracting a Range of Lines
Inhaltsvorschau
You want to extract all lines from a starting pattern through an ending pattern or from a starting line number up to an ending line number.
A common example of this is extracting the first 10 lines of a file (line numbers 1 to 10) or just the body of a mail message (everything past the blank line).
Use the operators .. or ... with patterns or line numbers.
The .. operator will test the right operand on the same iteration that the left operand flips the operator into the true state.
while (<>) {

    if (/BEGIN PATTERN/ .. /END PATTERN/) {

        # line falls between BEGIN and END in the

        # text, inclusive.

    }

}



while (<>) {

    if (FIRST_LINE_NUM .. LAST_LINE_NUM) {

        # operate only between first and last line, inclusive.

    }

}
But the ... operator waits until the next iteration to check the right operand.
while (<>) {

    if (/BEGIN PATTERN/ ... /END PATTERN/) {

        # line is between BEGIN and END on different lines

    }

}



while (<>) {

    if (FIRST_LINE_NUM ... LAST_LINE_NUM) {

        # operate only between first and last line, not inclusive

    }

}
The range operators, .. and ..., are probably the least understood of Perl's myriad operators. They were designed to allow easy extraction of ranges of lines without forcing the programmer to retain explicit state information. Used in scalar context, such as in the test of if and while statements, these operators return a true or false value that's partially dependent on what they last returned. The expression left_operand .. right_operand returns false until left_operand is true, but once that test has been met, it stops evaluating left_operand and keeps returning true until right_operand becomes true, after which it restarts the cycle. Put another way, the first operand turns on the construct as soon as it returns a true value, whereas the second one turns it off as soon as
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Matching Shell Globs as Regular Expressions
Inhaltsvorschau
You want to allow users to specify matches using traditional shell wildcards, not full Perl regular expressions. Wildcards are easier to type than full regular expressions for simple cases.
Use the following subroutine to convert four shell wildcard characters into their equivalent regular expression; all other characters are quoted to render them literals.
sub glob2pat {

    my $globstr = shift;

    my %patmap = (

        '*' => '.*',

        '?' => '.',

        '[' => '[',

        ']' => ']',

    );

    $globstr =~ s{(.)} { $patmap{$1} || "\Q$1" }ge;

    return '^' . $globstr . '$';

}
A Perl regex pattern is not the same as a shell wildcard pattern. The shell's *.* is not a valid regular expression. Its meaning as a pattern would be /^.*\..*$/s, which is admittedly much less fun to type.
The function given in the Solution makes these conversions for you, following the standard wildcard rules used by the glob built-in. Table 6-1Table 6-1 shows equivalent wildcard patterns in the shell and in Perl.
Table 6-2: Shell globs and equivalent Perl wildcard patterns
Shell
Perl
list.?
^list\..$
project.*
^project\..*$
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Speeding Up Interpolated Matches
Inhaltsvorschau
You want your function or program to take one or more regular expressions as arguments, but doing so seems to run slower than using literals.
To overcome this bottleneck, if you have only one pattern whose value won't change during the entire run of a program, store it in a string and use /$pattern/o:
while ($line = <>) {

    if ($line =~ /$pattern/o) {

        # do something

    }

}
However, that won't work for more than one pattern. Precompile the pattern strings using the qr// operator, then match each result against each of the targets:
@pats = map { qr/$_/ } @strings;

while ($line = <>) {

    for $pat (@pats) {

        if ($line =~ /$pat/) {

            # do something;

        }

    }

}
When Perl compiles a program, it converts patterns into an internal form. This conversion occurs at compile time for patterns without variables, but at runtime for those that do. Interpolating variables into patterns, as in /$pattern/, can slow your program down—sometimes substantially. This is particularly noticeable when $pattern changes often.
The /o modifier locks in the values from variables interpolated into the pattern. That is, variables are interpolated only once: the first time the match is run. Because Perl ignores any later changes to those variables, make sure to use it only on unchanging variables.
Using /o on patterns without interpolated variables doesn't hurt, but it also doesn't help. The /o modifier is also of no help when you have an unknown number of regular expressions and need to check one or more strings against all of these patterns, since you need to vary the patterns' contents. Nor is it of any use when the interpolated variable is a function argument, since each call to the function gives the variable a new value.
Example 6-4 is an example of the slow but straightforward technique for matching many patterns against many lines. The array
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Testing for a Valid Pattern
Inhaltsvorschau
You want to let users enter their own patterns, but an invalid one would abort your program the first time you tried to use it.
Test the pattern in an eval { } construct first, matching against some dummy string. If $@ is not set, no exception occurred, so you know the pattern successfully compiled as a valid regular expression. Here is a loop that continues prompting until the user supplies a valid pattern:
do {

    print "Pattern? ";

    chomp($pat = <>);

    eval { "" =~ /$pat/ };

    warn "INVALID PATTERN $@" if $@;

} while $@;
Here's a standalone subroutine that verifies whether a pattern is valid:
sub is_valid_pattern {

    my $pat = shift;

    eval { "" =~ /$pat/ };

    return $@ ? 0 : 1;

}
Another way to write that is like this:
sub is_valid_pattern {

    my $pat = shift;

    return eval { "" =~ /$pat/; 1 } || 0;

}
This version doesn't need to use $@, because if the pattern match executes without exception, the next statement with just a 1 is reached and returned. Otherwise it's skipped, so just a 0 is returned.
There's no limit to the number of invalid, uncompilable patterns. The user could mistakenly enter "<I\s*[^">, "*** GET RICH ***", or "+5-i". If you blindly use the proffered pattern in your program, it raises an exception, normally a fatal event.
The tiny program in Example 6-6 demonstrates this.
Example 6-6. paragrep
  #!/usr/bin/perl

  # paragrep - trivial paragraph grepper

  die "usage: $0 pat [files]\n" unless @ARGV;

  $/ = '';

  $pat = shift;

  eval { "" =~ /$pat/; 1 }      or die "$0: Bad pattern $pat: $@\n";

  while (<>) {

      print "$ARGV $.: $_" if /$pat/o;

  }
That /o means to interpolate variables once only, even if their contents later change.
You could encapsulate this in a function call that returns 1 if the block completes and 0 if not, as shown in the Solution. The simpler
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Honoring Locale Settings in Regular Expressions
Inhaltsvorschau
You want to translate case when in a different locale, or you want to make \w match letters with diacritics, such as José or déjà vu.
For example, let's say you're given half a gigabyte of text written in German and told to index it. You want to extract words (with \w+) and convert them to lowercase (with lc or \L), but the normal versions of \w and lc neither match the German words nor change the case of accented letters.
Perl's regular-expression and text-manipulation routines have hooks to the POSIX locale setting. Under the use locale pragma, accented characters are taken care of—assuming a reasonable LC_CTYPE specification and system support for the same.
use locale;
By default, \w+ and case-mapping functions operate on upper- and lowercase letters, digits, and underscores. This works only for the simplest of English words, failing even on many common imports. The use locale directive redefines what a "word character" means.
In Example 6-7 you see the difference in output between having selected the English ("en") locale and the German ("de") one.
Example 6-7. localeg
  #!/usr/bin/perl -w

  # localeg - demonstrate locale effects

  use locale;

  use POSIX 'locale_h';

  $name = "andreas k\xF6nig";

  @locale{qw(German English)} = qw(de_DE.ISO_8859-1 us-ascii);

  setlocale(LC_CTYPE, $locale{English})

    or die "Invalid locale $locale{English}";

  @english_names = ( );

  while ($name =~ /\b(\w+)\b/g) {

          push(@english_names, ucfirst($1));

  }

  setlocale(LC_CTYPE, $locale{German})

    or die "Invalid locale $locale{German}";

  @german_names = ( );

  while ($name =~ /\b(\w+)\b/g) {

          push(@german_names, ucfirst($1));

  }

  print "English names: @english_names\n";

  print "German names:  @german_names\n";

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Approximate Matching
Inhaltsvorschau
You want to match fuzzily, that is, allowing for a margin of error, where the string doesn't quite match the pattern. Whenever you want to be forgiving of misspellings in user input, you want fuzzy matching.
Use the String::Approx module, available from CPAN:
use String::Approx qw(amatch);



if (amatch("PATTERN", @list)) {

    # matched

}



@matches = amatch("PATTERN", @list);
String::Approx calculates the difference between the pattern and each string in the list. If less than a certain number—by default, 10 percent of the pattern length—of one-character insertions, deletions, or substitutions are required to make the string fit the pattern, it still matches. In scalar context, amatch returns the number of successful matches. In list context, it returns the strings matched.
use String::Approx qw(amatch);

open(DICT, "/usr/dict/words")               or die "Can't open dict: $!";

while(<DICT>) {

    print if amatch("balast");

}



ballast

               balustrade

               blast

               blastula

               sandblast

            
Options passed to amatch control case-sensitivity and the permitted number of insertions, deletions, or substitutions. These are fully described in the String::Approx documentation.
The module's matching function seems to run between 10 and 40 times slower than Perl's built-in pattern matching. So use String::Approx only if you're after a fuzziness in your matching that Perl's patterns can't provide.
The documentation for the CPAN module String::Approx; Recipe 1.22
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Matching from Where the Last Pattern Left Off
Inhaltsvorschau
You want to match again in the same string, starting from where the last match left off. This is a useful approach to take when repeatedly extracting data in chunks from a string.
Use a combination of the /g and /c match modifiers, the \G pattern anchor, and the pos function.
The /g modifier on a pattern match makes the matching engine keep track of the position in the string where it finished matching. If the next match also uses /g on that string, the engine starts looking for a match from this remembered position. This lets you, for example, use a while loop to progressively extract repeated occurrences of a match. Here we find all non-negative integers:
while (/(\d+)/g) {

    print "Found number $1\n";

}
Within a pattern, \G means the end of the previous match. For example, if you had a number stored in a string with leading blanks, you could change each leading blank into the digit zero this way:
$n = "   49 here";

$n =~ s/\G /0/g;

print $n;

00049 here

            
You can also make good use of \G in a while loop. Here we use \G to parse a comma-separated list of numbers (e.g., "3,4,5,9,120"):
while (/\G,?(\d+)/g) {

    print "Found number $1\n";

}
By default, when your match fails (when we run out of numbers in the examples, for instance) the remembered position is reset to the start. If you don't want this to happen, perhaps because you want to continue matching from that position but with a different pattern, use the modifier /c with /g:
$_ = "The year 1752 lost 10 days on the 3rd of September";



while (/(\d+)/gc) {

    print "Found number $1\n";

}

# the /c above left pos at end of final match



if (/\G(\S+)/g) {

    print "Found $1 right after the last number.\n";

}



Found number 1752

               
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Greedy and Non-Greedy Matches
Inhaltsvorschau
You have a pattern with a greedy quantifier like *, +, ?, or { }, and you want to stop it from being greedy.
A classic example is the naïve substitution to remove tags from HTML. Although it looks appealing, s#<TT>.*</TT>##gsi deletes everything from the first open TT tag through the last closing one. This would turn "Even <TT>vi</TT> can edit <TT>troff</TT> effectively." into "Even effectively", completely changing the meaning of the sentence!
Replace the offending greedy quantifier with the corresponding non-greedy version. That is, change *, +, ?, and { } into *?, +?, ??, and { }?, respectively.
Perl has two sets of quantifiers: the maximal ones—*, +, ?, and { }—and the minimal ones—*?, +?, ??, and { }?. Less formally, these two sorts of quantifiers are often referred to as greedy and non-greedy (or sometimes lazy), respectively. For instance, given the string "Perl is a Swiss Army Chainsaw!", the pattern /(r.*s)/ matches "rl is a Swiss Army Chains", whereas /(r.*?s)/ matches "rl is".
With maximal quantifiers, when you ask to match a variable number of times, such as zero or more times for * or one or more times for +, the matching engine prefers the "or more" portion of that description. Thus /foo.*bar/ matches the first "foo" through the last "bar" in the string, rather than only through the next "bar" as some might expect. That's because the greedy .* first expands to the rest of the string, but since that wouldn't leave any characters for "bar" to match, the engine backs up one character at a time until it finds "bar".
To make any repetition operator match minimally instead of maximally, add an extra
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Detecting Doubled Words
Inhaltsvorschau
You want to check for doubled words in a document.
Use backreferences in your pattern.
Parentheses in a pattern make the matching engine remember what text that portion of the pattern matched. Later in the pattern, refer to the actual string that matched with \1 (indicating the string matched by the first set of parentheses), \2 (for the string matched by the second set of parentheses), and so on. Don't use $1 within a regex, because it would be a variable interpolated before the match began. The pattern /([A-Z])\1/ matches a capital letter followed not just by any capital letter, but by whichever one was just matched (i.e., captured by the first set of parentheses in that pattern).
The next sample code reads its input files by paragraph, with the definition of paragraph following Perl's notion of a paragraph—a chunk of text terminated by two or more contiguous newlines. Within each paragraph, the code finds all doubled words. It ignores case and can match across newlines.
Here we use /x to embed whitespace and comments to improve readability. The /i permits both instances of "is" in the sentence "Is is this ok?" to match, even though they differ in case. We use /g in a while loop to keep finding doubled words until we run out of text.
$/ = '';                      # paragrep mode

while (<>) {

    while ( m{

                \b            # start at a word boundary (begin letters)

                (\S+)         # find chunk of non-whitespace

                \b            # until another word boundary (end letters)

                (

                    \s+       # separated by some whitespace

                    \1        # and that very same chunk again

                    \b        # until another word boundary

                ) +           # one or more sets of those

             }xig

         )

    {

        print "dup word '$1' at paragraph $.\n";

    }

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Matching Nested Patterns
Inhaltsvorschau
You want to match a nested set of enclosing delimiters, such as the arguments to a function call.
Use match-time pattern interpolation, recursively:
my $np;

$np = qr{

           \(

           (?:

              (?> [^( )]+ )    # Non-capture group w/o backtracking

            |

              (??{ $np })     # Group with matching parens

           )*

           \)

        }x;
Or use the Text::Balanced module's extract_bracketed function.
The $(??{ CODE }) construct runs the code and interpolates the string that the code returns right back into the pattern. A simple, non-recursive example that matches palindromes demonstrates this:
if ($word =~ /^(\w+)\w?(??{reverse $1})$/ ) {

    print "$word is a palindrome.\n";

}
Consider a word like "reviver", which this pattern correctly reports as a palindrome. The $1 variable contains "rev" partway through the match. The optional word character following catches the "i". Then the code reverse $1 runs and produces "ver", and that result is interpolated into the pattern.
For matching something balanced, you need to recurse, which is a bit tricker. A compiled pattern that uses (??{ CODE }) can refer to itself. The pattern given in the Solution matches a set of nested parentheses, however deep they may go. Given the value of $np in that pattern, you could use it like this to match a function call:
$text = "myfunfun(1,(2*(3+4)),5)";

$funpat = qr/\w+$np/;   # $np as above

$text =~ /^$funpat$/;   # Matches!
You'll find many CPAN modules that help with matching (parsing) nested strings. The Regexp::Common module supplies canned patterns that match many of the tricker strings. For example:
use Regexp::Common;

$text = "myfunfun(1,(2*(3+4)),5)";

if ($text =~ /(\w+\s*$RE{balanced}{-parens=>'( )'})/o) {

  print "Got function call: $1\n";

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Expressing AND, OR, and NOT in a Single Pattern
Inhaltsvorschau
You have an existing program that accepts a pattern as an argument or as input. It doesn't allow you to add extra logic, like case-insensitive options, ANDs, or NOTs. So you need to write a single pattern that matches either of two different patterns (the "or" case) or both of two patterns (the "and" case), or that reverses the sense of the match ("not").
This situation arises often in configuration files, web forms, or command-line arguments. Imagine there's a program that does this:
chomp($pattern = <CONFIG_FH>);

if ( $data =~ /$pattern/ ) { ..... }
As the maintainer of CONFIG_FH, you need to convey Booleans through to the program using one configuration parameter.
True if either /ALPHA/ or /BETA/ matches, like /ALPHA/ || /BETA/:
/ALPHA|BETA/

/(?:ALPHA)|(?:BETA)/  # works no matter what in both
True if both /ALPHA/ and /BETA/ match, but may overlap, meaning "BETALPHA" should be okay, like /ALPHA/ && /BETA/:
/^(?=.*ALPHA)BETA/s
True if both /ALPHA/ and /BETA/ match, but may not overlap, meaning that "BETALPHA" should fail:
/ALPHA.*BETA|BETA.*ALPHA/s
True if pattern /PAT/ does not match, like $var !~ /PAT/:
/^(?:(?!PAT).)*$/s
True if pattern BAD does not match, but pattern GOOD does:
/(?=^(?:(?!BAD).)*$)GOOD/s
(You can't actually count on being able to place the /s modifier there after the trailing slash, but we'll show how to include it in the pattern itself at the end of the Discussion.)
When in a normal program you want to know whether something doesn't match, use one of:
    if (!($string =~ /pattern/)) { something( ) }   # ugly

    if (  $string !~ /pattern/)  { something( ) }   # preferred

unless (  $string =~ /pattern/)  { something( ) }   # sometimes clearer
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Matching a Valid Mail Address
Inhaltsvorschau
You want to find a pattern to verify the validity of a supplied mail address.
Because you cannot do real-time validation of deliverable mail addresses, no single, succinct pattern will solve this problem. You must pick from several available compromise approaches.
Our best advice for verifying a person's mail address is to have them enter their address twice, just as you would when changing a password. This usually weeds out typos. If both entries match, send mail to that address with a personal message such as:
Dear someuser@host.com,



Please confirm the mail address you gave us on Sun Jun 29

10:29:01 MDT 2003 by replying to this message.  Include the

string "Rumpelstiltskin" in that reply, but spelled in reverse;

that is, start with "Nik...".  Once this is done, your confirmed

address will be entered into our records.
If you get back a message where they've followed your directions, you can be reasonably assured that it's real.
A related strategy that's less open to forgery is to give them a personal identification number (PIN). Record the address and PIN (preferably a random one) for later processing. In the mail you send, ask them to include the PIN in their reply. In case your email bounces, or the message is included via a vacation script, ask them to mail back the PIN slightly altered, such as with the characters reversed, one added or subtracted to each digit, etc.
Most common patterns used for address verification or validation fail in various and sometimes subtle ways. For example, the address this&that@somewhere.com is valid and quite possibly deliverable, but most patterns that allegedly match valid mail addresses fail to let that one pass.
1 while $addr =~ s/\([^( )]*\)//g;
You could use the 6598-byte pattern given on the last page of the first edition of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Matching Abbreviations
Inhaltsvorschau
Suppose you had a list of commands, such as "send", "abort", "list", and "edit". The user types one in, but you don't want to make them type out the whole thing.
Use the following technique if all strings start with different characters, or to arrange matches so one takes precedence over another, as "SEND" has precedence over "STOP" here:
chomp($answer = <>);

if    ("SEND"  =~ /^\Q$answer/i) { print "Action is send\n"  }

elsif ("STOP"  =~ /^\Q$answer/i) { print "Action is stop\n"  }

elsif ("ABORT" =~ /^\Q$answer/i) { print "Action is abort\n" }

elsif ("LIST"  =~ /^\Q$answer/i) { print "Action is list\n"  }

elsif ("EDIT"  =~ /^\Q$answer/i) { print "Action is edit\n"  }
Or use the Text::Abbrev module:
use Text::Abbrev;

$href = abbrev qw(send abort list edit);

for (print "Action: "; <>; print "Action: ") {

    chomp;

    my $action = $href->{ lc($_) };

    print "Action is $action\n";

}
The first technique exchanges the typical operand order of a match. Normally you have a variable on the left side of the match and a known pattern on the right side. We might try to decide which action the user wanted us to take by saying $answer =~ /^ABORT/i, which is true if $answer begins with the string "ABORT". It matches regardless of whether $answer has anything after "ABORT", so "ABORT LATER" would still match. Handling abbreviations generally requires quite a bit of ugliness: $answer =~ /^A(B(O(R(T)?)?)?)?$/i.
Compare the classic variable =~ /pattern/ with "ABORT" =~ /^\Q$answer/i. The \Q escapes characters that would otherwise be treated specially: that way your program won't blow up if the user enters an invalid pattern. When the user enters something like "ab", the expanded match becomes "ABORT" =~
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: urlify
Inhaltsvorschau
This program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries to avoid including end-of-sentence punctuation in the marked-up URL.
It is a typical Perl filter, so it can be fed input from a pipe:
% gunzip -c ~/mail/archive.gz | urlify > archive.urlified
or by supplying files on the command line:
% urlify ~/mail/*.inbox > ~/allmail.urlified
The program is shown in Example 6-10.
Example 6-10. urlify
  #!/usr/bin/perl

  # urlify - wrap HTML links around URL-like constructs

  $protos = '(http|telnet|gopher|file|wais|ftp)';

  $ltrs   = '\w';

  $gunk   = ';/#~:.?+=&%@!\-';

  $punc   = '.:?\-';

  $any    = "${ltrs}${gunk}${punc}";

  while (<>) {

      s{

        \b                    # start at word boundary

        (                     # begin $1  {

         $protos   :          # need resource and a colon

         [$any] +?            # followed by on or more

                              #  of any valid character, but

                              #  be conservative and take only

                              #  what you need to....

        )                     # end   $1  }

        (?=                   # look-ahead non-consumptive assertion

         [$punc]*             # either 0 or more punctuation

         [^$any]              #   followed by a non-url char

         |                    # or else

         $                    #   then end of the string

      )

     }{<A HREF="$1">$1</A>}igox;

    print;

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: tcgrep
Inhaltsvorschau
This program is a Perl rewrite of the Unix grep program. Although it runs slower than C versions (especially the GNU greps), it offers many more features.
The first and perhaps most important feature is that it runs anywhere Perl does. Other enhancements are that it can ignore anything that's not a plain text file, automatically expand compressed or gzipped files, recurse down directories, search complete paragraphs or user-defined records, look in younger files before older ones, and add underlining or highlighting of matches. It also supports the -c option to indicate a count of matching records, as well as -C for a count of matching patterns when there could be more than one per record.
This program uses gzcat or zcat to decompress compressed files, so this feature is unavailable on systems without these programs and systems that can't run external programs (such as old Macs).
Run the program with no arguments for a usage message (see the usage subroutine in the following code). The following example recursively and case-insensitively greps every file in ~/mail for mail messages from someone called "kate", reporting filenames that contained matches:
% tcgrep -ril '^From: .*kate' ~/mail
The program is shown in Example 6-11.
Example 6-11. tcgrep
  #!/usr/bin/perl -w

  # tcgrep: tom christiansen's rewrite of grep

  # v1.0: Thu Sep 30 16:24:43 MDT 1993

  # v1.1: Fri Oct  1 08:33:43 MDT 1993

  # v1.2: Fri Jul 26 13:37:02 CDT 1996

  # v1.3: Sat Aug 30 14:21:47 CDT 1997

  # v1.4: Mon May 18 16:17:48 EDT 1998

  use strict;

                                    # globals

  our ($Me, $Errors, $Grand_Total, $Mult, %Compress, $Matches);

  my ($matcher, $opt);              # matcher - anon. sub to check for matches

                                    # opt - ref to hash w/ command-line options

  init( );                           # initialize globals

  ($opt, $matcher) = parse_args( );  # get command line options and patterns

  matchfile($opt, $matcher, @ARGV); # process files

  exit(2) if $Errors;

  exit(0) if $Grand_Total;

  exit(1);

  ###################################

  sub init {

      ($Me = $0) =~ s!.*/!!;        # get basename of program, "tcgrep"

      $Errors = $Grand_Total = 0;   # initialize global counters

      $Mult = "";                   # flag for multiple files in @ARGV

      $| = 1;                       # autoflush output

      %Compress = (                 # file extensions and program names

          z  => 'gzcat',            # for uncompressing

          gz => 'gzcat',

          Z  => 'zcat',

      );

  }

  ###################################

  sub usage {

          die << EOF

  usage: $Me [flags] [files]

  Standard grep options:

          i   case insensitive

          n   number lines

          c   give count of lines matching

          C   ditto, but >1 match per line possible

          w   word boundaries only

          s   silent mode

          x   exact matches only

          v   invert search sense (lines that DON'T match)

          h   hide filenames

          e   expression (for exprs beginning with -)

          f   file with expressions

          l   list filenames matching

  Specials:

          1   1 match per file

          H   highlight matches

          u   underline matches

          r   recursive on directories or dot if none

          t   process directories in 'ls -t' order

          p   paragraph mode (default: line mode)

          P   ditto, but specify separator, e.g. -P '%%\\n'

          a   all files, not just plain text files

          q   quiet about failed file and dir opens

          T   trace files as opened

  May use a TCGREP environment variable to set default options.

  EOF

  }

  ###################################

  sub parse_args {

      use Getopt::Std;

      my ($optstring, $zeros, $nulls, %opt, $pattern, @patterns, $match_code);

      my ($SO, $SE);

      if (my $opts = $ENV{TCGREP}) {   # get envariable TCGREP

          $opts =~ s/^([^\-])/-$1/;    # add leading - if missing

          unshift(@ARGV, $opts);       # add TCGREP opt string to @ARGV

      }

      $optstring = "incCwsxvhe:f:l1HurtpP:aqT";

      $zeros = 'inCwxvhelut';       # options to init to 0 

      $nulls = 'pP';                # options to init to "" 

      @opt{ split //, $zeros } = ( 0 )  x length($zeros);

      @opt{ split //, $nulls } = ( '' ) x length($nulls);

      getopts($optstring, \%opt)              or usage( );

      # handle option "-f patfile", for list of patterns

      if ($opt{f}) {                

          open(PATFILE, $opt{f})          or die "$Me: Can't open '$opt{f}': $!";

          # make sure each pattern in file is valid

          while ($pattern = <PATFILE>) {

              chomp $pattern;

              eval { 'foo' =~ /$pattern/, 1 } or

                  die "$Me: $opt{f}:$.: bad pattern: $@";

              push @patterns, $pattern;

          }

          close PATFILE;

      }

      else {                        # make sure pattern is valid

          $pattern = $opt{e} || shift(@ARGV) || usage( );

          eval { 'foo' =~ /$pattern/; 1 } or

              die "$Me: bad pattern: $@";

          @patterns = ($pattern);

      }

      # option -H is for highlight, option -u is for underline

      if ($opt{H} || $opt{u}) {     

          my $term = $ENV{TERM} || 'vt100';

          my $terminal;

          # eval{  } only to trap potential exceptions in function calls

          eval {                    # try to look up escapes for stand-out

              require POSIX;        # or underline via Term::Cap

              use Term::Cap;

              my $termios = POSIX::Termios->new( );

              $termios->getattr;

              my $ospeed = $termios->getospeed;

              $terminal = Tgetent Term::Cap { TERM=>undef, OSPEED=>$ospeed }

          };

          unless ($@) {             # if successful, get escapes for either

              local $^W = 0;        # stand-out (-H) or underlined (-u)

              ($SO, $SE) = $opt{H}

                  ? ($terminal->Tputs('so'), $terminal->Tputs('se'))

                  : ($terminal->Tputs('us'), $terminal->Tputs('ue'));

          }

          else {                    # if use of Term::Cap fails,

              ($SO, $SE) = $opt{H}  # use tput command to get escapes

                  ? (`tput -T $term smso`, `tput -T $term rmso`)

                  : (`tput -T $term smul`, `tput -T $term rmul`)

          }

      }

      # option -i makes all pattern case insensitive

      if ($opt{i}) {

          @patterns = map {"(?i)$_"} @patterns;

      }

      # option -p or -P is paragraph mode, so add /m

      if ($opt{p} || $opt{P}) {

          @patterns = map {"(?m)$_"} @patterns;

      }

      # option -p is standard paragraph mode

      $opt{p}   && ($/ = '');

      # option -p is user-defined paragraph mode

      $opt{P}   && ($/ = eval(qq("$opt{P}")));     # for -P '%%\n'

      # option -w is at word boundary only (XXX: is this always optimal?)

      $opt{w}   && (@patterns = map {'\b' . $_ . '\b'} @patterns);

      # option -x is for whole lines only

      $opt{'x'} && (@patterns = map {"^$_\$"} @patterns);

      # determine whether to emit file name in front of each match

      if (@ARGV) {

          $Mult = 1 if ($opt{r} || (@ARGV > 1) || -d $ARGV[0]) && !$opt{h};

      }

      # if just listing filenames, stop after first match

      $opt{1}   += $opt{l};                   # that's a one and an ell

      # this way only need look for -H

      $opt{H}   += $opt{u};

      # if we're doing a complete count, where doing some counting

      $opt{c}   += $opt{C};

      # if we're counting, keep track of status

      $opt{'s'} += $opt{c};

      # stop at first match if checking status but not counting

      $opt{1}   += $opt{'s'} && !$opt{c};     # that's a one

      # default args are cwd if recursive, stdin otherwise

      @ARGV = ($opt{r} ? '.' : '-') unless @ARGV;

      # we're recursive even w/o -r if all args are directories

      $opt{r} = 1 if !$opt{r} && grep(-d, @ARGV) =  = @ARGV;

      ######

      # now the hard part: build of the matching function as text to eval

      #

      $match_code  = '';

      $match_code .= 'study;' if @patterns > 5; # might speed things up a bit

      foreach (@patterns) { s(/)(\\/)g }

      # add the stand-out and end-stand-out sequences for highlight mode

      if ($opt{H}) {

          foreach $pattern (@patterns) {

              $match_code .= "\$Matches += s/($pattern)/${SO}\$1${SE}/g;";

          }

      }

      # option -v means to count a line if it *doesn't* match

      elsif ($opt{v}) {

          foreach $pattern (@patterns) {

              $match_code .= "\$Matches += !/$pattern/;";

          }

      }

      # do full count, multiple hits per line

      elsif ($opt{C}) {

          foreach $pattern (@patterns) {

              $match_code .= "\$Matches++ while /$pattern/g;";

          }

      }

      else {

          foreach $pattern (@patterns) {

              $match_code .= "\$Matches++ if /$pattern/;";

          }

      }

      # now compile as a closure, and grab function pointer

      $matcher = eval "sub { $match_code }";

      die if $@;

      return (\%opt, $matcher);

  }

  ###################################

  sub matchfile {

      $opt = shift;                 # reference to option hash

      $matcher = shift;             # reference to matching sub

      my ($file, @list, $total, $name);

      local($_);

      $total = 0;

  FILE: while (defined ($file = shift(@_))) {

          if (-d $file) {

              if (-l $file && @ARGV != 1) {

                  warn "$Me: \"$file\" is a symlink to a directory\n"

                      if $opt->{T};

                  next FILE;

              }

              if (!$opt->{r}) {

                  warn "$Me: \"$file\" is a directory, but no -r given\n"

                      if $opt->{T};

                  next FILE;

              }

              unless (opendir(DIR, $file)) {

                  unless ($opt->{'q'}) {

                      warn "$Me: can't opendir $file: $!\n";

                      $Errors++;

                  }

                  next FILE;

              }

              @list = ( );

              for (readdir(DIR)) {                # skip cwd and parent dir

                  push(@list, "$file/$_") unless /^\.{1,2}$/;

              }

              closedir(DIR);

              # option -t is sort by age, youngest first

              # use algorithm from Recipe 4.XXX, Sorting a List by Computable Field 

              if ($opt->{t}) {

                  @list = map  { $_->[0] }

                          sort { $a->[1] <=> $b->[1] } 

                          map  { [ $_, -M $_ ] } @list;

              }

              else {

                  @list = sort @list;

              }

              matchfile($opt, $matcher, @list);    # process files

              next FILE;

          }

          # avoid annoying situation of grep wanting to read from keyboard

          # but user not realizing this 

          if ($file eq '-') {

              warn "$Me: reading from stdin\n" if -t STDIN && !$opt->{'q'};

              $name = '<STDIN>';

          }

          else {

              $name = $file;

              unless (-e $file) {

                  warn qq($Me: file "$file" does not exist\n) unless $opt->{'q'};

                  $Errors++;

                  next FILE;

              }

              unless (-f $file || $opt->{a}) {

                  warn qq($Me: skipping non-plain file "$file"\n) if $opt->{T};

                  next FILE;

              }

              # could use File::Spec instead

              my ($ext) = $file =~ /\.([^.]+)$/;

              # check whether it's an extension whose contents we know 

              # how to convert to plain text via a filter program

              if (defined($ext) && exists($Compress{$ext})) {

                  $file = "$Compress{$ext} < $file |";

              }

              elsif (! (-T $file  || $opt->{a})) {

                  warn qq($Me: skipping binary file "$file"\n) if $opt->{T};

                  next FILE;

              }

          }

          warn "$Me: checking $file\n" if $opt->{T};

          unless (open(FILE, $file)) {

              unless ($opt->{'q'}) {

                  warn "$Me: $file: $!\n";

                  $Errors++;

              }

              next FILE;

          }

          $total = 0;

          $Matches = 0;

  LINE:  while (<FILE>) {

              $Matches = 0;

              ##############

              &{$matcher}( );        # do it! (check for matches)

              ##############

              next LINE unless $Matches;

              $total += $Matches;

              if ($opt->{p} || $opt->{P}) {

                s/\n{2,}$/\n/ if $opt->{p};

                chomp         if $opt->{P};

              }

              print("$name\n"), next FILE if $opt->{l};

              # The following commented out block is the 

              # expanded/legible version of the statement

              # that immediately follows it.  This is one 

              # of the few times we sacrifice readability

              # for execution speed: we carefully arrange 

              # that print( ) be called just once, not four times,

              # and we don't resort to a braced block either.

              # (note that $Mult must be "" not 0 for this to work)

              ########

              ## unless ($opt->{'s'}) {

              ##    print "$name:"                 if $Mult;

              ##    print "$.:"                    if $opt{n};

              ##    print;

              ##    print (('-' x 20) . "\n")         if $opt->{p} || $opt->{P};

              ## } 

              ########

              $opt->{'s'} || print $Mult && "$name:",

                  $opt->{n} ? "$.:" : "",

                  $_,

                  ($opt->{p} || $opt->{P}) && ('-' x 20) . "\n";

              next FILE if $opt->{1};                 # that's a one

          }

      }

      continue {

          # again, next block equivalent to line following

          #######

          ## if ($opt->{c}) {

          ##         print $name if $Mult;

          ##         print "$total\n";

          ## } 

          #######

          print $Mult && "$name:", "$total\n" if $opt->{c};

      }

      $Grand_Total += $total;

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Regular Expression Grab Bag
Inhaltsvorschau
We have found these regular expressions useful or interesting:
Swap first two words
s/(\S+)(\s+)(\S+)/$3$2$1/
Keyword = Value
m/^(\w+)\s*=\s*(.*?)\s*$/             # keyword is $1, value is $2
Line of at least 80 characters
m/.{80,}/

length( ) >= 80        # ok, not a regex
MM/DD/YY HH:MM:SS
m|(\d+)/(\d+)/(\d+) (\d+):(\d+):(\d+)|
Changing directories
s(/usr/bin)(/usr/local/bin)g
Expanding %7E (hex) escapes
s/%([0-9A-Fa-f][0-9A-Fa-f])/chr(hex($1))/ge
Deleting C comments (imperfectly)
s{

    /*                    # Match the opening delimiter

    .*?                   # Match a minimal number of characters

    */                    # Match the closing delimiter

}{  }gsx;
Removing leading and trailing whitespace
s/^\s+//;

s/\s+$//;
Turning \ followed by n into a real newline
s/\\n/\n/g;
Removing package portion of fully qualified symbols
s/^.*:://
Dotted quads (most IP addresses)
# XXX: fails on legal IPs 127.1 and 2130706433.

m{

        ^  ( \d | [01]?\d\d | 2[0-4]\d | 25[0-5] )

       \.  ( \d | [01]?\d\d | 2[0-4]\d | 25[0-5] )

       \.  ( \d | [01]?\d\d | 2[0-4]\d | 25[0-5] )

       \.  ( \d | [01]?\d\d | 2[0-4]\d | 25[0-5] )

        $

    }x
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 7: File Access
Inhaltsvorschau
I the heir of all ages, in the foremost files of time.
—Alfred, Lord Tennyson, Locksley Hall
Nothing is more central to data processing than the file. As with everything else in Perl, easy things are easy and hard things are possible. Common tasks (opening files, reading data, writing data) use simple I/O functions and operators, whereas fancier functions do hard things like non-blocking I/O and file locking.
This chapter deals with the mechanics of file access: opening a file, telling subroutines which files to work with, locking files, and so on. Chapter 8 deals with techniques for working with the contents of a file: reading, writing, shuffling lines, and other operations you can do once you have access to the file.
Here's Perl code for printing all lines from the file /usr/local/widgets/data that contain the word "blue":
open(INPUT, "<", "/acme/widgets/data")

    or die "Couldn't open /acme/widgets/data for reading: $!\n";

while (<INPUT>) {

    print if /blue/;

}

close(INPUT);
Central to file access in Perl is the filehandle, like INPUT in the previous code example. Filehandles are symbols inside your Perl program that you associate with an external file, usually using the open function. Whenever your program performs an input or output operation, it provides that operation with an internal filehandle, not an external filename. It's the job of open to make that association, and of close to break it. Actually, any of several functions can be used to open files, and handles can refer to entities beyond mere files on disk; see Recipe 7.1 for details.
While users think of open files in terms of those files' names, Perl programs do so using their filehandles. But as far as the operating system itself is concerned, an open file is nothing more than a file descriptor, which is a small, non-negative integer. The fileno function divulges the system file descriptor of its filehandle argument. Filehandles are enough for most file operations, but for when they aren't, Recipe 7.9 turns a system file descriptor into a filehandle you can use from Perl.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Nothing is more central to data processing than the file. As with everything else in Perl, easy things are easy and hard things are possible. Common tasks (opening files, reading data, writing data) use simple I/O functions and operators, whereas fancier functions do hard things like non-blocking I/O and file locking.
This chapter deals with the mechanics of file access: opening a file, telling subroutines which files to work with, locking files, and so on. Chapter 8 deals with techniques for working with the contents of a file: reading, writing, shuffling lines, and other operations you can do once you have access to the file.
Here's Perl code for printing all lines from the file /usr/local/widgets/data that contain the word "blue":
open(INPUT, "<", "/acme/widgets/data")

    or die "Couldn't open /acme/widgets/data for reading: $!\n";

while (<INPUT>) {

    print if /blue/;

}

close(INPUT);
Central to file access in Perl is the filehandle, like INPUT in the previous code example. Filehandles are symbols inside your Perl program that you associate with an external file, usually using the open function. Whenever your program performs an input or output operation, it provides that operation with an internal filehandle, not an external filename. It's the job of open to make that association, and of close to break it. Actually, any of several functions can be used to open files, and handles can refer to entities beyond mere files on disk; see Recipe 7.1 for details.
While users think of open files in terms of those files' names, Perl programs do so using their filehandles. But as far as the operating system itself is concerned, an open file is nothing more than a file descriptor, which is a small, non-negative integer. The fileno function divulges the system file descriptor of its filehandle argument. Filehandles are enough for most file operations, but for when they aren't, Recipe 7.9 turns a system file descriptor into a filehandle you can use from Perl.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Opening a File
Inhaltsvorschau
You want to read or write a file from Perl.
Use open with two arguments for convenience or with three arguments for precision. Use sysopen for access to low-level features.
The open function takes arguments specifying the internal filehandle to open, the external filename, and some indication of how to open it (the access mode). Called with only two arguments, the second comprises both path and mode:
open(SOURCE, "< $path")

    or die "Couldn't open $path for reading: $!\n";



open(SINK, "> $path")

    or die "Couldn't open $path for writing: $!\n";
When called with three (or more) arguments, the mode is split out from the path, lest there be any ambiguity between one and the other:
open(SOURCE, "<", $path)

    or die "Couldn't open $path for reading: $!\n";



open(SINK, ">", $path)

    or die "Couldn't open $path for writing: $!\n";
The sysopen function takes either three or four arguments: filehandle, filename, file-access flags, plus an optional permissions value. The flags argument is a number constructed from constants provided by the Fcntl module:
use Fcntl;



sysopen(SOURCE, $path, O_RDONLY)

    or die "Couldn't open $path for reading: $!\n";



sysopen(SINK, $path, O_WRONLY, 0600)

    or die "Couldn't open $path for writing: $!\n";
If you pass open or sysopen a scalar variable that's undefined, Perl fills in that variable with a new, anonymous filehandle.
open(my $fh, "<", $path)

    or die "Couldn't open $path for reading: $!\n";
All input and output goes through filehandles, regardless of whether filehandles are mentioned. Filehandles aren't exclusively connected to regular files in the filesystem—they're also used to communicate with other programs (see Chapter 16) and for network communication (see Chapter 17). The open function can also be used to manipulate file descriptors, as discussed in Recipe 7.9.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Opening Files with Unusual Filenames
Inhaltsvorschau
You want to open a file with a funny filename, such as "-", or one that starts with <, >, or |; has leading or trailing whitespace; or ends with |. You don't want these to trigger open's do-what-I-mean behavior, since in this case, that's not what you mean.
When open is called with three arguments, not two, placing the mode in the second argument:
open(HANDLE, "<", $filename)          or die "cannot open $filename : $!\n";
Or simply use sysopen:
sysopen(HANDLE, $filename, O_RDONLY)   or die "cannot open $filename: $!\n";
When open is called with three arguments, the access mode and the filename are kept separate. But when called with only two arguments, open has to extract the access mode and the filename from a single string. If your filename begins with the same characters used to specify an access mode, open could easily do something unexpected. Imagine the following code:
$filename = shift @ARGV;

open(INPUT, $filename)               or die "Couldn't open $filename : $!\n";
If the user gave ">/etc/passwd" as the filename on the command line, this code would attempt to open /etc/passwd for writing. We can try to give an explicit mode, say for writing:
open(OUTPUT, ">$filename")

    or die "Couldn't open $filename for writing: $!\n";
but even this would let the user give a filename of ">data", and the code would append to the file data instead of erasing the old contents.
The easiest solution is to pass three arguments to open, where the second argument is the mode and the third the path. Now there can be neither confusion nor subterfuge.
open(OUTPUT, ">", $filename)

    or die "Couldn't open $filename for writing: $!\n";
Another solution is sysopen, which also takes the mode and filename as distinct arguments:
use Fcntl;                          # for file constants



sysopen(OUTPUT, $filename, O_WRONLY|O_TRUNC)

    or die "Can't open $filename for writing: $!\n";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Expanding Tildes in Filenames
Inhaltsvorschau
You want to open filenames like ~username/blah or ~/.mailrc, but open doesn't interpret the tilde to mean a home directory.
Either use the glob function:
open(FH, glob("~joebob/somefile")) || die "Couldn't open file: $!";
or expand the filename manually with a substitution:
$filename =~ s{ ^ ~ ( [^/]* ) }

              { $1

                    ? (getpwnam($1))[7]

                    : ( $ENV{HOME} || $ENV{LOGDIR}

                         || (getpwuid($<))[7]

                       )

}ex;
There is a useful convention, begun with the Unix csh shell and propagated widely by web addresses of the form http://www.example.com/~user/, that ~ in a filename represents a user's home directory. Thus:
    ~              # current user's home directory

    ~/blah         # file blah in current user's home directory

    ~user          # a particular user's home directory

    ~user/blah     # file blah in a particular user's home directory
Unfortunately, Perl's open function does not expand wildcards, including tildes. As of the v5.6 release, Perl internally uses the File::Glob module when you use the glob operator. So all you need to do is glob the result first.
open(MAILRC, "<", "~/.mailrc")           # WRONG: tilde is a shell thing

    or die "can't open ~/.mailrc: $!";



open(MAILRC, "<", glob("~/.mailrc"))     # so expand tilde first

    or die "can't open ~/.mailrc: $!";
The alternative solution, the substitution, uses /e to evaluate the replacement as Perl code. If a username follows the tilde, it's stored in $1, which getpwnam uses to extract the user's home directory out of the return list. This directory becomes the replacement string. If the tilde was not followed by a username, substitute in either the current HOME environment variable or the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making Perl Report Filenames in Error Messages
Inhaltsvorschau
Your program works with files, but Perl's errors and warnings only report the last used filehandle, not the name of the file.
Use the filename as the filehandle:
open($path, "<", $path)

    or die "Couldn't open $path for reading : $!\n";
Ordinarily, error messages say:

               Argument "3\n" isn't numeric in multiply at tallyweb line 16, <LOG> chunk 17

            
The filehandle LOG doesn't help much because you don't know which file the handle was connected to. By using the filename itself as indirect filehandle, Perl produces more informative errors and warnings:

               Argument "3\n" isn't numeric in multiply at tallyweb

               line 16, </usr/local/data/mylog3.dat> chunk 17

            
Unfortunately, this doesn't work with strict refs turned on because the variable $path doesn't really have a filehandle in it, only a string that sometimes behaves like one. The chunk number mentioned in warnings and error messages is the current value of the $. variable.
Recipe 7.1; the open function in perlfunc(1) and Chapter 29 of Programming Perl
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Storing Filehandles into Variables
Inhaltsvorschau
You want to use a filehandle like a normal variable so you can pass it to or return it from a function, store it in a data structure, and so on.
The easiest way to get a filehandle into a variable is to have open put it there for you:
open(my $fh, "<", $filename)      or die "$0: can't open $filename: $!";
To store named filehandles into a variable or pass them into or out of a function, use typeglob notation (*FH):
$variable = *FILEHANDLE;        # save in variable

subroutine(*FILEHANDLE);        # or pass directly



sub subroutine {

    my $fh = shift;

    print $fh "Hello, filehandle!\n";

}
If you pass an undefined scalar variable as the first argument to open, Perl allocates an anonymous typeglob and stores a reference to that typeglob in that scalar, effectively creating filehandles on demand. Like all other references, autovivified filehandles are subject to garbage collection, so this code doesn't leak a filehandle:
{

    open(my $fh, "< /etc/motd") or die;

    local $/;          # slurp mode

    $text = <$fh>;

}
When Perl reaches the end of the block, $fh goes out of scope. As explained earlier in the Introduction, because that variable contained the last reference to the anonymous filehandle created by open, the variable is garbage collected and the filehandle implicitly closed.
Autovivified filehandles, being anonymous and already held in variables, don't help you to understand how to pass named filehandles as function parameters or store them in variables, including elements of arrays or hashes. By named filehandles, we mean those of the form FH, including all predefined handles, such as STDIN and ARGV. So let's look at what FH is and how to extract a scalar value from it to use for all of those things.
Named filehandles used in:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a Subroutine That Takes Filehandles as Built-ins Do
Inhaltsvorschau
You can pass a bareword filehandle to Perl functions like eof, and you'd like to write similar subroutines of your own.
Use the Symbol module's qualify_to_ref in conjunction with a * prototype on the subroutine:
use Symbol qw(qualify_to_ref);



sub my_eof (*) {

  my $handle = shift;

  $handle = qualify_to_ref($handle, caller( ));

  # use $handle

}
The * prototype tells Perl that the function expects a bareword filehandle as its argument. This lets you call the function like so:
my_eof(HANDLE);
This works even when use strict 'subs' is in effect. The function receives a string as its argument, though. To safely use the argument as a filehandle, you need the Symbol module to turn it into a reference to a typeglob. And since typeglob refs can be used wherever you'd use named filehandles, store that reference in a scalar variable and use the variable as an indirect filehandle within your subroutine.
If you pass in a filehandle that is already a reference to a typeglob, like those autovivified by open, Perl and qualify_to_ref still do the right thing:
open(my $fh, "<", $filename) or die;

my_eof($fh);
This technique is used in Recipe 7.23.
The documentation for the standard module Symbol (also in Chapter 32 of Programming Perl); the "Prototypes" section in the perlsub(1) manpage (also in Chapter 6 of Programming Perl); Recipe 7.23
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Caching Open Output Filehandles
Inhaltsvorschau
You need more output files open simultaneously than your system allows.
Use the standard FileCache module:
use FileCache;

cacheout ($path);         # each time you use a filehandle

print $path "output";
FileCache's cacheout function lets you work with more output files than your operating system lets you have open at any one time. If you use it to open an existing file that FileCache is seeing for the first time, the file is truncated to length zero, no questions asked. However, in its opening and closing of files in the background, cacheout tracks files it has opened before and does not overwrite them, but appends to them instead. This does not create directories for you, so if you give it /usr/local/dates/merino.ewe to open but the directory /usr/local/dates doesn't exist, cacheout will die.
The cacheout function checks the value of the C-level constant NOFILE from the standard system include file sys/param.h to determine how many concurrently open files are allowed on your system. This value can be incorrect on some systems and even missing on a few (for instance, on those where the maximum number of open file descriptors is a process resource limit that can be set with the limit or ulimit commands). If cacheout can't get a value for NOFILE, set $FileCache::cacheout_maxopen to be four less than the correct value, or choose a reasonable number by trial and error.
Example 7-1 splits an xferlog file (created by most FTP servers nowadays) into separate files, each named after the authenticated user. Fields in xferlog files are space-separated, with the fourth field from the last holding the authenticated username.
Example 7-1. splitwulog
  #!/usr/bin/perl

  # splitwulog - split wuftpd log by authenticated user

  use FileCache;

  $outdir = "/var/log/ftp/by-user";

  while (<>) {

      unless (defined ($user = (split)[-4])) {

         warn "Invalid line: $.\n";

         next;

      }

      $path = "$outdir/$user";

      cacheout $path;

      print $path $_;

  

  

  

  

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Printing to Many Filehandles Simultaneously
Inhaltsvorschau
You need to output the same data to several different filehandles.
If you want to do it without forking, write a foreach loop that iterates across the filehandles:
foreach $filehandle (@FILEHANDLES) {

    print $filehandle $stuff_to_print;

}
If you don't mind forking, open a filehandle that's a pipe to a tee program:
open(MANY, "| tee file1 file2 file3 > /dev/null")   or die $!;

print MANY "data\n"                                 or die $!;

close(MANY)                                         or die $!;
If you don't have a tee program handy, use the IO::Tee module from CPAN:
use IO::Tee;

$tee = IO::Tee->new(@FILEHANDLES);

print $tee $stuff_to_print;
A filehandle sends output to one file or program only. To duplicate output to several places, you must call print multiple times or make a filehandle connected to a program like tee, which distributes its input elsewhere. If you use the first option, it's probably easiest to put the filehandles in a list or array and loop through them (see Recipe 7.5):
for $fh (*FH1, *FH2, *FH3)      { print $fh "whatever\n" }
However, if your system supports the tee(1) program, or if you've installed the Perl version from Recipe 8.25, you may open a pipe to tee and let it do the work of copying the file to several places. Remember that tee normally also copies its output to STDOUT, so you must redirect tee's standard output to /dev/null if you don't want an extra copy:
open (FH, "| tee file1 file2 file3 >/dev/null");

print FH "whatever\n";
You could even redirect your own STDOUT to the tee process, and then you're able to use a regular print directly:
# make STDOUT go to three files, plus original STDOUT

open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n";

print "whatever\n"                       or die "Writing: $!\n";

close(STDOUT)                            or die "Closing: $!\n";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Opening and Closing File Descriptors by Number
Inhaltsvorschau
You know which file descriptors you'd like to do I/O on, but Perl requires filehandles, not descriptor numbers.
To open the file descriptor, supply open with "<&=" or "<&" as the part of the file access mode, combined with a directional arrow:
open(FH, "<&=", $FDNUM)       # open FH to the descriptor itself

open(FH, "<&",  $FDNUM);      # open FH to a copy of the descriptor
Or use the IO::Handle module's new_from_fd class method:
use IO::Handle;

$fh = IO::Handle->new_from_fd($FDNUM, "r");
To close a file descriptor by number, either use the POSIX::close function or open it first as shown previously.
Occasionally you have a file descriptor but no filehandle. Perl's I/O system uses filehandles instead of file descriptors, so you have to make a new filehandle from an already open file descriptor. The "<&", ">&", and "+<&" access modes to open do this for reading, writing, and updating, respectively. Adding an equals sign to these—making them "<&=", ">&=", and "+<&="—is more parsimonious of file descriptors and nearly always what you want. That's because the underlying implementation of Perl's open statement uses only a C-level fdopen(3) function from the C library, not a dup2(2) syscall that calls the kernel.
The new_from_fd IO::Handle object method is equivalent to:
use IO::Handle;



$fh = new IO::Handle;

$fh->fdopen($FDNUM, "r");            # open fd 3 for reading
Here's how you'd open file descriptors that the MH mail system feeds its child processes. It identifies them in the environment variable MHCONTEXTFD:
$fd = $ENV{MHCONTEXTFD};

open(MHCONTEXT, "<&=", $fd)   or die "couldn't fdopen $fd: $!";

# after processing

close(MHCONTEXT)              or die "couldn't close context file: $!";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Copying Filehandles
Inhaltsvorschau
You want a copy of a filehandle.
To create an alias for a named filehandle, say:
*ALIAS = *ORIGINAL;
Use open with the & file access mode to create an independent copy of the file descriptor for that filehandle:
open(OUTCOPY, ">&STDOUT")        or die "Couldn't dup STDOUT: $!";

open(INCOPY,  "<&STDIN")         or die "Couldn't dup STDIN : $!";
Use open with the &= mode to create an alias for that filehandle or file descriptor:
open(OUTALIAS, ">&=STDOUT")      or die "Couldn't alias STDOUT: $!";

open(INALIAS,  "<&=STDIN")       or die "Couldn't alias STDIN : $!";

open(BYNUMBER, ">&=5")           or die "Couldn't alias file descriptor 5: $!";
With other types of filehandles (typeglobs, objects), use the same technique with a three-argument open:
open(my $copy, "<&",  $original) or die "Couldn't alias original: $!";

open(my $copy, "<&=", $original) or die "Couldn't alias original: $!";
If you create an alias for a filehandle with typeglobs, only one Perl I/O object is still being accessed. If you close one of these aliased filehandles, the I/O object is closed. Any further attempt to use a copy of that filehandle fails, silently by default or, if you have warnings enabled, with the warning "print on closed filehandle". When alternating access through aliased filehandles, writes work as you'd expect because there are no duplicated stdio data structures to get out of sync.
If you create a copy of a file descriptor with open(COPY, ">&HANDLE"), you're really calling the dup(2) syscall. You get two independent file descriptors whose file position, locks, and flags are shared, but which have independent stdio buffers. Closing one filehandle doesn't affect its copy. Simultaneously accessing the file through both filehandles is a recipe for disaster. Instead, this technique is normally used to save and restore
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating Temporary Files
Inhaltsvorschau
You need to create a temporary file and have it deleted automatically when your program exits. For instance, if you needed a temporary configuration file to feed a program you're about to launch, you'd need a name for that file so you could pass that filename along to the utility program. In other cases, you may want a temporary file to write to and read from, but don't need a filename for it.
Use the tempfile function from the File::Temp module:
use File::Temp qw/ tempdir /;

$fh = tempfile( );  # just the handle
perhaps in conjunction with a temporary directory:
use File::Temp qw/ tempdir /;

# or specify a directory

$dir = tempdir( CLEANUP => 1 );

($fh, $filename) = tempfile( DIR => $dir );



$template = "myprogtempXXXXXX"; # trailing Xs are changed

($fh, $filename) = tempfile( $template, DIR => $dir);

($fh, $filename) = tempfile( $template, SUFFIX => ".data");
The File::Temp module's functions are the best way to make temporary files. For one thing, they're extremely easy to use. For another, they're more portable than direct calls to the operating system. But perhaps of greatest importance is the care they take in security matters both various and subtle, especially those involving race conditions.
Although this module provides a handful of slightly different functions for creating a temporary file, most are there simply to support legacy interfaces; few users will need more than the basic tempfile( ) function. This function safely and atomically creates and opens a brand new, empty file in read-write mode. In scalar context, it returns a filehandle to that temporary file; in list context, it returns the handle and pathname of the temporary file:
use File::Temp qw(tempfile);



# just the handle

$fh = tempfile( );



# handle and filename

($fh, $filename) = tempfile( );
The tempfile function optionally accepts an argument containing a template and then named arguments in pairs. Named arguments specify such things as the directory to use instead of the current directory, that a specific file extension should be used, and on systems that support such a thing, whether the tempfile should be immediately unlinked before its handle is returned. (Files whose names have already been deleted from the filesystem are especially difficult for the guys with the black hats to find.) Any trailing X characters in the template are replaced by random characters in the final filename. You might use this feature if you need a temporary file with a specific extension.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Storing a File Inside Your Program Text
Inhaltsvorschau
You have data that you want to bundle with your program and treat as though it were in a file, but you don't want it to be in a different file.
Use the _ _DATA_ _ or _ _END_ _ tokens after your program code to mark the start of a data block, which can be read inside your program or module from the DATA filehandle.
Use _ _DATA_ _ within a module:
while ( <DATA>) {

    # process the line

}

_ _DATA_ _

# your data goes here
Similarly, use _ _END_ _ within the main program file:
while (<main::DATA>) {

    # process the line

}

_ _END_ _

# your data goes here
The _ _DATA_ _ and _ _END_ _ symbols tell the Perl compiler there's nothing more for it to do in the current file. They represent the logical end for code in a module or a program before the physical end-of-file.
Text after _ _DATA_ _ or _ _END_ _ can be read through the per-package DATA filehandle. For example, take the hypothetical module Primes. Text after _ _DATA_ _ in Primes.pm can be read from the Primes::DATA filehandle.
_ _END_ _ behaves as a synonym for _ _DATA_ _ in the main package. Any text occurring after an _ _END_ _ token in a module is completely inaccessible.
This lets you write self-contained programs instead of keeping data in separate files. Often this is used for documentation. Sometimes it's configuration data or old test data that the program was originally developed with, left lying about should it ever be needed again.
Another trick is to use DATA to find out the current program's or module's size or last modification date. On most systems, the $0 variable will contain the full pathname to your running script. On systems where $0 is not correct, you could try the DATA filehandle instead. This can be used to pull in the size, modification date, etc. Put a special token
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Storing Multiple Files in the DATA Area
Inhaltsvorschau
You've figured out how to use _ _END_ _ or _ _DATA_ _ to store a virtual file in your source code, but you now want multiple virtual files in one source file.
Use the Inline::Files module from CPAN. Carefully.
use Inline::Files;



while (<SETUP>) {

  # ...

}



while (<EXECUTION>) {

  # ...

}



_ _SETUP_ _

everything for the SETUP filehandle goes here

_ _EXECUTION_ _

everything for the EXECUTION filehandle goes here
One limitation with the _ _DATA_ _ setup is that you can have only one embedded data file per physical file. The CPAN module Inline::Files cleverly circumvents this restriction by providing logical embedded files. It's used like this:
use Inline::Files;



#

#  All your code for the file goes here first, then...

#



_ _ALPHA_ _

This is the data in the first virtual file, ALPHA.



_ _BETA_ _

This is the data in the next virtual file, BETA.



_ _OMEGA_ _

This is the data in yet another virtual file, OMEGA.



_ _ALPHA_ _

This is more data in the second part of virtual file, ALPHA.
The code is expected to read from filehandles whose names correspond to the double-underbarred symbols: here ALPHA, BETA, and OMEGA. You may have more than one section by the same name in the same program, and differently named sections needn't be read in any particular order. These handles work much like the ARGV handle does. For one thing, they're implicitly opened on first usage. For example, using the following code in the designated spot in the preceding code example:
while (<OMEGA>) {

   print "omega data: $_";

}



while (<ALPHA>) {

   print "alpha data: $_";

}
would produce this:
omega data: This is the data in yet another virtual file, OMEGA.

omega data:

alpha data: This is the data in the first virtual file, ALPHA.

alpha data:

alpha data: This is more data in the second part of virtual file, ALPHA.

alpha data:
Also like the ARGV handle, while reading from a particular handle, the list of available virtual files is in the array by that name, and the currently opened virtual file is in the scalar by that name. There's also a hash by that name that holds various bits of status information about that set of virtual files, including the current file, line number, and byte offset. If we used the Perl debugger on this program and dumped out the variables, it might show this:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a Unix-Style Filter Program
Inhaltsvorschau
You want to write a program that takes a list of filenames on the command line and reads from STDIN if no filenames were given. You'd like the user to be able to give the file "-" to indicate STDIN or "someprogram |" to indicate the output of another program. You might want your program to modify the files in place or to produce output based on its input.
Read lines with <>:
while (<>) {

    # do something with the line

}
When you say:
while (<>) {

    # ...

 }
Perl translates this into:
unshift(@ARGV, "-") unless @ARGV;

while ($ARGV = shift @ARGV) {

    unless (open(ARGV, $ARGV)) {

        warn "Can't open $ARGV: $!\n";

        next;

    }

    while (defined($_ = <ARGV>)) {

        # ...

    }

}
You can access ARGV and $ARGV inside the loop to read more from the filehandle or to find the filename currently being processed. Let's look at how this works.

Section 7.14.3.1: Behavior

If the user supplies no arguments, Perl sets @ARGV to a single string, "-". This is shorthand for STDIN when opened for reading and STDOUT when opened for writing. It's also what lets the user of your program specify "-" as a filename on the command line to read from STDIN.
Next, the file-processing loop removes one argument at a time from @ARGV and copies the filename into the global variable $ARGV. If the file cannot be opened, Perl goes on to the next one. Otherwise, it processes a line at a time. When the file runs out, the loop goes back and opens the next one, repeating the process until @ARGV is exhausted.
The open statement didn't say open(ARGV, "<", $ARGV). There's no extra less-than sign supplied. This allows for interesting effects, like passing the string "gzip -dc
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Modifying a File in Place with a Temporary File
Inhaltsvorschau
You need to update a file in place, and you can use a temporary file.
Read from the original file, write changes to a temporary file, and then rename the temporary back to the original:
open(OLD, "<", $old)        or die "can't open $old: $!";

open(NEW, ">", $new)        or die "can't open $new: $!";

while (<OLD>) {

    # change $_, then...

    print NEW $_            or die "can't write $new: $!";

}

close(OLD)                  or die "can't close $old: $!";

close(NEW)                  or die "can't close $new: $!";

rename($old, "$old.orig")   or die "can't rename $old to $old.orig: $!";

rename($new, $old)          or die "can't rename $new to $old: $!";
This is the best way to update a file "in place."
This technique uses little memory compared to the approach that doesn't use a temporary file. It has the added advantages of giving you a backup file and being easier and safer to program.
You can make the same changes to the file using this technique that you can with the version that uses no temporary file. For instance, to insert lines at line 20, say:
while (<OLD>) {

    if ($. =  = 20) {

        print NEW "Extra line 1\n";

        print NEW "Extra line 2\n";

    }

    print NEW $_;

}
To delete lines 20 through 30, say:
while (<OLD>) {

    next if 20 .. 30;

    print NEW $_;

}
Note that rename won't work across filesystems, so you should create your temporary file in the same directory as the file being modified.
The truly paranoid programmer would lock the file during the update. The tricky part is that you have to open the file for writing without destroying its contents before you can get a lock to modify it. Recipe 7.18 shows how to do this.
Recipe 7.1; Recipe 7.16; Recipe 7.17; Recipe 7.18
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Modifying a File in Place with the -i Switch
Inhaltsvorschau
You need to modify a file in place from the command line, and you're too lazy for the file manipulation of Recipe 7.15.
Use the -i and -p switches to Perl. Write your program on the command line:
% perl -i.orig -p -e 'FILTER COMMAND' file1 file2 file3 ...
or use the switches in programs:
#!/usr/bin/perl -i.orig -p

# filter commands go here
The -i command-line switch modifies each file in place. It creates a temporary file as in the previous recipe, but Perl takes care of the tedious file manipulation for you. Use it with -p (explained in Recipe 7.14) to turn:
while (<>) {

    if ($ARGV ne $oldargv) {           # are we at the next file?

        rename($ARGV, $ARGV . ".orig");

        open(ARGVOUT, ">", $ARGV);     # plus error check

        select(ARGVOUT);

        $oldargv = $ARGV;

    }

    s/DATE/localtime/e;

}

continue{

    print;

}

select (STDOUT);                      # restore default output
into:
% perl -pi.orig -e 's/DATE/localtime/e'
The -i switch takes care of making a backup (say -i instead of -i.orig to discard the original file contents instead of backing them up), and -p makes Perl loop over filenames given on the command line (or STDIN if no files were given).
The preceding one-liner would turn a file containing the following:
Dear Sir/Madam/Ravenous Beast,

    As of DATE, our records show your account

is overdue.  Please settle by the end of the month.

Yours in cheerful usury,

    --A. Moneylender
into:
Dear Sir/Madam/Ravenous Beast,

    As of Sat Apr 25 12:28:33 1998, our records show your account

is overdue.  Please settle by the end of the month.

Yours in cheerful usury,

    --A. Moneylender
This switch makes in-place translators a lot easier to write and to read. For instance, this changes isolated instances of "
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Modifying a File in Place Without a Temporary File
Inhaltsvorschau
You need to insert, delete, or change one or more lines in a file, and you don't want to (or can't) use a temporary file.
Open the file in update mode ("+<"), read the whole file into an array of lines, change the array, then rewrite the file and truncate it to its current seek pointer.
open(FH, "+<", $FILE)               or die "Opening: $!";

@ARRAY = <FH>;

# change ARRAY here

seek(FH,0,0)                        or die "Seeking: $!";

print FH @ARRAY                     or die "Printing: $!";

truncate(FH,tell(FH))               or die "Truncating: $!";

close(FH)                           or die "Closing: $!";
As explained in this chapter's Introduction, the operating system treats files as unstructured streams of bytes. This makes it impossible to insert, modify, or change bits of the file in place. (Except for the special case of fixed-record-length files, discussed in Recipe 8.13.) You can use a temporary file to hold the changed output, or you can read the entire file into memory, change it, and write it back out again.
Reading everything into memory is fine for small files, but doesn't scale well. Trying it on your 800 MB web server log files will either deplete your virtual memory or thrash your machine's VM system. For small files, though, this works:
open(F, "+<", $infile)      or die "can't read $infile: $!";

$out = "";

while (<F>) {

    s/DATE/localtime/eg;

    $out .= $_;

}

seek(F, 0, 0)               or die "can't seek to start of $infile: $!";

print F $out                or die "can't print to $infile: $!";

truncate(F, tell(F))        or die "can't truncate $infile: $!";

close(F)                    or die "can't close $infile: $!";
For other examples of things you can do in-place, look at the recipes in Chapter 8.
This approach is only for the truly determined. It's harder to write, takes more memory (potentially a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Locking a File
Inhaltsvorschau
Many processes need to update the same file simultaneously.
Have all processes honor advisory locking by using flock:
use Fcntl qw(:flock);               # for the LOCK_* constants

open(FH, "+<", $path)               or die "can't open $path: $!";

flock(FH, LOCK_EX)                  or die "can't flock $path: $!";

# update file, then...

close(FH)                           or die "can't close $path: $!";
Operating systems vary greatly in the type and reliability of locking techniques available. Perl tries hard to give you something that works, even if your operating system uses its own underlying technique. The flock function takes two arguments: a filehandle and a number representing what to do with the lock on that filehandle. The numbers are normally represented by names, such as LOCK_EX, which you can get from the Fcntl or IO::File modules.
Locks come in two varieties: shared (LOCK_SH) and exclusive (LOCK_EX). Despite what you might infer by "exclusive," processes aren't required to obey locks on files. Another way of saying this is that flock implements advisory locking. It allows processes to let the operating system suspend would-be writers of a file until any readers are finished with it.
Flocking files is like putting up a stoplight at an intersection. It works only if people pay attention to whether the light is red or green—or yellow for a shared lock. The red light doesn't stop traffic; it merely signals that traffic should stop. A desperate, ignorant, or rude person will still go flying through the intersection no matter what the light says. Likewise, flock only blocks out other flockers—not all processes trying to do I/O. Unless everyone is polite, accidents can (and will) happen.
The polite process customarily indicates its intent to read from the file by requesting a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Flushing Output
Inhaltsvorschau
When printing to a filehandle, output doesn't appear immediately. This is a problem in CGI scripts running on some programmer-hostile web servers where, if the web server sees warnings from Perl before it sees the (buffered) output of your script, it sends the browser an uninformative 500 Server Error. These buffering problems also arise with concurrent access to files by multiple programs and when talking with devices or sockets.
Disable buffering by setting the per-filehandle variable $| to a true value, customarily 1:
$old_fh = select(OUTPUT_HANDLE);

$| = 1;

select($old_fh);
Or, if you don't mind the expense of loading an IO module, disable buffering by invoking the autoflush method:
use IO::Handle;

OUTPUT_HANDLE->autoflush(1);
This works with indirect filehandles as well:
use IO::Handle;

$fh->autoflush(1);
In most stdio implementations, buffering varies with the type of output device. Disk files are block buffered, often with a buffer size of more than 2K. Pipes and sockets are often buffered with a buffer size between ½K and 2K. Serial devices, including terminals, modems, mice, and joysticks, are normally line-buffered; stdio sends the entire line out only when it gets the newline.
Perl's print function does not directly support truly unbuffered output, i.e., a physical write for each individual character. Instead, it supports command buffering, in which one physical write is made after every separate output command. This isn't as hard on your system as no buffering at all, and it still gets the output where you want it, when you want it.
Control output buffering through the $| special variable. Enable command buffering on output handles by setting it to a true value. This does not affect input handles at all; see Recipe 15.6 and Recipe 15.8 for unbuffered input. Set this variable to a false value to use default stdio buffering. Example 7-7 illustrates the difference.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Doing Non-Blocking I/O
Inhaltsvorschau
You want to read from or write to a filehandle without the system blocking your process until the program, file, socket, or device at the other end is ready. This is desired less often of regular files than of special files.
Open the file with sysopen, specifying the O_NONBLOCK option:
use Fcntl;

sysopen(MODEM, "/dev/cua0", O_NONBLOCK|O_RDWR)

    or die "Can't open modem: $!\n";
If you already have an open filehandle, invoke the blocking method from IO::Handle with an argument of 0:
use IO::Handle;

MODEM->blocking(0);  # assume MODEM already opened
Or use the low-level fcntl function:
use Fcntl;



$flags = "";

fcntl(HANDLE, F_GETFL, $flags)

    or die "Couldn't get flags for HANDLE : $!\n";

$flags |= O_NONBLOCK;

fcntl(HANDLE, F_SETFL, $flags)

    or die "Couldn't set flags for HANDLE: $!\n";
On a disk file, when no more data can be read because you're at the end of the file, the input operation returns immediately. But suppose the filehandle in question were the user's keyboard or a network connection. In those cases, simply because there's no data there right now doesn't mean there never will be, so the input function normally doesn't return until it gets data. Sometimes, though, you don't want to wait; you want to grab whatever's there and carry on with whatever you were doing.
Once a filehandle has been set for non-blocking I/O, the sysread or syswrite calls that would otherwise block will instead return undef and set $! to EAGAIN:
use Errno;



$rv = syswrite(HANDLE, $buffer, length $buffer);

if (!defined($rv) && $!{EAGAIN}) {

    # would block

} elsif ($rv != length $buffer) {

    # incomplete write

} else {

    # successfully wrote

}



$rv = sysread(HANDLE, $buffer, $BUFSIZ);

if (!defined($rv) && $!{EAGAIN}) {

    # would block

} else {

    # successfully read $rv bytes from HANDLE

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Determining the Number of Unread Bytes
Inhaltsvorschau
You want to know how many unread bytes are available for reading from a filehandle.
Use the FIONREAD ioctl call:
$size = pack("L", 0);

ioctl(FH, $FIONREAD, $size)     or die "Couldn't call ioctl: $!\n";

$size = unpack("L", $size);



# $size bytes can be read
Make sure the input filehandle is unbuffered (because you've used an I/O layer like :unix on it), or use only sysread.
The Perl ioctl function is a direct interface to the operating system's ioctl(2) system call. If your system doesn't have the FIONREAD request or the ioctl(2) call, you can't use this recipe. FIONREAD and the other ioctl(2) requests are numeric values normally found lurking in C include files.
Perl's h2ph tool tries to convert C include files to Perl code, which can be required. FIONREAD ends up defined as a function in the sys/ioctl.ph file:
require "sys/ioctl.ph";



$size = pack("L", 0);

ioctl(FH, FIONREAD( ), $size)    or die "Couldn't call ioctl: $!\n";

$size = unpack("L", $size);
If h2ph wasn't installed or doesn't work for you, you can manually grep the include files:
% grep FIONREAD /usr/include/*/*

/usr/include/asm/ioctls.h:#define FIONREAD      0x541B

            
If you install Inline::C from CPAN, you can write a C subroutine to obtain the constant for you:
use Inline C;



$FIONREAD = get_FIONREAD( );

# ...



_ _END_ _

_ _C_ _

#include <sys/ioctl.h>



int get_FIONREAD( ) {

  return FIONREAD;

}
If all else fails, write a small C program using the editor of champions:
% cat > fionread.c

#include <sys/ioctl.h>

main( ) {

    printf("%#08x\n", FIONREAD);

}

^D

% cc -o fionread fionread.c

% ./fionread

0x4004667f

            
Then hardcode it, leaving porting as an exercise to your successor.
$FIONREAD = 0x4004667f;         # XXX: opsys dependent



$size = pack("L", 0);

ioctl(FH, $FIONREAD, $size)     or die "Couldn't call ioctl: $!\n";

$size = unpack("L", $size);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading from Many Filehandles Without Blocking
Inhaltsvorschau
You want to learn whether input is available to be read, rather than blocking until there's input the way <FH> does. This is useful when reading from pipes, sockets, devices, and other programs.
Use select with a timeout value of 0 seconds if you're comfortable with manipulating bit-vectors representing file descriptor sets:
$rin = "";

# repeat next line for all filehandles to poll

vec($rin, fileno(FH1), 1) = 1;

vec($rin, fileno(FH2), 1) = 1;

vec($rin, fileno(FH3), 1) = 1;



$nfound = select($rout=$rin, undef, undef, 0);

if ($nfound) {

  # input waiting on one or more of those 3 filehandles

  if (vec($rout,fileno(FH1),1)) {

      # do something with FH1

  }

  if (vec($rout,fileno(FH2),1)) {

      # do something with FH2

  }

  if (vec($rout,fileno(FH3),1)) {

      # do something with FH3

  }

}
The IO::Select module provides an abstraction layer to hide bit-vector operations:
use IO::Select;



$select = IO::Select->new( );

# repeat next line for all filehandles to poll

$select->add(*FILEHANDLE);

if (@ready = $select->can_read(0)) {

    # input waiting on the filehandles in @ready

}
The select function is really two functions in one. If you call it with one argument, you change the current default output filehandle (see Recipe 7.19). If you call it with four arguments, it tells you which filehandles have input waiting or are ready to receive output. This recipe deals only with four-argument select.
The first three arguments to select are strings containing bit-vectors. Each bit-vector represents a set of file descriptors to inspect for pending input, pending output, and pending expedited data (like out-of-band or urgent data on a socket), respectively. The final argument is the timeout—how long select should spend waiting for status to change. A timeout value of 0 indicates a poll. Timeout can also be a floating-point number of seconds, or
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading an Entire Line Without Blocking
Inhaltsvorschau
You need to read a line of data from a handle that select says is ready for reading, but you can't use Perl's normal <FH> operation (readline) in conjunction with select because <FH> may buffer extra data and select doesn't know about those buffers.
Use the following sysreadline function, like this:
$line = sysreadline(SOME_HANDLE);
In case only a partial line has been sent, include a number of seconds to wait:
$line = sysreadline(SOME_HANDLE, TIMEOUT);
Here's the function to do that:
use IO::Handle;

use IO::Select;

use Symbol qw(qualify_to_ref);



sub sysreadline(*;$) {

    my($handle, $timeout) = @_;

    $handle = qualify_to_ref($handle, caller( ));

    my $infinitely_patient = (@_ =  = 1 || $timeout < 0);

    my $start_time = time( );

    my $selector = IO::Select->new( );

    $selector->add($handle);

    my $line = "";

SLEEP:

    until (at_eol($line)) {

        unless ($infinitely_patient) {

            return $line if time( ) > ($start_time + $timeout);

        }

        # sleep only 1 second before checking again

        next SLEEP unless $selector->can_read(1.0);

INPUT_READY:

        while ($selector->can_read(0.0)) {

            my $was_blocking = $handle->blocking(0);

CHAR:       while (sysread($handle, my $nextbyte, 1)) {

                $line .= $nextbyte;

                last CHAR if $nextbyte eq "\n";

            }

            $handle->blocking($was_blocking);

            # if incomplete line, keep trying

            next SLEEP unless at_eol($line);

            last INPUT_READY;

        }

    }

    return $line;

}

sub at_eol($) { $_[0] =~ /\n\z/ }
As described in Recipe 7.22, to determine whether the operating system has data on a particular handle for your process to read, you can use either Perl's built-in select function or the can_read method from the standard IO::Select module.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: netlock
Inhaltsvorschau
When locking files, we recommend that you use flock when possible. However, on some systems, flock's locking strategy is not reliable. For example, perhaps the person who built Perl on your system configured flock to use a version of file locking that didn't even try to work over the Net, or you're on the increasingly rare system where no flock emulation exists at all.
The following program and module provide a basic implementation of a file locking mechanism. Unlike a normal flock, with this module you lock file names, not file descriptors.
Thus, you can use it to lock directories, domain sockets, and other non-regular files. You can even lock files that don't exist yet. It uses a directory created at the same level in the directory structure as the locked file, so you must be able to write to the enclosing directory of the file you wish to lock. A sentinel file within the lock directory contains the owner of the lock. This is also useful with Recipe 7.15 because you can lock the filename even though the file that has that name changes.
The nflock function takes one or two arguments. The first is the pathname to lock; the second is the optional amount of time to wait for the lock. The function returns true if the lock is granted, returns false if the timeout expired, and raises an exception should various improbable events occur, such as being unable to write the directory.
Set the $File::LockDir::Debug variable to true to make the module emit messages if it stalls waiting for a lock. If you forget to free a lock and try to exit the program, the module will remove them for you. This won't happen if your program is sent a signal it doesn't trap.
Example 7-9 shows a driver program to demonstrate the File::LockDir module.
Example 7-9. drivelock
  #!/usr/bin/perl -w

  # drivelock - demo File::LockDir module

  use strict;

  use File::LockDir;

  $SIG{INT} = sub { die "outta here\n" };

  $File::LockDir::Debug = 1;

  my $path = shift                            or die "usage: $0 <path>\n";

  unless (nflock($path, 2)) {

      die "couldn't lock $path in 2 seconds\n";

  }

  sleep 100;

  nunflock($path);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: lockarea
Inhaltsvorschau
Perl's flock function only locks complete files, not regions of the file. Although fcntl supports locking of a file's regions, this is difficult to access from Perl, largely because no one has written an XS module that portably packs up the necessary structure.
The program in Example 7-11 implements fcntl, but only for the three architectures it already knows about: SunOS, BSD, and Linux. If you're running something else, you'll have to figure out the layout of the flock structure. We did this by eyeballing the C-language sys/fcntl.h #include file—and running the c2ph program to figure out alignment and typing. This program, while included with Perl, only works on systems with a strong Berkeley heritage, like those listed above. As with Unix—or Perl itself—you don't have to use c2ph, but it sure makes life easier if you can.
The struct_flock function in the lockarea program packs and unpacks in the proper format for the current architectures by consulting the $^O variable, which contains your current operating system name. There is no struct_flock function declaration. It's just aliased to the architecture-specific version. Function aliasing is discussed in Recipe 10.14.
The lockarea program opens a temporary file, clobbering any existing contents and writing a screenful (80 by 23) of blanks. Each line is the same length.
The program then forks one or more times and lets the child processes try to update the file at the same time. The first argument, N, is the number of times to fork to produce 2 ** N processes. So lockarea 1 makes two children, lockarea 2 makes four, lockarea 3 makes eight, lockarea 4 makes sixteen, etc. The more kids, the more contention for the locks.
Each process picks a random line in the file, locks that line only, and then updates it. It writes its process ID into the line, prepended with a count of how many times the line has been updated:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 8: File Contents
Inhaltsvorschau
The most brilliant decision in all of Unix was the choice of a single character for the newline sequence.
—Mike O'Dell, only half jokingly
Before the Unix Revolution, every kind of data source and destination was inherently different. Getting two programs merely to understand each other required heavy wizardry and the occasional sacrifice of a virgin stack of punch cards to an itinerant mainframe repairman. This computational Tower of Babel made programmers dream of quitting the field to take up a less painful hobby, like autoflagellation.
These days, such cruel and unusual programming is largely behind us. Modern operating systems work hard to provide the illusion that I/O devices, network connections, process control information, other programs, the system console, and even users' terminals are all abstract streams of bytes called files. This lets you easily write programs that don't care where their input came from or where their output goes.
Because programs read and write streams of simple text, every program can communicate with every other program. It is difficult to overstate the power and elegance of this approach. No longer dependent upon troglodyte gnomes with secret tomes of JCL (or COM) incantations, users can now create custom tools from smaller ones by using simple command-line I/O redirection, pipelines, and backticks.
Treating files as unstructured byte streams necessarily governs what you can do with them. You can read and write sequential, fixed-size blocks of data at any location in the file, increasing its size if you write past the current end. Perl uses an I/O library that emulates C's stdio(3) to implement reading and writing of variable-length records like lines, paragraphs, and words.
What can't you do to an unstructured file? Because you can't insert or delete bytes anywhere but at end-of-file, you can't easily change the length of, insert, or delete records. An exception is the last record, which you can delete by truncating the file to the end of the previous record. For other modifications, you need to use a temporary file or work with a copy of the file in memory. If you need to do this a lot, a database system may be a better solution than a raw file (see Chapter 14). Standard with Perl as of v5.8 is the Tie::File module, which offers an array interface to files of records. We use it in Recipe 8.4.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Before the Unix Revolution, every kind of data source and destination was inherently different. Getting two programs merely to understand each other required heavy wizardry and the occasional sacrifice of a virgin stack of punch cards to an itinerant mainframe repairman. This computational Tower of Babel made programmers dream of quitting the field to take up a less painful hobby, like autoflagellation.
These days, such cruel and unusual programming is largely behind us. Modern operating systems work hard to provide the illusion that I/O devices, network connections, process control information, other programs, the system console, and even users' terminals are all abstract streams of bytes called files. This lets you easily write programs that don't care where their input came from or where their output goes.
Because programs read and write streams of simple text, every program can communicate with every other program. It is difficult to overstate the power and elegance of this approach. No longer dependent upon troglodyte gnomes with secret tomes of JCL (or COM) incantations, users can now create custom tools from smaller ones by using simple command-line I/O redirection, pipelines, and backticks.
Treating files as unstructured byte streams necessarily governs what you can do with them. You can read and write sequential, fixed-size blocks of data at any location in the file, increasing its size if you write past the current end. Perl uses an I/O library that emulates C's stdio(3) to implement reading and writing of variable-length records like lines, paragraphs, and words.
What can't you do to an unstructured file? Because you can't insert or delete bytes anywhere but at end-of-file, you can't easily change the length of, insert, or delete records. An exception is the last record, which you can delete by truncating the file to the end of the previous record. For other modifications, you need to use a temporary file or work with a copy of the file in memory. If you need to do this a lot, a database system may be a better solution than a raw file (see Chapter 14). Standard with Perl as of v5.8 is the Tie::File module, which offers an array interface to files of records. We use it in Recipe 8.4.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading Lines with Continuation Characters
Inhaltsvorschau
You have a file with long lines split over two or more lines, with backslashes to indicate that a continuation line follows. You want to rejoin those split lines. Makefiles, shell scripts, and many other scripting or configuration languages let you break a long line into several shorter ones in this fashion.
Build up the complete lines one at a time until reaching one without a backslash:
while (defined($line = <FH>) ) {

    chomp $line;

    if ($line =~ s/\\$//) {

        $line .= <FH>;

        redo unless eof(FH);

    }

    # process full record in $line here

}
Here's an example input file:
DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) \

        $(TEXINFOS) $(INFOS) $(MANS) $(DATA)

DEP_DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) \

        $(TEXINFOS) $(INFO_DEPS) $(MANS) $(DATA) \

        $(EXTRA_DIST)
You'd like to process that file a record at a time with the escaped newlines ignored. The first record would then be the first two lines, the second record the next three lines, etc.
Here's how the algorithm works. The while loop reads lines one at a time. The substitution operator s/// tries to remove a trailing backslash. If the substitution fails, we've found a line without a backslash at the end. Otherwise, read another record, concatenate it onto the accumulating $line variable, and use redo to jump back to just inside the opening brace of the while loop. This lands us back on the chomp.
A frequent problem with files intended to be in this format arises when unnoticed spaces or tabs follow the backslash before the newline. The substitution that found continuation lines would be more forgiving if written this way:
if ($line =~ s/\\\s*$//) { 

    # as before

}
Unfortunately, even if your program is forgiving, surely others will not be. Remember to be liberal in what you accept, but conservative in what you produce.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Counting Lines (or Paragraphs or Records) in a File
Inhaltsvorschau
You need to compute the number of lines in a file.
Many systems have a wc program to count lines in a file:
$count = `wc -l < $file`;

die "wc failed: $?" if $?;

chomp($count);
You could also open the file and read line-by-line until the end, counting lines as you go:
open(FILE, "<", $file) or die "can't open $file: $!";

$count++ while <FILE>;

# $count now holds the number of lines read
Here's the fastest solution, assuming your line terminator really is "\n":
$count += tr/\n/\n/ while sysread(FILE, $_, 2 ** 20);
Although you can use -s $file to determine the file size in bytes, you generally cannot use it to derive a line count. See the Introduction in Chapter 9 for more on -s.
If you can't or don't want to call another program to do your dirty work, you can emulate wc by opening up and reading the file yourself:
open(FILE, "<", $file) or die "can't open $file: $!";

$count++ while <FILE>;

# $count now holds the number of lines read
Another way of writing this is:
open(FILE, "<", $file) or die "can't open $file: $!";

for ($count=0; <FILE>; $count++) { }
If you're not reading from any other files, you don't need the $count variable in this case. The special variable $. holds the number of lines read since a filehandle was last explicitly closed:
1 while <FILE>;

$count = $.;
This reads in all records in the file, then discards them.
To count paragraphs, set the global input record separator variable $/ to the empty string ("") before reading to make the input operator (<FH>) read a paragraph at a time.
$/ = "";            # enable paragraph mode for all reads

open(FILE, "<", $file) or die "can't open $file: $!";

1 while <FILE>;

$para_count = $.;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing Every Word in a File
Inhaltsvorschau
You need to do something to every word in a file, similar to the foreach function of csh.
Either split each line on whitespace:
while (<>) {

    for $chunk (split) {

        # do something with $chunk

    }

}
or use the m//g operator to pull out one chunk at a time:
while (<>) {

    while ( /(\w[\w'-]*)/g ) {

        # do something with $1

    }

}
Decide what you mean by "word." Sometimes you want anything but whitespace, sometimes you want only program identifiers, and sometimes you want English words. Your definition governs which regular expression to use.
The preceding two approaches work differently. Patterns are used in the first approach to decide what is not a word. In the second, they're used to decide what is a word.
With these techniques, it's easy to make a word frequency counter. Use a hash to store how many times each word has been seen:
# Make a word frequency count

%seen = ( );

while (<>) {

    while ( /(\w[\w'-]*)/g ) {

        $seen{lc $1}++;

    }

}



# output hash in a descending numeric sort of its values

foreach $word ( sort { $seen{$b} <=> $seen{$a} } keys %seen) {

    printf "%5d %s\n", $seen{$word}, $word;

}
To make the example program count line frequency instead of word frequency, omit the second while loop and use $seen{lc $_}++ instead:
# Line frequency count

%seen = ( );

while (<>) {

    $seen{lc $_}++;

}

foreach $line ( sort { $seen{$b} <=> $seen{$a} } keys %seen ) {

    printf "%5d %s", $seen{$line}, $line;

}
Odd things that may need to be considered as words include "M.I.T.", "Micro$oft", "o'clock", "49ers", "street-wise", "and/or", "&", "c/o", "St.", "Tschüß", and "Niño". Bear this in mind when you choose a pattern to match. The last two require you to place a use locale in your program and then use
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading a File Backward by Line or Paragraph
Inhaltsvorschau
You want to process each line or paragraph of a text file in reverse.
Read all lines into an array, then process that array from the end to the start:
@lines = <FILE>;

while ($line = pop @lines) {

    # do something with $line

}
Or store an array of lines in reverse order:
@lines = reverse <FILE>;

foreach $line (@lines) {

    # do something with $line

}
Or use the Tie::File module (standard as of v5.8):
use Tie::File;

tie(@lines, "Tie::File", $FILENAME, mode => 0)

    or die "Can't tie $FILENAME: $!";

$max_lines = $#lines;

for ($i = $max_lines; $i; $i--) {

    # do something with $lines[$i], eg line number them:

    printf "%5d  %s\n", $i+1, $lines[$i],

}
The limitations of file access mentioned in this chapter's Introduction prevent reading a line at a time starting from the end. You must read the lines into memory, then process them in reverse order. This requires at least as much available memory as the size of the file, unless you use tricks like Tie::File does.
The first technique moves through the array of lines in reverse order. This destructively processes the array, popping an element off the end of the array each time through the loop. We could do it non-destructively with:
for ($i = $#lines; $i != -1; $i--) {

    $line = $lines[$i];

}
The second approach generates an array of lines already in reverse order. This array can then be processed non-destructively. We get the reversed lines because the assignment to @lines confers list context on the return from reverse, and reverse confers list context on its argument of <FILE>, which returns a list of all lines in the file.
These approaches are easily extended to paragraphs just by changing $/:
# this enclosing block keeps local $/ temporary

{           

    local $/ = "";

    @paragraphs = reverse <FILE>;

}



foreach $paragraph (@paragraphs) {

    # do something

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Trailing a Growing File
Inhaltsvorschau
You want to read from a continually growing file, but the read fails when you reach the current end-of-file.
Read until end-of-file. Sleep, clear the EOF flag, and read some more. Repeat until interrupted. To clear the EOF flag, either use seek:
for (;;) {

    while (<FH>) { .... }

    sleep $SOMETIME;

    seek(FH, 0, 1);

}
or use the IO::Handle module's clearerr method:
use IO::Handle;



for (;;) {

    while (<FH>) { .... }

    sleep $SOMETIME;

    FH->clearerr( );

}
When you read until end-of-file, an internal flag is set that prevents further reading. The most direct way to clear this flag is the clearerr method, if supported: it's in the IO::Handle modules.
$naptime = 1;



use IO::Handle;

open (LOGFILE, "/tmp/logfile") or die "can't open /tmp/logfile: $!";

for (;;) {

    while (<LOGFILE>) { print }     # or appropriate processing

    sleep $naptime;

    LOGFILE->clearerr( );            # clear stdio error flag

}
Because Perl v5.8 ships with its own stdio implementation, that simple approach should almost always work. On the rare system where it doesn't work, you may need to use seek. The seek code given in the Solution tries to move zero bytes from the current position, which nearly always works. It doesn't change the current position, but it should clear the end-of-file condition on the handle so that the next <LOGFILE> operation picks up new data.
If that still doesn't work, perhaps because it relies on features of your I/O implementation, you may need to use the following seek code, which remembers the old file position explicitly and returns there directly.
for (;;) {

    for ($curpos = tell(LOGFILE); <LOGFILE>; $curpos = tell(LOGFILE)) {

        # process $_ here

    }

    sleep $naptime;

    seek(LOGFILE, $curpos, 0);  # seek to where we had been

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Picking a Random Line from a File
Inhaltsvorschau
You want to return a random line from a file.
Use rand and $. (the current line number) to decide which line to print:
srand;

rand($.) < 1 && ($line = $_) while <>;

# $line is the random line
This is a beautiful example of a solution that may not be obvious. We read every line in the file but don't have to store them all in memory. This is great for large files. Each line has a 1 in N (where N is the number of lines read so far) chance of being selected.
Here's a replacement for fortune using this algorithm:
$/ = "%%\n";

@ARGV = ("/usr/share/games/fortunes") unless @ARGV;

srand;

rand($.) < 1 && ($adage = $_) while <>;

print $adage;
If you know line offsets (for instance, you've created an index) and the number of lines, you can randomly select a line and jump to its offset in the file, but you usually don't have such an index.
Here's a more rigorous explanation of how the algorithm works. The function call rand ($.) picks a random number between 0 and the current line number. Therefore, you have a one in N chance, that is, , of keeping the Nth line. Therefore you've a 100% chance of keeping the first line, a 50% chance of keeping the second, a 33% chance of keeping the third, and so on. The question is whether this is fair for all N, where N is any positive integer.
First, some concrete examples, then abstract ones.
Obviously, a file with one line (N=1) is fair: you always keep the first line because = 100%, making it fair for files of 1 line. For a file with two lines, N=2. You always keep the first line; then when reaching the second line, you have a 50% chance of keeping it. Thus, both lines have an equal chance of being selected, which shows that N=2 is fair. For a file with three lines, N=3. You have a one-third chance, 33%, of keeping that third line. That leaves a two-thirds chance of retaining one of the first two out of the three lines. But we've already shown that for those first two lines there's a 50-50 chance of selecting either one. 50 percent of two-thirds is one-third. Thus, you have a one-third chance of selecting each of the three lines of the file.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Randomizing All Lines
Inhaltsvorschau
You want to copy a file and randomly reorder its lines.
Read all lines into an array, shuffle the array using List::Util's shuffle function, and write the shuffled lines back out:
use List::Util qw(shuffle);



while (<INPUT>) {

    push(@lines, $_);

}

@lines = shuffle(@lines);

foreach (@reordered) {

    print OUTPUT $_;

}
The easiest approach is to read all lines into memory and shuffle them there. Because you don't know where lines start in the file, you can't just shuffle a list of line numbers and then extract lines in the order they'll appear in the shuffled file. Even if you did know the byte offsets of the start of each line, it would probably still be slower because you'd be seeking around in the file instead of sequentially reading it from start to finish.
If you have a version of Perl older than v5.8, you can download the List::Util module from CPAN.
The documentation for the standard List::Util module; Recipe 2.6; Recipe 2.7; Recipe 4.18
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading a Particular Line in a File
Inhaltsvorschau
You want to extract a single line from a file.
The simplest solution is to read the lines until you get to the one you want:
# looking for line number $DESIRED_LINE_NUMBER

$. = 0;

do { $LINE = <HANDLE> } until $. =  = $DESIRED_LINE_NUMBER || eof;
If you are going to be doing this a lot and the file fits into memory, read the file into an array:
@lines = <HANDLE>;

$LINE = $lines[$DESIRED_LINE_NUMBER];
The standard (as of v5.8) Tie::File ties an array to a file, one line per array element:
use Tie::File;

use Fcntl;



tie(@lines, Tie::File, $FILE, mode => O_RDWR)

  or die "Cannot tie file $FILE: $!\n";

$line = $lines[$sought - 1];
If you have the DB_File module, its DB_RECNO access method ties an array to a file, one line per array element:
use DB_File;

use Fcntl;



$tie = tie(@lines, DB_File, $FILE, O_RDWR, 0666, $DB_RECNO) or die 

    "Cannot open file $FILE: $!\n";

# extract it

$line = $lines[$sought - 1];
Each strategy has different features, useful in different circumstances. The linear access approach is easy to write and best for short files. The Tie::File module gives good performance, regardless of the size of the file or which line you're reading (and is pure Perl, so doesn't require any external libraries). The DB_File mechanism has some initial overhead, but later accesses are faster than with linear access, so use it for long files that are accessed more than once and are accessed out of order.
It is important to know whether you're counting lines from 0 or 1. The $. variable is 1 after the first line is read, so count from 1 when using linear access. The index mechanism uses many offsets, so count from 0. Tie::File and DB_File treat the file's records as an array indexed from 0, so count lines from 0.
Here are three different implementations of the same program,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing Variable-Length Text Fields
Inhaltsvorschau
You want to extract variable-length fields from your input.
Use split with a pattern matching the field separators.
# given $RECORD with field separated by a pattern,

# extract a list of fields

@FIELDS = split(/PATTERN/, $RECORD);
The split function takes up to three arguments: PATTERN, EXPRESSION, and LIMIT. The LIMIT parameter is the maximum number of fields to split into. (If the input contains more fields, they are returned unsplit in the final list element.) If LIMIT is omitted, all fields (except any final empty ones) are returned. EXPRESSION gives the string value to split. If EXPRESSION is omitted, $_ is split. PATTERN is a pattern matching the field separator. If PATTERN is omitted, contiguous stretches of whitespace are used as the field separator and leading empty fields are silently discarded.
If your input field separator isn't a fixed string, you might want split to return the field separators as well as the data by using parentheses in PATTERN to save the field separators. For instance:
split(/([+-])/, "3+5-2");
returns the values:
(3, "+", 5, "-", 2)
To split colon-separated records in the style of the /etc/passwd file, use:
@fields = split(/:/, $RECORD);
The classic application of split is whitespace-separated records:
@fields = split(/\s+/, $RECORD);
If $RECORD started with whitespace, this last use of split would have put an empty string into the first element of @fields because split would consider the record to have an initial empty field. If you didn't want this, you could use this special form of split:
@fields = split(" ", $RECORD);
This behaves like split with a pattern of /\s+/, but ignores leading whitespace.
When the record separator can appear in the record, you have a problem. The usual solution is to escape occurrences of the record separator in records by prefixing them with a backslash. See Recipe 1.18.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Removing the Last Line of a File
Inhaltsvorschau
You'd like to remove the last line from a file.
Use the standard (as of v5.8) Tie::File module and delete the last element from the tied array:
use Tie::File;



tie @lines, Tie::File, $file        or die "can't update $file: $!";

delete $lines[-1];
The Tie::File solution is the most efficient solution, at least for large files, because it doesn't have to read through the entire file to find the last line and doesn't read the entire file into memory. It is, however, considerably slower for small files than code you could implement yourself by hand. That doesn't mean you shouldn't use Tie::File; it just means you've optimized for programmer time instead of for computer time.
If you don't have Tie::File and can't install it from CPAN, read the file a line at a time and keep track of the byte address of the last line you've seen. When you've exhausted the file, truncate to the last address you saved:
open (FH, "+<", $file)              or die "can't update $file: $!";

while (<FH>) {

    $addr = tell(FH) unless eof(FH);

}

truncate(FH, $addr)                 or die "can't truncate $file: $!";
Remembering the offset is more efficient than reading the whole file into memory because it holds only one given line at a time. Although you still have to grope your way through the whole file, you can use this technique on files larger than available memory.
The documentation for the standard Tie::File module; the truncate and tell functions in perlfunc(1) and in Chapter 29 of Programming Perl; your system's open(2) and fopen(3) manpages; Recipe 8.18
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing Binary Files
Inhaltsvorschau
You want to read 8-bit binary data as 8-bit binary data, i.e., neither as characters in a particular encoding nor as a text file with any newline or end-of-file conversions that your I/O library might want to do.
Use the binmode function on the filehandle:
binmode(HANDLE);
The binmode function lets you specify new I/O layers for a filehandle. The default layer to specify is :raw, which removes any layers that would interfere with binary data. The Solution is thus equivalent to:
binmode(HANDLE, ":raw");
except that explicitly specifying :raw only works on Perl 5.8 and later. The one-argument form of binmode works on all versions of Perl.
Because Perl makes :crlf the default if you are on an operating system that needs it, you should rarely (if ever) need to specify :crlf in your program. Furthermore, it's generally not wise to add or remove the :crlf layer once you've begun reading from the file, as there may be data already read into buffers that you can't unread. You can, however, safely change the :encoding(...) layer midstream (when parsing XML, for example).
You should get into the habit of calling binmode when you open a binary file. This will make your program portable to systems that might (un)helpfully translate bytes in your binary file into something unusable.
You may specify the I/O layers when you open a filehandle, rather than using binmode after the fact:
open(FH, "< :raw", $filename);       # binary mode
Specify the default set of layers for all subsequently opened input and output filehandles with the open pragma:
use open IN => ":raw";      # binary files
The PerlIO(3) manpage; the open and binmode functions in perlfunc(1) and in Chapter 29 of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Random-Access I/O
Inhaltsvorschau
You have to read a binary record from the middle of a large file but don't want to read a record at a time to get there.
Once you know the record's size, multiply it by the record number to get the byte address, and then seek to that byte address and read the record:
$ADDRESS = $RECSIZE * $RECNO;

seek(FH, $ADDRESS, 0) or die "seek:$!";

read(FH, $BUFFER, $RECSIZE);
The Solution assumes the first record has a RECNO of 0. If you're counting from one, use:
$ADDRESS = $RECSIZE * ($RECNO-1);
This is best applied to binary data. Applying it to text files assumes you have a constant character width and constant line length. This rules out most Unicode encodings, any kind of Windows text file, and any text file where lines can have different lengths.
The seek function in perlfunc(1) and in Chapter 29 of Programming Perl; Recipe 8.13
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Updating a Random-Access File
Inhaltsvorschau
You want to read an old record from a binary file, change its values, and write back the record.
After reading the old record, pack up the updated values, seek to the previous address, and write it back.
use Fcntl;                          # for SEEK_SET and SEEK_CUR



$ADDRESS = $RECSIZE * $RECNO;

seek(FH, $ADDRESS, SEEK_SET)        or die "Seeking: $!";

read(FH, $BUFFER, $RECSIZE) =  = $RECSIZE

                                    or die "Reading: $!";

@FIELDS = unpack($FORMAT, $BUFFER);

# update fields, then

$BUFFER = pack($FORMAT, @FIELDS);

seek(FH, -$RECSIZE, SEEK_CUR)       or die "Seeking: $!";

print FH $BUFFER;

close FH                            or die "Closing: $!";
You don't have to use anything fancier than print in Perl to output a record. Remember that the opposite of read is not write but print, although oddly enough, the opposite of sysread is syswrite.
The example program shown in Example 8-4, weekearly, takes one argument: the user whose record you want to backdate by a week. (Of course, in practice, you wouldn't really want to (nor be able to!) mess with the system accounting files.) This program requires write access to the file to be updated, since it opens the file in update mode. After fetching and altering the record, it packs it up again, skips backward in the file one record, and writes it out.
Example 8-4. weekearly
  #!/usr/bin/perl -w

  # weekearly -- set someone's login date back a week

  use User::pwent;

  use IO::Seekable;

  

  $typedef = "L A12 A16";         # linux fmt; sunos is "L A8 A16"

  $sizeof  = length(pack($typedef, ( )));

  $user    = shift(@ARGV) || $ENV{USER} || $ENV{LOGNAME};

  

  $address = getpwnam($user)->uid * $sizeof;

  

  open (LASTLOG, "+<:raw", "/var/log/lastlog")

      or die "can't update /var/log/lastlog: $!";

  seek(LASTLOG, $address, SEEK_SET)

      or die "seek failed: $!";

  read(LASTLOG, $buffer, $sizeof) =  = $sizeof

      or die "read failed: $!";

  

  ($time, $line, $host) = unpack($typedef, $buffer);

  $time  -= 24 * 7 * 60 * 60;         # back-date a week

  $buffer = pack($typedef, $time, $line, $time);

  

  seek(LASTLOG, -$sizeof, SEEK_CUR)   # backup one record

      or die "seek failed: $!";

  print LASTLOG $record;

      

  close(LASTLOG)

      or die "close failed: $!";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading a String from a Binary File
Inhaltsvorschau
You want to read a NUL-terminated string from a file, starting at a particular address.
Ensure you're working with a binary file, set $/ to an ASCII NUL, and read the string with <>:
binmode(FH);                        # binary mode

$old_rs = $/;                       # save old $/

$/ = "\0";                          # ASCII 0: NUL

seek(FH, $addr, SEEK_SET)           or die "Seek error: $!\n";

$string = <FH>;                     # read string

chomp $string;                      # remove NUL

$/ = $old_rs;                       # restore old $/
You can use local to save and restore $/:
{

    local $/ = "\0";

    # ...

}                           # $/ is automatically restored
The example program shown in Example 8-5, bgets, accepts a filename and one or more byte addresses as arguments. Decimal, octal, or hexadecimal addresses may be specified. For each address, the program reads and prints the null- or EOF-terminated string at that position.
Example 8-5. bgets
  #!/usr/bin/perl -w

  # bgets - get a string from an address in a binary file

  use IO::Seekable;

  use open IO => ":raw";              # binary mode on all opened handles

  ($file, @addrs) = @ARGV             or die "usage: $0 file addr ...";

  open(FH, $file)                     or die "cannot open $file: $!";

  $/ = "\000";

  foreach $addr (@addrs) {

      $addr = oct $addr if $addr =~ /^0/;

      seek(FH, $addr, SEEK_SET)

          or die "can't seek to $addr in $file: $!";

      printf qq{%#x %#o %d "%s"\n}, $addr, $addr, $addr, scalar <>;

  }
Example 8-6 is a simple implementation of the Unix strings program.
Example 8-6. strings
  #!/usr/bin/perl -w

  # strings - pull strings out of a binary file

  $/ = "\0";

  use open IO => ":raw";

  while (<>) {

      while (/([\040-\176\s]{4,})/g) {

          print $1, "\n";

      }

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading Fixed-Length Records
Inhaltsvorschau
You want to read a file whose records have a fixed length.
Use read and unpack:
# $RECORDSIZE is the length of a record, in bytes.

# $TEMPLATE is the unpack template for the record

# FILE is the file to read from

# @FIELDS is an array, one element per field



until ( eof(FILE) ) {

    read(FILE, $record, $RECORDSIZE) =  = $RECORDSIZE

        or die "short read\n";

    @FIELDS = unpack($TEMPLATE, $record);

}
Because the file in question is not a text file, you can't use <FH> or IO::Handle's getline method to read records. Instead, you must simply read a particular number of bytes into a variable. This variable contains one record's data, which you decode using unpack with the appropriate format.
For binary data, the catch is determining that format. When reading data written by a C program, this can mean peeking at C include files or manpages describing the structure layout, and this requires knowledge of C. It also requires that you become unnaturally chummy with your C compiler, because otherwise it's hard to predict field padding and alignment (such as the x2 in the format used in Recipe 8.24). If you're lucky enough to be on a Berkeley Unix system or a system supporting gcc, then you may be able to use the c2ph tool distributed with Perl to cajole your C compiler into helping you with this.
The tailwtmp program at the end of this chapter uses the format described in utmp(5) under Linux, and works on its /var/log/wtmp and /var/run/utmp files. Once you commit to working in binary format, machine dependencies creep in fast. It probably won't work unaltered on your system, but the procedure is still illustrative. Here is the relevant layout from the C include file on Linux:
#define UT_LINESIZE           12

#define UT_NAMESIZE           8

#define UT_HOSTSIZE           16



struct utmp {                       /* here are the pack template codes */

    short ut_type;                  /* s for short, must be padded      */

    pid_t ut_pid;                   /* i for integer                    */

    char ut_line[UT_LINESIZE];      /* A12 for 12-char string           */

    char ut_id[2];                  /* A2, but need x2 for alignment    */

    time_t ut_time;                 /* l for long                       */

    char ut_user[UT_NAMESIZE];      /* A8 for 8-char string             */

    char ut_host[UT_HOSTSIZE];      /* A16 for 16-char string           */

    long ut_addr;                   /* l for long                       */

};
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading Configuration Files
Inhaltsvorschau
You want to allow users of your program to change its behavior through configuration files.
Either process a file in trivial VAR=VALUE format, setting a hash key-value pair for each setting:
while (<CONFIG>) {

    chomp;                  # no newline

    s/#.*//;                # no comments

    s/^\s+//;               # no leading white

    s/\s+$//;               # no trailing white

    next unless length;     # anything left?

    my ($var, $value) = split(/\s*=\s*/, $_, 2);

    $User_Preferences{$var} = $value;

}
or better yet, treat the config file as full Perl code:
do "$ENV{HOME}/.progrc";
The first solution lets you read config files in a trivial format like this (comments and empty lines are allowed):
# set class C net

NETMASK = 255.255.255.0

MTU     = 296



DEVICE  = cua1

RATE    = 115200

MODE    = adaptive
After you're done, you can pull in a setting by using something like $User_Preferences{"RATE"} to find the value 115200. If you wanted the config file to set the global variable by that name, instead of assigning to the hash, use this:
no strict "refs";

$$var = $value;
and the $RATE variable would contain 115200.
The second solution uses do to pull in raw Perl code directly. When used with an expression instead of a block, do interprets the expression as a filename. This is nearly identical to using require, but without risk of taking a fatal exception. In the second format, the config file would look like:
# set class C net

$NETMASK = "255.255.255.0";

$MTU     = 0x128;

# Brent, please turn on the modem

$DEVICE  = "cua1";

$RATE    = 115_200;

$MODE    = "adaptive";
If you don't see the point of having extra punctuation and live code, consider this: you can have all of Perl at your disposal. You can now add arbitrary logic and tests to your simple assignments:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Testing a File for Trustworthiness
Inhaltsvorschau
You want to read from a file, perhaps because it has configuration information. You want to use the file only if it can't be written to (or perhaps not even be read from) by anyone else than its owner.
Use the stat function to retrieve ownership and file permissions information. You can use the built-in version, which returns a list:
( $dev, $ino, $mode, $nlink, 

  $uid, $gid, $rdev, $size, 

  $atime, $mtime, $ctime, 

  $blksize, $blocks )       = stat($filename)

        or die "no $filename: $!";



$mode &= 07777;             # discard file type info
Or you can use the by-name interface:
use File::stat;



$info = stat($filename)     or die "no $filename: $!";

if ($info->uid =  = 0) {

    print "Superuser owns $filename\n";

} 

if ($info->atime > $info->mtime) {

    print "$filename has been read since it was written.\n";

}
Usually you trust users to set file permissions as they wish. If they want others to read their files, or even to write to them, that's their business. Applications such as editors, mailers, and shells are often more discerning, though, refusing to evaluate code in configuration files if anyone but the owner can write to them. This helps avoid Trojan horse attacks. Security-minded programs such as ftp and ssh may even reject config files that can be read by anyone but their owner.
If the file is writable by someone other than the owner or is owned by someone other than the current user or the superuser, it shouldn't be trusted. To figure out file ownership and permissions, the stat function is used. The following function returns true if the file is deemed safe and false otherwise. If the stat fails, undef is returned.
use File::stat;



sub is_safe {

    my $path = shift;

    my $info = stat($path);

    return unless $info;



    # owner neither superuser nor me 

    # the real uid is in stored in the $< variable

    if (($info->uid != 0) && ($info->uid != $<)) {

        return 0;

    }



    # check whether group or other can write file.

    # use 066 to detect either reading or writing

    if ($info->mode & 022) {   # someone else can write this

        return 0 unless -d _;  # non-directories aren't safe

            # but directories with the sticky bit (01000) are

        return 0 unless $info->mode & 01000;        

    }

    return 1;

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Treating a File as an Array
Inhaltsvorschau
Your file contains a list of lines or records, and you'd like to be able to use Perl's powerful array operations to access and manipulate the file.
Use the Tie::File module, standard with v5.8 of Perl:
use Tie::File;

use Fcntl;



tie @data, Tie::File, $FILENAME or die "Can't tie to $filename : $!\n";

# use array operations on @data to work with the file
The Tie::File module makes a file appear to be an array, one record per element. You can then fetch and assign to elements of the array, use array functions like push and splice, use negative indices, or reverse it, and in every instance you're really working with the data on disk.
If you don't specify how Tie::File should open the file, it is opened for read and write access and created if it doesn't exist. To specify a particular access mode (see Recipe 7.1), pass the Fcntl mode with the mode parameter when you tie. For example:
use Fcntl;

tie(@data, Tie::File, $filename, mode => O_RDONLY)

  or die "Can't open $filename for reading: $!\n";
When you alter the array, the file is rewritten on disk. For example, if you change the length of an element, all records later in the file must be copied to make the change. Take this code:
foreach (@data) {

  s/Perl Cookbook/Perl Cookbook (2nd edition)/g;

}
That's close because you change the length of record 0, forcing a copy of records 1..N. Then you change the length of record 1, forcing a copy of records 2..N. It's better to defer the update until all changes have been made and then have Tie::File update the file in one single write. To do this, call a method on the object behind the tied array:
(tied @data)->defer;                     # defer updates

foreach (@data) {

  s/Perl Cookbook/Perl Cookbook (2nd edition)/g;

}

(tied @data)->flush;
Exactly how much rewriting to defer is governed by how much memory you let Tie::File use, because the only way to keep track of changes without updating the file is to store those changes in memory. The Tie::File manpage shows how to change options for memory use.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Setting the Default I/O Layers
Inhaltsvorschau
You want to ensure all files opened by your program use a particular set of I/O layers. For example, you know that every file will contain UTF-8 data.
Use the open pragma:
use open IO => ":raw:utf8";
You can easily specify I/O layers when you open a filehandle directly, but that doesn't help you when the filehandle is opened by someone else's code (possibly even the Perl core). The open pragma lets you specify a default set of layers for every open that doesn't specify its own layers.
The open module also offers separate IN and OUT control for input and output handles. For example, to read bytes and emit UTF-8:
use open "IN" => ":bytes", "OUT" => ":utf8";
The :std option tells open to apply the input and output layers to STDIN and STDOUT/STDERR. For example, the following code makes input handles read Greek (ISO 8859-7) and output handles write in the UTF-8 Unicode encoding. Then it applies the same layers to STDIN, STDOUT, and STDERR:
use open "IN" => ":encoding(Greek)",     # reading Greek

         "OUT" => ":utf8",               # writing 8-bit data in Unicode UTF-8,

         ":std";                         # STDIN is Greek,
The documentation for the standard open pragma; Recipe 8.12 and Recipe 8.19
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading or Writing Unicode from a Filehandle
Inhaltsvorschau
You have a file containing text in a particular encoding and when you read data from that into a Perl string, Perl treats it as a series of 8-bit bytes. You'd like to work with characters instead of bytes because your encoding characters can take more than one byte. Also, if Perl doesn't know about your encoding, it may fail to identify certain characters as letters. Similarly, you may want to output text in a particular encoding.
Use I/O layers to tell Perl that data from that filehandle is in a particular encoding.
open(my $ifh, "<:encoding(ENCODING_NAME)", $filename);

open(my $ofh, ">:encoding(ENCODING_NAME)", $filename);
Perl's text manipulation functions handle UTF-8 strings just as well as they do 8-bit data—they just need to know what type of data they're working with. Each string in Perl is internally marked as either UTF-8 or 8-bit data. The encoding(...) layer converts data between variable external encodings and the internal UTF-8 within Perl. This is done by way of the Encode module.
In the section on Unicode Support in Perl back in the Introduction to Chapter 1, we explained how under Unicode, every different character had a different code point (i.e., a different number) associated with it. Assigning all characters unique code points solves many problems. No longer does the same number, like 0xC4, represent one character under one character repertoire (e.g., a LATIN CAPITAL LETTER A WITH DIAERESIS under ISO-8859-1) and a different character in another repertoire (e.g., a GREEK CAPITAL LETTER DELTA under ISO-8859-7).
This neatly solves many problems, but still leaves one important issue: the precise format used in memory or disk for each code point. If most code points fit in 8 bits, it would seem wasteful to use, say, a full 32 bits for each character. But if every character is the same size as every other character, the code is easier to write and may be faster to execute.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Converting Microsoft Text Files into Unicode
Inhaltsvorschau
You have a text file written on a Microsoft computer that looks like garbage when displayed. How do you fix this?
Set the encoding layer appropriately when reading to convert this into Unicode:
binmode(IFH, ":encoding(cp1252)")

    || die "can't binmode to cp1252 encoding: $!";
Suppose someone sends you a file in cp1252 format, Microsoft's default in-house 8-bit character set. Files in this format can be annoying to read—while they might claim to be Latin1, they are not, and if you look at them with Latin1 fonts loaded, you'll get garbage on your screen. A simple solution is as follows:
open(MSMESS, "< :crlf :encoding(cp1252)", $inputfile)

    || die "can't open $inputfile: $!";
Now data read from that handle will be automatically converted into Unicode when you read it in. It will also be processed in CRLF mode, which is needed on systems that don't use that sequence to indicate end of line.
You probably won't be able to write out this text as Latin1. That's because cp1252 includes characters that don't exist in Latin1. You'll have to leave it in Unicode, and displaying Unicode properly may not be as easy as you wish, because finding tools to work with Unicode is something of a quest in its own right. Most web browsers support ISO 10646 fonts; that is, Unicode fonts (see http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html). Whether your text editor does is a different matter, although both emacs and vi (actually, vim, not nvi) have mechanisms for handling Unicode. The authors used the following xterm(1) command to look at text:
xterm -n unicode -u8 -fn -misc-fixed-medium-r-normal--20-200-75-75-c-100-iso10646-1
But many open questions still exist, such as cutting and pasting of Unicode data between windows.
The
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Comparing the Contents of Two Files
Inhaltsvorschau
You have two files and want to see whether they're the same or different.
Use the standard File::Compare module with filenames, typeglobs, or any indirect filehandles:
use File::Compare;



if (compare($FILENAME_1, $FILENAME_2) =  = 0) {

  # they're equal

}



if (compare(*FH1, *FH2) =  = 0) {

  # they're equal

}



if (compare($fh1, $fh2) =  = 0) {

  # they're equal

}
The File::Compare module (standard as of v5.8 and available on CPAN if you have an earlier version of Perl) compares two files for equality. The compare function, exported by default, returns 0 when the files are equal, 1 when they differ, and -1 when any error occurs during reading.
To compare more than two filehandles, simply loop, comparing two at a time:
# ensure all filehandles in @fh hold the same data

foreach $fh (@fh[1..$#fh]) {

  if (compare($fh[0], $fh)) {

    # $fh differs

  }

}
If you want details of exactly how two files differ, use the Text::Diff module from CPAN:
use Text::Diff;



$diff = diff(*FH1, *FH2);

$diff = diff($FILENAME_1, $FILENAME_2, { STYLE => "Context" });
In addition to filehandles, diff can also take filenames, strings, and even arrays of records. Pass a hash of options as the third argument. The STYLE option controls the type of output returned; it can be "Unified" (the default), "Context", or "OldStyle". You can even write your own class for custom diff formats.
The value returned by diff is a string similar to the output of the diff(1) program. This string is in valid diff format, suitable for feeding into patch(1). Although Text::Diff will not always produce the same output as GNU diff, byte for byte, its diffs are nevertheless correct.
The documentation for the standard File::Compare module; the documentation for the CPAN module Text::Diff; the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Pretending a String Is a File
Inhaltsvorschau
You have data in string, but would like to treat it as a file. For example, you have a subroutine that expects a filehandle as an argument, but you would like that subroutine to work directly on the data in your string instead. Additionally, you don't want to write the data to a temporary file.
Use the scalar I/O in Perl v5.8:
open($fh, "+<", \$string);   # read and write contents of $string
Perl's I/O layers include support for input and output from a scalar. When you read a record with <$fh>, you are reading the next line from $string. When you write a record with print, you change $string. You can pass $fh to a function that expects a filehandle, and that subroutine need never know that it's really working with data in a string.
Perl respects the various access modes in open for strings, so you can specify that the strings be opened as read-only, with truncation, in append mode, and so on:
open($fh, "<",  \$string);   # read only

open($fh, ">",  \$string);   # write only, discard original contents

open($fh, "+>", \$string);   # read and write, discard original contents

open($fh, "+<", \$string);   # read and write, preserve original contents
These handles behave in all respects like regular filehandles, so all I/O functions work, such as seek, truncate, sysread, and friends.
The open function in perlfunc(1) and in Chapter 29 of Programming Perl; Recipe 8.12 and Recipe 8.19
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: tailwtmp
Inhaltsvorschau
Every time a user logs into or out of a Unix system, a record is added to the wtmp file. You can't use the normal tail program on it, because the file is in binary format. The tailwtmp program in Example 8-7 knows the format of the binary file and shows every new record as it appears. You'll have to adjust the pack format for your own system.
Example 8-7. tailwtmp
  #!/usr/bin/perl -w

  # tailwtmp - watch for logins and logouts; 

  # uses linux utmp structure, from utmp(5)

  $typedef = "s x2 i A12 A4 l A8 A16 l";

  $sizeof = length pack($typedef, ( ) );

  use IO::File;

  open(WTMP, "< :raw", "/var/log/wtmp") or die "can't open /var/log/wtmp: $!";

  seek(WTMP, 0, SEEK_END);

  for (;;) { 

          while (read(WTMP, $buffer, $sizeof) =  = $sizeof) {

          ($type, $pid,  $line, $id, $time, $user, $host, $addr) 

              = unpack($typedef, $buffer);

          next unless $user && ord($user) && $time;

          printf "%1d %-8s %-12s %2s %-24s %-16s %5d %08x\n", 

              $type,$user,$line,$id,scalar(localtime($time)), 

              $host,$pid,$addr;

      } 

      for ($size = -s WTMP; $size =  = -s WTMP; sleep 1) {  }

      WTMP->clearerr( );

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: tctee
Inhaltsvorschau
Not all systems support the classic tee program for splitting output pipes to multiple destinations. This command sends the output from someprog to /tmp/output and to the mail pipe beyond:
% someprog | tee /tmp/output | Mail -s "check this" user@host.org
This program helps not only users who aren't on Unix systems and don't have a regular tee; it also helps those who are, because it offers features not found on other versions of tee.
The four flag arguments are -i to ignore interrupts, -a to append to output files, -u for unbuffered output, and -n to omit copying the output on to standard out.
Because this program uses Perl's magic open, you can specify pipes as well as files.
% someprog | tctee f1 "|cat -n" f2 ">>f3"
That sends the output from someprog to the files f1 and f2, appends it to f3, sends a copy to the program cat -n, and also produces the stream on standard output.
The program in Example 8-8 is one of many venerable Perl programs written nearly a decade ago that still runs perfectly well. If written from scratch now, we'd probably use strict, warnings, and ten to thirty thousand lines of modules. But if it ain't broke . . .
Example 8-8. tctee
  #!/usr/bin/perl

  # tctee - clone that groks process tees 

  # perl3 compatible, or better.

  

  while ($ARGV[0] =~ /^-(.+)/ && (shift, ($_ = $1), 1)) {

      next if /^$/;

      s/i// && (++$ignore_ints, redo);

      s/a// && (++$append,      redo);

      s/u// && (++$unbuffer,    redo);

      s/n// && (++$nostdout,    redo);

      die "usage $0 [-aiun] [filenames] ...\n";

  }

  

  if ($ignore_ints) {

      for $sig ("INT", "TERM", "HUP", "QUIT") { $SIG{$sig} = "IGNORE"; }

  }

  

  $SIG{"PIPE"} = "PLUMBER";

  $mode = $append ? ">>" : ">";

  $fh = "FH000";

  

  unless ($nostdout) {

      %fh = ("STDOUT", "standard output"); # always go to stdout

  }

  

  $| = 1 if $unbuffer;

  

  for (@ARGV) {

      if (!open($fh, (/^[^>|]/ && $mode) . $_)) {

          warn "$0: cannot open $_: $!\n"; # like sun's; i prefer die

          $status++;

          next;

      }

      select((select($fh), $| = 1)[0]) if $unbuffer;

      $fh{$fh++} = $_;

  }

  

  while (<STDIN>) {

      for $fh (keys %fh) {

          print $fh $_;

      }

  }

  

  for $fh (keys %fh) {

      next if close($fh) || !defined $fh{$fh};

      warn "$0: couldnt close $fh{$fh}: $!\n";

      $status++;

  }

  

  exit $status;

  

  sub PLUMBER {

      warn "$0: pipe to \"$fh{$fh}\" broke!\n";

      $status++;

      delete $fh{$fh};

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: laston
Inhaltsvorschau
When you log in to a Unix system, it tells you when you last logged in. That information is stored in a binary file called lastlog. Each user has their own record; UID 8 is at record 8, UID 239 at record 239, and so on. To find out when a given user last logged in, convert their login name to a number, seek to their record in that file, read, and unpack. Doing so with shell tools is hard, but with the laston program, it's easy. Here's an example:
% laston gnat

gnat  UID 314 at Mon May 25 08:32:52 2003 on ttyp0 from below.perl.com

         
The program in Example 8-9 is much newer than the tctee program in Example 8-8, but it's less portable. It uses the Linux binary layout of the lastlog file. You'll have to change this for other systems.
Example 8-9. laston
  #!/usr/bin/perl -w

  # laston - find out when given user last logged on

  use User::pwent;

  use IO::Seekable qw(SEEK_SET);

  

  open (LASTLOG, "< :raw", "/var/log/lastlog")

      or die "can't open /var/log/lastlog: $!";

  

  $typedef = "L A12 A16";  # linux fmt; sunos is "L A8 A16"

  $sizeof  = length(pack($typedef, ( )));

  

  for $user (@ARGV) {

      $U = ($user =~ /^\d+$/) ? getpwuid($user) : getpwnam($user);

      unless ($U) { warn "no such uid $user\n"; next; }

      seek(LASTLOG, $U->uid * $sizeof, SEEK_SET) or die "seek failed: $!";

      read(LASTLOG, $buffer, $sizeof) =  = $sizeof        or next;

      ($time, $line, $host) = unpack($typedef, $buffer);

      printf "%-8s UID %5d %s%s%s\n", $U->name, $U->uid,

            $time ? ("at " . localtime($time)) : "never logged in",

            $line && " on $line", 

            $host && " from $host";

  

  

  

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: Flat File Indexes
Inhaltsvorschau
It sometimes happens that you need to jump directly to a particular line number in a file, but the lines vary in length, so you can't use Recipe 8.12. Although you could start at the beginning of the file and read every line, this is inefficient if you're making multiple queries.
The solution is to build an index of fixed-width records, one per line. Each record contains the offset in the data file of the corresponding line. The subroutine in Example 8-10 takes the data file and a filehandle to send the index to. It reads a record at a time and prints the current offset in the file to the index, packed into a big-ending unsigned 32-bit integer; see the documentation for the pack function in perlfunc(1) for alternative storage types.
Example 8-10. build_index
  # usage: build_index(*DATA_HANDLE, *INDEX_HANDLE)

  sub build_index {

      my $data_file  = shift;

      my $index_file = shift;

      my $offset     = 0;

  

      while (<$data_file>) {

          print $index_file pack("N", $offset);

          $offset = tell($data_file);

      }

  }
Once you have an index, it becomes easy to read a particular line from the data file. Jump to that record in the index, read the offset, and jump to that position in the data file. The next line you read will be the one you want. Example 8-11 returns the line, given the line number and the index and data file handles.
Example 8-11. line_with_index
  # usage: line_with_index(*DATA_HANDLE, *INDEX_HANDLE, $LINE_NUMBER)

  # returns line or undef if LINE_NUMBER was out of range

  sub line_with_index {

      my $data_file   = shift;

      my $index_file  = shift;

      my $line_number = shift;

      my $size;               # size of an index entry

      my $i_offset;           # offset into the index of the entry

      my $entry;              # index entry

      my $d_offset;           # offset into the data file

      $size = length(pack("N", 0));

      $i_offset = $size * ($line_number-1);

      seek($index_file, $i_offset, 0) or return;

      read($index_file, $entry, $size);

      $d_offset = unpack("N", $entry);

      seek($data_file, $d_offset, 0);

      return scalar(<$data_file>);

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 9: Directories
Inhaltsvorschau
Unix has its weak points, but its file system is not one of them.
—Chris Torek
To fully understand directories, you need to be acquainted with the underlying mechanics. The following explanation is slanted toward the Unix filesystem, for whose syscalls and behavior Perl's directory access routines were designed, but it is applicable to some degree to most other platforms.
A filesystem consists of two parts: a set of data blocks where the contents of files and directories are kept, and an index to those blocks. Each entity in the filesystem has an entry in the index, be it a plain file, a directory, a link, or a special file like those in /dev. Each entry in the index is called an inode (short for index node). Since the index is a flat index, inodes are addressed by number.
A directory is a specially formatted file, whose inode entry marks it as a directory. A directory's data blocks contain a set of pairs. Each pair consists of the name of something in that directory and the inode number of that thing. The data blocks for /usr/bin might contain:
Name
Inode
bc
17
du
29
nvi
8
pine
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
To fully understand directories, you need to be acquainted with the underlying mechanics. The following explanation is slanted toward the Unix filesystem, for whose syscalls and behavior Perl's directory access routines were designed, but it is applicable to some degree to most other platforms.
A filesystem consists of two parts: a set of data blocks where the contents of files and directories are kept, and an index to those blocks. Each entity in the filesystem has an entry in the index, be it a plain file, a directory, a link, or a special file like those in /dev. Each entry in the index is called an inode (short for index node). Since the index is a flat index, inodes are addressed by number.
A directory is a specially formatted file, whose inode entry marks it as a directory. A directory's data blocks contain a set of pairs. Each pair consists of the name of something in that directory and the inode number of that thing. The data blocks for /usr/bin might contain:
Name
Inode
bc
17
du
29
nvi
8
pine
55
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Getting and Setting Timestamps
Inhaltsvorschau
You need to retrieve or alter when a file was last modified (written or changed) or accessed (read).
Use stat to get those times and utime to set them. Both functions are built into Perl:
($READTIME, $WRITETIME) = (stat($filename))[8,9];



utime($NEWREADTIME, $NEWWRITETIME, $filename);
As explained in the Introduction, three different times are associated with an inode in the traditional Unix filesystem. Of these, any user can set the atime and mtime with utime, assuming the user has write access to the parent directory of the file. There is effectively no way to change the ctime. This example shows how to call utime:
$SECONDS_PER_DAY = 60 * 60 * 24;

($atime, $mtime) = (stat($file))[8,9];

$atime -= 7 * $SECONDS_PER_DAY;

$mtime -= 7 * $SECONDS_PER_DAY;



utime($atime, $mtime, $file)

    or die "couldn't backdate $file by a week w/ utime: $!";
You must call utime with both atime and mtime values. If you want to change only one, you must call stat first to get the other:
$mtime = (stat $file)[9];

utime(time, $mtime, $file);
This is easier to understand if you use File::stat:
use File::stat;

utime(time, stat($file)->mtime, $file);
Use utime to make it appear as though you never touched a file at all (beyond its ctime being updated). For example, to edit a file, use the program in Example 9-1.
Example 9-1. uvi
  #!/usr/bin/perl -w

  # uvi - vi a file without changing its access times

  

  $file = shift or die "usage: uvi filename\n";

  ($atime, $mtime) = (stat($file))[8,9];

  system($ENV{EDITOR} || "vi", $file);

  utime($atime, $mtime, $file)

      or die "couldn't restore $file to orig times: $!";
The stat and utime functions in perlfunc(1) and in Chapter 29 of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Deleting a File
Inhaltsvorschau
You want to delete a file. Perl's delete function isn't what you want.
Use Perl's unlink function:
unlink($FILENAME)                 or die "Can't delete $FILENAME: $!\n";

unlink(@FILENAMES) =  = @FILENAMES    or die "Couldn't unlink all of @FILENAMES: $!\n";
The unlink function takes its name from the Unix syscall. Perl's unlink takes a list of filenames and returns the number of filenames successfully deleted. This return value can then be tested with || or or:
unlink($file) or die "Can't unlink $file: $!";
unlink doesn't report which filenames it couldn't delete, only how many it deleted. Here's one way to test for successful deletion of many files and report the number deleted:
unless (($count = unlink(@filelist)) =  = @filelist) {

    warn "could only delete $count of "

            . (@filelist) . " files";

}
A foreach over @filelist would permit individual error messages.
Under Unix, deleting a file from a directory requires write access to the directory, not to the file, because it's the directory you're changing. Under some circumstances, you could remove a file you couldn't write to or write to a file you couldn't remove.
If you delete a file that some process still has open, the operating system removes the directory entry but doesn't free up data blocks until all processes have closed the file. This is how the tmpfile function in File::Temp works (see Recipe 7.11).
The unlink function in perlfunc(1) and in Chapter 29 of Programming Perl; your system's unlink(2) manpage; Recipe 7.11
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Copying or Moving a File
Inhaltsvorschau
You need to copy a file, but Perl has no built-in copy function.
Use the copy function from the standard File::Copy module:
use File::Copy;

copy($oldfile, $newfile);
You can do it by hand:
open(IN,  "<", $oldfile)                    or die "can't open $oldfile: $!";

open(OUT, ">", $newfile)                    or die "can't open $newfile: $!";



$blksize = (stat IN)[11] || 16384;          # preferred block size?

while (1) {

    $len = sysread IN, $buf, $blksize);

    if (!defined $len) {

        next if $! =~ /^Interrupted/;       # ^Z and fg on EINTR

        die "System read error: $!\n";

    }

    last unless $len;



    $offset = 0;

    while ($len) {          # Handle partial writes.

        defined($written = syswrite OUT, $buf, $len, $offset)

            or die "System write error: $!\n";

        $len    -= $written;

        $offset += $written;

    };

}



close(IN);

close(OUT);
or you can call your system's copy program:
system("cp $oldfile $newfile");       # unix

system("copy $oldfile $newfile");     # dos, vms
The File::Copy module provides copy and move functions. These are more convenient than resorting to low-level I/O calls and more portable than calling system. This version of move works across file-system boundaries; the standard Perl built-in rename (usually) does not.
use File::Copy;



copy("datafile.dat", "datafile.bak")

    or die "copy failed: $!";



move("datafile.dat", "datafile.new")

    or die "move failed: $!";
Because these functions return only a simple success status, you can't easily tell which file prevented the copy or move from working. Copying the files manually lets you pinpoint which files didn't copy, but it fills your program with complex sysreads and syswrites.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Recognizing Two Names for the Same File
Inhaltsvorschau
You want to determine whether two filenames in a list correspond to the same file on disk (because of hard and soft links, two filenames can refer to a single file). You might do this to make sure that you don't change a file you've already worked with.
Maintain a hash, keyed by the device and inode number of the files you've seen. The values are the names of the files:
%seen = ( );



sub do_my_thing {

    my $filename = shift;

    my ($dev, $ino) = stat $filename;



    unless ($seen{$dev, $ino}++) {

        # do something with $filename because we haven't

        # seen it before

    }

}
A key in %seen is made by combining the device number ($dev) and inode number ($ino) of each file. Files that are the same will have the same device and inode numbers, so they will have the same key.
If you want to maintain a list of all files of the same name, instead of counting the number of times seen, save the name of the file in an anonymous array.
foreach $filename (@files) {

    ($dev, $ino) = stat $filename;

    push( @{ $seen{$dev,$ino} }, $filename);

}



foreach $devino (sort keys %seen) {

    ($dev, $ino) = split(/$;/o, $devino);

    if (@{$seen{$devino}} > 1) {

        # @{$seen{$devino}} is a list of filenames for the same file

    }

}
The $; variable contains the separator string using the old multidimensional associative array emulation syntax, $hash{$x,$y,$z}. It's still a one-dimensional hash, but it has composite keys. The key is really join($; => $x, $y, $z). The split separates them again. Although you'd normally just use a real multilevel hash directly, here there's no need, and it's cheaper not to.
The $; ($SUBSEP) variable in perlvar(1), and in the "Special Variables" section of Chapter 28 of Programming Perl; the stat function in
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing All Files in a Directory
Inhaltsvorschau
You want to do something to each file in a particular directory.
Use opendir to open the directory and readdir to retrieve every filename:
opendir(DIR, $dirname) or die "can't opendir $dirname: $!";

while (defined($file = readdir(DIR))) {

    # do something with "$dirname/$file"

}

closedir(DIR);
The opendir, readdir, and closedir functions operate on directories as open, <>, and close operate on files. Both use handles, but the directory handles used by opendir and friends are different from the filehandles used by open and friends. In particular, you can't use <> on a directory handle.
In scalar context, readdir returns the next filename in the directory until it reaches the end of the directory, when it returns undef. In list context it returns the rest of the filenames in the directory or an empty list if there were no files left. As explained in this chapter's Introduction, the filenames returned by readdir do not include the directory name. When you work with the filenames returned by readdir, you must either move to the right directory first or prepend the directory to the filename.
This shows one way of prepending:
$dir = "/usr/local/bin";

print "Text files in $dir are:\n";

opendir(BIN, $dir) or die "Can't open $dir: $!";

while( $file = readdir BIN) {

    print "$file\n" if -T "$dir/$file";

}

closedir(BIN);
The readdir function will return the special directories "." (the directory itself) and ".." (the parent of the directory). Most people skip those files with code like:
while ( defined ($file = readdir BIN) ) {

    next if $file =~ /^\.\.?$/;     # skip . and ..

    # ...

}
Like filehandles, bareword directory handles are per-package constructs. You can use the local *DIRHANDLE syntax to get a new bareword directory handle. Alternatively, pass an undefined scalar as the first argument to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Globbing, or Getting a List of Filenames Matching a Pattern
Inhaltsvorschau
You want to get a list of filenames similar to those produced by MS-DOS's *.* and Unix's *.h. This is called globbing, and the filename wildcard expression is called a glob, or occasionally a fileglob to distinguish it from a typeglob.
Perl provides globbing with the semantics of the Unix C shell through the glob keyword and <>:
@list = <*.c>;

@list = glob("*.c");
You can also use readdir to extract the filenames manually:
opendir(DIR, $path);

@files = grep { /\.c$/ } readdir(DIR);

closedir(DIR);
In versions of Perl before v5.6, Perl's built-in glob and <WILDCARD> notation (not to be confused with <FILEHANDLE>) ran an external program (often the csh shell) to get the list of filenames. This led to globbing being tarred with security and performance concerns. As of v5.6, Perl uses the File::Glob module to glob files, which solves the security and performance problems of the old implementation. Globs have C shell semantics on non-Unix systems to encourage portability. In particular, glob syntax isn't regular expression syntax—glob uses ? to mean "any single character" and * to mean "zero or more characters," so glob("f?o*") matches flo and flood but not fo.
For complex rules about which filenames you want, roll your own selection mechanism using readdir and regular expressions.
At its simplest, an opendir solution uses grep to filter the list returned by readdir:
@files = grep { /\.[ch]$/i } readdir(DH);
As always, the filenames returned don't include the directory. When you use the filename, prepend the directory name to get the full pathname:
opendir(DH, $dir)        or die "Couldn't open $dir for reading: $!";



@files = ( );

while( defined ($file = readdir(DH)) ) {

    next unless /\.[ch]$/i;



    my $filename = "$dir/$file";

    push(@files, $filename) if -T $filename;

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing All Files in a Directory Recursively
Inhaltsvorschau
You want to do something to each file and subdirectory in a particular directory.
Use the standard File::Find module.
use File::Find;

sub process_file {

    # do whatever;

}

find(\&process_file, @DIRLIST);
File::Find provides a convenient way to process a directory recursively. It does the directory scans and recursion for you. All you do is pass find a code reference and a list of directories. For each file in those directories, recursively, find calls your function.
Before calling your function, find by default changes to the directory being visited, whose path relative to the starting directory is stored in the $File::Find::dir variable. $_ is set to the basename of the file being visited, and the full path of that file can be found in $File::Find::name. Your code can set $File::Find::prune to true to tell find not to descend into the directory just seen.
This simple example demonstrates File::Find. We give find an anonymous subroutine that prints the name of each file visited and adds a / to the names of directories:
@ARGV = qw(.) unless @ARGV;

use File::Find;

find sub { print $File::Find::name, -d && "/", "\n" }, @ARGV;
The -d file test operator returns the empty string '' if it fails, making the && return that, too. But if -d succeeds, the && returns "/", which is then printed.
The following program prints the total bytes occupied by everything in a directory, including subdirectories. It gives find an anonymous subroutine to keep a running sum of the sizes of each file it visits. That includes all inode types, including the sizes of directories and symbolic links, not just regular files. Once the find function returns, the accumulated sum is displayed.
use File::Find;

@ARGV = (".") unless @ARGV;

my $sum = 0;

find sub { $sum += -s }, @ARGV;

print "@ARGV contains $sum bytes\n";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Removing a Directory and Its Contents
Inhaltsvorschau
You want to remove a directory tree recursively without using rm -r.
Use the finddepth function from File::Find, shown in Example 9-3.
Example 9-3. rmtree1
  #!/usr/bin/perl

  # rmtree1 - remove whole directory trees like rm -r

  use File::Find;

  die "usage: $0 dir ..\n" unless @ARGV;

  find {

      bydepth   => 1,

      no_chdir  => 1,

      wanted    => sub { 

          if (!-l && -d _) {

              rmdir     or warn "couldn't rmdir directory $_: $!";

          } else {

              unlink    or warn "couldn't unlink file $_: $!";

          }

      }

  } => @ARGV;
Or use rmtree from File::Path, as shown in Example 9-4.
Example 9-4. rmtree2
  #!/usr/bin/perl

  # rmtree2 - remove whole directory trees like rm -r

  use File::Path;

  die "usage: $0 dir ..\n" unless @ARGV;

  foreach $dir (@ARGV) {

      rmtree($dir);

  }
These programs remove an entire directory tree. Use with extreme caution!
The File::Find module supports an alternate interface in which find's first argument is a hash reference containing options and their settings. The bydepth option is the same as calling finddepth instead of find. This is guaranteed to visit all files beneath a directory before the directory itself, just what we need to remove a directory and its contents. The no_chdir option stops find from descending into directories during processing; under this option, $_ is the same as $File::Find::name. Finally, the wanted option takes a code reference, our old wanted( ) function.
We use two different functions, rmdir and unlink; both default to $_ if no argument is provided. The unlink function deletes only files, and
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Renaming Files
Inhaltsvorschau
You have many files whose names you want to change.
Use a foreach loop and the rename function:
foreach $file (@NAMES) {

    my $newname = $file;

    # change $newname

    rename($file, $newname) or  

        warn "Couldn't rename $file to $newname: $!\n";

}
This is straightforward. rename takes two arguments. The first is the filename to change, and the second is its new name. Perl's rename is a frontend to the operating system's rename syscall, which typically won't rename files across filesystem boundaries.
A small change turns this into a generic rename script, such as the one by Larry Wall shown in Example 9-5.
Example 9-5. rename
  #!/usr/bin/perl -w

  # rename - Larry's filename fixer

  $op = shift or die "Usage: rename expr [files]\n";

  chomp(@ARGV = <STDIN>) unless @ARGV;

  for (@ARGV) {

      $was = $_;

      eval $op;

      die $@ if $@;

      rename($was,$_) unless $was eq $_;

  }
This script's first argument is Perl code that alters the filename (stored in $_) to reflect how you want the file renamed. It can do this because it uses an eval to do the hard work. It also skips rename calls when the filename is untouched. This lets you simply use wildcards like rename EXPR * instead of making long lists of filenames.
Here are five examples of calling the rename program from the shell:
% rename 's/\.orig$//'  *.orig

% rename "tr/A-Z/a-z/ unless /^Make/"  *

% rename '$_ .= ".bad"'  *.f

% rename 'print "$_: "; s/foo/bar/ if <STDIN> =~ /^y/i'  *

% find /tmp -name "*~" -print | rename 's/^(.+)~$/.#$1/'
The first shell command removes a trailing ".orig" from each filename.
The second converts uppercase to lowercase. Because a translation is used rather than the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Splitting a Filename into Its Component Parts
Inhaltsvorschau
You want to extract a filename, its enclosing directory, or the extension(s) from a string that contains a full pathname.
Use routines from the standard File::Basename module.
use File::Basename;



$base = basename($path);

$dir  = dirname($path);

($base, $dir, $ext) = fileparse($path);
The standard File::Basename module contains routines to split up a filename. dirname and basename supply the directory and filename portions, respectively:
$path = "/usr/lib/libc.a";

$file = basename($path);    

$dir  = dirname($path);     



print "dir is $dir, file is $file\n";

# dir is /usr/lib, file is libc.a
The fileparse function can extract the extension. Pass fileparse the path to decipher and a regular expression that matches the extension. You must supply a pattern because an extension isn't necessarily dot-separated. Consider ".tar.gz": is the extension ".tar", ".gz", or ".tar.gz"? By specifying the pattern, you control which you get.
$path = "/usr/lib/libc.a";

($name,$dir,$ext) = fileparse($path,'\..*');



print "dir is $dir, name is $name, extension is $ext\n";

# dir is /usr/lib/, name is libc, extension is .a
By default, these routines parse pathnames using your operating system's normal conventions for directory separators by consulting the $^O ($OSNAME) variable, which holds a string identifying the platform you're running on. That value was determined when Perl was built and installed. You can change the default by calling the fileparse_set_fstype routine. This alters the behavior of subsequent calls to the File::Basename functions:
fileparse_set_fstype("MacOS");

$path = "Hard%20Drive:System%20Folder:README.txt";

($name,$dir,$ext) = fileparse($path,'\..*');



print "dir is $dir, name is $name, extension is $ext\n";

# dir is Hard%20Drive:System%20Folder, name is README, extension is .txt
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Working with Symbolic File Permissions Instead of Octal Values
Inhaltsvorschau
You want to print, inspect, or change permissions on a file or directory, but you don't want to specify the permissions in octal (e.g., 0644, 0755). You want to print permissions as ls(1) shows them (e.g., -rwx-r-xr-x) and specify permissions changes in the way that chmod(1) does (e.g., g-w to remove write access for the group).
Use the CPAN module Stat::lsMode to convert numeric permissions to a string:
use Stat::lsMode;



$lsmode = file_mode($pathname);
Use the CPAN module File::chmod to manipulate symbolic permissions:
use File::chmod;



chmod("g=rw,o=-w", @files);    # group can read/write, others can't write

chmod("-rwxr-xr--", @files);   # ls-style permissions
The Stat::lsMode module provides functions for generating ls-style permissions strings. The file_mode function takes a pathname and returns a permissions string. This string is false if the pathname doesn't exist or Perl can't stat it. If all goes well, you get a string like "drwxr-x---" for a directory or "-rwxr-x----" for a file. For more fine-grained control, Stat::lsMode offers format_mode, which takes a numeric permissions value and returns the 10-character ls-style string.
Notice the leading d and - in those strings. This indicates the type of file whose permissions you're inspecting: - means regular file, d means directory, l means symbolic link, and so on. The format_perms function from Stat::lsMode does the same job as format_mode, but it returns a nine-character string, which does not have the type indicator. For example:
use Stat::lsMode;

print file_mode("/etc"), "\n";

print format_mode((stat "/etc")[2]), "\n";

drwxr-xr-x

               r-xr-xr-x

            
The File::chmod module gives you a chmod that accepts these nine-character permissions strings:
use File::chmod;

chmod("rwxr-xr-x", @files);
These strings are three clusters of three characters. The three clusters represent what the user, group, and others can do to the file (respectively). The three characters represent
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: symirror
Inhaltsvorschau
The program in Example 9-6 recursively duplicates a directory tree, making a shadow forest full of symlinks pointing back at the real files.
Example 9-6. symirror

---

#!/usr/bin/perl 

# symirror - build spectral forest of symlinks



use warnings;

use strict;



use Cwd         qw(realpath);

use File::Find  qw(find);



die "usage: $0 realdir mirrordir" unless @ARGV == 2;



our $SRC = realpath $ARGV[0];

our $DST = realpath $ARGV[1];



my $oldmask = umask 077;        # in case was insanely uncreatable

chdir $SRC                      or die "can't chdir $SRC: $!";

unless (-d $DST) {

    mkdir($DST, 0700)           or die "can't mkdir $DST: $!";

}

find { 

    wanted      => \&shadow,

    postprocess => \&fixmode,

} => ".";

umask $oldmask;



sub shadow {

    (my $name = $File::Find::name) =~ s!^\./!!;     # correct name

    return if $name eq ".";

    if (-d) { # make a real dir; we'll copy mode later

        mkdir("$DST/$name", 0700)       

                        or die "can't mkdir $DST/$name: $!";

    } else {  # all else gets symlinked                      

        symlink("$SRC/$name", "$DST/$name")

                        or die "can't symlink $SRC/$name to $DST/$name: $!";

    }

}



sub fixmode {

    my $dir = $File::Find::dir;

    my $mode = (stat("$SRC/$dir"))[2] & 07777;

    chmod($mode, "$DST/$dir") 

        or die "can't set mode on $DST/$dir: $!";

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: lst
Inhaltsvorschau
Have you ever wondered what the newest or biggest files within a directory are? The standard ls program has options for listing out directories sorted in time order (the -t flag) and for recursing into subdirectories (the -R flag). However, it pauses at each directory to display the sorted contents of just that directory. It doesn't descend through all subdirectories first and then sort everything it found.
The following lst program does that. Here's an example using its -l flag to get a long listing:
% lst -l /etc

12695 0600      1     root    wheel      512 Fri May 29 10:42:41 1998

            /etc/ssh_random_seed

            12640 0644      1     root    wheel    10104 Mon May 25  7:39:19 1998

            /etc/ld.so.cache

            12626 0664      1     root    wheel    12288 Sun May 24 19:23:08 1998

            /etc/psdevtab

            12304 0644      1     root     root      237 Sun May 24 13:59:33 1998

            /etc/exports

            12309 0644      1     root     root     3386 Sun May 24 13:24:33 1998

            /etc/inetd.conf

            12399 0644      1     root     root    30205 Sun May 24 10:08:37 1998

            /etc/sendmail.cf

            18774 0644      1     gnat  perldoc     2199 Sun May 24  9:35:57 1998

            /etc/X11/XMetroconfig

            12636 0644      1     root    wheel      290 Sun May 24  9:05:40 1998

            /etc/mtab

            12627 0640      1     root     root        0 Sun May 24  8:24:31 1998

            /etc/wtmplock

            12310 0644      1     root  tchrist       65 Sun May 24  8:23:04 1998

            /etc/issue

....
/etc/X11/XMetroconfig showed up in the middle of the listing for /etc because it wasn't just for /etc, but for everything within that directory, recursively.
Other supported options include sorting on read time instead of write time using -u and sorting on size rather than time with -s. The -i flag takes the list of filenames from standard input instead of recursing with
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 10: Subroutines
Inhaltsvorschau
Composing mortals with immortal fire.
—W. H. Auden, "Three Songs for St Cecilia's Day"
To avoid the dangerous practice of copying and pasting code, larger programs reuse chunks of code as subroutines and functions. We'll use the terms subroutine and function interchangeably because Perl doesn't distinguish between the two. Even object-oriented methods are just subroutines that are called using a special syntax, described in Chapter 13.
A subroutine is declared with the sub keyword. Here's a simple subroutine definition:
sub hello { 

    $greeted++;          # global variable 

    print "hi there!\n";

}
The typical way of calling that subroutine is:
hello( );                 # call subroutine hello with no arguments/parameters
Because Perl compiles your program before executing it, it doesn't matter where subroutines are declared. Definitions don't have to be in the same file as your main program. They can be pulled in from other files using the do, require, or use operators, as described in Chapter 12. They can even be created on the fly using eval or AUTOLOAD, or generated using closures, which can act as function templates.
If you are familiar with other programming languages, several characteristics of Perl's functions may surprise you if you're unprepared for them. Most recipes in this chapter illustrate how to be aware of—and to take advantage of—these properties.
  • Perl functions have no formal, named parameters, but this is not necessarily a bad thing. See Recipe 10.1 and Recipe 10.7.
  • All variables are global unless declared otherwise. See Recipe 10.2, Recipe 10.3, and Recipe 10.13 for details.
  • Passing or returning more than one array or hash normally causes them to lose their separate identities. See Recipe 10.5, Recipe 10.8, Recipe 10.9, and Recipe 10.11 to avoid this.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
To avoid the dangerous practice of copying and pasting code, larger programs reuse chunks of code as subroutines and functions. We'll use the terms subroutine and function interchangeably because Perl doesn't distinguish between the two. Even object-oriented methods are just subroutines that are called using a special syntax, described in Chapter 13.
A subroutine is declared with the sub keyword. Here's a simple subroutine definition:
sub hello { 

    $greeted++;          # global variable 

    print "hi there!\n";

}
The typical way of calling that subroutine is:
hello( );                 # call subroutine hello with no arguments/parameters
Because Perl compiles your program before executing it, it doesn't matter where subroutines are declared. Definitions don't have to be in the same file as your main program. They can be pulled in from other files using the do, require, or use operators, as described in Chapter 12. They can even be created on the fly using eval or AUTOLOAD, or generated using closures, which can act as function templates.
If you are familiar with other programming languages, several characteristics of Perl's functions may surprise you if you're unprepared for them. Most recipes in this chapter illustrate how to be aware of—and to take advantage of—these properties.
  • Perl functions have no formal, named parameters, but this is not necessarily a bad thing. See Recipe 10.1 and Recipe 10.7.
  • All variables are global unless declared otherwise. See Recipe 10.2, Recipe 10.3, and Recipe 10.13 for details.
  • Passing or returning more than one array or hash normally causes them to lose their separate identities. See Recipe 10.5, Recipe 10.8, Recipe 10.9, and Recipe 10.11 to avoid this.
  • A function can know in which context it was called, how many arguments it was called with, and even which other function called it. See Recipe 10.4 and Recipe 10.6 to find out how.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Accessing Subroutine Arguments
Inhaltsvorschau
You have written a function that takes arguments supplied by its caller, and you need to access those arguments.
The special array @_ holds the values passed in as the function's arguments. Thus, the first argument to the function is in $_[0], the second in $_[1], and so on. The number of arguments is simply scalar(@_).
For example:
sub hypotenuse {

    return sqrt( ($_[0] ** 2) + ($_[1] ** 2) );

}



$diag = hypotenuse(3,4);  # $diag is 5
Most subroutines start by copying arguments into named private variables for safer and more convenient access:
sub hypotenuse {

    my ($side1, $side2) = @_;

    return sqrt( ($side1 ** 2) + ($side2 ** 2) );

}
It's been said that programming has only three nice numbers: zero, one, and however many you please. Perl's subroutine mechanism was designed to facilitate writing functions with as many—or as few—elements in the parameter and return lists as you wish. All incoming parameters appear as separate scalar values in the special array @_, which is automatically local to each function (see Recipe 10.13). To return a value or values from a subroutine, use the return statement with arguments. If there is no return statement, the return value is the result of the last evaluated expression.
Here are some sample calls to the hypotenuse function defined in the Solution:
print hypotenuse(3, 4), "\n";               # prints 5



@a = (3, 4);

print hypotenuse(@a), "\n";                 # prints 5
If you look at the arguments used in the second call to hypotenuse, it might appear that only one argument was passed: the array @a. This isn't what happens—the elements of @a are copied into the @_ array separately. Similarly, if you called a function with (@a, @b), you'd be giving it all arguments from both arrays. This is the same principle of flattened lists at work as in:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making Variables Private to a Function
Inhaltsvorschau
Your subroutine needs temporary variables. You shouldn't use global variables, because another subroutine might also use the same variables.
Use my to declare a variable private to a region of your program:
sub somefunc {

    my $variable;                 # $variable is invisible outside somefunc( )

    my ($another, @an_array, %a_hash);     # declaring many variables at once



    # ...

}
The my operator confines a variable to a particular region of code in which it can be used and accessed. Outside that region, it can't be accessed. This region is called its scope.
Variables declared with my have lexical scope, meaning that they exist only within a specific textual region of code. For instance, the scope of $variable in the Solution is the function it was defined in, somefunc. The variable is created when somefunc is entered, and it is destroyed when the function returns. The variable can be accessed only from inside the function, not from outside.
A lexical scope is usually a block of code with braces around it, such as those defining the body of the somefunc subroutine or those marking the code blocks of if, while, for, foreach, and eval. An entire source file and the string argument to eval are each a lexical scope; think of them as blocks with invisible braces delimiting their confines. Because a lexical scope is most often found as a brace-delimited block, when discussing lexical variables we sometimes say that they are visible only in their block, but what we really mean is that they're visible only in their scope.
The code that can legally access a my variable is determined statically at compile time and never changes, and so lexical scoping is sometimes referred to as static scoping, especially when in contrast to dynamic scoping, a topic we'll cover in Recipe 10.13.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating Persistent Private Variables
Inhaltsvorschau
You want a variable to retain its value between calls to a subroutine but not be visible outside that routine. For instance, you'd like your function to keep track of how many times it was called.
Wrap the function in another block, then declare my variables in that block's scope rather than in the function's:
{

    my $variable;

    sub mysub {

        # ... accessing $variable

    }

}
If the variables require initialization, make that block an INIT so the variable is guaranteed to be set before the main program starts running:
INIT {

    my $variable = 1;                       # initial value

    sub othersub {                          

      # ... accessing $variable

    }

}
Unlike local variables in C or C++, Perl's lexical variables don't necessarily get recycled just because their scope has exited. If something more permanent is still aware of the lexical, it will stick around. In this code, mysub uses $variable, so Perl doesn't reclaim the variable when the block around the definition of mysub ends.
Here's how to write a counter:
{

    my $counter;

    sub next_counter { return ++$counter }

}
Each time next_counter is called, it increments and returns the $counter variable. The first time next_counter is called, $counter is undefined, so it behaves as though it were 0 for the ++. The variable is not part of next_counter's scope, but rather part of the block surrounding it. No code from outside can change $counter except by calling next_counter.
Generally, you should use an INIT for the extra scope. Otherwise, you could call the function before its variables were initialized.
INIT {

    my $counter = 42;

    sub next_counter { return ++$counter }

    sub prev_counter { return --$counter }

}
This technique creates the Perl equivalent of C's static variables. Actually, it's a little better: rather than being limited to just one function, both functions share their private variable.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Determining Current Function Name
Inhaltsvorschau
You want to determine the name of the currently running function. This is useful for creating error messages that don't need to be changed if you copy and paste the subroutine code.
Use the caller function:
$this_function = (caller(0))[3];
Code can always determine the current source line number via the special symbol _ _LINE_ _, the current file via _ _FILE_ _, and the current package via _ _PACKAGE_ _. But no such symbol for the current subroutine name exists, let alone the name for the subroutine that called this one.
The built-in function caller handles all of these. In scalar context it returns the calling function's package name, but in list context it returns much more. You can also pass it a number indicating how many frames (nested subroutine calls) back you'd like information about: 0 is your own function, 1 is your caller, and so on.
Here's the full syntax, where $i is how far back you're interested in:
($package, $filename, $line, $subr, $has_args, $wantarray 

#   0         1         2       3       4          5

 $evaltext, $is_require, $hints, $bitmask

#   6         7            8       9

)= caller($i);
Here's what each of those return values means:
$package
The package in which the code was compiled.
$filename
The name of the file in which the code was compiled, reporting -e if launched from that command-line switch, or - if the script was read from standard input.
$line
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Passing Arrays and Hashes by Reference
Inhaltsvorschau
You want to pass a function more than one array or hash and have each remain distinct. For example, you want to put the algorithm from Recipe 4.8 into a subroutine. This subroutine must then be called with two arrays that remain distinct.
Pass arrays and hashes by reference, using the backslash operator:
array_diff( \@array1, \@array2 );
See Chapter 11 for more about manipulation of references. Here's a subroutine that expects array references, along with code to call it correctly:
@a = (1, 2);

@b = (5, 8);

@c = add_vecpair( \@a, \@b );

print "@c\n";

6 10



sub add_vecpair {                      # assumes both vectors the same length

    my ($x, $y) = @_;                  # copy in the array references

    my @result;



    for (my $i=0; $i < @$x; $i++) {

      $result[$i] = $x->[$i] + $y->[$i];

    }



    return @result;

}
A potential problem with this function is that it doesn't verify the number and types of arguments passed into it. You could check explicitly this way:
unless (@_ =  = 2 && ref($x) eq 'ARRAY' && ref($y) eq 'ARRAY') {

    die "usage: add_vecpair ARRAYREF1 ARRAYREF2";

}
If all you plan to do is die on error (see Recipe 10.12), you can sometimes omit this check, since dereferencing the wrong kind of reference triggers an exception anyway. However, good defensive programming style encourages argument validation for all functions.
The sections on "Passing References" and on "Prototypes" in Chapter 6 of Programming Perl and on "Pass by Reference" in perlsub(1); Recipe 10.11; Chapter 11; Chapter 8 of Programming Perl
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Detecting Return Context
Inhaltsvorschau
You want to know in which context your function was called. This lets one function do different things, depending on how its return value or values are used, just like many of Perl's built-in functions.
Use the wantarray( ) function, which has three possible return values, depending on how the current function was called:
if (wantarray( )) {

    # list context

}

elsif (defined wantarray( )) {

    # scalar context

}

else {

    # void context

}
Many built-in functions act differently when called in scalar context than they do when called in list context. A user-defined function can learn which context it was called in by checking wantarray. List context is indicated by a true return value. If wantarray returns a value that is false but defined, then the function's return value will be used in scalar context. If wantarray returns undef, your function isn't being asked to provide any value at all.
if (wantarray( )) {

    print "In list context\n";

    return @many_things;

} elsif (defined wantarray( )) {

    print "In scalar context\n";

    return $one_thing;

} else {

    print "In void context\n";

    return;  # nothing

}



mysub( );                    # void context



$a = mysub( );               # scalar context

if (mysub( )) {  }           # scalar context



@a = mysub( );               # list context

print mysub( );              # list context
The return and wantarray functions in Chapter 29 of Programming Perl and in perlfunc(1)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Passing by Named Parameter
Inhaltsvorschau
You want to make a function with many parameters that are easy to call so that programmers remember what the arguments do, rather than having to memorize their order.
Name each parameter in the call:
thefunc(INCREMENT => "20s", START => "+5m", FINISH => "+30m");

thefunc(START => "+5m", FINISH => "+30m");

thefunc(FINISH => "+30m");

thefunc(START => "+5m", INCREMENT => "15s");
Then in the subroutine, create a hash loaded up with default values plus the array of named pairs.
sub thefunc {

    my %args = ( 

        INCREMENT   => '10s', 

        FINISH      => 0, 

        START       => 0, 

        @_,         # argument pair list goes here

    );

    if ($args{INCREMENT}  =~ /m$/ ) { ... }

}
Functions whose arguments require a particular order work well for short argument lists, but as the number of parameters increases, it's awkward to make some optional or have default values. You can only leave out trailing arguments, never initial ones.
A more flexible approach allows the caller to supply arguments using name-value pairs. The first element of each pair is the argument name; the second, its value. This makes for self-documenting code because you can see the parameters' intended meanings without having to read the full function definition. Even better, programmers using your function no longer have to remember argument order, and they can leave unspecified any extraneous, unused arguments.
This works by having the function declare a private hash variable to hold the default parameter values. Put the current arguments, @_, after the default values, so the actual arguments override the defaults because of the order of the values in the assignment.
A common variation on this is to preface the parameter name with a hyphen, intended to evoke the feel of command-line parameters:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Skipping Selected Return Values
Inhaltsvorschau
You have a function that returns many values, but you only care about some of them. The stat function is a classic example: you often want only one value from its long return list (mode, for instance).
Either assign to a list that has undef in some positions:
($a, undef, $c) = func( );
or else take a slice of the return list, selecting only what you want:
($a, $c) = (func( ))[0,2];
Using dummy temporary variables is wasteful; plus it feels artificial and awkward:
($dev,$ino,$DUMMY,$DUMMY,$uid) = stat($filename);
A nicer style is to use undef instead of dummy variables to discard a value:
($dev,$ino,undef,undef,$uid)   = stat($filename);
Or you can take a slice, picking up just the values you care about:
($dev,$ino,$uid,$gid)   = (stat($filename))[0,1,4,5];
If you want to put an expression into list context and discard all of its return values (calling it simply for side effects), you can assign this to the empty list:
( ) = some_function( );
This last strategy is rather like a list version of the scalar operator—it calls the function in list context, even in a place it wouldn't otherwise do so. You can get just a count of return values this way:
$count = ( ) = some_function( );
or you can call it in list context and make sure it returns some non-zero number of items (which you immediately discard) like this:
if (( ) = some_function( ))  { .... }
If you hadn't assigned to the empty list, the Boolean context of the if test would have called the function in scalar context.
The section on "List Values and Arrays" in Chapter 2 of Programming Perl and perlsub(1); Recipe 3.1
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Returning More Than One Array or Hash
Inhaltsvorschau
You want a function to return more than one array or hash, but the return list flattens into just one long list of scalars.
Return references to the hashes or arrays:
($array_ref, $hash_ref) = somefunc( );



sub somefunc {

    my @array;

    my %hash;



    # ...



    return ( \@array, \%hash );

}
Just as all arguments collapse into one flat list of scalars, return values do, too. Functions that want to return multiple, distinct arrays or hashes need to return those by reference, and the caller must be prepared to receive references. If a function wants to return three separate hashes, for example, it should use one of the following:
sub fn { 

    .....

    return (\%a, \%b, \%c); # or                           

    return \(%a,  %b,  %c); # same thing

}
The caller must expect a list of hash references returned by the function. It cannot just assign to three hashes.
(%h0, %h1, %h2)  = fn( );    # WRONG!

@array_of_hashes = fn( );    # eg: $array_of_hashes[2]{"keystring"}

($r0, $r1, $r2)  = fn( );    # eg: $r2->{"keystring"}
The general discussions on references in Chapter 11, and in Chapter 8 of Programming Perl; Recipe 10.5
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Returning Failure
Inhaltsvorschau
You want to return a value indicating that your function failed.
Use a bare return statement without any argument, which returns undef in scalar context and the empty list ( ) in list context.
return;
A return without an argument means:
sub empty_retval {

    return ( wantarray ? ( ) : undef );

}
You can't use just return undef, because in list context you will get a list of one value: undef. If your caller says:
if (@a = yourfunc( )) { ... }
then the "error" condition will be perceived as true because @a will be assigned (undef) and then evaluated in scalar context. This yields 1, the number of elements assigned to @a, which is true. You could use the wantarray function to see what context you were called in, but a bare return is a clear and tidy solution that always works:
unless ($a = sfunc( )) { die "sfunc failed" }

unless (@a = afunc( )) { die "afunc failed" }

unless (%a = hfunc( )) { die "hfunc failed" }
Some of Perl's built-in functions have a peculiar return value. Both fcntl and ioctl have the curious habit of returning the string "0 but true" in some circumstances. (This magic string is conveniently exempt from nagging warnings about improper numerical conversions.) This has the advantage of letting you write code like this:
ioctl(....) or die "can't ioctl: $!";
That way, code doesn't have to check for a defined zero as distinct from the undefined value, as it would for the read or glob functions. "0 but true" is zero when used numerically. It's rare that this kind of return value is needed. A more common (and spectacular) way to indicate failure in a function is to raise an exception, as described in Recipe 10.12.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Prototyping Functions
Inhaltsvorschau
You want to use function prototypes so the compiler can check your argument types.
Perl has something of a prototype facility, but it isn't what you're thinking. Perl's function prototypes are more like a context coercion used to write functions that behave like some Perl built-ins, such as push and pop.
Manually checking the validity of a function's arguments can't happen until runtime. If you make sure the function is declared before it is used, you can tickle the compiler into using a very limited form of prototype checking. But don't confuse Perl's function prototypes with those found in any other language.
A Perl function prototype is zero or more spaces, backslashes, or type characters enclosed in parentheses after the subroutine definition or name. A backslashed type symbol means that the argument is passed by reference, and the argument in that position must start with that type character.
A prototype can impose context on the prototyped function's arguments. This is done when Perl compiles your program. But this does not always mean that Perl checks the number or type of arguments; since a scalar prototype is like inserting a scalar in front of just one argument, sometimes an implicit conversion occurs instead. For example, if Perl sees func(3, 5) for a function prototyped as sub func ($), it will stop with a compile-time error. But if it sees func(@array) with the same prototype, it will merely put @array into scalar context instead of complaining that you passed an array, but it wanted a scalar.
This is so important that it bears repeating: don't use Perl prototypes expecting the compiler to check type and number of arguments for you. It does a little bit of that, sometimes, but mostly it's about helping you type less, and sometimes to emulate the calling and parsing conventions of built-in functions.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Handling Exceptions
Inhaltsvorschau
How do you safely call a function that might raise an exception? How do you create a function that raises an exception?
Sometimes you encounter a problem so exceptional that merely returning an error isn't strong enough, because the caller could unintentionally ignore the error. Use die STRING from your function to trigger an exception:
die "some message";         # raise exception
The caller can wrap the function call in an eval to intercept that exception, then consult the special variable $@ to see what happened:
eval { func( ) };

if ($@) {

    warn "func raised an exception: $@";

}
Raising exceptions is not a facility to be used lightly. Most functions should return an error using a bare return statement. Wrapping every call in an exception trap is tedious and unsightly, removing the appeal of using exceptions in the first place.
But, on rare occasions, failure in a function should cause the entire program to abort. Rather than calling the irrecoverable exit function, you should call die instead, which at least gives the programmer the chance to cope. If no exception handler has been installed via eval, then the program aborts at that point.
To detect this, wrap the call to the function with a block eval. The $@ variable will be set to the offending exception if one occurred; otherwise, it will be false.
eval { $val = func( ) };

warn "func blew up: $@" if $@;
Any eval catches all exceptions, not just specific ones. Usually you should propagate unexpected exceptions to an enclosing handler. For example, suppose your function raised an exception containing the string "Full moon!". You could safely trap that exception while letting others through by inspecting the $@ variable. Calling die without an argument uses the contents of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Saving Global Values
Inhaltsvorschau
You need to temporarily save away the value of a global variable.
Use the local operator to save a previous global value, automatically restoring it when the current block exits:
our $age = 18;          # declare and set global variable

if (CONDITION) {

    local $age = 23;

    func( );         # sees temporary value of 23

} # Perl restores the old value at block exit
Despite its name, Perl's local operator does not create a local variable. That's what my does. Instead, local merely preserves an existing value for the duration of its enclosing block. Hindsight shows that if local had been called save_value instead, much confusion could have been avoided.
Three places where you must use local instead of my are:
  1. You need to give a global variable a temporary value, especially $_.
  2. You need to create a local file or directory handle or a local function.
  3. You want to temporarily change just one element of an array or hash.

Section 10.13.3.1: Using local( ) for temporary values for globals

The first situation is more apt to happen with predefined, built-in variables than with user variables. Often these are variables that Perl consults for hints for its high-level operations. In particular, any function that uses $_, implicitly or explicitly, should certainly have a local $_. This is annoyingly easy to forget to do. See Recipe 13.15 for one solution to this.
Another common target for local is the $/ variable, a global that implicitly affects the behavior of the readline operator used in <FH> operations.
$para = get_paragraph(*FH);        # pass filehandle glob 

$para = get_paragraph(*FH);        # pass filehandle by glob reference

$para = get_paragraph(*IO{FH});    # pass filehandle by IO reference

sub get_paragraph {

    my $fh = shift;  

    local $/ = '';        

    my $paragraph = <$fh>;

    chomp($paragraph);

    return $paragraph;

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Redefining a Function
Inhaltsvorschau
You want to temporarily or permanently redefine a function, but functions can't be assigned to.
To redefine a function, assign a new code reference to the typeglob of the name of that function. Use local if you want this redefinition to be temporary.
undef &grow;                       # silence -w complaints of redefinition

*grow = \&expand;           

grow( );                          # calls expand( )



{

    local *grow = \&shrink;        # only until this block exists

    grow( );                      # calls shrink( )

}
Unlike a variable (but like named filehandles, directory handles, and formats), a named function cannot be directly assigned to. It's just a name and doesn't vary. You can manipulate it almost as though it were a variable, because you can directly manipulate the runtime symbol table using typeglobs like *foo to produce interesting aliasing effects.
Assigning a reference to a typeglob changes what is accessed the next time a symbol of the referent's type is needed. This is what the Exporter does when you import a function or variable from one package into another. Since this is direct manipulation of the package symbol table, it works only on package variables (globals), not lexicals.
*one::var = \%two::Table;   # make %one::var alias for %two::Table

*one::big = \&two::small;   # make &one::big alias for &two::small
A typeglob is one of those things you can only use local on, not my. If you do use local, the aliasing effect is then limited to the duration of the current block.
local *fred = \&barney;     # temporarily alias &fred to &barney
If the value assigned to a typeglob is not a reference but itself another typeglob, then all types by that name are aliased. The types aliased in a full typeglob assignment are scalar, array, hash, function, filehandle, directory handle, and format. That means that assigning
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Trapping Undefined Function Calls with AUTOLOAD
Inhaltsvorschau
You want to intercept calls to undefined functions so you can handle them gracefully.
Declare a function called AUTOLOAD for the package whose undefined function calls you'd like to trap. While running, that package's $AUTOLOAD variable contains the name of the undefined function being called.
Another strategy for creating similar functions is to use a proxy function. If you call an undefined function, instead of automatically raising an exception, you can trap the call. If the function's package has a function named AUTOLOAD, then this function is called in its place, with the special package global $AUTOLOAD set to the package-qualified function name. The AUTOLOAD subroutine can then do whatever that function would do.
sub AUTOLOAD {

    my $color = our $AUTOLOAD;

    $color =~ s/.*:://;

    return "<FONT COLOR='$color'>@_</FONT>";

} 

#note: sub chartreuse isn't defined.

print chartreuse("stuff");
When the nonexistent main::chartreuse function is called, rather than raising an exception, main::AUTOLOAD is called with the same arguments as you passed chartreuse. The package variable $AUTOLOAD would contain the string main::chartreuse because that's the function it's proxying.
The technique using typeglob assignments shown in Recipe 10.14 is faster and more flexible than using AUTOLOAD. It's faster because you don't have to run the copy and substitute. It's more flexible because it lets you do this:
{

    local *yellow = \&violet;  

    local (*red, *green) = (\&green, \&red);

    print_stuff( );

}
While print_stuff( ) is running, including from within any functions it calls, anything printed in yellow will come out violet, and the red and green texts will exchange colors.
Aliasing subroutines like this won't handle calls to undefined subroutines. AUTOLOAD does.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Nesting Subroutines
Inhaltsvorschau
You want subroutines to nest, such that one subroutine is visible and callable only from another. When you try the obvious approach of nesting sub FOO { sub BAR { } ... }, Perl gives warnings about variables that will not stay shared.
Instead of making the inner functions normal subroutines, make them closures and temporarily assign their references to the typeglob of the right name to create a localized function.
If you use nested subroutines in other programming languages with their own private variables, you'll have to work at it a bit in Perl. The intuitive coding of this kind of thing gives the warning "will not stay shared." For example, this won't work:
sub outer {

    my $x = $_[0] + 35;

    sub inner { return $x * 19 }   # WRONG

    return $x + inner( );

}
The following is a workaround:
sub outer {

    my $x = $_[0] + 35;

    local *inner = sub { return $x * 19 };

    return $x + inner( );

}
Now inner( ) can be called only from within outer( ) because of the temporary assignments of the closure. Once called, it has normal access to the lexical variable $x from the scope of outer( ).
This essentially creates a function local to another function, something not directly supported in Perl; however, the programming isn't always clear.
The sections on "Symbol Tables" in Chapter 10 in Programming Perl and in perlmod(1); the sections on "Closures" and "Symbol Table References" in Chapter 8 of Programming Perl and the discussion of closures in perlref(1); Recipe 10.13; Recipe 11.4
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a Switch Statement
Inhaltsvorschau
You want to write a multiway branch statement, much as you can in C using its switch statement or in the shell using case—but Perl seems to support neither.
Use the Switch module, standard as of the v5.8 release of Perl.
use Switch;

switch ($value) {

    case 17         { print "number 17"       }

    case "snipe"    { print "a snipe"         }

    case /[a-f]+/i  { print "pattern matched" }

    case [1..10,42] { print "in the list"     }

    case (@array)   { print "in the array"    }

    case (%hash)    { print "in the hash"     }

    else            { print "no case applies" }

}
The Switch module extends Perl's basic syntax by providing a powerful and flexible switch construct. In fact, it's so powerful and flexible that instead of a complete description of how it works, we'll instead provide examples of some common uses. For the full story, make sure to consult the documentation that accompanies the module.
A switch takes an argument and a mandatory block, within which can occur any number of cases. Each of those cases also takes an argument and a mandatory block. The arguments to each case can vary in type, allowing (among many other things) any or all of string, numeric, or regex comparisons against the switch's value. When the case is an array or hash (or reference to the same), the case matches if the switch value corresponds to any of the array elements or hash keys. If no case matches, a trailing else block will be executed.
Unlike certain languages' multiway branching constructs, here once a valid case is found and its block executed, control transfers out of the enclosing switch. In other words, there's no implied fall-through behavior the way there is in C. This is considered desirable because even the best of programmers will occasionally forget about fall-through.
However, this is Perl, so you can have your cake and eat it, too. Just use a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: Sorting Your Mail
Inhaltsvorschau
The program in Example 10-1 sorts a mailbox by subject by reading input a paragraph at a time, looking for one with a "From" at the start of a line. When it finds one, it searches for the subject, strips it of any "Re: " marks, and stores its lowercased version in the @sub array. Meanwhile, the messages themselves are stored in a corresponding @msgs array. The $msgno variable keeps track of the message number.
Example 10-1. bysub1
  #!/usr/bin/perl 

  # bysub1 - simple sort by subject

  my(@msgs, @sub);

  my $msgno = -1;

  $/ = '';                    # paragraph reads

  while (<>) {

      if (/^From/m) {

          /^Subject:\s*(?:Re:\s*)*(.*)/mi;

          $sub[++$msgno] = lc($1) || '';

      }

      $msgs[$msgno] .= $_;

  } 

  for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {

      print $msgs[$i];

  }
That sort is only sorting array indices. If the subjects are the same, cmp returns 0, so the second part of the || is taken, which compares the message numbers in the order they originally appeared.
If sort were fed a list like (0,1,2,3), that list would get sorted into a different permutation, perhaps (2,1,3,0). We iterate across them with a for loop to print out each message.
Example 10-2 shows how an awk programmer might code this program, using the -00 switch to read paragraphs instead of lines.
Example 10-2. bysub2
  #!/usr/bin/perl -n00

  # bysub2 - awkish sort-by-subject

  INIT { $msgno = -1 }

  $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;

  $msg[$msgno] .= $_;

  END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
Perl programmers have used parallel arrays like this since Perl 1. Keeping each message in a hash is a more elegant solution, though. We'll sort on each field in the hash, by making an anonymous hash as described in Chapter 11.
Example 10-3 is a program similar in spirit to Example 10-1 and Example 10-2.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 11: References and Records
Inhaltsvorschau
With as little a web as this will I ensnare as great a fly as Cassio.
—Shakespeare, Othello, Act II, scene i
Perl provides three fundamental data types: scalars, arrays, and hashes. It's certainly possible to write many programs without complex records, but most programs need something more sophisticated than simple variables and lists.
Perl's three built-in types combine with references to produce arbitrarily complex and powerful data structures. Selecting the proper data structure and algorithm can make the difference between an elegant program that does its job quickly and an ungainly concoction that's glacially slow to execute and consumes system resources voraciously.
The first part of this chapter shows how to create and use plain references. The second part shows how to create higher-order data structures out of references.
To grasp the concept of references, you must first understand how Perl stores values in variables. Each defined variable has associated with it a name and the address of a chunk of memory. This idea of storing addresses is fundamental to references because a reference is a value that holds the location of another value. The scalar value that contains the memory address is called a reference. Whatever value lives at that memory address is called its referent. See Figure 11-1.
Figure 11-1: Reference and referent
The referent could be any built-in type (scalar, array, hash, ref, code, or glob) or a user-defined type based on one of the built-ins.
Referents in Perl are typed. This means, for example, that you can't treat a reference to an array as though it were a reference to a hash. Attempting to do so raises a runtime exception. No mechanism for type casting exists in Perl. This is considered a feature.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Perl provides three fundamental data types: scalars, arrays, and hashes. It's certainly possible to write many programs without complex records, but most programs need something more sophisticated than simple variables and lists.
Perl's three built-in types combine with references to produce arbitrarily complex and powerful data structures. Selecting the proper data structure and algorithm can make the difference between an elegant program that does its job quickly and an ungainly concoction that's glacially slow to execute and consumes system resources voraciously.
The first part of this chapter shows how to create and use plain references. The second part shows how to create higher-order data structures out of references.
To grasp the concept of references, you must first understand how Perl stores values in variables. Each defined variable has associated with it a name and the address of a chunk of memory. This idea of storing addresses is fundamental to references because a reference is a value that holds the location of another value. The scalar value that contains the memory address is called a reference. Whatever value lives at that memory address is called its referent. See Figure 11-1.
Figure 11-1: Reference and referent
The referent could be any built-in type (scalar, array, hash, ref, code, or glob) or a user-defined type based on one of the built-ins.
Referents in Perl are typed. This means, for example, that you can't treat a reference to an array as though it were a reference to a hash. Attempting to do so raises a runtime exception. No mechanism for type casting exists in Perl. This is considered a feature.
So far, it may look as though a reference were little more than a raw address with strong typing. But it's far more than that. Perl takes care of automatic memory allocation and deallocation (garbage collection) for references, just as it does for everything else. Every chunk of memory in Perl has a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Taking References to Arrays
Inhaltsvorschau
You need to manipulate an array by reference.
To get a reference to an array:
$aref               = \@array;

$anon_array         = [1, 3, 5, 7, 9];

$anon_copy          = [ @array ];

@$implicit_creation = (2, 4, 6, 8, 10);
To deference an array reference, precede it with an at sign (@):
push(@$anon_array, 11);
Or use a pointer arrow plus a bracketed subscript for a particular element:
$two = $implicit_creation->[0];
To get the last index number by reference, or the number of items in that referenced array:
$last_idx  = $#$aref;

$num_items = @$aref;
Or defensively embracing and forcing context:
$last_idx  = $#{ $aref };

$num_items = scalar @{ $aref };
Here are array references in action:
# check whether $someref contains a simple array reference

if (ref($someref) ne "ARRAY") {

    die "Expected an array reference, not $someref\n";

}



print "@{$array_ref}\n";        # print original data



@order = sort @{ $array_ref };  # sort it



push @{ $array_ref }, $item;    # append new element to orig array
If you can't decide whether to use a reference to a named array or to create a new one, here's a simplistic guideline that will prove right more often than not. Only take a reference to an existing array to return the reference out of scope, thereby creating an anonymous array, or to pass the array by reference to a function. For virtually all other cases, use [@array] to create a new array reference with a copy of the old values.
Automatic reference counting and the backslash operator make a powerful combination:
sub array_ref {

    my @array;

    return \@array;

}



$aref1 = array_ref( );

$aref2 = array_ref( );
Each time array_ref is called, the function allocates a new piece of memory for
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making Hashes of Arrays
Inhaltsvorschau
For each key in a hash, only one scalar value is allowed, but you'd like to use one key to store and retrieve multiple values. That is, you'd like the value to produce a list.
Use references to arrays as the hash values. Use push to append:
push(@{ $hash{"KEYNAME"} }, "new value");
Then, dereference the value as an array reference when printing out the hash:
foreach $string (keys %hash) {

    print "$string: @{$hash{$string}}\n";

}
You can only store scalar values in a hash. References, however, are scalars. This solves the problem of storing multiple values for one key by making $hash{$key} a reference to an array containing the values for $key. Normal hash operations acting on individual scalar values (insertion, deletion, iteration, and testing for existence) are now written with array operations acting on lists of values (like push, splice, and foreach).
Here's how to give a key many values:
$hash{"a key"} = [ 3, 4, 5 ];       # anonymous array
Once you have a key with many values, here's how to use them:
@values = @{ $hash{"a key"} };
To append a new value to the array of values associated with a particular key, use push:
push @{ $hash{"a key"} }, $value;
One common application of such data structures is inverting a hash that may have several keys with the same associated value. When inverted, you end up with a hash that has many values for the same key. This is addressed in Recipe 5.9.
Be warned that this:
@residents = @{ $phone2name{$number} };
causes a runtime exception under use strict because you're dereferencing an undefined reference where autovivification won't occur. You must do this instead:
@residents = exists( $phone2name{$number} )

                       ? @{ $phone2name{$number} }

                       : ( );
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Taking References to Hashes
Inhaltsvorschau
You need to manipulate a hash by reference. This might be because it was passed into a function that way or because it's part of a larger data structure.
To get a hash reference:
$href = \%hash;

$anon_hash = { "key1" => "value1", "key2" => "value2", ... };

$anon_hash_copy = { %hash };
To dereference a hash reference:
%hash  = %$href;

$value = $href->{$key};

@slice = @$href{$key1, $key2, $key3};  # note: no arrow!

@keys  = keys %$href;
To check whether something is a hash reference:
if (ref($someref) ne "HASH") {

    die "Expected a hash reference, not $someref\n";

}
This example prints out all keys and values from two predefined hashes:
foreach $href ( \%ENV, \%INC ) {       # OR: for $href ( \(%ENV,%INC) ) {

    foreach $key ( keys %$href ) {

        print "$key => $href->{$key}\n";

    }

}
Access slices of hashes by reference as you'd access slices of arrays by reference. For example:
@values = @$hash_ref{"key1", "key2", "key3"};



for $val (@$hash_ref{"key1", "key2", "key3"}) {

    $val += 7;   # add 7 to each value in hash slice

}
The Introductionin Chapter 5; Chapter 8 of Programming Perl; perlref(1); Recipe 11.9
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Taking References to Functions
Inhaltsvorschau
You need to manipulate a subroutine by reference. This might happen if you need to create a signal handler, a Tk callback, or a hash of function pointers.
To get a code reference:
$cref = \&func;

$cref = sub { ... };
To call a code reference:
@returned = $cref->(@arguments);

@returned = &$cref(@arguments);
If the name of a function is func, you can produce a reference to it by prefixing its name with \&. You can also dynamically allocate anonymous functions using the sub { } notation. These code references can be stored just like any other reference.
It is possible to store the name of a function in a variable, such as:
$funcname = "thefunc";

&$funcname( );
but that's not a very good solution for several reasons. First, it uses symbolic references, not real (hard) references, and so is forbidden under the use strict "refs" pragma. Symbolic references to variables are usually a bad idea, since they can't access lexical variables, only globals, and aren't reference counted. Second, as written it doesn't include package information, so if executed in a different package, it would try to call the wrong function. Finally, in the odd case that the function were redefined at some point, the symbolic reference would get whatever the current definition for the function was, whereas the hard reference would still access the old definition.
Instead of placing the name of the function in the variable, use the backslash operator to create a reference to the function. This is the normal way to store a function in a variable or pass along to another function. You can mix and match references to named functions with references to unnamed ones:
my %commands = (

    "happy" => \&joy,

    "sad"   => \&sullen,

    "done"  => sub { die "See ya!" },

    "mad"   => \&angry,

);



print "How are you? ";

chomp($string = <STDIN>);

if ($commands{$string}) {

    $commands{$string}->( );

} else {

    print "No such command: $string\n";

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Taking References to Scalars
Inhaltsvorschau
You want to create and manipulate a reference to a scalar value.
To create a reference to a scalar variable, use the backslash operator:
$scalar_ref = \$scalar;       # get reference to named scalar
To create a reference to an anonymous scalar value (a value that isn't in a variable), assign to a dereferenced undefined variable:
undef $anon_scalar_ref;

$$anon_scalar_ref = 15;
This creates a reference to a constant scalar:
$anon_scalar_ref = \15;
Use ${...} to dereference:
print ${ $scalar_ref };       # dereference it

${ $scalar_ref } .= "string"; # alter referent's value
If you want to create many new anonymous scalars, use a subroutine that returns a reference to a lexical variable out of scope, as explained in this chapter's Introduction:
sub new_anon_scalar {

    my $temp;

    return \$temp;

}
Dereference a scalar reference by prefacing it with $ to get at its contents:
$sref = new_anon_scalar( );

$$sref = 3;

print "Three = $$sref\n";

@array_of_srefs = ( new_anon_scalar( ), new_anon_scalar( ) );

${ $array[0] } = 6.02e23;

${ $array[1] } = "avocado";

print "\@array contains: ", join(", ", map { $$_ } @array ), "\n";
Notice we put braces around $array[0] and $array[1]. If we tried to say $$array[0], the tight binding of dereferencing would turn it into $array->[0]. It would treat $array as an array reference and return the element at index zero.
Here are other examples where it is safe to omit the braces:
$var        = `uptime`;     # $var holds text

$vref       = \$var;        # $vref "points to" $var

if ($$vref =~ /load/) {  }    # look at $var, indirectly

chomp $$vref;               # alter $var, indirectly
As mentioned in the Introduction, you may use the ref built-in to inspect a reference for its referent's type. Calling
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating Arrays of Scalar References
Inhaltsvorschau
You want to create and manipulate an array of references to scalars. This arises when you pass variables by reference to a function so the function can change their values.
To create an array, either backslash each scalar in the list to store in the array:
@array_of_scalar_refs = ( \$a, \$b );
or simply backslash the entire list, taking advantage of the backslash operator's distributive property:
@array_of_scalar_refs = \( $a, $b );
To get or set the value of an element of the list, use ${ ... }:
${ $array_of_scalar_refs[1] } = 12;         # $b = 12
In the following examples, @array is a simple array containing references to scalars (an array of references is not a reference to an array). To access the original data indirectly, braces are mandatory.
($a, $b, $c, $d) = (1 .. 4);        # initialize

@array =  (\$a, \$b, \$c, \$d);     # refs to each scalar

@array = \( $a,  $b,  $c,  $d);     # same thing!

@array = map { \my $anon } 0 .. 3;  # allocate 4 anon scalar refs



${ $array[2] } += 9;                # $c now 12



${ $array[ $#array ] } *= 5;        # $d now 20

${ $array[-1] }        *= 5;        # same; $d now 100



$tmp   = $array[-1];                # using temporary

$$tmp *= 5;                         # $d now 500
The two assignments to @array are equivalent—the backslash operator is distributive across a list. So preceding a list (including a slice or a function's return list, but not an array) with a backslash is the same as applying a backslash to everything in that list. The ensuing code changes the values of the variables whose references were stored in the array.
Here's how to deal with such an array without explicit indexing:
use Math::Trig qw(pi);              # load the constant pi

foreach $sref (@array) {            # prepare to change $a,$b,$c,$d

    ($$sref **= 3) *= (4/3 * pi);   # replace with spherical volumes

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Closures Instead of Objects
Inhaltsvorschau
You want records with private state, behavior, and identity, but you don't want to learn object-oriented programming to accomplish this.
Write a function that returns (by reference) a hash of code references. These code references are closures created in the same scope, so when they execute, they'll share bindings to the same private variables.
Because a closure is a binding of code and data, it can implement what might be thought of as an object.
Here's an example that creates and returns a hash of anonymous functions. mkcounter takes an argument of a seed counter and returns a reference to a hash of code references that you can use to manipulate the counter indirectly.
$c1 = mkcounter(20);

$c2 = mkcounter(77);



printf "next c1: %d\n", $c1->{NEXT}->( );  # 21

printf "next c2: %d\n", $c2->{NEXT}->( );  # 78

printf "next c1: %d\n", $c1->{NEXT}->( );  # 22

printf "last c1: %d\n", $c1->{PREV}->( );  # 21

printf "old  c2: %d\n", $c2->{RESET}->( ); # 77
The code values in the hash references in $c1 and $c2 maintain their own separate state. Here's how to set that up:
sub mkcounter {

    my $count  = shift;

    my $start  = $count;

    my $bundle = {

        "NEXT"   => sub { return ++$count  },

        "PREV"   => sub { return --$count  },

        "GET"    => sub { return $count    },

        "SET"    => sub { $count = shift   },

        "BUMP"   => sub { $count += shift  },

        "RESET"  => sub { $count = $start  },

    };

    $bundle->{"LAST"} = $bundle->{"PREV"};

    return $bundle;

}
Because the lexical variables used by the closures in the $bundle hash reference are returned by the function, they are not deallocated. The next time mkcounter is called, the closures get a different set of variable bindings for the same code. Because no one outside those closures can access these two variables, this assures true privacy.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating References to Methods
Inhaltsvorschau
You want to store a reference to a method.
Create a closure that makes the proper method call on the appropriate object.
When you ask for a reference to a method, you're asking for more than just a raw function pointer. You also need to record which object the method needs to be called upon as the object contains the data the method will work with. The best way to do this is using a closure. Assuming $obj is lexically scoped, you can say:
$mref = sub { $obj->meth(@_) };

# later...

$mref->("args", "go", "here");
Even when $obj goes out of scope, the closure stored in $mref has captured it. Later when it's called indirectly, the correct object is used for the method call.
Be aware that the notation:
$sref = \$obj->meth;
doesn't do what you probably expected. It first calls the method on that object and gives you either a reference to the return value or a reference to the last of the return values if the method returns a list.
The can method from the UNIVERSAL base class, while appealing, is also unlikely to produce what you want.
$cref = $obj->can("meth");
This produces a code ref to the appropriate method (should one be found), but one that carries no object information. Think of it as a raw function pointer. The information about the object is lost. That's why you need a closure to capture both the object state as well as the method to call.
The discussion on methods in the Introduction to Chapter 13; the section on "Closures" in Chapter 8 of Programming Perl; Recipe 11.7; Recipe 13.8
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Constructing Records
Inhaltsvorschau
You want to create a record data type.
Use a reference to an anonymous hash.
Suppose you wanted to create a data type that contained various data fields. The easiest way is to use an anonymous hash. For example, here's how to initialize and use that record:
$record = {

    NAME   => "Jason",

    EMPNO  => 132,

    TITLE  => "deputy peon",

    AGE    => 23,

    SALARY => 37_000,

    PALS   => [ "Norbert", "Rhys", "Phineas"],

};



printf "I am %s, and my pals are %s.\n",

    $record->{NAME},

    join(", ", @{$record->{PALS}});
Just having one of these records isn't much fun—you'd like to build larger structures. For example, you might want to create a %byname hash that you could initialize and use this way:
# store record

$byname{ $record->{NAME} } = $record;



# later on, look up by name

if ($rp = $byname{"Aron"}) {        # false if missing

    printf "Aron is employee %d.\n", $rp->{EMPNO};

}



# give jason a new pal

push @{$byname{"Jason"}->{PALS}}, "Theodore";

printf "Jason now has %d pals\n", scalar @{$byname{"Jason"}->{PALS}};
That makes %byname a hash of hashes because its values are hash references. Looking up employees by name would be easy using such a structure. If we find a value in the hash, we store a reference to the record in a temporary variable, $rp, which we then use to get any field we want.
We can use our existing hash tools to manipulate %byname. For instance, we could use the each iterator to loop through it in an arbitrary order:
# Go through all records

while (($name, $record) = each %byname) {

    printf "%s is employee number %d\n", $name, $record->{EMPNO};

}
What about looking employees up by employee number? Just build and use another data structure, an array of hashes called @employees. If your employee numbers aren't consecutive (for instance, they jump from 1 to 159997) an array would be a bad choice. Instead, you should use a hash mapping employee number to record. For consecutive employee numbers, use an array:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading and Writing Hash Records to Text Files
Inhaltsvorschau
You want to read or write hash records stored in text files.
Use a simple file format with one field per line:
FieldName: Value
and separate records with blank lines.
If you have an array of records that you'd like to store into and retrieve from a text file, you can use a simple format based on mail headers. The format's simplicity requires that the keys have neither colons nor newlines, and the values not have newlines.
This code writes them out:
foreach $record (@Array_of_Records) {

    for $key (sort keys %$record) {

        print "$key: $record->{$key}\n";

    }

    print "\n";

}
Reading them in is easy, too.
$/ = "";                # paragraph read mode

while (<>) {

    my @fields = split /^([^:]+):\s*/m;

    shift @fields;      # for leading null field

    push(@Array_of_Records, { map /(.*)/, @fields });

}
The split acts upon $_, its default second argument, which contains a full paragraph. The pattern looks for start of line (not just start of record, thanks to the /m) followed by one or more non-colons, followed by a colon and optional whitespace. When split's pattern contains parentheses, these are returned along with the values. The return values placed in @fields are in key-value order, with a leading null field we shift off. The braces in the call to push produce a reference to a new anonymous hash, which we copy @fields into. Since that array was stored in order of the needed key-value pairing, this makes for well-ordered hash contents.
All you're doing is reading and writing a plain text file, so you can use related recipes for additional components. You could use Recipe 7.18 to ensure that you have clean, concurrent access; Recipe 1.18 to store colons and newlines in keys and values; and Recipe 11.3 to store more complex structures.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Printing Data Structures
Inhaltsvorschau
You want to print out a data structure.
If the output's legibility and layout are important, write your own custom printing routine.
If you are in the Perl debugger, use the x command:
DB<1> $reference = [ { "foo" => "bar" }, 3, sub { print "hello, world\n" } ];

DB<2> x $reference

  

               0  ARRAY(0x1d033c)

               0  HASH(0x7b390)

               foo' = 'bar

               1  3

               2  CODE(0x21e3e4)

               -> &main::_ _ANON_ _[(eval 15)[/usr/local/...perl5db.pl:17]:2]

               in (eval 15)[/usr/local/.../perl5db.pl:17]:2-2

            
From within your own programs, use the Dumper function from the standard module Data::Dumper:
use Data::Dumper;

print Dumper($reference);
Or if you'd like output formatted in the same style as the Debugger uses:
use Dumpvalue;

Dumpvalue->new->dumpValue($reference);
Sometimes you'll want to make a dedicated function for your data structure that delivers a particular output format, but often this is overkill. If you're running under the Perl debugger, the x and X commands provide nice pretty-printing. The x command is more useful because it works on both global and lexical variables, whereas X works only on globals. Pass x a reference to the data structure you want to print.
DB<3> x @INC

  

               0  ARRAY(0x807d0a8)

               0  '/home/tchrist/perllib

               1  '/usr/lib/perl5/i686-linux/5.00403

               2  '/usr/lib/perl5

               3  '/usr/lib/perl5/site_perl/i686-linux

               4  '/usr/lib/perl5/site_perl

               5  '.

            
The standard Dumpvalue module provides the Debugger's output formatting using an object-oriented interface. Here's an example:
use Dumpvalue;

Dumpvalue->new->dumpvars("main", "INC");



@INC = (

               
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Copying Data Structures
Inhaltsvorschau
You need to copy a complex data structure.
Use the dclone function from the standard Storable module:
use Storable;



$r2 = dclone($r1);
Two types of "copy" are sometimes confused. A surface copy (also known as shallow copy) simply copies references without creating copies of the data behind them:
@original = ( \@a, \@b, \@c );

@surface = @original;
A deep copy creates an entirely new structure with no overlapping references. This copies references to one layer deep:
@deep = map { [ @$_ ] } @original;
If @a, @b, and @c themselves contain references, the preceding map is no longer adequate. Writing your own code to deep-copy structures is laborious and rapidly becomes tiresome.
The Storable module provides a function called dclone that recursively copies its argument:
use Storable qw(dclone);

$r2 = dclone($r1);
This only works on references or blessed objects of type SCALAR, ARRAY, HASH, or CODE; references of type GLOB, IO, and the more esoteric types are not supported. The safeFreeze function from the FreezeThaw module supports even these types when used in the same address space by using a reference cache that could interfere with garbage collection and object destructors under some circumstances.
Because dclone takes and returns references, you must add extra punctuation if you have a hash or arrays to copy:
%newhash = %{ dclone(\%oldhash) };
The documentation for the standard Storable and Data::Dumper modules, and for the FreezeThaw CPAN module.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Storing Data Structures to Disk
Inhaltsvorschau
You want to save your large, complex data structure to disk so you don't have to reconstruct it from scratch each time your program runs.
Use the Storable module's store and retrieve functions:
use Storable;

store(\%hash, "filename");



# later on...

$href = retrieve("filename");        # by ref

%hash = %{ retrieve("filename") };   # direct to hash
The Storable module uses C functions and a binary format to walk Perl's internal data structures and lay out its data. It's more efficient than a pure Perl and string-based approach, but it's also more fragile.
The store and retrieve functions expect binary data using the machine's own byte-ordering. This means files created with these functions cannot be shared across different architectures. nstore does the same job store does, but keeps data in canonical (network) byte order, at a slight speed cost:
use Storable qw(nstore);

nstore(\%hash, "filename");

# later ...

$href = retrieve("filename");
No matter whether store or nstore was used, you need to call the same retrieve routine to restore the objects in memory. The producer must commit to portability, but the consumer doesn't have to. Code need only be changed in one place when the producer has a change of heart; the code thus offers a consistent interface for the consumer, who does not need to know or care.
The store and nstore functions don't lock the files they work on. If you're worried about concurrent access, open the file yourself, lock it using Recipe 7.18, and then use store_fd or its slower but machine-independent version nstore_fd.
Here's code to save a hash to a file, with locking. We don't open with the O_TRUNC flag because we have to wait to get the lock before we can clobber the file.
use Storable qw(nstore_fd);

use Fcntl qw(:DEFAULT :flock);

sysopen(DF, "/tmp/datafile", O_RDWR|O_CREAT, 0666)

        or die "can't open /tmp/datafile: $!";

flock(DF, LOCK_EX)                   or die "can't lock /tmp/datafile: $!";

nstore_fd(\%hash, *DF)

    or die "can't store hash\n";

truncate(DF, tell(DF));

close(DF);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Transparently Persistent Data Structures
Inhaltsvorschau
You have a complex data structure that you want to persist outside your program.
Use MLDBM and either (preferably) DB_File or else GDBM_File:
use MLDBM qw(DB_File);

use Fcntl;



tie(%hash, "MLDBM", "testfile.db", O_CREAT|O_RDWR, 0666)

    or die "can't open tie to testfile.db: $!";



# ... act on %hash



untie %hash;
A hash with 100,000 items in it would undoubtably take considerable time to build. Storing this to disk, either slowly by hand or quickly with Storable, is still an expensive operation in memory and computation.
The DBM modules solve this by tying hashes to disk database files. Rather than reading the whole structure in at once, they only pull in what they need, when they need it. To the user, it looks like a hash that persists across program invocations.
Unfortunately, the values in this persistent hash must be plain strings. You cannot readily use a database file as a backing store for a hash of hashes, a hash of arrays, and so on—just for a hash of strings.
However, the MLDBM module from CPAN allows you to store references in a database. It uses Data::Dumper to stringify these references for external storage:
use MLDBM qw(DB_File);

use Fcntl;

tie(%hash, "MLDBM", "testfile.db", O_CREAT|O_RDWR, 0666)

    or die "can't open tie to testfile.db: $!";
Now you can use %hash to fetch or store complex records from disk. The only drawback is that you can't access the references piecemeal. You have to pull in the reference from the database, work with it, and then store it back.
# this doesn't work!

$hash{"some key"}[4] = "fred";



# RIGHT

$aref = $hash{"some key"};

$aref->[4] = "fred";

$hash{"some key"} = $aref;
Recipe 11.13
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Coping with Circular Data Structures Using Weak References
Inhaltsvorschau
You have an inherently self-referential data structure, so Perl's reference-based garbage collection system won't notice when that structure is no longer being used. You want to prevent your program from leaking memory.
Convert all internal references within the data structure into weak references so they don't increment the reference count.
Perl's memory management system relies on an underlying reference count to know when to reclaim memory. In practice, this works fairly well except for one particular situation: when a variable directly or indirectly points at itself. Consider:
{

    my ($a, $b); 

    ($a, $b) = \($b, $a);   # same as (\$b, \$a);

}
The two underlying scalars that $a and $b represent each start out with a reference count of one apiece in the first line of the block. In the second line, those scalars are each initialized to contain a reference to the other variable; $a points to $b and vice versa. Saving a reference increments the underlying reference count on the scalars, so now both refcounts are set to two. As the block exits and those lexical variables become unreachable (by name), both refcounts are decremented by one, leaving one in each—forever. Since the refcounts can never reach zero, memory used by those two underlying scalars will never be reclaimed. You'll leak two scalars every time that block executes; if it's a loop or a subroutine, you could eventually run out of memory.
The standard Devel::Peek module's Dump function shows you underlying reference counts, plus a whole lot more. This code:
use Devel::Peek;

$a = 42;

$b = \$a;

Dump $a;
produces this output:
SV = IV(0xd7cc4) at 0xd72b8

  REFCNT = 2

  FLAGS = (IOK,pIOK)

  IV = 42
The important thing to notice there is that the refcount is two. That's because the scalar can be reached two ways: once via the variable named
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: Outlines
Inhaltsvorschau
Outlines are a simple (and thus popular) way of structuring data. The hierarchy of detail implied by an outline maps naturally to our top-down way of thinking about the world. The only problem is that it's not obvious how to represent outlined data as a Perl data structure.
Take, for example, this simple outline of some musical genres:
Alternative

.Punk

..Emo

..Folk Punk

.Goth

..Goth Rock

..Glam Goth

Country

.Old Time

.Bluegrass

.Big Hats

Rock

.80s

..Big Hair

..New Wave

.60s

..British

..American
Here we use a period to indicate a subgroup. There are many different formats in which that outline could be output. For example, you might write the genres out in full:
Alternative

Alternative - Punk

Alternative - Punk - Emo

Alternative - Punk - Folk Punk

Alternative - Goth

...
You might number the sections:
1 Alternative

1.1 Punk

1.1.1 Emo

1.1.2 Folk Punk

1.2 Goth

...
or alphabetize:
Alternative

Alternative - Goth

Alternative - Goth - Glam Goth

Alternative - Goth - Goth Rock

Alternative - Punk

Alternative - Punk - Emo

...
or show inheritance:
Alternative

Punk - Alternative

Emo - Punk - Alternative

Folk Punk - Punk - Alternative

Goth - Alternative

Goth Rock - Goth - Alternative

...
These transformations are all much easier than it might seem. The trick is to represent the levels of the hierarchy as elements in an array. For example, you'd represent the third entry in the sample outline as:
@array = ("Alternative", "Goth", "Glam Goth");
Now reformatting the entry is trivial. There's an elegant way to parse the input file to get this array representation:
while (<FH>) {

  chomp;

  $tag[$in = s/\G\.//g] = $_;

  # do something with @tag[0..$in]

}
The substitution deletes leading periods from the current entry, returning how many it deleted. This number indicates the indentation level of the current entry.
Alphabetizing is now simple using the Unix sort program:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: Binary Trees
Inhaltsvorschau
Because Perl's built-in data types are already powerful, high-level, dynamic data types in their own right, most code can use what's already provided. If you just want quick lookups, you nearly always want to use a simple hash. As Larry has said, "The trick is to use Perl's strengths rather than its weaknesses."
However, hashes provide no inherent ordering. To traverse the hash in a particular order, you must first extract its keys and then sort them. If you find yourself doing so many times, performance will suffer, but probably not enough to justify the time required to craft a fancy algorithm.
A tree structure provides ordered traversals. How do you write a tree in Perl? First, you grab one of your favorite textbooks on data structures; the authors recommend Cormen et al., as mentioned in Other Books in the Preface. Using an anonymous hash to represent each node in the tree, translate the algorithms in the book into Perl. This is usually much more straightforward than you would imagine.
The program code in Example 11-1 demonstrates an ordered binary tree implementation using anonymous hashes. Each node has three fields: a left child, a right child, and a value. The crucial property of an ordered binary tree is that at every node, all left children have values that are less than the current node value, and all right children have values that are greater.
The main program does three things. First, it creates a tree with 20 random nodes. Then it shows the in-order, pre-order, and post-order traversals of that tree. Finally, it allows the user to enter a key and reports whether that key is in the tree.
The insert function takes advantage of Perl's implicit pass-by-reference behavior on scalars to initialize an empty tree when asked to insert into an empty node. The assignment of the new node back to $_[0] alters the value in its caller.
Although this data structure takes much more memory than a simple hash and the lookups are slower, the ordered traversals themselves are faster.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 12: Packages, Libraries, and Modules
Inhaltsvorschau
Like all those possessing a library, Aurelian was aware that he was guilty of not knowing his in its entirety.
—Jorge Luis Borges, The Theologians
Imagine that you have two separate programs, both of which work fine by themselves, and you decide to make a third program that combines the best features from the first two. You copy both programs into a new file or cut and paste selected pieces. You find that the two programs had variables and functions with the same names that should remain separate. For example, both might have an init function or a global $count variable. When merged into one program, these separate parts would interfere with each other.
The solution to this problem is packages. Perl uses packages to partition the global namespace. The package is the basis for both traditional modules and object-oriented classes. Just as directories contain files, packages contain identifiers. Every global identifier (variables, functions, file and directory handles, and formats) has two parts: its package name and the identifier proper. These two pieces are separated from one another with a double colon. For example, the variable $CGI::needs_binmode is a global variable named $needs_binmode, which resides in package CGI.
Where the filesystem uses slashes to separate the directory from the filename, Perl uses a double colon. $Names::startup is the variable named $startup in the package Names, whereas $Dates::startup is the $startup variable in package Dates. Saying $startup by itself without a package name means the global variable $startup in the current package. (This assumes that no lexical $startup variable is currently visible. Lexical variables are explained in Chapter 10.) When looking at an unqualified variable name, a lexical takes precedence over a global. Lexicals live in scopes; globals live in packages. If you really want the global instead, you need to fully qualify it.
package is a compile-time declaration that sets the default package prefix for unqualified global identifiers, much as
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Imagine that you have two separate programs, both of which work fine by themselves, and you decide to make a third program that combines the best features from the first two. You copy both programs into a new file or cut and paste selected pieces. You find that the two programs had variables and functions with the same names that should remain separate. For example, both might have an init function or a global $count variable. When merged into one program, these separate parts would interfere with each other.
The solution to this problem is packages. Perl uses packages to partition the global namespace. The package is the basis for both traditional modules and object-oriented classes. Just as directories contain files, packages contain identifiers. Every global identifier (variables, functions, file and directory handles, and formats) has two parts: its package name and the identifier proper. These two pieces are separated from one another with a double colon. For example, the variable $CGI::needs_binmode is a global variable named $needs_binmode, which resides in package CGI.
Where the filesystem uses slashes to separate the directory from the filename, Perl uses a double colon. $Names::startup is the variable named $startup in the package Names, whereas $Dates::startup is the $startup variable in package Dates. Saying $startup by itself without a package name means the global variable $startup in the current package. (This assumes that no lexical $startup variable is currently visible. Lexical variables are explained in Chapter 10.) When looking at an unqualified variable name, a lexical takes precedence over a global. Lexicals live in scopes; globals live in packages. If you really want the global instead, you need to fully qualify it.
package is a compile-time declaration that sets the default package prefix for unqualified global identifiers, much as chdir sets the default directory prefix for relative pathnames. This effect lasts until the end of the current scope (a brace-enclosed block, file, or
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Defining a Module's Interface
Inhaltsvorschau
You want the standard Exporter module to define the external interface to your module.
In module file YourModule.pm, place the following code. Fill in the ellipses as explained in the Discussion section.
package YourModule;

use strict;

our (@ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS, $VERSION);



use Exporter;

$VERSION = 1.00;              # Or higher

@ISA = qw(Exporter);



@EXPORT      = qw(...);       # Symbols to autoexport (:DEFAULT tag)

@EXPORT_OK   = qw(...);       # Symbols to export on request

%EXPORT_TAGS = (              # Define names for sets of symbols

    TAG1 => [...],

    TAG2 => [...],

    ...

);



########################

# your code goes here

########################



1;                            # this should be your last line
In other files where you want to use YourModule, choose one of these lines:
use YourModule;               # Import default symbols into my package

use YourModule qw(...);       # Import listed symbols into my package

use YourModule ( );            # Do not import any symbols

use YourModule qw(:TAG1);     # Import whole tag set
The standard Exporter module handles the module's external interface. Although you could define your own import method for your package, almost no one does this.
When someone says use YourModule, this does a require "YourModule.pm" statement followed a YourModule->import( ) method call, both during compile time. The import method inherited from the Exporter package looks for global variables in your package to govern its behavior. Because they must be package globals, we've declared them with our to satisfy use strict. These variables are:
$VERSION
When a module is loaded, a minimal required version number can be supplied. If the version isn't at least this high, the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Trapping Errors in require or use
Inhaltsvorschau
You need to load in a module that might not be present on your system. This normally results in a fatal exception. You want to detect and trap these failures.
Wrap the require or use in an eval, and wrap the eval in a BEGIN block:
# no import

BEGIN {

    unless (eval "require $mod; 1") {

        warn "couldn't require $mod: $@";

    }

}



# imports into current package

BEGIN {

    unless (eval "use $mod; 1") {

        warn "couldn't use $mod: $@";

    }

}
You usually want a program to fail if it tries to load a module that is missing or doesn't compile. Sometimes, though, you'd like to recover from that error, perhaps trying an alternative module instead. As with any other exception, you insulate yourself from compilation errors with an eval.
You don't want to use eval { BLOCK }, because this traps only runtime exceptions, and use is a compile-time event. Instead, you must use eval "string" to catch compile-time problems as well. Remember, require on a bareword has a slightly different meaning than require on a variable. It adds a ".pm" and translates double-colons into your operating system's path separators, canonically / (as in URLs), but sometimes \, :, or even . on some systems.
If you need to try several modules in succession, stopping at the first one that works, you could do something like this:
BEGIN {

    my($found, @DBs, $mod);

    $found = 0;

    @DBs = qw(Giant::Eenie Giant::Meanie Mouse::Mynie Moe);

    for $mod (@DBs) {

        if (eval "require $mod") {

            $mod->import( );         # if needed

            $found = 1;

            last;

        }

    }

    die "None of @DBs loaded" unless $found;

}
We wrap the eval in a BEGIN block to ensure the module-loading happens at compile time instead of runtime.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Delaying use Until Runtime
Inhaltsvorschau
You have a module that you don't need to load each time the program runs, or whose inclusion you wish to delay until after the program starts up.
Either break up the use into its separate require and import components, or else employ the use autouse pragma.
Programs that check their arguments and abort with a usage message on error have no reason to load modules they never use. This delays the inevitable and annoys users. But those use statements happen during compilation, not execution, as explained in the Introduction.
Here, an effective strategy is to place argument checking in a BEGIN block before loading the modules. The following is the start of a program that checks to make sure it was called with exactly two arguments, which must be whole numbers, before going on to load the modules it will need:
BEGIN {

    unless (@ARGV =  = 2 && (2 =  = grep {/^\d+$/} @ARGV)) {

        die "usage: $0 num1 num2\n";

    }

}

use Some::Module;

use More::Modules;
A related situation arises in programs that don't always use the same set of modules every time they're run. For example, the factors program from Chapter 2 needs the infinite precision arithmetic library only when the -b command-line flag is supplied. A use statement would be pointless within a conditional because it's evaluated at compile time, long before the if can be checked. So we use a require instead:
if ($opt_b) {

    require Math::BigInt;

}
Because Math::BigInt is an object-oriented module instead of a traditional one, no import was needed. If you have an import list, specify it with a qw( ) construct as you would with use. For example, rather than this:
use Fcntl qw(O_EXCL O_CREAT O_RDWR);
you might say this instead:
require Fcntl;

Fcntl->import(qw(O_EXCL O_CREAT O_RDWR));
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making Variables Private to a Module
Inhaltsvorschau
You want to make a variable private to a package.
You can't. But you can make them private to the file that the module sits in, which usually suffices.
Remember that a package is just a way of grouping variables and functions together, conferring no privacy. Anything in a package is by definition global and accessible from anywhere. Packages only group; they don't hide.
For privacy, only lexical variables will do. A module is implemented in a Module.pm file, with all its globals in the package named Module. Because that whole file is by definition a scope and lexicals are private to a scope, creating file-scoped lexicals is effectively the same thing as a module-private variable.
If you alternate packages within a scope, though, you may be surprised that the scope's lexicals are still visible throughout that scope. That's because a package statement only sets a different prefix for a global identifier; it does not end the current scope, not does it begin a new one.
package Alpha;

my $aa = 10;

   $x = "azure";



package Beta;

my $bb = 20;

   $x = "blue";



package main;

print "$aa, $bb, $x, $Alpha::x, $Beta::x\n";

10, 20, , azure, blue

            
Was that the output you expected? The two lexicals, $aa and $bb, are still in scope because we haven't left the current block, file, or eval. You might think of globals and lexicals as existing in separate dimensions, forever unrelated to each other. Package statements have nothing to do with lexicals. By setting the current prefix, the first global variable $x is really $Alpha::x, whereas the second $x is now $Beta::x because of the intervening package statement changing the default prefix. Package identifiers, if fully qualified, can be accessed from anywhere, as we've done in the print statement.
So, packages can't have privacy—but modules can because they're in a file, which is always its own scope. Here's a simple module, placed in the file
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making Functions Private to a Module
Inhaltsvorschau
You want to make a function private to a package.
You can't. But you can make a private variable and store a reference to an anonymous function in it.
# this is the file SomeModule.pm

package Some_Module;



my $secret_function = sub {

    # your code here

};



sub regular_function {

    # now call your "private" function via the code ref

    $secret_function->(ARG1, ARG2);

}
Even a function that isn't exported can still be accessed by anyone, anywhere if they qualify that function's name with its package. That's because function names are always in the package symbol table, which is globally accessible.
By creating a lexical variable at the file scope, code in that module file below the point of declaration has full access to that variable. Code in other files will not, because those scopes are unrelated. The subroutine created via sub { .... } is anonymous, so there's no name in the symbol table for anyone outside to find. Not even other code in the module can call the function by name, since it doesn't have one, but that code can use the lexical variable to dereference the code reference indirectly.
$secret_function->(ARGS);                # infix deref form

&$secret_function(ARGS);                 # prefix deref form
Curiously, if you really wanted to, you could give this anonymous function a temporary name. Using the technique outlined in Recipe 10.16, assign the code reference to a localized typeglob, like this:
sub module_function {

    local *secret = $secret_function;

    Other_Package::func1( );

    secret(ARG1, ARG2);

    Yet_Another_Package::func2( );

}
Now for the duration of module_function, your previously secret function can be called using a direct function call; no indirection required. However, code outside the module can also find that function. In the example, it doesn't matter whether
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Determining the Caller's Package
Inhaltsvorschau
You need to find out the current or calling package.
To find the current package:
$this_pack = _ _PACKAGE_ _;
To find the caller's package:
$that_pack = caller( );
The _ _PACKAGE_ _ symbol returns the package that the code is currently being compiled into. This doesn't interpolate into double-quoted strings:
print "I am in package _ _PACKAGE_ _\n";              # WRONG!

I am in package _ _PACKAGE_ _

            
Needing to figure out the caller's package arose more often in older code that received as input a string of code to be evaluated, or a filehandle, format, or directory handle name. Consider a call to a hypothetical runit function:
package Alpha;

runit('$line = <TEMP>');



package Beta;

sub runit {

    my $codestr = shift;

    eval $codestr;

    die if $@;

}
Because runit was compiled in a different package than was currently executing, when the eval runs, it acts as though it were passed $Beta::line and Beta::TEMP. The old workaround was to include your caller's package first:
package Beta;

sub runit {

    my $codestr = shift;

    my $hispack = caller;

    eval "package $hispack; $codestr";

    die if $@;

}
That approach works only when $line is a global variable. If it's lexical, that won't help at all. Instead, arrange for runit to accept a reference to a subroutine:
package Alpha;

runit( sub { $line = <TEMP> } );



package Beta;

sub runit {

    my $coderef = shift;

    &$coderef( );

}
This not only works with lexicals, but has the added benefit of checking the code's syntax at compile time, which is a major win.
If all that's being passed in is a filehandle, it's more portable to use the Symbol::qualify function. This function takes a name and package to qualify the name into. If the name needs qualification, it fixes it; otherwise, it's left alone. But that's considerably less efficient than a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Automating Module Cleanup
Inhaltsvorschau
You need to create module setup code and cleanup code that gets called automatically, without user intervention.
For setup code, put executable statements outside subroutine definitions in the module file. For cleanup code, use an END subroutine in that module.
In some languages, the programmer must remember to call module initialization code before accessing any of that module's regular functions. Similarly, when the program is done, the programmer may have to call module-specific finalization code.
Not so in Perl. For per-module initialization code, executable statements outside of any subroutines in your module suffice. When the module is loaded in, that code runs right then and there. The user never has to remember to do this, because it's done automatically.
Now, why would you want automatic cleanup code? It depends on the module. You might want to write a shutdown message to a logfile, tell a database server to commit any pending state, refresh a screen, or return the tty to its original state.
Suppose you want a module to log quietly whenever a program using it starts up or finishes. Add code in an END subroutine to run after your program finishes:
$Logfile = "/tmp/mylog" unless defined $Logfile;

open(LF, ">>", $Logfile)

    or die "can't append to $Logfile: $!";

select(((select(LF), $|=1))[0]);  # unbuffer LF

logmsg("startup");



sub logmsg {

    my $now = scalar gmtime;

    print LF "$0 $$ $now: @_\n"

        or die "write to $Logfile failed: $!";

}



END {

    logmsg("shutdown");

    close(LF)                     

        or die "close $Logfile failed: $!";

}
The first part of code, outside any subroutine declaration, is executed at module load time. The module user doesn't have to do anything special to make this happen. Someone might be unpleasantly surprised, however, if the file couldn't be accessed, since the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Keeping Your Own Module Directory
Inhaltsvorschau
You don't want to install your own personal modules in the standard per-system extension library.
You have several choices: use Perl's -I command line switch; set your PERL5LIB environment variable; or employ the use lib pragma, possibly in conjunction with the FindBin module.
The @INC array contains a list of directories to consult when do, require, or use pulls in code from another file. You can print these out easily from the command line:
% perl -e 'printf "%d %s\n", $i++, $_ for @INC'

0 /usr/local/lib/perl5/5.8.0/OpenBSD.i386-openbsd

               1 /usr/local/lib/perl5/5.8.0

               2 /usr/local/lib/perl5/site_perl/5.8.0/OpenBSD.i386-openbsd

               3 /usr/local/lib/perl5/site_perl/5.8.0

               4 /usr/local/lib/perl5/site_perl/5.6.0

               5 /usr/local/lib/perl5/site_perl/5.00554

               6 /usr/local/lib/perl5/site_perl/5.005

               7 /usr/local/lib/perl5/site_perl

               8 

            
The first two directories, elements 0 and 1 of @INC, are respectively the standard architecture-dependent and architecture-independent directories, which all standard libraries, modules, and pragmas will go into. You have two of them because some modules contain information or formatting that makes sense only on that particular architecture. For example, the Config module contains information that cannot be shared across several architectures, so it goes in the 0th array element. Modules that include compiled C components, such as Socket.so, are also placed there. Most modules, however, go in the platform-independent directory in the 1st element.
The next pair, elements 2 and 3, fulfills roles analogous to elements and 1, but on a site-specific basis. Suppose you have a module that didn't come with Perl, such as one from CPAN or that you wrote yourself. When you or (more likely) your system administrator installs this module, its components go into one of the site-specific directories. You are encouraged to use these for any modules that your entire site should be able to access conveniently.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Preparing a Module for Distribution
Inhaltsvorschau
You want to prepare your module in standard distribution format so you can easily send your module to a friend. Better yet, you plan to contribute your module to CPAN so everyone can use it.
It's best to start with Perl's standard h2xs tool. Let's say you want to make a Planets module or an Astronomy::Orbits module. You'd type:
% h2xs -XA -n Planets

% h2xs -XA -n Astronomy::Orbits
These commands make subdirectories called ./Planets/ and ./Astronomy/Orbits/, respectively, where you will find all the components you need to get you started. The -n flag names the module you want to make, -X suppresses creation of XS (external subroutine) components, and -A means the module won't use the AutoLoader.
Writing modules is easy—once you know how. Writing a proper module is like filling out a legal contract: it's full of places to initial, sign, and date exactly right. If you miss any, it's not valid. Instead of hiring a contract lawyer, you can get a quick start on writing modules using the h2xs program. This tool gives you a skeletal module file with the right parts filled in, and it also gives you the other files needed to correctly install your module and its documentation or to bundle up for contributing to CPAN or sending off to a friend.
h2xs is something of a misnomer because XS is Perl's external subroutine interface for linking with C or C ++. But the h2xs tool is also extremely convenient for preparing a distribution even when you aren't using the XS interface.
Let's look at the module file that h2xs has made. Because the module is called Astronomy::Orbits, the user specifies not use Orbits but rather use Astronomy::Orbits. Therefore an extra Astronomy subdirectory is made, under which an Orbits subdirectory is placed. Here is the first and perhaps most important line of Orbit.pm:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Speeding Module Loading with SelfLoader
Inhaltsvorschau
You'd like to load a very large module quickly.
Use the SelfLoader module:
require Exporter;

require SelfLoader;

@ISA = qw(Exporter SelfLoader);

#

# other initialization or declarations here

#

_ _DATA_ _

sub abc { .... }

sub def { .... }
When you load a module using require or use, the entire module file must be read and compiled (into internal parse trees, not into byte code or native machine code) right then. For very large modules, this annoying delay is unnecessary if you need only a few functions from a particular file.
To address this problem, the SelfLoader module delays compilation of each subroutine until that subroutine is actually called. SelfLoader is easy to use: just place your module's subroutines underneath the _ _DATA_ _ marker so the compiler will ignore them, use a require to pull in the SelfLoader, and include SelfLoader in the module's @ISA array. That's all there is to it. When your module is loaded, the SelfLoader creates stub functions for all routines below _ _DATA_ _. The first time a function gets called, the stub replaces itself by first compiling the real function and then calling it.
There is one significant restriction on modules that employ the SelfLoader (or the AutoLoader for that matter, described in Recipe 12.11). SelfLoaded or AutoLoaded subroutines have no access to lexical variables in the file whose _ _DATA_ _ block they are in because they are compiled via eval in an imported AUTOLOAD block. Such dynamically generated subroutines are therefore compiled in the scope of SelfLoader's or AutoLoader's AUTOLOAD.
Whether the SelfLoader helps or hinders performance depends on how many subroutines the module has, how large they are, and whether they are all called over the lifetime of the program or not.
You should initially develop and test your module without SelfLoader. Commenting out the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Speeding Up Module Loading with Autoloader
Inhaltsvorschau
You want to use the AutoLoader module.
The easiest solution is to use the h2xs facility to create a directory and all the files you need. Here we assume you have your own directory, ~/perllib/, which contains your personal library modules.
% h2xs -Xn Sample

% cd Sample

% perl Makefile.PL LIB=~/perllib

% (edit Sample.pm)

% make install
The AutoLoader addresses the same performance issues as the SelfLoader. It also provides stub functions that get replaced by real ones the first time they're called. But instead of looking for functions all in the same file, hidden below a _ _DATA_ _ marker, the AutoLoader expects to find the real definition for each function in its own file. If your Sample.pm module had two functions, foo and bar, then the AutoLoader would expect to find them in Sample/auto/foo.al and Sample/auto/bar.al, respectively. Modules employing the AutoLoader load faster than those using the SelfLoader, but at the cost of extra files, disk space, and complexity.
This setup sounds complicated. If you were doing it manually, it probably would be. Fortunately, h2xs helps out tremendously here. Besides creating a module directory with templates for your Sample.pm file and other files you need, it also generates a Makefile that uses the AutoSplit module to break your module's functions into little files, one function per file. The make install rule installs these so they will be found automatically. All you have to do is put the module functions down below an _ _END_ _ line (rather than a _ _DATA_ _ line as in SelfLoader) that h2xs already created.
As with the SelfLoader, it's easier to develop and test your module without the AutoLoader. Just comment out the _ _END_ _ line while developing it.
The same restrictions about invisibility of file lexicals that apply to modules using the SelfLoader also apply when using the AutoLoader, so using file lexicals to maintain private state doesn't work. If state is becoming that complex and significant an issue, consider writing an object module instead of a traditional one.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Overriding Built-in Functions
Inhaltsvorschau
You want to replace a standard, built-in function with your own version.
Import that function from another module into your own namespace.
Suppose you want to give a function of your own the same name as one of Perl's core built-ins. If you write:
sub time { "it's howdy doody time" }

print time( );
then you won't get your function called—you'll still get Perl's original, built-in version. You could use an explicit ampersand to call the function:
print &time( );
because that always gets your function, never the built-in. But then you forego any prototype checking and context coercion on the function's arguments. However, there is a way to override that.
Many (but not all) of Perl's built-in functions may be overridden. This is not something to be attempted lightly, but it is possible. You might do this, for example, if you are running on a platform that doesn't support the function that you'd like to emulate. Or, you might want to add your own wrapper around the built-in.
Not all reserved words have the same status. Those that return a negative number in the C-language keyword( ) function in the toke.c file in your Perl source kit may be overridden. Keywords that cannot be overridden as of v5.8.1 are defined, delete, do, else, elsif, eval, exists, for, foreach, format, glob, goto, grep, if, last, local, m, map, my, next, no, our, package, pos, print, printf, prototype, q, qq, qr, qw, qx, redo, require, return, s, scalar, sort, split, study, sub, tie, tied, tr, undef, unless, untie, until, use, while, and y. The rest can.
A standard Perl module that overrides a built-in is Cwd, which can overload chdir. Others are the by-name versions of functions that return lists: File::stat, Net::hostent, Net::netent, Net::protoent, Net::servent, Time::gmtime, Time::localtime, Time::tm, User::grent, and User::pwent. These modules all override built-in functions like
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Overriding a Built-in Function in All Packages
Inhaltsvorschau
You want to change the definition of a core built-in function within your entire program, not just the current package.
Manually import, via direct symbol-table manipulation, the function into the CORE::GLOBAL pseudopackage.
*CORE::GLOBAL::int = \&myown_int;
The technique demonstrated in the previous recipe only overrides a built-in in a particular package. It doesn't change everything for your whole program, no matter what package that function is called from. To do so would risk changing the behavior of code from modules you didn't write, and which were therefore not prepared for the change.
It has been said that Unix was not designed to stop you from doing stupid things, because that would also stop you from doing clever things. So, too, with Perl. Just because overriding a function in all packages at once might seem, well, imprudent doesn't mean a clever person won't someday find a marvelous use for such a facility.
For example, let's suppose that you've decided that the core int function's behavior of integer truncation, also known as rounding toward zero, is so annoying to your program that you want to provide an alternative by the same name. This would do it:
package Math::Rounding;

use warnings;

use Carp;

use Exporter;

our @EXPORT = qw(int);

our @ISA    = qw(Exporter);



sub int(;$) {

    my $arg = @_ ? shift : $_;

    use warnings FATAL => "numeric";  # promote to die( )ing

    my $result = eval { sprintf("%.0f", $arg) };

    if ($@) {

        die if $@ !~ /isn't numeric/;

        $@ =~ s/ in sprintf.*/ in replacement int/s;

        croak $@;

    } else {

        return $result;

    } 

}
Your replacement version uses sprintf( ) to round to the closest integer. It also raises an exception if passed a non-numeric string. A program could access this function either by saying:
use Math::Rounding ( );

$y = Math::Rounding::int($x);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reporting Errors and Warnings Like Built-ins
Inhaltsvorschau
You want to generate errors and warnings in your modules, but when you use warn or die, the user sees your own filename and line number. You'd like your functions to act like built-ins and report messages from the perspective of the user's code, not your own.
The standard Carp module provides functions to do this. Use carp instead of warn. Use croak (for a short message) and confess (for a long message) instead of die.
Like built-ins, some of your module's functions generate warnings or errors if all doesn't go well. Think about sqrt: when you pass it a negative number (and you haven't used the Math::Complex module), an exception is raised, producing a message such as "Can't take sqrt of -3 at /tmp/negroot line 17", where /tmp/negroot is the name of your own program. But if you write your own function that dies, perhaps like this:
sub even_only {

    my $n = shift;

    die "$n is not even" if $n & 1;  # one way to test

    #....

}
then the message will say it's coming from the file your even_only function was itself compiled in, rather than from the file the user was in when they called your function. That's where the Carp module comes in handy. Instead of using die, use croak instead:
use Carp;

sub even_only {

    my $n = shift;

    croak "$n is not even" if $n % 2;  # here's another

    #....

}
If you just want to complain about something, but have the message report where in the user's code the problem occurred, call carp instead of warn. For example:
use Carp;

sub even_only {

    my $n = shift;

    if ($n & 1) {         # test whether odd number

        carp "$n is not even, continuing";

        ++$n;

    }

    #....

}
Many built-ins emit warnings only when the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Customizing Warnings
Inhaltsvorschau
You would like your module to respect its caller's settings for lexical warnings, but you can't inspect the predefined $^W variable to determine those settings.
Your module should use this pragma:
use warnings::register;
Then from inside your module, use the warnings::enabled function from that module as described in the Discussion to check whether the caller has warnings enabled. This works for both the old-style, global warnings and for lexical warnings set via the use warnings pragma.
Perl's -w command-line flag, mirrored by the global $^W variable, suffers from several problems. For one thing, it's an all-or-nothing affair, so if you turn it on for the program, module code included by that program—including code you may not have written—is also affected by it. For another, it's at best cumbersome to control compile-time warnings with it, forcing you to resort to convoluted BEGIN blocks. Finally, suppose you were interested in numeric warnings but not any other sort; you'd have to write a $SIG{_ _WARN_ _} handler to sift through all warnings to find those you did or did not want to see.
Lexical warnings, first introduced in Perl v5.6, address all this and more. By lexical, we mean that their effects are constrained to the lexical scope in which use warnings or no warnings occurs. Lexical warnings pay no attention to the -w command-line switch. Now when you turn warnings on in one scope, such as the main program's file scope, that doesn't enable warnings in modules you load. You can also selectively enable or disable individual categories of warnings. For example:
use warnings qw(numeric uninitialized);



use warnings qw(all);

no warnings qw(syntax);
The warnings::register pragma permits a module to check the warnings preferences of its caller's lexical scope. The pragma also creates a new warning category, taken from the name of the current package. These user-defined warning categories are easily distinguishable from the built-in warning categories because a module's package always starts (or
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Referring to Packages Indirectly
Inhaltsvorschau
You want to refer to a variable or function in a package unknown until runtime, but syntax like $packname::$varname is illegal.
Use symbolic references:
{

    no strict "refs";

    $val  = ${ $packname . "::" . $varname };

    @vals = @{ $packname . "::" . $aryname };

    &{ $packname . "::" . $funcname }("args");

    ($packname . "::" . $funcname) -> ("args");

}
A package declaration has meaning at compile time. If you don't know the name of the package or variable until runtime, you'll have to resort to symbolic references for direct access to the package symbol table. Assuming you normally run with use strict in effect, you must disable part of it to use symbolic references. Once you've used the no strict "refs" directive in that block, build up a string with the fully qualified name of the variable or function you're interested in. Then dereference this name as though it were a proper Perl reference.
During the prehistoric eras (before Perl 5), programmers were forced to use an eval for this kind of thing:
eval "package $packname; \$'$val = \$$varname"; # set $main'val

die if $@;
As you see, this approach makes quoting difficult. It's also comparatively slow. Fortunately, you never need to do this just to access variables indirectly by name. Symbolic references are a necessary compromise.
Similarly, eval could be used to define functions on the fly. Suppose you wanted to be able to get the base 2 or base 10 logs of numbers:
printf "log2  of 100 is %.2f\n", log2(100);

printf "log10 of 100 is %.2f\n", log10(100);
Perl has only the natural log function. Here's how one could use eval to create these functions at runtime. Here we'll create functions named log2 up through log999:
$packname = "main";

for ($i = 2; $i < 1000; $i++) {

    $logN = log($i);

    eval "sub ${packname}::log$i { log(shift) / $logN }";

    die if $@;

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using h2ph to Translate C #include Files
Inhaltsvorschau
Someone gave you code that generates the bizarre error message:

               Can't locate sys/syscall.ph in @INC (did you run h2ph?)

               (@INC contains: /usr/lib/perl5/i686-linux/5.00404 /usr/lib/perl5

               /usr/lib/perl5/site_perl/i686-linux /usr/lib/perl5/site_perl .)

               at some_program line 7

            
You want to know what it means and how to fix it.
Get your system administrator to do this, running as the superuser:
% cd /usr/include; h2ph sys/syscall.h
However, most include files require other include files, which means you should probably just translate them all:
% cd /usr/include; h2ph *.h */*.h
If that reports too many filenames or misses some that are more deeply nested, try this instead:
% cd /usr/include; find . -name "*.h" -print | xargs h2ph
A file whose name ends in .ph has been created by the h2ph tool, which translates C preprocessor directives from C #include files into Perl. The goal is to allow Perl code to access the same constants as C code. h2xs is a better approach in most cases because it provides compiled C code for your modules, not Perl code simulating C code. However, using h2xs requires a lot more programming savvy (at least, for accessing C code) than h2ph does.
When h2ph's translation process works, it's wonderful. When it doesn't, you're probably out of luck. As system architectures and include files become more complex, h2ph fails more frequently. If you're lucky, the constants you need are already in the Fcntl, Socket, or POSIX modules. The POSIX module implements constants from sys/file.h, sys/errno.h, and sys/wait.h, among others. It also allows fancy tty handling, as described in Recipe 15.8.
So what can you do with these .ph files? Here are a few examples. The first uses the pessimally non-portable
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using h2xs to Make a Module with C Code
Inhaltsvorschau
You'd like to access your system's unique C functions from Perl.
Use the h2xs tool to generate the necessary template files, fill the files in appropriately, and then type:
% perl Makefile.PL

% make
A Perl module need not be written solely in Perl. As with any other module, first pick a module name and use h2xs on it. We'll make a FineTime::time function with the same semantics as in the previous recipe, but this time around, we'll implement it using real C.
First, we run the following command:
% h2xs -cn FineTime
If we had a .h file with function prototype declarations, we could include that, but because we're writing this one from scratch, we'll use the -c switch to omit building code to translate any #define symbols. The -n switch says to create a module directory named FineTime/, which will have the following files:
Manifest
List of files in the distribution
Changes
Change log
Makefile.PL
A meta-makefile
FineTime.pm
The Perl parts
FineTime.xs
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing Extensions in C with Inline::C
Inhaltsvorschau
You'd like to write functions in C that you can call from Perl. You may already have tried XS and found it harmful to your mental health.
Use the Inline::C module available from CPAN:
use Inline C;

$answer = somefunc(20, 4);

print "$answer\n";               # prints 80

_ _END_ _

_ _C_ _

double somefunc(int a, int b) {  /* Inline knows most basic C types */

  double answer = a * b;

  return answer;

}
Inline::C was created as an alternative to the XS system for building C extension modules. Rather than jumping through all the hoopla of h2xs and the format of an .xs file, Inline::C lets you embed C code into your Perl program. There are also Inline modules for Python, Ruby, and Java, among other languages.
By default, your C source is in the _ _END_ _ or _ _DATA_ _ section of your program after a _ _C_ _ token. This permits multiple Inlined language blocks in a single file. If you want, use a here document when you load Inline:
use Inline C <<'END_OF_C';

double somefunc(int a, int b) {  /* Inline knows most basic C types */

  double answer = a * b;

  return answer;

}

END_OF_C
Inline::C scans the source code for ANSI-style function definitions. When it finds a function definition it knows how to deal with, it creates a Perl wrapper for the function. Inline can automatically translate the basic C data types (double, int, char *, etc.) by using the typemap that comes with Perl. A typemap shows Perl how to convert between C values and Perl data types, and you can install your own if you need to use more complex data structures than the basic typemap supports.
You can link against external libraries, parse header files as h2xs does, pass and return multiple values, handle objects, and more. See the Inline::C-Cookbook manpage that comes with the Inline::C module for more details.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Documenting Your Module with Pod
Inhaltsvorschau
You need to document your module, but don't know what format to use.
Embed your documentation in the your module file using pod format.
Pod stands for plain old documentation. It's documentation embedded in your program using a very simple markup format. Programmers are notorious for writing the code first and the documentation never, so pod was designed to make writing documentation so easy that anyone can and will do so. Sometimes this even works.
When Perl is parsing your source code, a line starting with an equals sign (where a new statement is expected) says to ignore all text until it finds a line beginning with =cut, after which it will start parsing code again. This lets you mix code and documentation throughout your Perl program or module file. Since it's mostly plain text, type in your documentation as literal text, or nearly so. The translators try to be clever and make output-specific decisions so the programmer doesn't have to specifically format variable names, function calls, etc.
Perl ships with several translators that filter generic pod format into specific output styles. These include pod2man to change your pods into troff for use with the man program or for phototypesetting and printing; pod2html for creating web pages (which works even on non-Unix systems); and pod2text for plain ASCII. Other translators, such as pod2ipf, pod2fm, pod2texi, pod2latex, and pod2ps, may also be available or can be found on CPAN.
Many books are written using proprietary word processors with limited scripting capabilities. Not this one! It was written in pod format using common text editors (vi for Tom, emacs for Nat). The final book was produced by converting the pod source files to FrameMaker.
Although formally documented in perlpod
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Building and Installing a CPAN Module
Inhaltsvorschau
You want to install a module file that you downloaded from CPAN over the Net or obtained from a CD.
Type the following commands into your shell. It will build and install Version 4.54 of the Some::Module package.
% gunzip Some-Module-4.54.tar.gz

% tar xf Some-Module-4.54

% cd Some-Module-4.54

% perl Makefile.PL

% make

% make test

% make install
Like most programs on the Net, Perl modules are available in source kits stored as tar archives in GNU zip format. If tar warns of "Directory checksum errors", then you downloaded the binary file in text format, mutilating it.
You'll probably have to become a privileged user with adequate permissions to install the module in the system directories. Standard modules are installed in a directory like /usr/lib/perl5, whereas third-party modules are installed in /usr/lib/perl5/site_ perl.
Here's a sample run, showing the installation of the MD5 module:
% gunzip MD5-1.7.tar.gz

% tar xf MD5-1.7.tar

% cd MD5-1.7

% perl Makefile.PL 

Checking if your kit is complete..

               Looks good

               Writing Makefile for MD5

% make

mkdir ./blib

               mkdir ./blib/lib

               cp MD5.pm ./blib/lib/MD5.pm

               AutoSplitting MD5 (./blib/lib/auto/MD5)

               /usr/bin/perl -I/usr/local/lib/perl5/i386 ..

               ..

               cp MD5.bs ./blib/arch/auto/MD5/MD5.bs

               chmod 644 ./blib/arch/auto/MD5/MD5.bsmkdir ./blib/man3

               Manifying ./blib/man3/MD5.3

% make test

PERL_DL_NONLAZY=1 /usr/bin/perl -I./blib/arch -I./blib/lib

               -I/usr/local/lib/perl5/i386-freebsd/5.00404 -I/usr/local/lib/perl5 test.pl

               1..14

               ok 1

               ok 2

               ..

               ok 13

               ok 14

% sudo make install

Password

               Installing /usr/local/lib/perl5/site_perl/i386-freebsd/./auto/MD5/
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Example: Module Template
Inhaltsvorschau
Following is the skeleton of a module. If you want to write a module of your own, you can copy this and customize it.
package Some::Module;  # must live in Some/Module.pm



use strict;



require Exporter;



# set the version for version checking

our $VERSION     = 0.01;



our @ISA         = qw(Exporter);

our @EXPORT      = qw(&func1 &func2 &func4);

our %EXPORT_TAGS = ( );     # eg: TAG => [ qw!name1 name2! ],



# your exported package globals go here,

# as well as any optionally exported functions

our @EXPORT_OK   = qw($Var1 %Hashit &func3);



use vars qw($Var1 %Hashit);

# non-exported package globals go here

our(@more, $stuff);



# initialize package globals, first exported ones

$Var1   = "";

%Hashit = ( );



# then the others (which are still accessible as $Some::Module::stuff)

$stuff  = "";

@more   = ( );



# all file-scoped lexicals must be created before

# the functions below that use them.



# file-private lexicals go here

my $priv_var    = "";

my %secret_hash = ( );



# here's a file-private function as a closure,

# callable as &$priv_func.

my $priv_func = sub {

    # stuff goes here.

};



# make all your functions, whether exported or not;

# remember to put something interesting in the {  } stubs

sub func1      { .... }    # no prototype

sub func2( )    { .... }    # proto'd void

sub func3($$)  { .... }    # proto'd to 2 scalars



# this one isn't auto-exported, but could be called!

sub func4(\%)  { .... }    # proto'd to 1 hash ref



END { }       # module cleanup code here (global destructor)



1;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: Finding Versions and Descriptions of Installed Modules
Inhaltsvorschau
Perl comes with many modules included standard. Even more can be found on CPAN. The following program prints out the names, versions, and descriptions of all modules installed on your system. It uses standard modules like File::Find and includes several techniques described in this chapter.
To run it, type:
% pmdesc
It prints a list of modules and their descriptions:

            FileHandle (2.00) - supply object methods for filehandles

            IO::File (1.06021) - supply object methods for filehandles

            IO::Select (1.10) - OO interface to the select system call

            IO::Socket (1.1603) - Object interface to socket communications

            ..

         
With the -v flag, pmdesc provides the names of the directories the files are in:
% pmdesc -v



<<<Modules from /usr/lib/perl5/i686-linux/5.00404>>>



            FileHandle (2.00) - supply object methods for filehandles

            ..

         
The -w flag warns if a module doesn't come with a pod description, and -s sorts the module list within each directory.
The program is given in Example 12-3.
Example 12-3. pmdesc
  #!/usr/bin/perl -w

  # pmdesc - describe pm files

  # tchrist@perl.com

  

  use strict;

  use File::Find      qw(find);

  use Getopt::Std     qw(getopts);

  use Carp;

  

  use vars (

      q!$opt_v!,              # give debug info

      q!$opt_w!,              # warn about missing descs on modules

      q!$opt_a!,              # include relative paths

      q!$opt_s!,              # sort output within each directory

  );

  

  $| = 1;

  

  getopts("wvas")             or die "bad usage";

  

  @ARGV = @INC unless @ARGV;

  

  # Globals.  wish I didn't really have to do this.

  use vars (

      q!$Start_Dir!,          # The top directory find was called with

      q!%Future!,             # topdirs find will handle later

  );

  

  my $Module;

  

  # install an output filter to sort my module list, if wanted.

  if ($opt_s) {

      if (open(ME, "-|")) {

          $/ = "";

          while (<ME>) {

              chomp;

              print join("\n", sort split /\n/), "\n";

          }

          exit;

      }

  }

  

  MAIN: {

      my %visited;

      my ($dev,$ino);

  

      @Future{@ARGV} = (1) x @ARGV;

  

      foreach $Start_Dir (@ARGV) {

          delete $Future{$Start_Dir};

  

          print "\n << Modules from $Start_Dir>>\n\n"

              if $opt_v;

  

          next unless ($dev,$ino) = stat($Start_Dir);

          next if $visited{$dev,$ino}++;

          next unless $opt_a || $Start_Dir =~ m!^/!;

  

          find(\&wanted, $Start_Dir);

      }

      exit;

  }

  

  # calculate module name from file and directory

  sub modname {

      local $_ = $File::Find::name;

  

      if (index($_, $Start_Dir . "/") =  = 0) {

          substr($_, 0, 1+length($Start_Dir)) = "";

      }

  

      s { /              }    {::}gx;

      s { \.p(m|od)$     }    {  }x;

  

      return $_;

  }

  

  # decide if this is a module we want

  sub wanted {

      if ( $Future{$File::Find::name} ) {

          warn "\t(Skipping $File::Find::name, qui venit in futuro.)\n"

              if 0 and $opt_v;

          $File::Find::prune = 1;

          return;

      }

      return unless /\.pm$/ && -f;

      $Module = &modname;

      # skip obnoxious modules

      if ($Module =~ /^CPAN(\Z|::)/) {

          warn("$Module -- skipping because it misbehaves\n");

          return;

      }

  

      my    $file = $_;

  

      unless (open(POD, "<", $file)) {

          warn "\tcannot open $file: $!";

              # if $opt_w;

          return 0;

      }

  

      $: = " -:";

  

      local $/ = "";

      local $_;

      while (<POD>) {

          if (/=head\d\s+NAME/) {

              chomp($_ = <POD>);

              s/^.*?-\s+//s;

              s/\n/ /g;

              #write;

              my $v;

              if (defined ($v = getversion($Module))) {

                  print "$Module ($v) ";

              } else {

                  print "$Module ";

              }

              print "- $_\n";

              return 1;

          }

      }

  

      warn "\t(MISSING DESC FOR $File::Find::name)\n"

          if $opt_w;

  

      return 0;

  }

  

  # run Perl to load the module and print its verson number, redirecting

  # errors to /dev/null

  sub getversion {

      my $mod = shift;

  

      my $vers = `$^X -m$mod -e 'print \$${mod}::VERSION' 2>/dev/null`;

      $vers =~ s/^\s*(.*?)\s*$/$1/; # remove stray whitespace

      return ($vers || undef);

  }

  

  format = ^<<<<<<<<<<<<<<<<<~~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

  $Module,        $_

  .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 13: Classes, Objects, and Ties
Inhaltsvorschau
All the world over, I will back the masses against the classes.
—William E. Gladstone, Speech at Liverpool, 28 June 1886
Although Perl was not initially conceived of as an object-oriented language, within a few years of its initial release, complete support for object-oriented programming had been added. As usual, Perl doesn't try to enforce one true style, but embraces many. This helps more people do their job the way they want to do it.
You don't have to use objects to write programs, unlike Java, where programs are instances of objects. If you want to, though, you can write Perl programs that use nearly every weapon in the object-oriented arsenal. Perl supports classes and objects, single and multiple inheritance, instance methods and class methods, access to overridden methods, constructors and destructors, operator overloading, proxy methods through autoloading, delegation, a rooted hierarchy for all objects, and two levels of garbage collection.
You can use as many or as few object-oriented techniques as you want and need. Ties are the only part of Perl where you must use object orientation. And even then, only the module implementor need be aware of this; the casual user gets to remain blissfully unaware of the internal mechanics. Ties, discussed in Recipe 13.15, let you transparently intercept access to a variable. For example, you can use ties to create hashes that support lookups by key or value instead of just by key.
If you ask 10 people what object orientation is, you'll get 10 different answers. People bandy about terms like abstraction and encapsulation, trying to isolate the basic units of object-oriented programming languages and give them big names to write papers and books about. Not all object-oriented languages offer the same features, yet they are still deemed object-oriented. This, of course, produces more papers and books.
We follow the nomenclature used in Perl's documentation, the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Although Perl was not initially conceived of as an object-oriented language, within a few years of its initial release, complete support for object-oriented programming had been added. As usual, Perl doesn't try to enforce one true style, but embraces many. This helps more people do their job the way they want to do it.
You don't have to use objects to write programs, unlike Java, where programs are instances of objects. If you want to, though, you can write Perl programs that use nearly every weapon in the object-oriented arsenal. Perl supports classes and objects, single and multiple inheritance, instance methods and class methods, access to overridden methods, constructors and destructors, operator overloading, proxy methods through autoloading, delegation, a rooted hierarchy for all objects, and two levels of garbage collection.
You can use as many or as few object-oriented techniques as you want and need. Ties are the only part of Perl where you must use object orientation. And even then, only the module implementor need be aware of this; the casual user gets to remain blissfully unaware of the internal mechanics. Ties, discussed in Recipe 13.15, let you transparently intercept access to a variable. For example, you can use ties to create hashes that support lookups by key or value instead of just by key.
If you ask 10 people what object orientation is, you'll get 10 different answers. People bandy about terms like abstraction and encapsulation, trying to isolate the basic units of object-oriented programming languages and give them big names to write papers and books about. Not all object-oriented languages offer the same features, yet they are still deemed object-oriented. This, of course, produces more papers and books.
We follow the nomenclature used in Perl's documentation, the perlobj(1) manpage, and Chapter 12 of Programming Perl. An object is a variable that belongs to a class. Methods are functions associated with a class. In Perl, a class is a package—and usually a module. An object is a reference to something associated with a class.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Constructing an Object
Inhaltsvorschau
You want to create a way for your users to generate new objects.
Make a constructor. In Perl, the constructor method must not only initialize its object, but must also first allocate memory for it, typically using an anonymous hash. C++ constructors, on the other hand, are called with memory already allocated. Some of the object-oriented world would call C++'s constructors initializers.
Here's the canonical object constructor in Perl:
sub new {

    my $class = shift;

    my $self  = { };

    bless($self, $class);

    return $self;

}
This is the equivalent one-liner:
sub new { bless( { }, shift ) }
Any method that allocates and initializes a new object acts as a constructor. The most important thing to remember is that a reference isn't an object until bless has been called on it. The simplest possible constructor, although not particularly useful, is the following:
sub new { bless({  }) }
Let's add some initialization:
sub new {

    my $self = { };  # allocate anonymous hash

    bless($self);

    # init two sample attributes/data members/fields

    $self->{START} = time( );

    $self->{AGE}   = 0;

    return $self;

}
This constructor isn't very useful, because it uses the single-argument form of bless, which always blesses the object into the current package. This means it can't be usefully inherited from; objects it constructs will always be blessed into the class that the new function was compiled into. With inheritance, this is not necessarily the class on whose behalf the method was invoked.
To solve this, have the constructor heed its first argument. For a class method, this is the package name. Pass this class name as the second argument to bless:
sub new {

    my $classname  = shift;           # What class are we constructing?

    my $self       = {  };            # Allocate new memory

    bless($self, $classname);         # Mark it of the right type

    $self->{START}  = time( );       # init data fields

    $self->{AGE}    = 0;

    return $self;                     # And give it back

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Destroying an Object
Inhaltsvorschau
You want to run special code whenever an object is no longer used. This is sometimes needed when the object is an interface to the outside world—or contains circular data structures—and must clean up after itself. You might remove temporary files, break circular links, gracefully disconnect from a socket, or kill a spawned subprocess.
Create a method named DESTROY. This will be invoked when there are no more references to the object, or else when the program shuts down, whichever comes first. You don't need to do any memory deallocation here, just any finalization code that makes sense for the class.
sub DESTROY {

    my $self = shift;

    printf("$self dying at %s\n", scalar localtime);

}
Every story has a beginning and an end. The beginning of the object's story is its constructor, explicitly invoked when the object comes into existence. The end of its story is the destructor, a method implicitly invoked when an object leaves this life. Any per-object clean-up code is placed in the destructor, which must be named DESTROY.
Why can't destructors have arbitrary names? Because although constructors are explicitly called by name, the destructor is not. Destruction happens automatically via Perl's garbage collection (GC) system, which is currently implemented as a quick but lazy reference-based GC system. To know what to call, Perl insists that the destructor be named DESTROY. If more than one object goes out of scope at once, Perl makes no promise about invoking destructors in any particular order.
Why is DESTROY in all caps? Perl on occasion uses purely uppercase function names as a convention to indicate that the function will be automatically called by Perl. Others that are called implicitly include BEGIN, INIT, END, AUTOLOAD, plus all methods used by tied objects (see Recipe 13.15), such as STORE and FETCH.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Managing Instance Data
Inhaltsvorschau
Each data attribute of an object, sometimes named data members or properties, needs its own method for access. How do you write functions that manipulate the object's instance data?
Either write pairs of get and set methods that affect the appropriate key in the object hash, like this:
sub get_name {

    my $self = shift;

    return $self->{NAME};

}



sub set_name {

    my $self      = shift;

    $self->{NAME} = shift;

}
or make single methods that do both jobs depending on whether they're passed an argument:
sub name {

    my $self = shift;

    if (@_) { $self->{NAME} = shift }

    return $self->{NAME};

}
When setting a new value, sometimes it may be useful to return not that new value, but the previous one:
# returns previously set value if changing it

sub age {

    my $self = shift;

    my $oldage = $self->{AGE};

    if (@_) { $self->{AGE} = shift }

    return $oldage;

}

$previous_age = $obj->age( $obj->age( ) + $TIME_PASSES );
Methods are how you implement the public interface to the object. A proper class doesn't encourage anyone to poke around inside its innards. Each data attribute should have a method to update it, retrieve it, or both. If a user writes code like this:
$him = Person->new( );

$him->{NAME} = "Sylvester";

$him->{AGE}  = 23;
then an argument could justifiably be made that they have violated the interface and so deserve whatever they get.
For nominally private data elements, you may omit methods that access them. However, then if—better make that when—you update the implementation, you'll need to scour the class to find where other methods within the class rely upon the particular representation that you're now changing. To be squeaky clean, you could have the class itself go through a mediated, functional interface to access instance data.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Managing Class Data
Inhaltsvorschau
You need a method invoked on behalf of the whole class, not just on one object. This might be a procedural request, or it might be a global data attribute shared by all instances of the class.
Instead of expecting a reference as their first argument as object methods do, class methods expect a string containing the name of the class. Class methods access package data, not object data, as in the population method shown here:
package Person;



$Body_Count = 0;



sub population { return $Body_Count }



sub new {                                   # constructor

    $Body_Count++;

    return bless({  }, shift);

}



sub DESTROY { --$Body_Count }                # destructor



# later, the user can say this:

package main;



for (1..10) { push @people, Person->new }

printf "There are %d people alive.\n", Person->population( );



There are 10 people alive

            
Normally, each object has its own complete state stored within itself. The value of a data attribute in one object is unrelated to the value that attribute might have in another instance of the same class. For example, setting her gender here does nothing to his gender, because they are different objects with distinct states:
$him = Person->new( );

$him->gender("male");



$her = Person->new( );

$her->gender("female");
Imagine a classwide attribute where changing the attribute for one instance changes it for all of them. Just as some programmers prefer capitalized global variables, some prefer uppercase names when the method affects class data instead of instance data. Here's an example of using a class method named Max_Bounds:
FixedArray->Max_Bounds(100);                # set for whole class

$alpha = FixedArray->new( );

printf "Bound on alpha is %d\n", $alpha->Max_Bounds( );

100

$beta = FixedArray->new( );

$beta->Max_Bounds(50);                      # still sets for whole class

printf "Bound on alpha is %d\n", $alpha->Max_Bounds( );

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Classes as Structs
Inhaltsvorschau
You're used to structured data types more complex than Perl's arrays and hashes, such as C's structs and Pascal's records. You've heard that Perl's classes are comparable, but you aren't an object-oriented programmer.
Use the standard Class::Struct module's struct to declare data structures reminiscent of those in the C programming language:
use Class::Struct;          # load struct-building module



struct Person => {          # create a definition for a "Person"

    name   => '$',          #    name field is a scalar

    age    => '$',          #    age field is also a scalar

    peers  => '@',          #    but peers field is an array (reference)

};



my $p = Person->new( );      # allocate an empty Person struct



$p->name("Jason Smythe");                   # set its name field

$p->age(13);                                # set its age field

$p->peers( ["Wilbur", "Ralph", "Fred" ] );  # set its peers field



# or this way:

@{$p->peers} = ("Wilbur", "Ralph", "Fred");



# fetch various values, including the zeroth friend

printf "At age %d, %s's first friend is %s.\n",

    $p->age, $p->name, $p->peers(0);
The Class::Struct::struct function builds struct-like classes on the fly. It creates a class of the name given in the first argument, complete with a constructor named new and per-field accessor methods.
In the structure layout definition, the keys are the names of the fields and the values are the data type. This type can be one of the three base types: '$' for scalars, '@' for arrays, and '%' for hashes. Each accessor method can be invoked without arguments to fetch the current value, or with an argument to set the value. For a field whose type is an array or hash, a zero-argument method invocation returns a reference to the entire array or hash, a one-argument invocation retrieves the value at that subscript, and a two-argument invocation sets the value at that subscript.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Cloning Constructors
Inhaltsvorschau
You want to write a constructor method that might be invoked on an existing object, and if so, to use that object for default values.
Start your constructor like this:
my $proto  = shift;

my $class  = ref($proto) || $proto;

my $parent = ref($proto) && $proto;
The $class variable will contain the class to bless into, and the $parent variable will either be false, or else the object you're cloning.
Sometimes you need another object of the same type as the current one. You could do this:
$ob1 = SomeClass->new( );

# later on

$ob2 = (ref $ob1)->new( );
but that's not very clear. It's clearer to have a single constructor that behaves correctly, regardless of whether its invocant is a class name or an existing object of that class. As a class method, it should return a new object with the default initialization. As an instance method, it should return a new object initialized from the object it was invoked on:
$ob1 = Widget->new( );

$ob2 = $ob1->new( );
Here's a version of new that takes this into consideration:
sub new {

    my $proto  = shift;

    my $class  = ref($proto) || $proto;

    my $parent = ref($proto) && $proto;



    my $self;

    # check whether we're shadowing a new from @ISA

    if (@ISA && $proto->SUPER::can("new") ) {

        $self = $proto->SUPER::new(@_);

    } else {

        $self = {  };

        bless ($self, $class);

    }



    $self->{PARENT}  = $parent;

    $self->{START}   = time( );   # init data fields

    $self->{AGE}     = 0;



    return $self;

}
Initializing doesn't have to mean simply copying values from the parent. If you're writing a linked list or binary tree class, your constructor can return a new object linked into the list or tree, when invoked as an instance method.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Copy Constructors
Inhaltsvorschau
You would like to provide users of your class with a copy method, or you would like to copy an object for which no copy method has been provided by the class.
Use the dclone( ) function from the standard Storable module.
use Storable qw(dclone);

use Carp;

sub copy {

     my $self = shift;

     croak "can't copy class $self" unless ref $self;

     my $copy = Storable::dclone($self);

     return $copy;

}
As described in Recipe 11.12, the Storable module's dclone function will recursively copy (virtually) any data structure. It works on objects, too, correctly giving you back new objects that are appropriately blessed. This assumes that the underlying types are SCALAR, ARRAY, HASH, or CODE refs. Things like GLOB and IO refs won't serialize.
Some classes already provide methods to copy their objects; others do not, not so much out of intent as out of neglect. Consider this:
sub UNIVERSAL::copy {

    my $self = shift;

    unless (ref $self) { 

        require Carp;

        Carp::croak("can't copy class $self");

    }

    require Storable;

    my $copy = Storable::dclone($self);

    return $copy;

}
Now all objects can be copied, providing they're of the supported types. Classes that provide their own copy methods are unaffected, but any class that doesn't provide its own copy method will pick up this definition. We placed the require on Storable within the function call itself so that you load Storable only if you actually plan to use it. Likewise, we placed the one for Carp inside the test that will end up using it. By using require, we delay loading until the module is actually needed.
We also avoid use because it would import things into our current package. This could be antisocial. From the previous code snippet, you cannot determine what package you're even in. Just because we've declared a subroutine named copy
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Invoking Methods Indirectly
Inhaltsvorschau
You want to invoke a method by a name that isn't known until runtime.
Store the method name as a string in a scalar variable and use it where you would use the real method name to the right of the arrow operator:
$methname = "flicker";

$obj->$methname(10);         # invokes $ob->flicker(10);



# invoke three methods on the object, by name

foreach $m ( qw(start run stop) ) {

    $obj->$m( );

}
Sometimes you need to invoke a method whose name you've stored somewhere. You can't take the address of a method, but you can store its name. If you have a scalar variable $meth containing the method name, invoke the method on an object $crystal with $crystal->$meth( ).
@methods = qw(name rank serno);

%his_info = map { $_ => $ob->$_( ) } @methods;



# same as this:



%his_info = (

    'name'  => $ob->name( ),

    'rank'  => $ob->rank( ),

    'serno' => $ob->serno( ),

);
If you're desperate to devise a way to get a method's address, you should try to rethink your algorithm. For example, instead of incorrectly taking \$ob->method( ), which simply applies the backslash to that method's return value or values, do this:
my $fnref = sub { $ob->method(@_) };
Now when it's time to invoke that indirectly, you would use:
$fnref->(10, "fred");
and have the closure in turn correctly use the original value of $ob (provided $ob was a lexical variable) that was around when it was created to call:
$ob->method(10, "fred");
This works even if $ob has gone out of scope. This solution is much cleaner.
When using indirect method invocation, it is permitted to store a subroutine reference in the scalar variable instead of a string representing the method name. No verification that the function represents a valid method.
The code reference returned by the UNIVERSAL can method should probably not be used for indirect method invocation on objects other than the one on which it was called, or at least those of the same class. That's because you have no reason to believe that this will be a valid method when applied to an object of an arbitrary class.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Determining Subclass Membership
Inhaltsvorschau
You want to know whether an object is an instance of a particular class or that class's subclasses. Perhaps you want to decide whether a particular method should be invoked on an arbitrary object.
Use methods from the special UNIVERSAL class:
$obj->isa("HTTP::Message");                  # as object method

HTTP::Response->isa("HTTP::Message");        # as class method



if ($obj->can("method_name")) { .... }       # check method validity
Wouldn't it be convenient if all objects were rooted at some ultimate base class? That way you could give every object common methods without having to add to each @ISA. Well, you can. You don't see it, but Perl pretends there's an extra element at the end of @ISA—the package named UNIVERSAL.
UNIVERSAL has only a few predefined methods, although you are free to add your own. These are built right into your Perl binary, so they don't take extra time to load. Predefined methods include isa, can, and VERSION. All three may be used for both sorts of invocants: classes and objects.
The isa method reports whether its invocant inherits the class name directly or indirectly from the class name supplied as the argument. This saves having to traverse the hierarchy yourself, and is much better than testing with an exact check against the string returned by the ref built-in. You may even supply a basic type that ref might return as an argument, such as SCALAR, ARRAY, HASH, or GLOB.
$has_io = $fd->isa("IO::Handle") || $fd->isa("GLOB");

$itza_handle = IO::Socket->isa("IO::Handle");
Type checks like this are sometimes frowned upon as being too constraining. If you just want to know if a certain method can be invoked against something, it might be better to just try to invoke the method you're hoping will be there instead of checking for the class.
Another possibility is to use another UNIVERSAL method,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing an Inheritable Class
Inhaltsvorschau
You're not sure whether you've designed your class robustly enough to be inherited.
Use the "empty subclass test" on your class.
Imagine you've implemented a class named Person that supplies a constructor named new, and methods such as age and name. Here's the straightforward implementation:
package Person;

sub new {

    my $class = shift;

    my $self  = { };

    return bless $self, $class;

}

sub name {

    my $self = shift;

    $self->{NAME} = shift if @_;

    return $self->{NAME};

}

sub age {

    my $self = shift;

    $self->{AGE} = shift if @_;

    return $self->{AGE};

}
You might use the class in this way:
use Person;

my $dude = Person->new( );

$dude->name("Jason");

$dude->age(23);

printf "%s is age %d.\n", $dude->name, $dude->age;
Now consider another class, the one named Employee:
package Employee;

use Person;

@ISA = ("Person");

1;
There's not a lot to that one. All it's doing is loading in class Person and stating that Employee will inherit any needed methods from Person. Since Employee has no methods of its own, it will get all of its methods from Person. We rely upon an Employee to behave just like a Person.
Setting up an empty class like this is called the empty base class test; that is, it creates a derived class that does nothing but inherit from a base class. If the original base class has been designed properly, then the new derived class can be used as a drop-in replacement for the old one. This means you should be able to change just the class name and everything will still work:
use Employee;

my $empl = Employee->new( );

$empl->name("Jason");

$empl->age(23);

printf "%s is age %d.\n", $empl->name, $empl->age;
By proper design, we mean using only the two-argument form of bless, avoiding any direct access of class data, and exporting nothing. In the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Accessing Overridden Methods
Inhaltsvorschau
Your class's constructor method overrides the constructor of its parent class. You want your constructor to invoke the parent class's constructor.
Learn about the special pseudoclass, SUPER.
sub meth {

    my $self = shift;

    $self->SUPER::meth( );

}
In languages like C++ where constructors don't actually allocate memory but just initialize the object, all base class constructors are automatically invoked for you. In languages like Java and Perl, you have to invoke them yourself.
To invoke a method in a particular class, the notation $self->SUPER::meth( ) is used. This is an extension of the regular notation that means to begin searching for a method in a particular class. It is valid only from within an overridden method. Here's a comparison of styles:
$self->meth( );                # Call wherever first meth is found

$self->Where::meth( );         # Start looking in package "Where"

$self->SUPER::meth( );         # Call overridden version
Simple users of the class should probably restrict themselves to the first line in the previous example. The second is possible, but not suggested for this situation, because we have the special notation shown in the third line, which only works within the overridden method.
An overriding constructor should invoke its SUPER's constructor to allocate and bless the object, limiting itself to instantiating any data fields needed. It makes sense here to separate the object allocation code from the object initialization code for reasons that will become clear a couple paragraphs from now. We'll name it with a leading underscore, a convention indicating a nominally private method. Think of it as a "Do Not Disturb" sign.
sub new {

    my $classname  = shift;         # What class are we constructing?

    my $self       = $classname->SUPER::new(@_);

    $self->_init(@_);

    return $self;                   # And give it back

}



sub _init {

    my $self = shift;

    $self->{START}   = time( );   # init data fields

    $self->{AGE}     = 0;

    $self->{EXTRA}   = { @_ };   # anything extra

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Generating Attribute Methods Using AUTOLOAD
Inhaltsvorschau
Your object needs accessor methods to set or get its data fields, and you're tired of writing them all out one at a time.
Carefully use Perl's AUTOLOAD mechanism as something of a proxy method generator so you don't have to create them all yourself each time you want to add a new data field.
Perl's AUTOLOAD mechanism intercepts all possible undefined method invocations. To disallow arbitrary data names, we store the list of permitted fields in a hash. The AUTOLOAD method checks for whether the accessed field name is in that hash.
package Person;

use strict;

use Carp;

our(%ok_field);



# Authorize four attribute fields

for my $attr ( qw(name age peers parent) ) { $ok_field{$attr}++; }



sub AUTOLOAD {

    my $self = shift;

    my $attr = our $AUTOLOAD;

    $attr =~ s/.*:://;

    return unless $attr =~ /[^A-Z]/;  # skip DESTROY and all-cap methods

    croak "invalid attribute method: ->$attr( )" unless $ok_field{$attr};

    $self->{uc $attr} = shift if @_;

    return $self->{uc $attr};

}

sub new {

    my $proto  = shift;

    my $class  = ref($proto) || $proto;

    my $parent = ref($proto) && $proto;

    my $self = {  };

    bless($self, $class);

    $self->parent($parent);

    return $self;

}

1;
This class supports a constructor method named new, plus four attribute methods: name, age, peers, and parent. Use the module this way:
use Person;

my ($dad, $kid);

$dad = Person->new;

$dad->name("Jason");

$dad->age(23);

$kid = $dad->new;

$kid->name("Rachel");

$kid->age(2);

printf "Kid's parent is %s\n", $kid->parent->name;

Kid's parent is Jason

            
This is tricky when producing inheritance trees. Suppose you'd like an Employee class that had every data attribute of the Person class, plus two new ones, like salary and boss. Class Employee can't rely upon an inherited Person::AUTOLOAD to determine what Employee's attribute methods are. So each class would need its own
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Coping with Circular Data Structures Using Objects
Inhaltsvorschau
You have an inherently self-referential data structure, so Perl's reference-based garbage collection system won't notice when it's no longer being used. You want to prevent your program from leaking memory.
Create a non-circular container object that holds a pointer to the self-referential data structure. Define a DESTROY method for the containing object's class that manually breaks the self-referential circularities.
Or use weak references, as described in Recipe 11.15.
Many interesting data structures include references back to themselves. This can occur in code as simple as this:
$node->{NEXT} = $node;
As soon as you do that, you've created a circularity that will hide the data structure from Perl's referenced-based garbage collection system. Destructors will eventually be invoked when your program exits, but sometimes you don't want to wait that long.
A circular linked list is similarly self-referential. Each node contains a front pointer, a back pointer, and the node's value. If you implement it with references in Perl, you get a circular set of references and the data structure won't be automatically garbage collected when there are no external references to its nodes.
Making each node an instance of class Ring doesn't solve the problem. What you want is for Perl to clean up this structure as it would any other structure—which it will do if you implement your object as a structure that contains a reference to the real circle. That reference will be stored in the "DUMMY" field:
package Ring;



# return an empty ring structure

sub new {

    my $class = shift;

    my $node  = { };

    $node->{NEXT} = $node->{PREV} = $node;

    my $self  = { DUMMY => $node, COUNT => 0 };

    bless $self, $class;

    return $self;

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Overloading Operators
Inhaltsvorschau
You want to use familiar operators like = = or + on objects from a class you've written, or you want to define the print interpolation value for objects.
Use the use overload pragma. Here are two of the more commonly overloaded operators:
use overload 

  '<=>' => \&threeway_compare;



sub threeway_compare {

    my ($s1, $s2) = @_;

    return uc($s1->{NAME}) cmp uc($s2->{NAME});

}



use overload 

    '""'  => \&stringify;



sub stringify {

    my $self = shift;

    return sprintf "%s (%05d)",

            ucfirst(lc($self->{NAME})),

            $self->{IDNUM};

}
When you use built-in types, certain operators apply, like + for addition or . for string concatenation. With the use overload pragma, you can customize these operators so they do something special on your own objects.
This pragma takes a list of operator/function call pairs, such as:
package TimeNumber;

use overload         '+' => \&my_plus,

                     '-' => \&my_minus,

                     '*' => \&my_star,

                     '/' => \&my_slash;
Those four operators can now be used with objects of class TimeNumber, and the listed functions will be called as method invocations. These functions can do anything you'd like.
Here's a simple example of an overload of + for use with an object that holds hours, minutes, and seconds. It assumes that both operands are of a class that has a new method that can be invoked as an object method, and that the structure names are as shown:
sub my_plus {

    my($left, $right) = @_;

    my $answer = $left->new( );

    $answer->{SECONDS} = $left->{SECONDS} + $right->{SECONDS};

    $answer->{MINUTES} = $left->{MINUTES} + $right->{MINUTES};

    $answer->{HOURS}   = $left->{HOURS}   + $right->{HOURS};



    if ($answer->{SECONDS} >= 60) {

        $answer->{SECONDS} %= 60;

        $answer->{MINUTES} ++;

    }



    if ($answer->{MINUTES} >= 60) {

        $answer->{MINUTES} %= 60;

        $answer->{HOURS}   ++;

    }



    return $answer;



}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating Magic Variables with tie
Inhaltsvorschau
You want to add special processing to a variable or handle.
Use the tie function to give your ordinary variables object hooks.
Anyone who's ever used a DBM file under Perl has already used tied objects. Perhaps the most excellent way of using objects is such that the user need never notice them. With tie, you can bind a variable or handle to a class, after which all access to the tied variable or handle is transparently intercepted by specially named object methods (see Table 13-2).
The most important tie methods are FETCH to intercept read access, STORE to intercept write access, and the constructor, which is one of TIESCALAR, TIEARRAY, TIEHASH, or TIEHANDLE.
Table 13-2: How tied variables are interpreted
User code
Executed code
tie $s, "SomeClass"
SomeClass->TIESCALAR( )
$p = $s
$p = $obj->FETCH( )
$s = 10
$obj->STORE(10)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 14: Database Access
Inhaltsvorschau
I only ask for information.
—Charles Dickens, David Copperfield
Everywhere you find data, you find databases. At the simplest level, every file can be considered a database. At the most complex level, expensive and complex relational database systems handle thousands of transactions per second. In between are countless improvised schemes for fast access to loosely structured data. Perl can work with all of them.
Early in the history of computers, people noticed that flat file databases don't scale to large data sets. Flat files were tamed using fixed-length records or auxiliary indices, but updating became expensive, and previously simple applications bogged down with I/O overhead.
After some head-scratching, clever programmers devised a better solution. As hashes in memory provide more flexible access to data than do arrays, hashes on disk offer more convenient kinds of access than array-like text files. These benefits in access time cost you space, but disk space is cheap these days (or so the reasoning goes).
The DBM library gives Perl programmers a simple, easy-to-use database. You use the same standard operations on hashes bound to DBM files as you do on hashes in memory. In fact, that's how you use DBM databases from Perl. You use tie to associate a hash with a class and a file. Then whenever you access the hash, the class consults or changes the DBM database on disk. The old dbmopen function also did this, but only let you use one DBM implementation in your program, so you couldn't copy from one format to another.
Recipe 14.1 shows how to create a DBM database and gives tips on using it efficiently. Although you can do with DBM files the same things you do with regular hashes, their disk-based nature leads to performance concerns that don't exist with in-memory hashes. Because DBM files are disk-based and can be shared between processors, use a sentinel lock file (see Recipe 7.24) to regulate concurrent access to them.Recipes Recipe 14.2 and Recipe 14.4 explain these concerns and show how to work around them. DBM files also make possible operations that aren't available using regular hashes. Recipe 14.5 explains two of these things.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Everywhere you find data, you find databases. At the simplest level, every file can be considered a database. At the most complex level, expensive and complex relational database systems handle thousands of transactions per second. In between are countless improvised schemes for fast access to loosely structured data. Perl can work with all of them.
Early in the history of computers, people noticed that flat file databases don't scale to large data sets. Flat files were tamed using fixed-length records or auxiliary indices, but updating became expensive, and previously simple applications bogged down with I/O overhead.
After some head-scratching, clever programmers devised a better solution. As hashes in memory provide more flexible access to data than do arrays, hashes on disk offer more convenient kinds of access than array-like text files. These benefits in access time cost you space, but disk space is cheap these days (or so the reasoning goes).
The DBM library gives Perl programmers a simple, easy-to-use database. You use the same standard operations on hashes bound to DBM files as you do on hashes in memory. In fact, that's how you use DBM databases from Perl. You use tie to associate a hash with a class and a file. Then whenever you access the hash, the class consults or changes the DBM database on disk. The old dbmopen function also did this, but only let you use one DBM implementation in your program, so you couldn't copy from one format to another.
Recipe 14.1 shows how to create a DBM database and gives tips on using it efficiently. Although you can do with DBM files the same things you do with regular hashes, their disk-based nature leads to performance concerns that don't exist with in-memory hashes. Because DBM files are disk-based and can be shared between processors, use a sentinel lock file (see Recipe 7.24) to regulate concurrent access to them.Recipes Recipe 14.2 and Recipe 14.4 explain these concerns and show how to work around them. DBM files also make possible operations that aren't available using regular hashes. Recipe 14.5 explains two of these things.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making and Using a DBM File
Inhaltsvorschau
You want to create, populate, inspect, or delete values in a DBM database.
Use tie to open the database and make it accessible through a hash. Then use the hash as you normally would. When you're done, call untie:
use DB_File;             # load database module



tie %HASH, "DB_File", $FILENAME                # open database to be accessed

    or die "Can't open $FILENAME:$!\n";        # through %HASH



$V = $HASH{$KEY};                              # retrieve from database

$HASH{$KEY} = $VALUE;                          # put value into database

if (exists $HASH{$KEY}) {                      # check whether in database

    # ...

}

delete $HASH{$KEY};                            # delete from database

untie %HASH;                                   # close the database
Accessing a database as a hash is powerful but easy, giving you a persistent hash that sticks around after the program using it has finished running. It's also much faster than loading in a new hash every time; even if the hash has a million entries, your program starts up virtually instantaneously.
The program in Example 14-1 treats the database as though it were a normal hash. You can even call keys or each on it. Likewise, exists and defined are implemented for tied DBM hashes. Unlike a normal hash, a DBM hash does not distinguish between those two functions.
Example 14-1. userstats
  #!/usr/bin/perl -w

  # userstats - generates statistics on who is logged in.

  # call with an argument to display totals

  

  use DB_File;

  

  $db = "/tmp/userstats.db";       # where data is kept between runs

  

  tie(%db, 'DB_File', $db)         or die "Can't open DB_File $db : $!\n";

  

  if (@ARGV) {

      if ("@ARGV" eq "ALL") {

          @ARGV = sort keys %db;

      }

      foreach $user (@ARGV) {

              print "$user\t$db{$user}\n";

      }

  } else {

      @who = `who`;                                   # run who(1)

      if ($?) {

          die "Couldn't run who: $?\n";               # exited abnormally

      }

      # extract username (first thing on the line) and update

      foreach $line (@who) {

          $line =~ /^(\S+)/;

          die "Bad line from who: $line\n" unless $1;

          $db{$1}++;

      }

  }

  

  untie %db;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Emptying a DBM File
Inhaltsvorschau
You want to clear out a DBM file.
Open the database and assign ( ) to it. Use tie:
use DB_File;



tie(%HASH, "DB_File", $FILENAME)        or die "Can't open FILENAME: $!\n";

%HASH = ( );

untie %HASH;
Alternatively, delete the file and reopen:
unlink $FILENAME

    or die "Couldn't unlink $FILENAME to empty the database: $!\n";

tie(%HASH => "DB_File", $FILENAME)

    or die "Couldn't create $FILENAME database: $!\n";
It may be quicker to delete the file and create a new one than to reset it, but doing so opens you up to a race condition that trips up a careless program or makes it vulnerable to an attacker. The attacker could make a link pointing to the file /etc/precious with the same name as your file between the time when you deleted the file and when you recreated it. When the DBM library opens the file, it clobbers /etc/precious.
If you delete a DB_File database and recreate it, you'll lose any customizable settings like page size, fill-factor, and so on. This is another good reason to assign the empty list to the tied hash.
The documentation for the standard DB_File module, also in Chapter 32 of Programming Perl; the unlink function in perlfunc(1); Recipe 14.1
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Converting Between DBM Files
Inhaltsvorschau
You have a file in one DBM format, but another program expects input in a different DBM format.
Read the keys and values from the initial DBM file and write them to a new file in the different DBM format as in Example 14-2.
Example 14-2. db2gdbm
  #!/usr/bin/perl -w

  # db2gdbm: converts DB to GDBM

  

  use strict;

  

  use DB_File;

  use GDBM_File;

  

  unless (@ARGV =  = 2) {

      die "usage: db2gdbm infile outfile\n";

  }

  

  my ($infile, $outfile) = @ARGV;                     

  my (%db_in, %db_out);                               

  

  # open the files

  tie(%db_in, 'DB_File', $infile)

      or die "Can't tie $infile: $!";

  tie(%db_out, 'GDBM_File', $outfile, GDBM_WRCREAT, 0666)

      or die "Can't tie $outfile: $!";

  

  # copy (don't use %db_out = %db_in because it's slow on big databases)

  while (my($k, $v) = each %db_in) {

      $db_out{$k} = $v;

  }

  

  # these unties happen automatically at program exit

  untie %db_in;

  untie %db_out;
Call the program as:
% db2gdbm /tmp/users.db /tmp/users.gdbm
When multiple types of DBM file are used in the same program, you have to use tie, not the dbmopen interface. That's because with dbmopen you can use only one database format, which is why its use is deprecated.
Copying hashes by simple assignment, as in %new = %old, works on DBM files. However, it loads everything into memory first as a list, which doesn't matter with small hashes, but can be prohibitively expensive in the case of DBM files. For database hashes, use each to iterate through them instead.
The documentation for the standard modules GDBM_File, NDBM_File, SDBM_File, DB_File, some of which are in Chapter 32 of Programming Perl; Recipe 14.1
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Merging DBM Files
Inhaltsvorschau
You want to combine two DBM files into a single DBM file with original key-value pairs.
Either merge the databases by treating their hashes as lists:
%OUTPUT = (%INPUT1, %INPUT2);
or, more wisely, by iterating over each key-value pair:
%OUTPUT = ( );

foreach $href ( \%INPUT1, \%INPUT2 ) {

    while (my($key, $value) = each(%$href)) {

        if (exists $OUTPUT{$key}) {

            # decide which value to use and set $OUTPUT{$key} if necessary

        } else {

            $OUTPUT{$key} = $value;

        }

    }

}
This straightforward application of Recipe 5.11 comes with the same caveats. Merging hashes by treating them as lists requires that the hashes be preloaded into memory, creating a potentially humongous temporary list. If you're dealing with large hashes, have little virtual memory, or both, then you want to iterate over the keys with each to save memory.
Another difference between these merging techniques is what to do if the same key exists in both input databases. The blind assignment merely overwrites the first value with the second value. The iterative merging technique lets you decide what to do. Possibilities include issuing a warning or error, choosing the first over the second, choosing the second over the first, or concatenating the new value to the old one. If you're using the MLDBM module, you can even store them both, using an array reference to the two values.
Recipe 5.11; Recipe 14.6
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sorting Large DBM Files
Inhaltsvorschau
You want to process a large dataset you'd like to commit to a DBM file in a particular order.
Use the DB_File's B-tree bindings and supply a comparison function of your own devising:
use DB_File;



# specify the Perl sub to do key comparison using the

# exported $DB_BTREE hash reference

$DB_BTREE->{'compare'} = sub {

    my ($key1, $key2) = @_ ;

    "\L$key1" cmp "\L$key2" ;

};



tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE)

    or die "can't tie $filename: $!";
An annoyance of hashes, whether in memory or as DBM files, is that they do not maintain proper ordering. The CPAN module Tie::IxHash can make a regular hash in memory maintain its insertion order, but that doesn't help for DBM databases or arbitrary sorting criteria.
The DB_File module supports a nice solution to this using a B-tree implementation. One advantage of a B-tree over a regular DBM hash is its ordering. When the user defines a comparison function, all calls to keys, values, and each are automatically ordered. For example, Example 14-3 is a program that maintains a hash whose keys will always be sorted case-insensitively.
Example 14-3. sortdemo
  #!/usr/bin/perl

  # 

  sortdemo - show auto dbm sorting

  use strict;

  use DB_File;

  

  $DB_BTREE->{'compare'} = sub {

      my ($key1, $key2) = @_ ;

      "\L$key1" cmp "\L$key2" ;

  };

  

  my %hash;

  my $filename = '/tmp/sorthash.db';

  tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE)

      or die "can't tie $filename: $!";

  

  my $i = 0;

  for my $word (qw(Can't you go camp down by Gibraltar)) {

      $hash{$word} = ++$i;

  }

  

  while (my($word, $number) = each %hash) {

      printf "%-12s %d\n", $word, $number;

  }
By default, the entries in a B-tree DB_File database are stored alphabetically. Here, though, we provide a case-insensitive comparison function, so using
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Storing Complex Data in a DBM File
Inhaltsvorschau
You want values in a DBM file to be something other than scalars. For instance, you use a hash of hashes in your program and want to store them in a DBM file for other programs to access, or you want them to persist across process runs.
Use the CPAN module MLDBM to store more complex values than strings and numbers.
use MLDBM 'DB_File';

tie(%HASH, 'MLDBM', [... other DBM arguments]) or die $!;
Specify a particular serializing module with:
use MLDBM qw(DB_File Storable);
MLDBM uses a serializing module like Storable, Data::Dumper, or FreezeThaw (see Recipe 11.4) to convert data structures to and from strings so that they can be stored in a DBM file. It doesn't store references; instead, it stores the data those references refer to:
# %hash is a tied hash

$hash{"Tom Christiansen"} = [ "book author", 'tchrist@perl.com' ];          

$hash{"Tom Boutell"} = [ "shareware author", 'boutell@boutell.com' ];



# names to compare

$name1 = "Tom Christiansen";

$name2 = "Tom Boutell";



$tom1 = $hash{$name1};      # snag local pointer

$tom2 = $hash{$name2};      # and another           



print "Two Toming: $tom1 $tom2\n";



Tom Toming: ARRAY(0x73048) ARRAY(0x73e4c)

            
Each time MLDBM retrieves a data structure from the DBM file, it generates a new copy of that data. To compare data that you retrieve from a MLDBM database, you need to compare the values within the structure:
if ($tom1->[0] eq $tom2->[0] &&

    $tom1->[1] eq $tom2->[1]) {

    print "You're having runtime fun with one Tom made two.\n";

} else {

    print "No two Toms are ever alike.\n";

}
This is more efficient than:
if ($hash{$name1}->[0] eq $hash{$name2}->[0] &&     # INEFFICIENT

    $hash{$name1}->[1] eq $hash{$name2}->[1]) {

    print "You're having runtime fun with one Tom made two.\n";

 } else {

    print "No two Toms are ever alike.\n";

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Persistent Data
Inhaltsvorschau
You want your variables to retain their values between calls to your program.
Use a MLDBM to store the values between calls to your program:
use MLDBM "DB_File";



my ($VARIABLE1,$VARIABLE2);

my $Persistent_Store = "/projects/foo/data";

BEGIN {

    my %data;

    tie(%data, "MLDBM", $Persistent_Store)

        or die "Can't tie to $Persistent_Store : $!";

    $VARIABLE1 = $data{VARIABLE1};

    $VARIABLE2 = $data{VARIABLE2};

    # ...

    untie %data;

}

END {

    my %data;

    tie (%data, "MLDBM", $Persistent_Store)

        or die "Can't tie to $Persistent_Store : $!";

    $data{VARIABLE1} = $VARIABLE1;

    $data{VARIABLE2} = $VARIABLE2;

    # ...

    untie %data;

}
An important limitation of MLDBM is that you can't add to or alter the structure in the reference without assignment to a temporary variable. We do this in the sample program in Example 14-4, assigning to $array_ref before we push. You can't simply do this:
push(@{$db{$user}}, $duration);
For a start, MLDBM doesn't allow it. Also, $db{$user} might not be in the database (the array reference isn't automatically created as it would be if %db weren't tied to a DBM file). This is why we test exists $db{$user} when we give $array_ref its initial value. We're creating the empty array for the case where it doesn't already exist.
Example 14-4. mldbm-demo
  #!/usr/bin/perl -w

  # mldbm_demo - show how to use MLDBM with DB_File

  

  use MLDBM "DB_File";

  

  $db = "/tmp/mldbm-array";

  

  tie %db, "MLDBM", $db

    or die "Can't open $db : $!";

  

  while(<DATA>) {

      chomp;

      ($user, $duration) = split(/\s+/, $_);

      $array_ref = exists $db{$user} ? $db{$user} : [  ];

      push(@$array_ref, $duration);

      $db{$user} = $array_ref;

  }

  

  foreach $user (sort keys %db) {

      print "$user: ";

      $total = 0;

      foreach $duration (@{ $db{$user} }) {

          print "$duration ";

          $total += $duration;

      }

          print "($total)\n";

      }

  

  _ _END_ _

  gnat        15.3

  tchrist     2.5

  jules       22.1

  tchrist     15.9

  gnat        8.7
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Saving Query Results to Excel or CSV
Inhaltsvorschau
You want to query a relational database and create a file of the results so that another program or person can use them. The two common formats people want to get data in are CSV and Excel.
Use the CPAN module DBIx::Dump to dump the statement handle after the query:
use DBIx::Dump;

use DBI;



# ... connect to your database as normal

$sth = $dbh->prepare("SELECT ...");  # your query here

$sth->execute( );



$out = DBIx::Dump->new('format' => $FORMAT,   # excel or csv

                       'output' => $FILENAME, # file to save as

                       'sth'    => $sth);

$out->dump( );
The CPAN module DBIx::Dump supports Excel and CSV file formats. It uses the CPAN module Spreadsheet::WriteExcel to write Excel files, and the CPAN module Text::CSV_XS to write CSV files.
The first row in the output files holds the column names. For example:
ID,NAME

1,Nat

2,Tom

4,Larry

5,Damian

6,Jon

7,Dan
The documentation for the CPAN modules DBIx::Dump, Spreadsheet::WriteExcel, and Text::CSV_XS; Recipe 14.17
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Executing an SQL Command Using DBI
Inhaltsvorschau
You want to send SQL queries to a database system such as Oracle, Sybase, mSQL, or MySQL, and process the results.
Use the DBI (DataBase Interface) and DBD (DataBase Driver) modules available from CPAN:
use DBI;



$dbh = DBI->connect('dbi:driver:database', 'username', 'auth',

            { RaiseError => 1, AutoCommit => 1});



$dbh->do($NON_SELECT_SQL_STATEMENT);



$results = $dbh->selectall_arrayref($SELECT_SQL_STATEMENT);



$sth = $dbh->prepare($SQL_SELECT_STATEMENT);

$sth->execute( );

while (@row = $sth->fetchrow_array) {

    # ...

}



$dbh->disconnect( );
The DBI module abstracts away the different database APIs, offering you a single set of functions for accessing every database. The actual work of connecting to a database, issuing queries, parsing results, etc. is done by a DBD module specific to that database (e.g., DBD::mysql, DBD::Oracle, etc.).
All work with databases via the DBI is done through handles. A handle is simply an object, created by calling DBI->connect. This is attached to a specific database and driver using the DBI->connect call.
The first argument to DBI->connect is a single string with three colon-separated fields. This DSN (Data Source Name) identifies the database you're connecting to. The first field is always dbi (though this is case-insensitive, so DBI will do just as well), and the second is the name of the driver you're going to use (Oracle, mysql, etc.). The rest of the string is passed by the DBI module to the requested driver module (DBD::mysql, for example) where it identifies the database.
The second and third arguments authenticate the user.
The fourth argument is an optional hash reference defining attributes of the connection. PrintError controls whether DBI warns when a DBI method fails (the default is true; setting it to a false value keeps DBI quiet). Setting RaiseError is like PrintError except that
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Escaping Quotes
Inhaltsvorschau
You want to put Perl values into queries as literal strings, but you're not sure how your database wants strings to be quoted.
Use the database handle's quote method:
$quoted = $dbh->quote($unquoted);
This $quoted value is now suitable for interpolation into queries:
$sth->prepare("SELECT id,login FROM People WHERE name = $quoted");
Or simply use placeholders in your query and DBI automatically quotes strings for you:
$sth->prepare("SELECT id,login FROM People WHERE name = ?");

$sth->execute($unquoted);
Each database has its own quoting idiosyncrasies, so leave the quoting to the quote method or placeholders rather than trying to roll your own quoting function. Not only is hardcoding quotes into your SQL non-portable, it doesn't take into account the possibility that the strings you're interpolating might have quotes in them. For example, take this:
$sth = $dbh->prepare(qq{SELECT id,login FROM People WHERE name="$name"});
If $name is Jon "maddog" Orwant, then you are effectively preparing this query, which is invalid SQL:
SELECT id,login FROM People WHERE name="Jon "maddog" Orwant"
The only strange quoting behavior from quote is this: because the DBI represents NULL values as undef, if you pass undef to quote, it returns NULL without quotes.
The documentation with the DBI module from CPAN; http://dbi.perl.org; Programming the Perl DBI
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Dealing with Database Errors
Inhaltsvorschau
You want your program to catch and handle database errors, possibly displaying informative error messages.
The best solution is to enable RaiseError when you connect to the database, then wrap database calls in eval:
$dbh = DBI->connect($DSN, $user, $password,

                    { RaiseError => 1 });

eval {

  $dbh->do($SQL);

  $sth = $dbh->prepare($SQL2);

  $sth->execute( );

  while (@row = $sth->fetchrow_array) {

    # ...

  }

};

if ($@) {

  # recover here using $DBI::lasth->errstr to get

  # the error message

}
The logic here is simple: first tell DBI to die if there's a problem with your SQL (otherwise, the database can't do what you wanted it to). Then, wrap the code that might die in eval to catch fatal errors. Next, check $@ (either the error message you would have died with or empty if there was no error) to see whether something went wrong. If it did, somehow deal with the error.
DBI supplies the $DBI::lasth variable, containing the last handle acted on. If something went wrong, that's the handle that caused it. You could use the $@ error message, but that also has the "died at file ... line ..." text from die that you might not want. To discover the SQL statement that died, use $DBI::lasth->{Statement}. If you are using only the one handle, you can call the methods directly on your handle instead of on $DBI::lasth:
$msg = $dbh->errstr;

$sql = $dbh->{Statement};
An alternative approach is to disable RaiseError and check the return value for each database call. Methods such as do and execute return a true value if successful, so you can say:
$dbh->do($SQL) or die $dbh->errstr;

$sth->execute( ) or die $sth->errstr;
The do method returns the number of rows affected, but in such a way that it always returns a true value if successful. (If you're curious how to do this, see the Introduction to Chapter 1 for the gory details of how Perl decides what's true and what's false.)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Repeating Queries Efficiently
Inhaltsvorschau
You have a query that you want to execute repeatedly, and you'd like to do it as efficiently as possible. Sometimes you have several queries that are similar, but not quite identical, that you'd like to execute efficiently (for example, you have a loop through an array of names and want to SELECT ... WHERE name=$name).
Take advantage of the fact that you can repeatedly execute a query that you need prepare only once.
$sth = $dbh->prepare($SQL);

# execute query ten times

for ($i=0; $i < 10; $i++) {

  $sth->execute( );

  while (@row = $sth->fetchrow_array) {

    # ...

  }

}
If you have changing parameters, use the DBI's binding features:
$sth = $dbh->prepare('SELECT uid,login FROM People WHERE name = ?');

foreach $person (@names) {

  $sth->execute($person);

  while (@row = $sth->fetchrow_array) {

    # ...

  }

}
"Prepare once, execute often" is one secret to DBI success. By separating preparation from execution, the database server can parse and optimize queries once and then execute them many times. Most databases can do this even when the queries contain placeholders for values to be filled when the query is executed.
The process of replacing placeholders with actual values is known as binding. The simplest way is to bind when you execute:
$sth = $dbh->prepare('SELECT id,login FROM People WHERE middle_initial = ?');

$sth->execute('J');
If you have multiple parameters to bind, pass more values to execute:
$sth = $dbh->prepare('SELECT * FROM Addresses WHERE House = ?

                      AND Street LIKE ?');

$sth->execute('221b', 'Baker%');
You don't have to do the binding and the execution in one step. The bind_param function binds without executing:
$sth = $dbh->prepare('SELECT id,login FROM People WHERE middle_initial = ?');

$sth->bind_param(1, 'J');

$sth->execute( );
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Building Queries Programmatically
Inhaltsvorschau
You want to create searches at runtime. For example, you want users of your program to be able to specify combinations of columns and allowable ranges of values.
Build a list of clauses and join them together to form the SQL WHERE clause:
if ($year_min)     { push @clauses, "Year >= $year_min" }

if ($year_max)     { push @clauses, "Year <= $year_max" }

if ($bedrooms_min) { push @clauses, "Beds >= $bedrooms_min" }

if ($bedrooms_max) { push @clauses, "Beds <= $bedrooms_max" }

# ...

$clause = join(" AND ", @clauses);

$sth = $dbh->prepare("SELECT beds,baths FROM Houses WHERE $clause");
Don't try to build up a string in a loop:
$where = '';

foreach $possible (@names) {

  $where .= ' OR Name=' . $dbh->quote($possible);

}
That code will end up creating a WHERE clause like:
OR Name="Tom" OR Name="Nat" OR Name="Larry" OR Name="Tim"
Then you end up having to lop off the leading " OR ". It's much cleaner to use map and never have the extra text at the start:
$where = join(" OR ", map { "Name=".$dbh->quote($_) } @names);
The map produces a list of strings like:
Name="Nat"

Name="Tom"

Name="Larry"

Name="Tim"
and then they're joined together with " OR " to create a well-formed clause:
Name="Nat" OR Name="Tom" OR Name="Larry" OR Name="Tim"
Unfortunately, you cannot use placeholders here:
$sth = $dbh->prepare("SELECT id,login FROM People WHERE ?");   # BAD

$sth->bind_param(1, $where);
As explained in Recipe 14.12, placeholders can only be used for simple scalar values and not entire clauses. However, there is an elegant solution: construct the clause and the values to be bound in parallel:
if ($year_min)     { push @clauses, "Year >= ?"; push @bind, $year_min }

if ($year_max)     { push @clauses, "Year <= ?"; push @bind, $year_max }

if ($bedrooms_min) { push @clauses, "Beds >= ?"; push @bind, $bedrooms_min }

if ($bedrooms_max) { push @clauses, "Beds <= ?"; push @bind, $bedrooms_max }

$clause = join(" AND ", @clauses);

$sth = $dbh->prepare("SELECT id,price FROM Houses WHERE $clause");

$sth->execute(@bind);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding the Number of Rows Returned by a Query
Inhaltsvorschau
You want to find out how many rows were returned by a query.
For operations that aren't queries (such as INSERTs, UPDATEs, and DELETEs), the do method returns the number of rows affected, -1 when it can't determine the right value, or else undef in case of failure.
$rows = $dbh->do("DELETE FROM Conference WHERE Language='REBOL'");

if (! defined $rows) {

  # failed, but this is not needed if RaiseError is active

} else {

  print "Deleted $rows rows\n";

}
You can't reliably get row counts from queries without either fetching all of the results and then counting them, or writing another query.
The easiest way to find out how many rows a query will return is to use the COUNT function in SQL. For example, take this query:
SELECT id,name FROM People WHERE Age > 30
To find out how many rows it will return, simply issue this query:
SELECT COUNT(*) FROM People WHERE Age > 30
If the database is so volatile that you're afraid the number of rows will change between the COUNT query and the data-fetching query, your best option is to fetch the data and then count rows yourself.
With some DBD modules, execute returns the number of rows affected. This isn't portable and may change in the future.
The documentation with the DBI module from CPAN; http://dbi.perl.org; Programming the Perl DBI
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Transactions
Inhaltsvorschau
A single change to the database requires several INSERT, UPDATE, or DELETE commands in SQL. For example, you might have to add a person to the People table, add an address to the Address table, and add a link between them to the LivesAt table. The database is in a logically inconsistent state from the first insert until the last completes. If another client queries the database, it will get inconsistent data (e.g., there'll be a Person with no address).
You want to perform the update in such a way that another client never sees an inconsistent database—either all of the changes or none of them should be visible in the database during and after the changes, regardless of any client or server failures during processing.
Use transactions. The DBI supports these via the commit and rollback methods on a database handle. Use them thus:
$dbh->{AutoCommit} = 0; # enable transactions

$dbh->{RaiseError} = 1; # die( ) if a query has problems



eval {

  # do inserts, updates, deletes, queries here

  $dbh->commit( );

};



if ($@) {

  warn "Transaction aborted: $@";

  eval { $dbh->rollback( ) };   # in case rollback( ) fails

  # do your application cleanup here

}
The AutoCommit option controls whether the database commits each change as soon as you issue the command. When AutoCommit is disabled, the database won't update until you call the commit method. If midway through the series of updates you change your mind or an error occurs, the rollback method undoes all pending changes.
You don't have to explicitly set the AutoCommit and RaiseError attributes before each transaction. For convenience, set those attributes in the connect call:
$dbh = DBI->connect($dsn, $username, $password,

                    { AutoCommit => 0, RaiseError => 1 });
Because RaiseError causes DBI to call die whenever a database operation fails, you break out of the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Viewing Data One Page at a Time
Inhaltsvorschau
You want to display the contents of a table or the results of a query one page at a time.
Keep track of which record you're starting with, then use that value to decide how many records to skip before you display a page worth. If your database supports a LIMIT clause that takes a range, use this to avoid transferring unnecessary rows into your program.
The example code in this recipe pages through the contents of a table. To page through the results of a query, select the data into a temporary table and page through that.
In desktop applications (e.g., Tk), you can keep track of the current page number yourself. With web applications, the easiest thing to do is to use query parameters in the URL to indicate where you are. For example:
/users-report/view?start=1
Begin by finding out how many records there are in total:
$row = $Dbh->selectrow_arrayref("SELECT COUNT(*) FROM Users");

$count = $row->[0];
Find the first record to display by looking at the start parameter, then calculate the last record from that. You need to know the number of records per page, which here we assume is in the $Page_Size variable:
$first = param('start') || 1;

$last  = $first + $Page_Size - 1;

$last = $count if $last > $count;   # don't go past the end
Now fetch the data into an array and display the records you're interested in:
$results = $Dbh->selectall_arrayref('SELECT id,lastname,firstname FROM Users

                                     ORDER BY lastname,firstname,id');



for (my $i=$first; $i <= $last; $i++) {

  my $user = $results->[$i-1];        # result 1 is in row 0

  printf("%d.  %s, %s.<br>\n", $i, $user->[1], $user->[2]);

}
That will produce output like:

               1.  Brocard, Leon.<br>

               2.  Cawley, Piers.<br>

               3.  Christiansen, Tom.<br>

            
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Querying a CSV File with SQL
Inhaltsvorschau
You want to use SQL to insert, delete, or retrieve data from a comma-separated values (CSV) file.
Use the DBD::CSV module from CPAN:
use DBI;



$dbh = DBI->connect("dbi:CSV:f_dir=/home/gnat/payroll", "", "",

                    { AutoCommit => 1, RaiseError => 1 });



$dbh->do("UPDATE salaries SET salary = salary * 2 WHERE name = 'Nat'");



$sth = $dbh->prepare("SELECT name,salary FROM salaries WHERE name = 'Nat'");

$sth->execute( );

while (@row = $sth->fetchrow_array) {

  # ...

}

$sth->finish( );



$dbh->disconnect( );
A "table" in CSV terms is a file (the table name becomes the filename). The tables are kept in the directory specified by the f_dir parameter in the connect method call. The DBD::CSV module supports CREATE and DROP to make and destroy tables:
$dbh->do("CREATE TABLE salaries (salary FLOAT, name CHAR(20))");
Valid column types are: TINYINT, BIGINT, LONGVARBINARY, VARBINARY, BINARY, LONGVARCHAR, CHAR, NUMERIC, DECIMAL, INTEGER, SMALLINT, FLOAT, REAL, and DOUBLE.
When you access a table, the DBD::CSV module locks the corresponding file with the flock(2) syscall. If flock(2) isn't supported on the filesystem containing the CSV file, two processes will be able to access the file at the same time, possibly leading to incorrect results or lost data.
If you're reading or writing an Excel CSV file, you need to tell the DBD::CSV module that the value separator is actually a semicolon:
$dbh = DBI->connect('dbi:CSV:f_dir=/home/gnat/payroll;csv_sep_char=\;');
We need to quote the semicolon to prevent connect from thinking it's separating csv_sep_char= from another connection attribute. We use single quotes rather than double quotes to avoid having to backslash the backslash:
$dbh = DBI->connect("dbi:CSV:f_dir=/home/gnat/payroll;csv_sep_char=\\;");
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using SQL Without a Database Server
Inhaltsvorschau
You want to make complex SQL queries but don't want to maintain a relational database server.
Use the DBD::SQLite module from CPAN:
use DBI;



$dbh = DBI->connect("dbi:SQLite:dbname=/Users/gnat/salaries.sqlt", "", "",

                    { RaiseError => 1, AutoCommit => 1 });



$dbh->do("UPDATE salaries SET salary = 2 * salary WHERE name = 'Nat'");



$sth = $dbh->prepare("SELECT id,deductions FROM salaries WHERE name = 'Nat'");

# ...
An SQLite database lives in a single file, specified with the dbname parameter in the DBI constructor. Unlike most relational databases, there's no database server here—DBD::SQLite interacts directly with the file. Multiple processes can read from the same database file at the same time (with SELECTs), but only one process can make changes (and other processes are prevented from reading while those changes are being made).
SQLite supports transactions. That is, you can make a number of changes to different tables, but the updates won't be written to the file until you commit them:
use DBI;

$dbh = DBI->connect("dbi:SQLite:dbname=/Users/gnat/salaries.sqlt", "", "",

                    { RaiseError => 1, AutoCommit => 0 });

eval {

  $dbh->do("INSERT INTO people VALUES (29, 'Nat', 1973)");

  $dbh->do("INSERT INTO people VALUES (30, 'William', 1999)");

  $dbh->do("INSERT INTO father_of VALUES (29, 30)");

  $dbh->commit( );

};

if ($@) {

      eval { $dbh->rollback( ) };

      die "Couldn't roll back transaction" if $@;

}
SQLite is a typeless database system. Regardless of the types specified when you created a table, you can put any type (strings, numbers, dates, blobs) into any field. Indeed, you can even create a table without specifying any types:
CREATE TABLE people (id, name, birth_year);
The only time that data typing comes into play is when comparisons occur, either through WHERE clauses or when the database has to sort values. The database ignores the type of the column and looks only at the type of the specific value being compared. Like Perl, SQLite recognizes only strings and numbers. Two numbers are compared as floating-point values, two strings are compared as strings, and a number is always less than a string when values of two different types are compared.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: ggh—Grep Netscape Global History
Inhaltsvorschau
This program divulges the contents of Netscape's history.db file. It can be called with full URLs or with a (single) pattern. If called without arguments, it displays every entry in the history file. The ~/.netscape/history.db file is used unless the -database option is given.
Each output line shows the URL and its access time. The time is converted into localtime representation with -localtime (the default) or gmtime representation with -gmtime—or left in raw form with -epochtime, which is useful for sorting by date.
To specify a pattern to match against, give one single argument without a ://.
To look up one or more URLs, supply them as arguments:
% ggh http://www.perl.com/index.html
To find out a link you don't quite recall, use a regular expression (a single argument without a :// is a pattern):
% ggh perl
To find out everyone you've mailed:
% ggh mailto:
To find out the FAQ sites you've visited, use a snazzy Perl pattern with an embedded /i modifier:
% ggh -regexp '(?i)\bfaq\b'
If you don't want the internal date converted to localtime, use -epoch:
% ggh -epoch http://www.perl.com/perl/
If you prefer gmtime to localtime, use -gmtime:
% ggh -gmtime http://www.perl.com/perl/
To look at the whole file, give no arguments (but perhaps redirect to a pager):
% ggh | less
If you want the output sorted by date, use the -epoch flag:
% ggh -epoch | sort -rn | less
If you want it sorted by date into your local time zone format, use a more sophisticated pipeline:
% ggh -epoch | sort -rn | perl -pe 's/\d+/localtime $&/e' | less
The Netscape release notes claim that they're using NDBM format. This is misleading: they're actually using Berkeley DB format, which is why we require DB_File (not supplied standard with all systems Perl runs on) instead of NDBM_File (which is). The program is shown in Example 14-7.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 15: Interactivity
Inhaltsvorschau
And then the Windows failed—and then I could not see to see
—Emily Dickinson, "I heard a Fly buzz—when I died"
Everything we use has a user interface: VCRs, computers, telephones, even books. Our programs have user interfaces: do we have to supply arguments on the command line? Can we drag and drop files into the program? Do we have to press Enter after every response we make, or can the program read a single keystroke at a time?
This chapter won't discuss designing user interfaces: entire bookshelves are filled with books written on the subject. Instead, we focus on implementing user interfaces—parsing command-line arguments, reading a character at a time, writing anywhere on the screen, and writing a graphical user interface.
The simplest user interface is what we are called line mode interfaces. Line mode programs normally read entire lines and write characters or entire lines. Filters like grep and utilities like mail exemplify this type of interface. We don't really talk much about this type of interface in this chapter, because so much of the rest of the book does.
A more complex interface is what is called full-screen mode. Programs such as vi, elm, and lynx have full-screen interfaces. They read single characters at a time and can write to any character position on the screen. We address this type of interface in Recipe 15.4, Recipe 15.6, Recipe 15.9, Recipe 15.10, and Recipe 15.11.
Still more complex are the graphical user interfaces (GUIs). Programs with GUIs can address individual pixels, not just characters. GUIs often follow a windowing metaphor, in which a program creates windows that appear on the user's display device. The windows are filled with widgets, which include things like scrollbars to drag or buttons to click. Netscape Navigator provides a full graphical user interface, as does your window manager. Perl can use many GUI toolkits, but here we'll cover the Tk toolkit, since it's the most well-known and portable. See Recipe 15.14, Recipe 15.15, and Recipe 15.22.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Everything we use has a user interface: VCRs, computers, telephones, even books. Our programs have user interfaces: do we have to supply arguments on the command line? Can we drag and drop files into the program? Do we have to press Enter after every response we make, or can the program read a single keystroke at a time?
This chapter won't discuss designing user interfaces: entire bookshelves are filled with books written on the subject. Instead, we focus on implementing user interfaces—parsing command-line arguments, reading a character at a time, writing anywhere on the screen, and writing a graphical user interface.
The simplest user interface is what we are called line mode interfaces. Line mode programs normally read entire lines and write characters or entire lines. Filters like grep and utilities like mail exemplify this type of interface. We don't really talk much about this type of interface in this chapter, because so much of the rest of the book does.
A more complex interface is what is called full-screen mode. Programs such as vi, elm, and lynx have full-screen interfaces. They read single characters at a time and can write to any character position on the screen. We address this type of interface in Recipe 15.4, Recipe 15.6, Recipe 15.9, Recipe 15.10, and Recipe 15.11.
Still more complex are the graphical user interfaces (GUIs). Programs with GUIs can address individual pixels, not just characters. GUIs often follow a windowing metaphor, in which a program creates windows that appear on the user's display device. The windows are filled with widgets, which include things like scrollbars to drag or buttons to click. Netscape Navigator provides a full graphical user interface, as does your window manager. Perl can use many GUI toolkits, but here we'll cover the Tk toolkit, since it's the most well-known and portable. See Recipe 15.14, Recipe 15.15, and Recipe 15.22.
The final class of UIs is one we won't address here—web user interfaces. Increasingly, people are eschewing the complicated programming of a fully responsive GUI whose every pixel is addressable, preferring relatively clunky and plain-looking HTML pages. After all, everyone has a web browser, but not everyone can figure out how to install Perl/Tk. We cover the Web in Chapter 19, Chapter 20, and Chapter 21.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing Program Arguments
Inhaltsvorschau
You want to let users change your program's behavior by giving options on the command line. For instance, you want to allow the user to control the level of output that your program produces with a -v (verbose) option.
Use the standard Getopt::Std module to permit single-character options:
use Getopt::Std;



# -v ARG, -D ARG, -o ARG, sets $opt_v, $opt_D, $opt_o

getopt("vDo");              

# -v ARG, -D ARG, -o ARG, sets $args{v}, $args{D}, $args{o}

getopt("vDo", \%args);



getopts("vDo:");         # -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o

getopts("vDo:", \%args); # -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o}
Or, use the standard Getopt::Long module to permit named arguments:
use Getopt::Long;



GetOptions( "verbose"  => \$verbose,     # --verbose

            "Debug"    => \$debug,       # --Debug

            "output=s" => \$output );    # --output=string or --output string
Most traditional programs like ls and rm take single-character options (also known as flags or switches), such as -l and -r. In the case of ls -l and rm -r, the argument is Boolean: either it is present or it isn't. Contrast this with gcc -o compiledfile source.c, where compiledfile is a value associated with the option -o. We can combine Boolean options into a single option in any order. For example:
% rm -r -f /tmp/testdir
Another way of saying this is:
% rm -rf /tmp/testdir
The Getopt::Std module, part of the standard Perl distribution, parses these types of traditional options. Its getopt function takes a single string of characters (each corresponding to an option that takes a value), parses the command-line arguments stored in @ARGV, and sets a global variable for each option. For example, the value for the -D option will be stored in
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Testing Whether a Program Is Running Interactively
Inhaltsvorschau
You want to know whether your program is being called interactively or not. For instance, a user running your program from a shell is interactive, whereas the program being called from cron is not.
Use -t to test STDIN and STDOUT:
sub I_am_interactive {

    return -t STDIN && -t STDOUT;

}
If you're on a POSIX system, test process groups:
use POSIX qw/getpgrp tcgetpgrp/;



sub I_am_interactive {

    my $tty;

    open($tty, "<", "/dev/tty")    or die "can't open /dev/tty: $!";

    my $tpgrp = tcgetpgrp(fileno($tty));

    my $pgrp  = getpgrp( );

    close $tty;

    return ($tpgrp =  = $pgrp);

}
The -t file test operator tells whether the filehandle or file is a tty device. Such devices are signs of interactive use. This only tells you whether your program has been redirected. Running your program from the shell and redirecting STDIN and STDOUT makes the -t version of I_am_interactive return false. Called from cron, I_am_interactive also returns false.
The POSIX test tells you whether your program has exclusive control over its tty. A program whose input and output has been redirected still can control its tty if it wants to, so the POSIX version of I_am_interactive returns true. A program run from cron has no tty, so I_am_interactive returns false.
Whichever I_am_interactive you choose to use, here's how you'd call it:
while (1) {

    if (I_am_interactive( )) {

        print "Prompt: ";

    }

    $line = <STDIN>;

    last unless defined $line; 

    # do something with the line

}
Or, more clearly:
sub prompt { print "Prompt: " if I_am_interactive( ) }

for (prompt( ); $line = <STDIN>; prompt( )) {

    # do something with the line

}
The documentation for the standard POSIX module, also in Chapter 32 of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Clearing the Screen
Inhaltsvorschau
You want to clear the screen.
Use the Term::Cap module to send the appropriate character sequence. Use POSIX:: Termios to get the output speed of the terminal (or guess 9600 bps). Use eval to trap any exceptions that arise using POSIX::Termios.
use Term::Cap;



$OSPEED = 9600;

eval {

    require POSIX;

    my $termios = POSIX::Termios->new( );

    $termios->getattr;

    $OSPEED = $termios->getospeed;

};



$terminal = Term::Cap->Tgetent({OSPEED=>$OSPEED});

$terminal->Tputs('cl', 1, STDOUT);
Or, just run the clear command:
system("clear");
If you clear the screen a lot, cache the return value from the termcap or clear command:
$clear = $terminal->Tputs('cl');

$clear = `clear`;
Then you can clear the screen a hundred times without running clear a hundred times:
print $clear;
Your system's clear(1) and termcap(5) manpages (if you have them); the documentation for the standard module Term::Cap module, also in Chapter 32 of Programming Perl; the documentation for the Term::Lib module from CPAN
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Determining Terminal or Window Size
Inhaltsvorschau
You need to know the size of the terminal or window. For instance, you want to format text so that it doesn't pass the righthand boundary of the screen.
Either use the ioctl described in Recipe 12.17, or else use the CPAN module Term::ReadKey:
use Term::ReadKey;



($wchar, $hchar, $wpixels, $hpixels) = GetTerminalSize( );
GetTerminalSize returns four elements: the width and height in characters and the width and height in pixels. If the operation is unsupported for the output device (for instance, if output has been redirected to a file), it returns an empty list.
Here's how you'd graph the contents of @values, assuming no value is less than 0:
use Term::ReadKey;



($width) = GetTerminalSize( );

die "You must have at least 10 characters" unless $width >= 10;



$max = 0;

foreach (@values) {

    $max = $_ if $max < $_;

}



$ratio = ($width-10)/$max;          # chars per unit

foreach (@values) {

    printf("%8.1f %s\n", $_, "*" x ($ratio*$_));

}
The documentation for the Term::ReadKey module from CPAN; Recipe 12.17
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Changing Text Color
Inhaltsvorschau
You want text to appear in different colors on the screen. For instance, you want to emphasize a mode line or highlight an error message.
Use the CPAN module Term::ANSIColor to send the ANSI color-change sequences to the user's terminal:
use Term::ANSIColor;



print color("red"), "Danger, Will Robinson!\n", color("reset");

print "This is just normal text.\n";

print colored("<BLINK>Do you hurt yet?</BLINK>", "blink");
Or, you can use convenience functions from Term::ANSIColor:
use Term::ANSIColor qw(:constants);



print RED, "Danger, Will Robinson!\n", RESET;
Term::ANSIColor prepares escape sequences that some (but far from all) terminals will recognize. For example, if you normally launch a color-xterm, this recipe will work. If you normally use the normal xterm program, or have a vt100 in your kitchen, it won't.
There are two ways of using the module: either by calling the exported functions color($attribute) and colored($text, $attribute), or by using convenience functions like BOLD, BLUE, and RESET.
Attributes can be a combination of colors and controls. The colors are black, red, green, yellow, blue, magenta, on_black, on_red, on_green, on_yellow, on_blue, on_magenta, on_cyan, and on_white. (Apparently orange and purple don't matter.) The controls are clear, reset, bold, underline, underscore, blink, reverse, and concealed. clear and reset are synonyms, as are underline and underscore. reset restores the colors to the way they were when the program started, and concealed makes foreground and background colors the same.
You can combine attributes:
# rhyme for the deadly coral snake

print color("red on_black"),  "venom lack\n";

print color("red on_yellow"), "kill that fellow\n";



print color("green on_cyan blink"), "garish!\n";

print color("reset");
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading Single Characters from the Keyboard
Inhaltsvorschau
You want to read a single character from the keyboard. For instance, you've displayed a menu of one-character options, and you don't want to require users to press the Enter key to make their selection.
Use the CPAN module Term::ReadKey to put the terminal into cbreak mode, read characters from STDIN, and then put the terminal back into its normal mode:
use Term::ReadKey;



ReadMode 'cbreak';

$key = ReadKey(0);

ReadMode 'normal';
Term::ReadKey can put the terminal into many modes—cbreak is just one of them. cbreak mode makes each character available to your program as it is typed (see Example 15-1). It also echoes the characters to the screen; see Recipe 15.10 for an example of a mode that does not echo.
Example 15-1. sascii
  #!/usr/bin/perl -w

  # sascii - Show ASCII values for keypresses

  

  use Term::ReadKey;

  ReadMode('cbreak');

  print "Press keys to see their ASCII values.  Use Ctrl-C to quit.\n";

  

  while (1) {

      $char = ReadKey(0);

      last unless defined $char;

      printf(" Decimal: %d\tHex: %x\n", ord($char), ord($char));

  }

  

  ReadMode('normal');
Using cbreak mode doesn't prevent the terminal's device driver from interpreting end-of-file and flow-control characters. If you want to be able to read a real Ctrl-C (which normally sends a SIGINT to your process) or a Ctrl-D (which indicates end-of-file under Unix), you want to use raw mode.
An argument of 0 to ReadKey indicates that we want a normal read using getc. If no input is available, the program will pause until there is some. We can also pass -1 to indicate a non-blocking read, or a number greater than 0 to indicate the number of seconds to wait for input to become available; fractional seconds are allowed. Non-blocking reads and timed-out reads return either
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Ringing the Terminal Bell
Inhaltsvorschau
You want to sound an alarm on the user's terminal.
Print the "\a" character to sound a bell:
print "\aWake up!\n";
Or use the "vb" terminal capability to show a visual bell:
use Term::Cap;



$OSPEED = 9600;

eval {

    require POSIX;

    my $termios = POSIX::Termios->new( );

    $termios->getattr;

    $OSPEED = $termios->getospeed;

};



$terminal = Term::Cap->Tgetent({OSPEED=>$OSPEED});

$vb = "";

eval {

    $terminal->Trequire("vb");

    $vb = $terminal->Tputs('vb', 1);

};



print $vb;                                  # ring visual bell
The "\a" escape is the same as "\cG", "\007", and "\x07". They all correspond to the ASCII BEL character and cause an irritating ding. In a crowded terminal room at the end of the semester, this beeping caused by dozens of vi novices all trying to get out of insert mode at once can be maddening. The visual bell is a workaround to avoid irritation. Based upon the polite principle that terminals should be seen and not heard (at least, not in crowded rooms), some terminals let you briefly reverse the foreground and background colors to give a flash of light instead of an audible ring.
Not every terminal supports the visual bell, which is why we eval the code that finds it. If the terminal doesn't support it, Trequire will die without having changed the value of $vb from "". If the terminal does support it, the value of $vb will be set to the character sequence to flash the bell.
There's a better approach to the bell issue in graphical terminal systems like xterm. Many of these let you enable the visual bell from the enclosing application itself, allowing all programs that blindly output a chr(7) to become less noisy.
The section on "String Literals" in Chapter 2 of Programming Perl
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using POSIX termios
Inhaltsvorschau
You'd like to manipulate your terminal characteristics directly.
Use the POSIX termios interface.
Think of everything you can do with the stty command—you can set everything from special characters to flow control and carriage-return mapping. The standard POSIX module provides direct access to the low-level terminal interface to implement stty-like capabilities in your program.
Example 15-2 finds what your tty's erase and kill characters are (probably backspace and Ctrl-U). Then it sets them back to their original values out of antiquity, # and @, and has you type something. It restores them when done.
Example 15-2. demo POSIX termios
  #!/usr/bin/perl -w

  # demo POSIX termios

  

  use POSIX qw(:termios_h);

  

  $term = POSIX::Termios->new;

  $term->getattr(fileno(STDIN));

  

  $erase = $term->getcc(VERASE);

  $kill = $term->getcc(VKILL);

  printf "Erase is character %d, %s\n", $erase, uncontrol(chr($erase));

  printf "Kill is character %d, %s\n", $kill, uncontrol(chr($kill));

  

  $term->setcc(VERASE, ord('#'));

  $term->setcc(VKILL, ord('@'));

  $term->setattr(1, TCSANOW);

  

  print("erase is #, kill is @; type something: ");

  $line = <STDIN>;

  print "You typed: $line";

  

  $term->setcc(VERASE, $erase);

  $term->setcc(VKILL, $kill);

  $term->setattr(1, TCSANOW);

  

  sub uncontrol {

  

      local $_ = shift;

      s/([\200-\377])/sprintf("M-%c",ord($1) & 0177)/eg;

      s/([\0-\37\177])/sprintf("^%c",ord($1) ^ 0100)/eg;

      return $_;

  }
Here's a module called HotKey that implements a readkey function in pure Perl. It doesn't provide any benefit over Term::ReadKey, but it shows POSIX termios in action:
# HotKey.pm

package HotKey;



@ISA = qw(Exporter);

@EXPORT = qw(cbreak cooked readkey);



use strict;

use POSIX qw(:termios_h);

my ($term, $oterm, $echo, $noecho, $fd_stdin);



$fd_stdin = fileno(STDIN);

$term     = POSIX::Termios->new( );

$term->getattr($fd_stdin);

$oterm    = $term->getlflag( );



$echo     = ECHO | ECHOK | ICANON;

$noecho   = $oterm & ~$echo;



sub cbreak {

    $term->setlflag($noecho);  # ok, so i don't want echo either

    $term->setcc(VTIME, 1);

    $term->setattr($fd_stdin, TCSANOW);

}



sub cooked {

    $term->setlflag($oterm);

    $term->setcc(VTIME, 0);

    $term->setattr($fd_stdin, TCSANOW);

}



sub readkey {

    my $key = '';

    cbreak( );

    sysread(STDIN, $key, 1);

    cooked( );

    return $key;

}



END { cooked( ) }



1;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Checking for Waiting Input
Inhaltsvorschau
You want to know whether keyboard input is waiting without actually reading it.
Use the CPAN module Term::ReadKey, and try to read a key in non-blocking mode by passing it an argument of -1:
use Term::ReadKey;



ReadMode ('cbreak');



if (defined ($char = ReadKey(-1)) ) {

    # input was waiting and it was $char

} else {

    # no input was waiting

}



ReadMode ('normal');                  # restore normal tty settings
The -1 parameter to ReadKey indicates a non-blocking read of a character. If no character is available, ReadKey returns undef.
The documentation for the Term::ReadKey module from CPAN; Recipe 15.6
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading Passwords
Inhaltsvorschau
You want to read input from the keyboard without the keystrokes being echoed on the screen. For instance, you want to read passwords as passwd does, i.e., without displaying the user's password.
Use the CPAN module Term::ReadKey, set the input mode to noecho, and then use ReadLine:
use Term::ReadKey;



ReadMode('noecho');

$password = ReadLine(0);
Example 15-3 shows how to verify a user's password. If your system uses shadow passwords, only the superuser can get the encrypted form of the password with getpwuid. Everyone else just gets * as the password field of the database, which is useless for verifying passwords.
Example 15-3. checkuser
  #!/usr/bin/perl -w

  # checkuser - demonstrates reading and checking a user's password

  

  use Term::ReadKey;

  

  print "Enter your password: ";

  ReadMode 'noecho';

  $password = ReadLine 0;

  chomp $password;

  ReadMode 'normal';

  

  print "\n";

  

  ($username, $encrypted) = ( getpwuid $<)[0,1];

  

  if (crypt($password, $encrypted) ne $encrypted) {

      die "You are not $username\n";

  } else {

      print "Welcome, $username\n";

  }
The documentation for the Term::ReadKey module from CPAN; the crypt and getpwuid functions in Chapter 29 of Programming Perl and in perlfunc(1), which demonstrate using the stty(1) command; your system's crypt(3) and passwd(5) manpages (if you have them)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Editing Input
Inhaltsvorschau
You want a user to be able to edit a line before sending it to you for reading.
Use the standard Term::ReadLine library along with the Term::ReadLine::Gnu module from CPAN:
use Term::ReadLine;



$term = Term::ReadLine->new("APP DESCRIPTION");

$OUT = $term->OUT || *STDOUT;



$term->addhistory($fake_line);

$line = $term->readline($prompt);



print $OUT "Any program output\n";
The program in Example 15-4 acts as a crude shell. It reads a line and passes it to the shell to execute. The readline method reads a line from the terminal, with editing and history recall. It automatically adds the user's line to the history.
Example 15-4. vbsh
  #!/usr/bin/perl -w

  # vbsh -  very bad shell

  use strict;

  

  use Term::ReadLine;

  use POSIX qw(:sys_wait_h);

  

  my $term = Term::ReadLine->new("Simple Shell");

  my $OUT = $term->OUT( ) || *STDOUT;

  my $cmd;

  

  while (defined ($cmd = $term->readline('$ ') )) {

      my @output = `$cmd`;

      my $exit_value  = $? >> 8;

      my $signal_num  = $? & 127;

      my $dumped_core = $? & 128;

      printf $OUT "Program terminated with status %d from signal %d%s\n",

             $exit_value, $signal_num, 

             $dumped_core ? " (core dumped)" : "";

      print @output;

      $term->addhistory($cmd);

  }
If you want to seed the history with your own functions, use the addhistory method:
$term->addhistory($seed_line);
You can't seed with more than one line at a time. To remove a line from the history, use the remove_history method, which takes an index into the history list. 0 is the first (least recent) entry, 1 the second, and so on up to the most recent history lines.
$term->remove_history($line_number);
To get a list of history lines, use the GetHistory method, which returns a list of the lines:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Managing the Screen
Inhaltsvorschau
You want to control the screen layout or highlighting, detect when special keys are pressed, or present full-screen menus, but you don't want to think about what kind of display device the user has.
Use the Curses module from CPAN, which makes use of your native curses(3) library.
The curses library provides easy access to the full-screen display in an efficient and device-independent fashion. (By display, we mean any cursor-addressable monitor.) With Curses, you write high-level code to put data on the logical display, building it up character by character or string by string. When you want output to show up, call the refresh function. The library generates output consisting only of the changes on the virtual display since the last call to refresh. This is particularly appreciated on a slow connection.
The example program in Example 15-5, called rep, demonstrates this. Call it with arguments of the program to run, like any of these:
% rep ps aux

% rep netstat

% rep -2.5 lpq
The rep script will repeatedly call the listed command, printing its output to the screen, updating only what changed since the previous run. This is most effective when the changes between runs are small. It maintains the current date in reverse video at the bottom-right corner of your screen.
By default, rep waits 10 seconds before rerunning the command. You can change this delay period by calling it an optional number of seconds (which can be a decimal number) as shown in the previous example when calling lpq. You may also hit any key during the pause for it to run the command right then.
Example 15-5. rep
  #!/usr/bin/perl -w

  # rep - screen repeat command

  use strict;

  use Curses;

  

  my $timeout = 10;

  if (@ARGV && $ARGV[0] =~ /^-(\d+\.?\d*)$/) { 

      $timeout = $1; 

      shift; 

  } 

  

  die "usage: $0 [ -timeout ] cmd args\n" unless @ARGV;

      

  initscr( );          # start screen

  noecho( );           

  cbreak( ); 

  nodelay(1);         # so getch( ) is non-blocking

  

  $SIG{INT} = sub { done("Ouch!") };

  sub done { endwin( ); print "@_\n"; exit; }

  

  while (1) {

      while ((my $key = getch( )) ne ERR) {    # maybe multiple keys

          done("See ya") if $key eq 'q' 

      }

      my @data = `(@ARGV) 2>&1`;              # gather output+errors

      for (my $i = 0; $i < $LINES; $i++) {

          addstr($i, 0, $data[$i] || ' ' x $COLS);

      } 

  

      standout( );

      addstr($LINES-1, $COLS - 24, scalar localtime);

      standend( );

  

      move(0,0); 

      refresh( );                              # flush new output to display

  

      my ($in, $out) = ('', '');

      vec($in,fileno(STDIN),1) = 1;           # look for key on stdin 

      select($out = $in,undef,undef,$timeout);# wait up to this long

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Controlling Another Program with Expect
Inhaltsvorschau
You want to automate interaction with a full-screen program that expects to have a terminal behind STDIN and STDOUT.
Use the Expect module from CPAN:
use Expect;



$command = Expect->spawn("program to run")

    or die "Couldn't start program: $!\n";



# prevent the program's output from being shown on our STDOUT

$command->log_stdout(0);



# wait 10 seconds for "Password:" to appear

unless ($command->expect(10, "Password")) {

    # timed out

}



# wait 20 seconds for something that matches /[lL]ogin: ?/

unless ($command->expect(20, -re => '[lL]ogin: ?')) {

    # timed out

}



# wait forever for "invalid" to appear

unless ($command->expect(undef, "invalid")) {

    # error occurred; the program probably went away

}



# send "Hello, world" and a carriage return to the program

print $command "Hello, world\r";



# if the program will terminate by itself, finish up with

$command->soft_close( );



# if the program must be explicitly killed, finish up with

$command->hard_close( );
This module requires two other modules from CPAN: IO::Pty and IO::Stty. It sets up a pseudo-terminal to interact with programs that insist on talking to the terminal device driver. People often use this for talking to passwd to change passwords. telnet (Net::Telnet, described in Recipe 18.6, is probably more suitable and portable) and ftp are also programs that expect a real tty.
Start the program you want to run with Expect->spawn, passing a program name and arguments either in a single string or as a list. Expect starts the program and returns an object representing that program, or undef if the program couldn't be started.
To wait for the program to emit a particular string, use the expect method. Its first argument is the number of seconds to wait for the string, or undef to wait forever. To wait for a string, give that string as the second argument to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating Menus with Tk
Inhaltsvorschau
You want to create a window that has a menu bar at the top.
Use the Tk Menubutton and Frame widgets:
use Tk;



$main = MainWindow->new( );



# Create a horizontal space at the top of the window for the

# menu to live in.

$menubar = $main->Frame(-relief              => "raised",

                        -borderwidth         => 2)

                ->pack (-anchor              => "nw",

                        -fill                => "x");



# Create a button labeled "File" that brings up a menu

$file_menu = $menubar->Menubutton(-text      => "File",

                                  -underline => 1)

                     ->pack      (-side      => "left" );

# Create entries in the "File" menu

$file_menu->command(-label   => "Print",

                    -command => \&Print);
This is considerably easier if you use the -menuitems shortcut:
$file_menu = $menubar->Menubutton(-text             => "File",

                                 -underline         => 1,

                                 -menuitems         => [

              [ Button => "Print",-command          => \&Print ],

               [ Button => "Save",-command          => \&Save  ] ])

                           ->pack(-side             => "left");
Menus in applications can be viewed as four separate components working together: Frames, Menubuttons, Menus, and Menu Entries. The Frame is the horizontal bar at the top of the window that the menu resides in (the menubar). Inside the Frame are a set of Menubuttons, corresponding to Menus: File, Edit, Format, Buffers, and so on. When the user clicks on a Menubutton, the Menubutton brings up the corresponding Menu, a vertically arranged list of Menu Entries.
Options on a Menu are labels (Open, for example) or separators (horizontal lines dividing one set of entries from another in a single menu).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating Dialog Boxes with Tk
Inhaltsvorschau
You want to create a dialog box, i.e., a new top-level window with buttons to make the window go away. The dialog box might also have other items, such as labels and text entry widgets for creating a fill-out form. You could use such a dialog box to collect registration information, and you want it to go away when registration is sent or if the user chooses not to register.
For simple jobs, use the Tk::DialogBox widget:
use Tk::DialogBox;



$dialog = $main->DialogBox( -title   => "Register This Program",

                            -buttons => [ "Register", "Cancel" ] );



# add widgets to the dialog box with $dialog->Add( )



# later, when you need to display the dialog box

$button = $dialog->Show( );

if ($button eq "Register") {

    # ...

} elsif ($button eq "Cancel") {

    # ...

} else {

    # this shouldn't happen

}
A DialogBox has two parts: the bottom is a set of buttons, and the top has the widgets of your choosing. Showing a DialogBox pops it up and returns the button the user selected.
Example 15-6 contains a complete program demonstrating the DialogBox.
Example 15-6. tksample3
  #!/usr/bin/perl -w

  # tksample3 - demonstrate dialog boxes

  

  use Tk;

  use Tk::DialogBox;

  

  $main = MainWindow->new( );

  

  $dialog = $main->DialogBox( -title   => "Register",

                              -buttons => [ "Register", "Cancel" ],

                             );

  

  # the top part of the dialog box will let people enter their names,

  # with a Label as a prompt

  

  $dialog->add("Label", -text => "Name")->pack( );

  $entry = $dialog->add("Entry", -width => 35)->pack( );

  

  # we bring up the dialog box with a button

  $main->Button( -text    => "Click Here For Registration Form",

                 -command => \&register)    ->pack(-side => "left");

  $main->Button( -text    => "Quit",

                 -command => sub { exit } ) ->pack(-side => "left");

  

  MainLoop;

  

  #

  # register

  #

  # Called to pop up the registration dialog box

  #

  

  sub register {

      my $button;

      my $done = 0;

  

      do {    

          # show the dialog

          $button = $dialog->Show;

  

          # act based on what button they pushed

          if ($button eq "Register") {

              my $name = $entry->get;

  

              if (defined($name) && length($name)) {

                  print "Welcome to the fold, $name\n";

                  $done = 1;

              } else {

                  print "You didn't give me your name!\n";

              }

          } else {

              print "Sorry you decided not to register.\n";

              $done = 1;

          }

      } until $done;

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Responding to Tk Resize Events
Inhaltsvorschau
You've written a Tk program, but your widget layout goes awry when the user resizes their window.
You can prevent the user from resizing the window by intercepting the Configure event:
use Tk;



$main = MainWindow->new( );



$main->bind('<Configure>' => sub {

    $xe = $main->XEvent;

    $main->maxsize($xe->w, $xe->h);

    $main->minsize($xe->w, $xe->h);

});
Or you can use pack to control how each widget resizes and expands when the user resizes its container:
$widget->pack( -fill => "both", -expand => 1 );

$widget->pack( -fill => "x",    -expand => 1 );
By default, packed widgets resize if their container changes size—they don't scale themselves or their contents to the new size. This can lead to empty space between widgets, or cropped or cramped widgets if the user resizes the window.
One solution is to prevent resizing. We bind to the Configure event, which is sent when a widget's size or position changes, registering a callback to reset the window's size. This is how you'd ensure a pop-up error-message box couldn't be resized.
You often want to let the user resize the application's windows. You must then define how each widget will react. Do this through the arguments to the pack method: -fill controls the dimensions the widget will resize in, and -expand controls whether the widget's size will change to match available space. The -expand option takes a Boolean value, true or false. The -fill option takes a string indicating the dimensions the widget can claim space in: "x", "y", "both", or "none".
The solution requires both options. Without -fill, -expand won't claim space to grow into. Without -expand, -fill will claim empty space but won't expand in it.
Different parts of your application will behave differently. The main area of a web browser, for example, should probably change size in both dimensions when the window is resized. You'd pack the widget like this:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Removing the DOS Shell Window with Windows Perl/Tk
Inhaltsvorschau
You have written a Perl program for the Windows port of Perl and Tk, but you get a DOS shell window every time you start your program.
Add this to the start of your program:
BEGIN {

  if ($^O eq 'MSWin32') {

    require Win32::Console;

    Win32::Console::Free( );

  }

}
The Win32::Console module lets you control the terminal window that launched your program. All you need to do is close that window (or Free it in, in the peculiar parlance of the Windows API) and voilà—no pesky DOS shell window.
The documentation for the Win32::Console module, which is included with distributions of Perl destined for Microsoft systems
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Graphing Data
Inhaltsvorschau
You have numerical data that you want to represent as a bar, pie, or line chart.
Use the GD::Graph::* modules from CPAN:
use GD::Graph::lines;                      # bars, lines, points, pie

$chart = GD::Graph::lines->new(480,320);

$chart->set(x_label => $X_AXIS_LABEL,      # no axes for pie chart

            y_label => $Y_AXIS_LABEL,

            title   => $GRAPH_TITLE,

            # ... more options possible

           );

$plot = $chart->plot($DATA_REF) or die $chart->error;

# do something with $plot->png which is the image in PNG form
Here is a sample data structure (every row must have the same number of values):
$DATA_REF = [

              [ 1990, 1992, 1993, 1995, 2002 ],      # X values

              [ 10,   15,   18,   20,   25   ],      # first dataset

              [ 9,    undef,17,   undef,12   ],      # second dataset

              # ...

            ];
The GD::Graph module requires you to have the GD module installed, which itself depends on a C library available from http://www.boutell.com/gd/. Early versions of this library created GIF images, but since the owners of the GIF patent are cracking down, the library now emits PNG and JPEG images:
$png_data = $plot->png;

$jpg_data = $plot->jpeg;
The documentation for GD::Graph lists a large number of options you can fine-tune (colors, fonts, placement), but the most important ones are labels and the image title. There are no axes to label in pie charts, so the x_label and y_label options are not available. By default, pie charts are drawn with a pseudo-3D look, which you can disable by setting the 3d option to a false value.
Recipe 15.23 contains a program that (crudely) extracts the day of the week on which each mail message in a mailbox was sent, and then graphs that data.
Documentation for the GD and GD::Graph modules;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Thumbnailing Images
Inhaltsvorschau
You have a large image and you want to create a smaller version of that image, the thumbnail. For example, on a web site you might use a thumbnail as a preview to let readers see the basic image before they decide whether to download the larger original.
Use the Image::Magick module from CPAN:
use Image::Magick;



$image = Image::Magick->new( );

$image->Read($ORIGINAL_FILENAME);

$image->Resize(geometry => '120x90');

$image->Write($THUMBNAIL_FILENAME);
The Image::Magick module is a frontend to the ImageMagick suite, available from http://imagemagick.sourceforge.net. It handles many complex and powerful image manipulations, but here we're only concerned with the very simple resizing.
The Resize method's geometry parameter indicates the new geometry (width x height). You can also specify percentages: '75%' to resize each axis proportionally to 3/4 of its original size, or '10%x30%' to resize the X axis to 10% of its original value and the Y axis to 30%.
You can also specify a filter to use and how much to blur or sharpen the image with that filter:
$image->Resize(geometry => '120x90',

               filter   => 'Gaussian',

               blur     => 2);
A blur value greater than 1 indicates blurring; a value less than 1 indicates sharpening. The valid filters are Point, Box, Triangle, Hermite, Hanning, Hamming, Blackman, Gaussian, Quadratic, Cubic, Catrom, Mitchell, Lanczos, Bessel, and Sinc.
The documentation for the Image::Magick modules; Perl Graphics Programming
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Adding Text to an Image
Inhaltsvorschau
You want to write text onto an existing image. For example, you want to add a small copyright message to all photos on your web site.
Use the GD module from CPAN:
use GD;

$image = GD::Image->new($FILENAME);

$blue = $image->colorAllocate(0,0,255);

$image->string(gdTinyFont, 10, 10, "Copyright Me, 2037", $blue);

# write $image->png( ) to file
The GD module can load only certain file formats; precisely which depends on the C libraries available when the underlying C library of GD was built. At the time of this writing, GD could read and write PNG, JPEG, XBM, XPM, and WBMP (Windows Bitmap), as well as its own GD2 and GD formats.
The arguments to the string method are: the font to use, the x and y coordinates to draw at, the string to draw, and the color to draw the text in.
GD comes with five fonts: gdTinyFont, gdSmallFont, gdMediumBoldFont, gdLargeFont, and gdGiantFont. If your GD was compiled to handle TrueType fonts, you can write with a TrueType font using:
$image->stringFT($color, $font, $point_size, $angle, $x, $y, $string);
Here, $font is the absolute pathname of the .ttf file containing the TrueType font. The $point_size and $angle parameters indicate the size (in points; fractions are acceptable) and rotation from horizontal (in radians). For example:
$image->stringFT($blue, '/Users/gnat/fonts/arial.ttf', 8, 0,

                 10, 20, 'Copyright Me Me Me');
The documentation for the GD module; Perl Graphics Programming
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: Small termcap Program
Inhaltsvorschau
This program clears your screen and scribbles all over it until you interrupt it. It shows how to use Term::Cap to clear the screen, move the cursor, and write anywhere on the screen. It also uses Recipe 16.6.
The program text is shown in Example 15-8.
Example 15-8. tcapdemo
  #!/usr/bin/perl -w

  # tcapdemo - show off direct cursor placement

  

  use POSIX;

  use Term::Cap;

      

  init( );                     # Initialize Term::Cap.

  zip( );                      # Bounce lines around the screen.

  finish( );                   # Clean up afterward.

  exit( );

  

  # Two convenience functions.  clear_screen is obvious, and

  # clear_end clears to the end of the screen.

  sub clear_screen { $tcap->Tputs('cl', 1, *STDOUT) } 

  sub clear_end    { $tcap->Tputs('cd', 1, *STDOUT) } 

  

  # Move the cursor to a particular location.

  sub gotoxy {

      my($x, $y) = @_;

      $tcap->Tgoto('cm', $x, $y, *STDOUT);

  } 

  

  # Get the terminal speed through the POSIX module and use that

  # to initialize Term::Cap.

  sub init { 

      $| = 1;

      $delay = (shift( ) || 0) * 0.005;

      my $termios = POSIX::Termios->new( );

      $termios->getattr;

      my $ospeed = $termios->getospeed;

      $tcap = Term::Cap->Tgetent ({ TERM => undef, OSPEED => $ospeed });

      $tcap->Trequire(qw(cl cm cd));

  }

  

  # Bounce lines around the screen until the user interrupts with

  # Ctrl-C.

  sub zip { 

      clear_screen( );

      ($maxrow, $maxcol) = ($tcap->{_li} - 1, $tcap->{_co} - 1);

  

      @chars = qw(* - / | \ _ );

      sub circle { push(@chars, shift @chars); }

  

      $interrupted = 0;

      $SIG{INT} = sub { ++$interrupted };

  

      $col = $row = 0;

      ($row_sign, $col_sign) = (1,1);

  

      do {

          gotoxy($col, $row);

          print $chars[0];

          select(undef, undef, undef, $delay);

  

          $row += $row_sign;

          $col += $col_sign;

  

          if    ($row =  = $maxrow) { $row_sign = -1; circle; } 

          elsif ($row =  = 0 )      { $row_sign = +1; circle; }

  

          if    ($col =  = $maxcol) { $col_sign = -1; circle; } 

          elsif ($col =  = 0 )      { $col_sign = +1; circle; }

      

      } until $interrupted;

  

  }

  

  # Clean up the screen.

  sub finish { 

      gotoxy(0, $maxrow);

      clear_end( );

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: tkshufflepod
Inhaltsvorschau
This short program uses Tk to list the =head1 sections in the file using the Listbox widget, and it lets you drag the sections around to reorder them. When you're done, press "s" or "q" to save or quit. You can even double-click a section to view it with the Pod widget. It writes the section text to a temporary file in /tmp and removes the file when the Pod widget is destroyed.
Call it with the name of the Pod file to view:
% tkshufflepod chap15.pod
We used this a lot when we wrote this book.
The program text is shown in Example 15-9.
Example 15-9. tkshufflepod
  #!/usr/bin/perl -w

  # tkshufflepod - reorder =head1 sections in a pod file

  

  use Tk;

  use Tk::Pod;

  use strict;

  

  # declare variables

  

  my $podfile;     # name of the file to open

  my $m;             # main window

  my $l;             # listbox

  my ($up, $down);   # positions to move

  my @sections;      # list of pod sections

  my $all_pod;       # text of pod file (used when reading)

  

  # read the pod file into memory, and split it into sections.

  

  $podfile = shift || "-";

  

  undef $/;

  open(F, " < $podfile")

    or die "Can't open $podfile : $!\n";

  $all_pod = <F>;

  close(F);

  @sections = split(/(?=  =head1)/, $all_pod);

  

  # turn @sections into an array of anonymous arrays.  The first element

  # in each of these arrays is the original text of the message, while

  # the second element is the text following =head1 (the section title).

  

  foreach (@sections) {

      /(.*)/;

      $_ = [ $_, $1 ];

  }

  

  # fire up Tk and display the list of sections.

  

  $m = MainWindow->new( );

  $l = $m->Listbox('-width' => 60)->pack('-expand' => 1, '-fill' => 'both');

  

  foreach my $section (@sections) {

      $l->insert("end", $section->[1]);

  }

  

  # permit dragging by binding to the Listbox widget.

  $l->bind( '<Any-Button>'     => \&down );

  $l->bind( '<Any-ButtonRelease>' => \&up );

  

  # permit viewing by binding double-click

  $l->bind( '<Double-Button>'     => \&view );

  

  # 'q' quits and 's' saves

  $m->bind( '<q>'             => sub { exit } );

  $m->bind( '<s>'            => \&save );

  

  MainLoop;

  

  # down(widget): called when the user clicks on an item in the Listbox.

  

  sub down {

      my $self = shift;

      $down = $self->curselection;;

  }

  

  # up(widget): called when the user releases the mouse button in the

  # Listbox.

  

  sub up {

      my $self = shift;

      my $elt;

  

      $up = $self->curselection;;

  

      return if $down =  = $up;

  

      # change selection list

      $elt = $sections[$down];

      splice(@sections, $down, 1);

      splice(@sections, $up, 0, $elt);

  

      $self->delete($down);

      $self->insert($up, $sections[$up]->[1]);

  }

  

  # save(widget): called to save the list of sections.

  

  sub save {

      my $self = shift;

  

      open(F, "> $podfile")

        or die "Can't open $podfile for writing: $!";

      print F map { $_->[0] } @sections;

      close F;

  

      exit;

  }

  

  # view(widget): called to display the widget.  Uses the Pod widget.

  

  sub view {

      my $self = shift;

      my $temporary = "/tmp/$$-section.pod";

      my $popup;

  

      open(F, "> $temporary")

        or warn ("Can't open $temporary : $!\n"), return;

      print F $sections[$down]->[0];

      close(F);

      $popup = $m->Pod('-file' => $temporary);

  

      $popup->bind('<Destroy>' => sub { unlink $temporary } );

  

  

  

  

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: graphbox
Inhaltsvorschau
The graphbox program shown in Example 15-10 generates a bar graph of how many email messages were sent on each day of the week, using the GD::Graph::Bars module (see Recipe 15.18). It extracts the day of the week from the Date: headers and then plots the results.
Example 15-10. graphbox
#!/usr/bin/perl -w

# graphbox - graph number of messages by day of week they were sent



use GD::Graph::bars;

use Getopt::Std;

use strict;



my %count;  # running total of messages for each day of the week

my $chart;  # the GD::Graph::bars object

my $plot;   # the GD object containing the actual graph



my @DAYS = qw(Mon Tue Wed Thu Fri Sat Sun);

my $day_re = join("|", @DAYS);

$day_re = qr/$day_re/;



# process options



my %Opt;

getopts('ho:', \%Opt);

if ($Opt{h} or !$Opt{o}) {

  die "Usage:\n\t$0 -o outfile.png < mailbox\n";

}



# extract dates from Date headers (guessing!)



while (<>) {

  if (/^Date: .*($day_re)/) {

    $count{$1}++;

  }

}



# build graph



$chart = GD::Graph::bars->new(480,320);

$chart->set(x_label => "Day",

            y_label => "Messages",

            title   => "Mail Activity");

$plot = $chart->plot([ [ @DAYS ],

                       [ @count{@DAYS} ],

                     ]);



# save it



open(F, "> $Opt{o}")

  or die "Can't open $Opt{o} for writing: $!\n";

print F $plot->png;

close F;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 16: Process Management and Communication
Inhaltsvorschau
It is quite a three-pipe problem, and I beg that you won't speak to me for fifty minutes.
—Sherlock Holmes, The Red-Headed League
Perl may be many things to many people, but to most of us it is the glue that connects diverse components. This chapter is about launching commands and connecting separate processes together. It's about managing their creation, communication, and ultimate demise. It's about systems programming.
When it comes to systems programming, Perl, as usual, makes easy things easy and hard things possible. If you want to use it as you would the shell, Perl is happy to assist you. If you want to roll up your sleeves for low-level hacking like a hardcore C programmer, you can do that, too.
Because Perl lets you get so close to the system, portability issues can sneak in. This chapter is the most Unix-centric chapter of the book. It will be tremendously useful to those on Unix systems, but of limited use to others. (If you're not on Unix, consult the perlport(3) manpage that came with Perl to see which of the techniques we describe are available on other operating systems or emulated by Perl.) We deal with features that aren't as universal as strings and numbers and basic arithmetic. Most basic operations work more or less the same everywhere. But if you're not using some kind of Unix or other POSIX conformant system, most of the interesting features in this chapter may work differently for you—or not at all. Check the documentation that came with your Perl port if you aren't sure.
You might even be pleasantly surprised. Windows users, for example, are often astonished to learn that Perl's fork function, long unique to Unix, is supported on their platform. See perlfork(1).
In this chapter, we cover the proper care and feeding of your own child processes. Sometimes this means launching a standalone command and letting it have its own way with the world (using system). Other times it means keeping a tight rein on your child, feeding it carefully filtered input or taking hold of its output stream (backticks and piped
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Perl may be many things to many people, but to most of us it is the glue that connects diverse components. This chapter is about launching commands and connecting separate processes together. It's about managing their creation, communication, and ultimate demise. It's about systems programming.
When it comes to systems programming, Perl, as usual, makes easy things easy and hard things possible. If you want to use it as you would the shell, Perl is happy to assist you. If you want to roll up your sleeves for low-level hacking like a hardcore C programmer, you can do that, too.
Because Perl lets you get so close to the system, portability issues can sneak in. This chapter is the most Unix-centric chapter of the book. It will be tremendously useful to those on Unix systems, but of limited use to others. (If you're not on Unix, consult the perlport(3) manpage that came with Perl to see which of the techniques we describe are available on other operating systems or emulated by Perl.) We deal with features that aren't as universal as strings and numbers and basic arithmetic. Most basic operations work more or less the same everywhere. But if you're not using some kind of Unix or other POSIX conformant system, most of the interesting features in this chapter may work differently for you—or not at all. Check the documentation that came with your Perl port if you aren't sure.
You might even be pleasantly surprised. Windows users, for example, are often astonished to learn that Perl's fork function, long unique to Unix, is supported on their platform. See perlfork(1).
In this chapter, we cover the proper care and feeding of your own child processes. Sometimes this means launching a standalone command and letting it have its own way with the world (using system). Other times it means keeping a tight rein on your child, feeding it carefully filtered input or taking hold of its output stream (backticks and piped opens). Without even starting a new process, you can use
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Gathering Output from a Program
Inhaltsvorschau
You want to run a program and collect its output into a variable.
Either use backticks:
$output = `program args`;   # collect output into one multiline string

@output = `program args`;   # collect output into array, one line per element
or use Recipe 16.4:
open(my $fh, "-|", "program", @args)

    or die "Can't run program: $!\n";

while (<$fh>) {

    $output .= $_;

}

close $fh;
The backticks are a convenient way to run other programs and gather their output. The backticks do not return until the called program exits. Perl goes to some trouble behind the scenes to collect the output, so it is inefficient to use the backticks and ignore their return value:
`fsck -y /dev/rsd1a`;       # BAD AND SCARY
The backtick operator calls the shell to run the command. This makes it unsafe when used in a program with special privileges, but lets you use shell wildcards in the command:
@files = `ls -1 /music/*.mp3`;
If you want to read the output of a wildcarded command line as it's generated (and don't mind the potential security problems), use this form of open:
open(README, "ls -l /music/*.mp3 |") or die "Can't run program: $!\n";

while(<README>) {

    # the latest line is in $_

}

close(README);
In versions of Perl before 5.8, this two-argument form of open was the only one available to you. In those versions of Perl, you wrote the solution as:
open(FH, "program @args |")

  or die "Can't run program: $!\n";
Here's a low-level workaround, using pipe (to create two connected filehandles), fork (to split off a new process), and exec (to replace the new process with the program to read from):
use POSIX qw(:sys_wait_h);



my  ($readme, $writeme);

pipe $readme, $writeme;

if ($pid = fork) {

    # parent

    $SIG{CHLD} = sub { 1 while ( waitpid(-1, WNOHANG)) > 0 };

    close $writeme;

} else {

    die "cannot fork: $!" unless defined $pid;

    # child

    open(STDOUT, ">&=", $writeme)   or die "Couldn't redirect STDOUT: $!";

    close $readme;

    exec($program, $arg1, $arg2)    or die "Couldn't run $program : $!\n";

}



while (<$readme>) {

    $string .= $_;

    # or  push(@strings, $_);

}

close($readme);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Running Another Program
Inhaltsvorschau
From one program you want to run another, pause until it is done, and then continue with the original program. The other program should have the same STDIN and STDOUT as you have.
Call system with a string to have the shell interpret the string as a command line:
$status = system("vi $myfile");
If you don't want the shell involved, pass system a list:
$status = system("vi", $myfile);
The system function is the simplest and most generic way to run another program in Perl. It doesn't gather the program's STDOUT like backticks or open. Instead, its return value is (essentially) that program's exit status. While the new program is running, your main program is suspended, so the new program can read from your STDIN and write to your STDOUT so users can interact with it.
Like open, exec, and backticks, system uses the shell to start the program whenever it's called with one argument. This is convenient when you want to do redirection or other tricks:
system("cmd1 args | cmd2 | cmd3 >outfile");

system("cmd args < infile >outfile 2>errfile");
To avoid the shell, call system with a list of arguments:
$status = system($program, $arg1, $arg);

die "$program exited funny: $?" unless $status =  = 0;
The returned status value is not just the exit value: it includes the signal number (if any) that the process died from. This is the same value that wait sets $? to. See Recipe 16.19 to learn how to decode this value.
The system function ignores SIGINT and SIGQUIT while child processes are running. That way those signals will kill only the child process. If you want your main program to die as well, check the return value of system or the value of the $? variable.
if (($signo = system(@arglist)) &= 127) { 

    die "program killed by signal $signo\n";

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Replacing the Current Program with a Different One
Inhaltsvorschau
You want to replace the running program with another, as when checking parameters and setting up the initial environment before running another program.
Use the built-in exec function. If exec is called with a single argument containing metacharacters, the shell will be used to run the program:
exec("archive *.data")

    or die "Couldn't replace myself with archive: $!\n";
If you pass exec more than one argument, the shell will not be used:
exec("archive", "accounting.data")

    or die "Couldn't replace myself with archive: $!\n";
If called with a single argument containing no shell metacharacters, the argument will be split on whitespace and then interpreted as though the resulting list had been passed to exec:
exec("archive accounting.data")

    or die "Couldn't replace myself with archive: $!\n";
The exec function in Perl is a direct interface to the execlp(2) syscall, which replaces the current program with another, leaving the process intact. The program that calls exec gets wiped clean, and its place in the operating system's process table is taken by the program specified in the arguments to exec. As a result, the new program has the same process ID ($$) as the original program. If the specified program couldn't be run, exec returns a false value and the original program continues. Be sure to check for this.
As with system (see Recipe 16.2), an indirect object identifies the program to be run:
exec { '/usr/local/bin/lwp-request' } 'HEAD', $url;
The first real argument ('HEAD' here) is what the new program will be told it is. Some programs use this to control their behavior, and others use it for logging. The main use of this, however, is that exec called with an indirect object will never use the shell to run the program.
If you
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading or Writing to Another Program
Inhaltsvorschau
You want to run another program and either read its output or supply the program with input.
Use open with a pipe symbol at the beginning or end. To read from a program, put the pipe symbol at the end:
$pid = open $readme, "-|", "program", "arguments"

                                            or die "Couldn't fork: $!\n";

while (<$readme>) {

    # ...

}

close $readme                               or die "Couldn't close: $!\n";
To write to the program, put the pipe at the beginning:
$pid = open $writeme, "|-", "program", "arguments"

                                            or die "Couldn't fork: $!\n";

print $writeme "data\n";

close $writeme                              or die "Couldn't close: $!\n";
In the case of reading, this is similar to using backticks, except you have a process ID and a filehandle, and the shell is never involved. If you want Perl to use the shell when it sees shell-special characters in its argument—for example, to let the shell do filename wildcard expansion and I/O redirection—then you must use the two-argument form of open:
open($writeme, "| program args");

open($readme, "program args |");
However, sometimes this isn't desirable. Piped opens that include unchecked user data would be unsafe while running in taint mode or in untrustworthy situations.
Notice how we specifically call close on the filehandle. When you use open to connect a filehandle to a child process, Perl remembers this and automatically waits for the child when you close the filehandle. If the child hasn't exited by then, Perl waits until it does. This can be a very, very long wait if your child doesn't exit:
$pid = open $f, "-|", "sleep", "100000";  # child goes to sleep

close $f;                                 # and the parent goes to lala land
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Filtering Your Own Output
Inhaltsvorschau
You want to postprocess your program's output without writing a separate program to do so.
Use the forking form of open to attach a filter to yourself. For example, this will restrict your program to a hundred lines of output:
head(100);

while (<>) {

    print;

} 



sub head {

    my $lines = shift || 20;

    return if $pid = open STDOUT, "|-";

    die "cannot fork: $!" unless defined $pid;

    while (<STDIN>) {

        print;

        last unless --$lines;

    } 

    exit;

}
It's easy to add an output filter. Just use the forking open on your own STDOUT, and let the child filter STDIN to STDOUT, performing whatever alterations you care about. Notice that we install the output filter before we generate the output. This makes sense—you can't filter your output if it has already left your program. Any such filters should be applied in LIFO order—the last one inserted is the first one run.
Here's an example that uses two output filters. One numbers lines; the other quotes the lines like a mail reply. When run on /etc/motd, you get something like:

               1:  > Welcome to Linux, version 2.0.33 on a i686

               2:  >

               3:  >    "The software required `Windows 95 or better'

               4:  >     so I installed Linux.

            
If you reversed the order of the two filters, you'd get:

               > 1: Welcome to Linux, Kernel version 2.0.33 on a i686

               > 2

               > 3:     "The software required `Windows 95 or better'

               > 4:      so I installed Linux.

            
The program is in Example 16-1.
Example 16-1. qnumcat
  #!/usr/bin/perl

  # qnumcat - demo additive output filters

  

  number( );                   # push number filter on STDOUT

  quote( );                    # push quote filter on STDOUT

  

  while (<>) {                # act like /bin/cat

      print;

  } 

  close STDOUT;               # tell kids we're done--politely

  exit;

  

  sub number {

      my $pid;

      return if $pid = open STDOUT, "|-";

      die "cannot fork: $!" unless defined $pid;

      while (<STDIN>) { printf "%d: %s", $., $_ } 

      exit;

  } 

  

  sub quote {

      my $pid;

      return if $pid = open STDOUT, "|-";

      die "cannot fork: $!" unless defined $pid;

      while (<STDIN>) { print "> $_" } 

      exit;

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Preprocessing Input
Inhaltsvorschau
You'd like your programs to work on files with funny formats, such as compressed files or remote web documents specified with a URL, but your program only knows how to access regular text in local files.
Take advantage of Perl's easy pipe handling by changing your input files' names to pipes before opening them.
To autoprocess gzipped or compressed files by decompressing them with gzip, use:
@ARGV = map { /\.(gz|Z)$/ ? "gzip -dc $_ |" : $_  } @ARGV;

while (<>) {

    # .......

}
To fetch URLs before processing them, use the GET program from LWP (see Chapter 20):
@ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV;

while (<>) {

    # .......

}
You might prefer to fetch just the text, of course, not the HTML. That just means using a different command, perhaps lynx -dump.
As shown in Recipe 16.1, Perl's built-in open function is magical: you don't have to do anything special to get Perl to open a pipe instead of a file. (That's why it's sometimes called magic open and, when applied to implicit ARGV processing, magic ARGV.) If it looks like a pipe, Perl will open it like a pipe. We take advantage of this by rewriting certain filenames to include a decompression or other preprocessing stage. For example, the file "09tails.gz" becomes "gzcat -dc 09tails.gz|".
This technique has further applications. Suppose you wanted to read /etc/passwd if the machine isn't using NIS, and the output of ypcat passwd if it is. You'd use the output of the domainname program to decide if you're running NIS, and then set the filename to open to be either "</etc/passwd" or "ypcat passwd|":
$pwdinfo = `domainname` =~ /^(\(none\))?$/

                ? '</etc/passwd'

                : 'ypcat  passwd |';



open(PWD, $pwdinfo)                 or die "can't open $pwdinfo: $!";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading STDERR from a Program
Inhaltsvorschau
You want to run a program as you would with system, backticks, or open, but you don't want its STDERR to be sent to your STDERR. You would like to be able to either ignore or read the STDERR.
Use the shell's numeric redirection and duplication syntax for file descriptors. (We don't check the return value from open here in order to make the examples easier to read, but you should always check it in your programs!)
To capture a command's STDERR and STDOUT together:
$output = `cmd 2>&1`;                          # with backticks

# or

$pid = open(PH, "cmd 2>&1 |");                 # with an open pipe

while (<PH>) { }                               # plus a read
To capture a command's STDOUT and discard its STDERR:
$output = `cmd 2>/dev/null`;                   # with backticks

# or

$pid = open(PH, "cmd 2>/dev/null |");          # with an open pipe

while (<PH>) { }                               # plus a read
To capture a command's STDERR and discard its STDOUT:
$output = `cmd 2>&1 1>/dev/null`;              # with backticks

# or

$pid = open(PH, "cmd 2>&1 1>/dev/null |");     # with an open pipe

while (<PH>) { }                               # plus a read
To exchange a command's STDOUT and STDERR, i.e., capture the STDERR but have its STDOUT come out on our old STDERR:
$output = `cmd 3>&1 1>&2 2>&3 3>&-`;           # with backticks

# or

$pid = open(PH, "cmd 3>&1 1>&2 2>&3 3>&-|");   # with an open pipe

while (<PH>) { }                               # plus a read
To read both a command's STDOUT and its STDERR separately, it's easiest and safest to redirect them separately to files, and then read from those files when the program is done:
system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");
When you launch a command with backticks, a piped
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Controlling Input and Output of Another Program
Inhaltsvorschau
You want to both write to and read from another program. The open function lets you do one or the other, but not both.
Use the standard IPC::Open2 module:
use IPC::Open2;



$pid = open2(*README, *WRITEME, $program);

print WRITEME "here's your input\n";

$output = <README>;

close(WRITEME);

close(README);

waitpid($pid, 0);
Wanting simultaneous read and write access to another program is very common, but surprisingly perilous. That's one reason the built-in open doesn't permit:
open(my $double_handle, "| program args |")     # WRONG
The big problem here is buffering. Because you can't force the other program to unbuffer its output, you can't guarantee that your reads won't block. If you block trying to read at the same time the other process blocks waiting for you to send something, you've achieved the unholy state of deadlock. There you'll both stay, wedged, until someone kills your process or the machine reboots.
If you control the other process's buffering because you wrote the other program and know how it works, then IPC::Open2 may be the module for you. If you pass undefined scalar values as the first two arguments, open2 creates new filehandles:
use IPC::Open2;



$pid = open2(my $reader, my $writer, $program);
Alternatively, you can pass in arguments that look like "<&OTHERFILEHANDLE" or ">&OTHERFILEHANDLE", which specify existing filehandles for the child process to read from or write to. These filehandles don't have to be controlled by your program—they may be connected to other programs, files, or sockets.
You can specify the program either as a list (where the first element is the program name and the remaining elements are arguments to the program) or as a single string (which is passed to the shell as a command to start the program). If you also want control over the process's standard error, use the IPC::Open3 module and see the next recipe.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Controlling the Input, Output, and Error of Another Program
Inhaltsvorschau
You want full control over a command's input, output, and error streams.
Carefully use the standard IPC::Open3 module, possibly in conjunction with the standard IO::Select module.
If you're interested in only one of the program's STDIN, STDOUT, or STDERR, the task is simple. When you want to manage two or more of these, however, it abruptly stops being simple. Multiplexing multiple I/O streams is never a pretty picture. Here's an easy workaround:
@all = `($cmd | sed -e 's/^/stdout: /' ) 2>&1`;

for (@all) { push @{ s/stdout: // ? \@outlines : \@errlines }, $_ }

print "STDOUT:\n", @outlines, "\n";

print "STDERR:\n", @errlines, "\n";
If you don't have sed on your system, you'll find that for simple cases like this, perl -pe works just as well as sed -e.
However, that's not really simultaneous processing. All we're doing is marking STDOUT lines with "stdout:" and then stripping them back out once we've read all the STDOUT and STDERR the program produced.
You can use the standard IPC::Open3 module for this. Mysteriously, the argument order is different for IPC::Open3 than for IPC::Open2.
open3($write_me, $read_me, $errors, "program to run");
Using this has even more potential for chaos than using open2. If you're reading the program's STDERR as it is trying to write more than one buffer's worth to its STDOUT, the program will block on the write because its buffers are full, and you will block on the read because there's nothing available.
You can avoid this deadlock by mimicking open3 with fork, open, and exec; making all filehandles unbuffered; and using sysread, syswrite, and select to decide which readable filehandle to read a byte from. This makes your program slower and bulkier, though, and it doesn't solve the classic open2 deadlock where each program is expecting the other to say something.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Communicating Between Related Processes
Inhaltsvorschau
You have two related processes that need to communicate, and you need better control than you can get from open, system, and backticks.
Use pipe and then fork:
my ($reader, $writer);

pipe $reader, $writer;

if (fork) {

    # run parent code, either reading or writing, not both

} else {

    # run child code, either reading or writing, not both

}
Or use a special forking form of open:
if ($pid = (open $child, "|-")) {

        # run parent code, writing to child

} else {

    die "cannot fork: $!" unless defined $pid;

    # otherwise run child code here, reading from parent

}
Or, going the other way:
if ($pid = open ($child, "-|")) {

    # run parent code, reading from child

} else {

    die "cannot fork: $!" unless defined $pid;

    # otherwise run child code here, writing to parent

}
Pipes are simply two connected filehandles, where data written to one filehandle can be read by the other. The pipe function creates two filehandles linked in this way, one writable and one readable. Even though you can't take two already existing filehandles and link them, pipe can be used for communication between processes. One process creates a pair of filehandles with the pipe functions, then forks off a child, resulting in two distinct processes both running in the same program, each with a copy of the connected filehandles. As with open, if pipe is passed undefined scalars instead of filehandles, it creates filehandles in those scalars.
It doesn't matter which process is the reader and which is the writer, so long as one of them takes one role and its peer process takes the other. You can only have one-way communication. (But read on.)
We'll pull in the IO::Handle module so we can call its autoflush method. (You could instead play the select games described in Chapter 7, if you prefer a lightweight solution.) If we didn't, our single line of output would get lodged in the pipe and not make it through to the other side until we closed that handle.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making a Process Look Like a File with Named Pipes
Inhaltsvorschau
You want a process to intercept all access to a file. For instance, you want to make your ~/.plan file a program that returns a random quote.
Use named pipes. First create one, probably from your shell:
% mkfifo /path/to/named.pipe
Here's a reader for it:
open($fifo, "<", "/path/to/named.pipe")      or die $!;

while (<$fifo>) {

    print "Got: $_";

}

close $fifo;
Here's a writer for it:
open($fifo, ">", "/path/to/named.pipe")      or die $!;

print $fifo "Smoke this.\n";

close $fifo;
A named pipe, or FIFO as they are also known, is a special file that acts as a buffer to connect processes on the same machine. Ordinary pipes also allow processes to communicate, but those processes must have inherited the filehandles from their parents. To use a named pipe, a process need know only the named pipe's filename. In most cases, processes don't even need to be aware that they're reading from a FIFO.
Named pipes can be read from and written to just as though they were ordinary files (unlike Unix-domain sockets as discussed in Chapter 17). Data written into the FIFO is buffered up by the operating system, then read back in the order it was written in. Because a FIFO acts as a buffer to connect processes, opening one for reading will block until another process opens it for writing, and vice versa. If you open for read and write using the +< mode to open, you won't block (on most systems), because your process could be both reader and writer.
Let's examine how to use a named pipe so people will get a different file each time they finger you. To create a named pipe, use mkfifo or mknod to create a named pipe called .plan in your home directory:
% mkfifo ~/.plan                    # isn't this everywhere yet?

% mknod  ~/.plan p                  # in case you don't have mkfifo
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sharing Variables in Different Processes
Inhaltsvorschau
You want to share variables across forks or between unrelated processes.
Use SysV IPC if your operating system supports it.
While SysV IPC (shared memory, semaphores, etc.) isn't as widely used as pipes, named pipes, and sockets for interprocess communication, it still has some interesting properties. Normally, however, you can't expect to use shared memory via shmget or the mmap(2) syscall to share a variable among several processes. That's because Perl would reallocate your string when you weren't wanting it to.
The CPAN module IPC::Shareable takes care of that. Using a clever tie module, SysV shared memory, and the Storable module from CPAN allows data structures of arbitrary complexity to be shared among cooperating processes on the same machine. These processes don't even have to be related to each other.
Example 16-11 is a simple demonstration of the module.
Example 16-11. sharetest
  #!/usr/bin/perl 

  # sharetest - test shared variables across forks

  use IPC::Shareable;

  

  $handle = tie $buffer, 'IPC::Shareable', undef, { destroy => 1 };

  $SIG{INT} = sub { die "$$ dying\n" };

  

  for (1 .. 10) { 

      unless ($child = fork) {        # i'm the child

          die "cannot fork: $!" unless defined $child;

          squabble( );

          exit;

      } 

      push @kids, $child;  # in case we care about their pids

  }

  

  while (1) {

      print "Buffer is $buffer\n";

      sleep 1;

  } 

  die "Not reached";

  

  sub squabble {

      my $i = 0;

      while (1) { 

          next if $buffer =~ /^$$\b/o;  

          $handle->shlock( );

          $i++;

          $buffer = "$$ $i";

          $handle->shunlock( );

      }

  }
The starting process creates the shared variable, forks off 10 children, and then sits back and prints out the value of the buffer every second or so, forever, or until you hit Ctrl-C.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Listing Available Signals
Inhaltsvorschau
You want to know the signals your operating system provides.
If your shell has a built-in kill -l command, use it:
% kill -l

HUP INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV USR2 PIPE

               ALRM TERM CHLD CONT STOP TSTP TTIN TTOU URG XCPU XFSZ VTALRM

               PROF WINCH POLL PWR

            
Or using just Perl, print the keys in %SIG if you have release 5.004 or later:
% perl -e 'print join(" ", keys %SIG), "\n"'

XCPU ILL QUIT STOP EMT ABRT BUS USR1 XFSZ TSTP INT IOT USR2 INFO TTOU

               ALRM KILL HUP URG PIPE CONT SEGV VTALRM PROF TRAP IO TERM WINCH CHLD

               FPE TTIN SYS

            
Before Version 5.004, you had to use the Config module:
% perl -MConfig -e 'print $Config{sig_name}'

ZERO HUP INT QUIT ILL TRAP ABRT EMT FPE KILL BUS SEGV SYS PIPE ALRM

               TERM URG STOP TSTP CONT CHLD TTIN TTOU IO XCPU XFSZ VTALRM PROF WINCH

               INFO USR1 USR2 IOT

            
If your version of Perl is before 5.004, you have to use signame and signo in Config to find the list of available signals, since keys %SIG wasn't implemented then.
The following code retrieves by name and number the available signals from Perl's standard Config.pm module. Use @signame indexed by number to get the signal name, and %signo indexed by name to get the signal number.
use Config;

defined $Config{sig_name} or die "No sigs?";

$i = 0;                     # Config prepends fake 0 signal called "ZERO".

foreach $name (split(' ', $Config{sig_name})) {

    $signo{$name} = $i;

    $signame[$i] = $name;

    $i++;

}
The documentation for the standard Config module, also in Chapter 32 of Programming Perl; the "Signals" sections in Chapter 16 of Programming Perl and in perlipc(1)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sending a Signal
Inhaltsvorschau
You want to send a signal to a process. This could be sent to your own process or to another on the same system. For instance, you caught SIGINT and want to pass it on to your children.
Use kill to send a signal by name or number to the process IDs listed in the remaining arguments:
kill  9     => $pid;                    # send $pid a signal 9

kill -1     => $pgrp;                   # send whole job a signal 1

kill  USR1  => $$;                      # send myself a SIGUSR1

kill  HUP   => @pids;                   # send a SIGHUP to processes in @pids
Perl's kill function is an interface to the syscall of the same name. The first argument is the signal to send, identified by number or by name; subsequent arguments are process IDs to send the signal to. It returns the count of processes successfully signaled. You can only send signals to processes running under the same real or saved UID as your real or effective UID—unless you're the superuser.
If the signal number is negative, Perl interprets remaining arguments as process group IDs and sends that signal to all those groups' processes using the killpg(2) syscall.
A process group is essentially a job. It's how the operating system ties related processes together. For example, when you use your shell to pipe one command into another, you've started two processes, but only one job. When you use Ctrl-C to interrupt the current job or Ctrl-Z to suspend it, this sends the appropriate signals to the entire job, which may be more than one process.
kill can also check whether a process is alive. Sending the special pseudo-signal number 0 checks whether it's legal for you to send a signal to the process—without actually sending one. If it returns true, the process is still alive. If it returns false, the process either has changed its effective UID (in which case
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Installing a Signal Handler
Inhaltsvorschau
You want to control how your program responds to signals. You need to do this if you want to catch Ctrl-C, avoid accumulating finished subprocesses, or prevent your process from dying when it writes to a child that has gone away.
Use the %SIG hash to install your own handler by name or by code reference:
$SIG{QUIT} = \&got_sig_quit;     # call &got_sig_quit for every SIGQUIT 

$SIG{PIPE} = 'got_sig_pipe';     # call main::got_sig_pipe for every SIGPIPE 

$SIG{INT}  = sub { $ouch++ };    # increment $ouch for every SIGINT
%SIG also lets you ignore a signal:
$SIG{INT} = 'IGNORE';            # ignore the signal INT
It also restores handling for that signal to the default:
$SIG{STOP} = 'DEFAULT';          # restore default STOP signal handling
Perl uses the %SIG hash to control what happens when signals are received. Each key in %SIG corresponds to a signal. Each value is the action to take when Perl receives the corresponding signal. Perl provides two special behaviors: "IGNORE" to take no action when a particular signal is received, and "DEFAULT" to perform the default Unix action for that signal.
Although a C programmer might think of a signal as SIGINT, Perl uses just INT. Perl figures you only use signal names in functions that deal with signals, so the SIG prefix is redundant. This means that you'll assign to $SIG{CHLD} to change what your process does when it gets a SIGCHLD.
If you want to run your own code when a given signal is received, you have two choices of what to put in the hash: either a code reference or a subroutine name. (This means you can't name a signal handler IGNORE or DEFAULT if you store the string, but they'd be mighty strange names for signal handlers anyway.) If you use a subroutine name that isn't qualified by a package, Perl will interpret this name to be a function in the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Temporarily Overriding a Signal Handler
Inhaltsvorschau
You want to install a signal handler only for a particular subroutine. For instance, your subroutine catches SIGINT, and you don't want to disturb SIGINT handling outside the subroutine.
Use local to temporarily override a signal's behavior:
# the signal handler

sub ding {

    $SIG{INT} = \&ding;

    warn "\aEnter your name!\n";

}



# prompt for name, overriding SIGINT

sub get_name {

    local $SIG{INT} = \&ding;

    my $name;



    print "Kindly Stranger, please enter your name: ";

    chomp( $name = <> );

    return $name;

}
You must use local rather than my to save away one value out of %SIG. The change remains in effect throughout the execution of that block, including in anything called from it. In this case, that's the get_name subroutine. If the signal is delivered while another function that your function calls is running, your signal handler is triggered—unless the called subroutine installs its own signal handler. The previous value of the hash is automatically restored when the block exits. This is one of the (few) places where dynamic scoping is more convenient than confusing.
Recipe 10.13; Recipe 16.15; Recipe 16.18
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a Signal Handler
Inhaltsvorschau
You want to write a subroutine that will be called whenever your program receives a signal.
A signal handler is just a subroutine. With some risk, you can do anything in a signal handler you'd do in any Perl subroutine, but the more you do, the riskier it gets.
Some systems require you to reinstall your signal handler after each signal:
$SIG{INT} = \&got_int;

sub got_int {

    $SIG{INT} = \&got_int;          # but not for SIGCHLD!

    # ...

}
Some systems restart blocking operations, such as reading data. In such cases, you must call die within the handler and trap it with eval:
my $interrupted = 0;



sub got_int {

    $interrupted = 1;

    $SIG{INT} = 'DEFAULT';          # or 'IGNORE'

    die;

}



eval {

    $SIG{INT} = \&got_int;

    # ... long-running code that you don't want to restart

};



if ($interrupted) {

    # deal with the signal

}
At the C level, signals can interrupt just about anything. Unfortunately, this means that signals could interrupt Perl while Perl is changing its own internal data structures, leaving those data structures inconsistent and leading to a core dump. As of Perl 5.8, Perl tries very hard to ensure that this doesn't happen—when you install a signal handler, Perl installs a C-level signal handler that says "Perl received this signal." When Perl's data structures are consistent (after each operation it performs), the Perl interpreter checks to see whether a signal was received. If one was, your signal handler is called.
This prevents core dumps, but at the cost of slightly delaying signals in cases where one of Perl's built-in operations takes a long time to finish. For example, building a long list like this:
@a = 1..5_000_000;
might take 10 seconds on a heavily loaded system, but you won't be able to interrupt it because Perl will not check whether a signal was received while the list is being built. There are two operations in this statement, list generation and assignment, and Perl checks for signals only after each operation completes.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Catching Ctrl-C
Inhaltsvorschau
You want to intercept Ctrl-C, which would otherwise kill your whole program. You'd like to ignore it or run your own function when the signal is received.
Set a handler for SIGINT. Set it to "IGNORE" to make Ctrl-C have no effect:
$SIG{INT} = 'IGNORE';
Or set it to a subroutine of your own devising to respond to Ctrl-C:
$SIG{INT} = \&tsktsk;



sub tsktsk {

    $SIG{INT} = \&tsktsk;           # See ``Writing A Signal Handler''

    warn "\aThe long habit of living indisposeth us for dying.\n";

}
Ctrl-C isn't directly affecting your program. The terminal driver processing your keystrokes recognizes the Ctrl-C combination (or whatever you've set your terminal to recognize as the interrupt character), and sends a SIGINT to every process in the foreground process group (foreground job) for that terminal. The foreground job normally comprises all programs started from the shell on a single command line, plus any other programs run by those programs. See Signals in the Introduction to this chapter for details.
The interrupt character isn't the only special character interpreted by your terminal driver. Type stty -a to find out your current settings:
% stty -a

speed 9600 baud; 38 rows; 80 columns;

               lflags: icanon isig iexten echo echoe -echok echoke -echonl echoctl

               -echoprt -altwerase -noflsh -tostop -flusho pendin -nokerninfo

               -extproc

               iflags: -istrip icrnl -inlcr -igncr ixon -ixoff ixany imaxbel -ignbrk

               brkint -inpck -ignpar -parmrk

               oflags: opost onlcr oxtabs

               cflags: cread cs8 -parenb -parodd hupcl -clocal -cstopb -crtscts -dsrflow

               -dtrflow -mdmbuf

               cchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = <undef;>

               eol2 = <undef; erase = ^H; intr = ^C; kill = ^U; lnext = ^V;>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Avoiding Zombie Processes
Inhaltsvorschau
Your program forks children, but the dead children accumulate, fill up your process table, and aggravate your system administrator.
If you don't need to record the children that have terminated, use:
$SIG{CHLD} = 'IGNORE';
To keep better track of deceased children, install a SIGCHLD handler to call waitpid:
use POSIX ":sys_wait_h";



$SIG{CHLD} = \&REAPER;

sub REAPER {

    my $stiff;

    while (($stiff = waitpid(-1, WNOHANG)) > 0) {

        # do something with $stiff if you want

    }

    $SIG{CHLD} = \&REAPER;                  # install *after* calling waitpid

}
When a process exits, the system keeps it in the process table so the parent can check its status—whether it terminated normally or abnormally. Fetching a child's status (thereby freeing it to drop from the system altogether) is rather grimly called reaping dead children. (This entire recipe is full of ways to harvest your dead children. If this makes you queasy, we understand.) It involves a call to wait or waitpid. Some Perl functions (piped opens, system, and backticks) will automatically reap the children they make, but you must explicitly wait when you use fork to manually start another process.
To avoid accumulating dead children, simply tell the system that you're not interested in them by setting $SIG{CHLD} to "IGNORE". If you want to know which children die and when, you'll need to use waitpid.
The waitpid function reaps a single process. Its first argument is the process to wait for—use -1 to mean any process—and its second argument is a set of flags. We use the WNOHANG flag to make waitpid immediately return 0 if there are no dead children. A flag value of 0 is supported everywhere, indicating a blocking wait. Call waitpid from a SIGCHLD handler, as we do in the Solution, to reap the children as soon as they die.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Blocking Signals
Inhaltsvorschau
You'd like to delay the reception of a signal, possibly to prevent unpredictable behavior from signals that can interrupt your program at any point.
Use the POSIX module's interface to the sigprocmask(2) syscall. This is available only if your system is POSIX conformant.
To block a signal around an operation:
use POSIX qw(:signal_h);



$sigset = POSIX::SigSet->new(SIGINT);    # define the signals to block

$old_sigset = POSIX::SigSet->new;        # where the old sigmask will be kept



sigprocmask(SIG_BLOCK, $sigset, $old_sigset)

    or die "Could not block SIGINT\n";
To unblock:
defined sigprocmask(SIG_UNBLOCK, $old_sigset)

    or die "Could not unblock SIGINT\n";
The POSIX standard introduced sigaction and sigprocmask to give you better control over how signals are delivered. The sigprocmask function controls delayed delivery of signals, and sigaction installs handlers. If available, Perl uses sigaction when you change %SIG.
To use sigprocmask, first build a signal set using POSIX::SigSet->new. This takes a list of signal numbers. The POSIX module exports functions named after the signals, which return their signal numbers.
use POSIX qw(:signal_h);



$sigset = POSIX::SigSet->new( SIGINT, SIGKILL );
Pass the POSIX::SigSet object to sigprocmask with the SIG_BLOCK flag to delay signal delivery, SIG_UNBLOCK to restore delivery of the signals, or SIG_SETMASK to block only signals in the POSIX::SigSet. The most paranoid of programmers block signals for a fork to prevent a signal handler in the child process being called before Perl can update the child's $$ variable, its process id. If the signal handler were called immediately and reported $$ in that handler, it could possibly report its parent's $$, not its own. This issue does not arise often.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Timing Out an Operation
Inhaltsvorschau
You want to make sure an operation doesn't take more than a certain amount of time. For instance, you're running filesystem backups and want to abort if it takes longer than an hour. Or, you want to give the user a limited amount of time to respond to a query.
To interrupt a long-running operation, set a SIGALRM handler to call die, in effect transforming the signal into an exception. Set an alarm with alarm, then eval your code:
eval { 

    local $SIG{ALRM} = sub { die "alarm clock restart" };

    alarm 10;                   # schedule alarm in 10 seconds 

    eval { 

          ########

          # long-running operation goes here

          ########

    };

    alarm 0;                    # cancel the alarm

};

alarm 0;                        # race condition protection

die if $@ && $@ !~ /alarm clock restart/; # reraise
The alarm function takes one argument: the integer number of seconds before the kernel sends your process a SIGALRM, that is, an alarm signal. It may be delivered after that time in busy time-sharing systems. The default action for SIGALRM is to terminate your program, so you should install your own signal handler.
Because this example should work no matter what operation is being timed out, we take some special precautions in case your long-running operation contains a slow syscall. Slow syscalls are those that don't return immediately, but await some external event, such as for I/O to happen or some sort of timer to go off. These external events include read (including readline, the <FH> operator), write, and open on certain devices, fifos, and sockets, as well as accept, connect, send, recv, flock, wait, waitpid, and of course, sleep. If the alarm hits while you're in a slow syscall and you simply catch the signal and return, you'll go right back into that syscall. That's because Perl automatically restarts syscalls where it's able to. The only way out of them is to raise an exception through
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Turning Signals into Fatal Errors
Inhaltsvorschau
END blocks aren't run when your program dies from an uncaught signal. Your program gets such signals, and you'd like your END blocks to have a chance to clean up.
Use the sigtrap pragma:
use sigtrap qw(die untrapped normal-signals);
Untrapped signals cause your program to die without running END blocks. Although you could manually install signal handlers that call die, this becomes tedious for a lot of signals:
$SIG{INT} = $SIG{HUP} = $SIG{PIPE} = $SIG{TERM} = sub { die };
The sigtrap pragma provides a convenient shorthand for installing such handlers:
use sigtrap qw(die untrapped normal-signals);
The die import tells sigtrap to call die (you can also import stack-trace to install handlers that trigger stack traces). The untrapped import tells sigtrap to install handlers only for signals that don't already have them, so if you handle SIGPIPE yourself, sigtrap won't replace your handler.
normal-signals is one of several imports that specify predefined lists of useful signals to trap. The signal lists are given in Table 16-2.
Table 16-2: Signal lists
List
Signals
normal-signals
HUP, INT, PIPE, TERM
error-signals
ABRT,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: sigrand
Inhaltsvorschau
The following program gives you random signatures by using named pipes. It expects the signatures file to have records in the format of the fortune program—that is, each possible multiline record is terminated with "%%\n". Here's an example:
Make is like Pascal: everybody likes it, so they go in and change it.

                                            --Dennis Ritchie

%%

I eschew embedded capital letters in names; to my prose-oriented eyes,

they are too awkward to read comfortably. They jangle like bad typography.

                                            --Rob Pike

%%

God made the integers; all else is the work of Man.  

                                            --Kronecker

%%

I'd rather have :rofix than const.          --Dennis Ritchie

%%

If you want to program in C, program in C.  It's a nice language.

I use it occasionally...   :-)              --Larry Wall

%%

Twisted cleverness is my only skill as a programmer.       

                                            --Elizabeth Zwicky

%%

Basically, avoid comments. If your code needs a comment to be understood,

it would be better to rewrite it so it's easier to understand.  

                                            --Rob Pike

%%

Comments on data are usually much more helpful than on algorithms.  

                                            --Rob Pike

%% 

Programs that write programs are the happiest programs in the world.

                                            --Andrew Hume 

%%
We check whether we're already running by using a file with our PID in it. If sending a signal number 0 indicates that PID still exists (or, rarely, that something else has reused it), we just exit. We also look at the current Usenet posting to decide whether to look for a per-newsgroup signature file. That way, you can have different signatures for each newsgroup you post to. For variety, a global signature file is still on occasion used even if a per-newsgroup file exists.
You can even use sigrand on systems without named pipes if you remove the code to create a named pipe and extend the sleep interval before file updates. Then
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 17: Sockets
Inhaltsvorschau
Glendower: I can call spirits from the vasty deep.Hotspur: Why so can I, or so can any man, But will they come when you do call for them?
—Shakespeare, Henry IV, part I, Act III scene 1
Sockets are endpoints for communication. Some types of sockets provide reliable communications. Others offer few guarantees, but consume low system overhead. Socket communication can be used to let processes talk on just one machine or over the Internet.
In this chapter we consider the two most commonly used types of sockets: streams and datagrams. Streams provide a bidirectional, sequenced, and reliable channel of communication—similar to pipes. Datagram sockets do not guarantee sequenced, reliable delivery, but they do guarantee that message boundaries will be preserved when read. Your system may support other types of sockets as well; consult your socket(2) manpage or equivalent documentation for details.
We also consider both the Internet and Unix domains. The Internet domain gives sockets two-part names: a host (an IP address in a particular format) and a port number. In the Unix domain, sockets are named using files (e.g., /tmp/mysock).
In addition to domains and types, sockets also have a protocol associated with them. Protocols are not very important to the casual programmer, as there is rarely more than one protocol for a given domain and type of socket.
Domains and types are normally identified by numeric constants (available through functions exported by the Socket and IO::Socket modules). Stream sockets have the type SOCK_STREAM, and datagram sockets have the type SOCK_DGRAM. The Internet domain is PF_INET, and the Unix domain PF_UNIX. (POSIX uses PF_LOCAL instead of PF_UNIX, but PF_UNIX will almost always be an acceptable constant simply because of the preponderance of existing software that uses it.) You should use these symbolic names instead of numbers because the numbers may change (and historically, have).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Sockets are endpoints for communication. Some types of sockets provide reliable communications. Others offer few guarantees, but consume low system overhead. Socket communication can be used to let processes talk on just one machine or over the Internet.
In this chapter we consider the two most commonly used types of sockets: streams and datagrams. Streams provide a bidirectional, sequenced, and reliable channel of communication—similar to pipes. Datagram sockets do not guarantee sequenced, reliable delivery, but they do guarantee that message boundaries will be preserved when read. Your system may support other types of sockets as well; consult your socket(2) manpage or equivalent documentation for details.
We also consider both the Internet and Unix domains. The Internet domain gives sockets two-part names: a host (an IP address in a particular format) and a port number. In the Unix domain, sockets are named using files (e.g., /tmp/mysock).
In addition to domains and types, sockets also have a protocol associated with them. Protocols are not very important to the casual programmer, as there is rarely more than one protocol for a given domain and type of socket.
Domains and types are normally identified by numeric constants (available through functions exported by the Socket and IO::Socket modules). Stream sockets have the type SOCK_STREAM, and datagram sockets have the type SOCK_DGRAM. The Internet domain is PF_INET, and the Unix domain PF_UNIX. (POSIX uses PF_LOCAL instead of PF_UNIX, but PF_UNIX will almost always be an acceptable constant simply because of the preponderance of existing software that uses it.) You should use these symbolic names instead of numbers because the numbers may change (and historically, have).
Protocols have names such as tcp and udp, which correspond to numbers that the operating system uses. The getprotobyname function (built into Perl) returns the number when given a protocol name. Pass protocol number
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a TCP Client
Inhaltsvorschau
You want to connect to a socket on a remote machine.
This solution assumes you're using the Internet to communicate. For TCP-like communication within a single machine, see Recipe 17.6.
Either use the standard IO::Socket::INET class:
use IO::Socket;



$socket = IO::Socket::INET->new(PeerAddr => $remote_host,

                                PeerPort => $remote_port,

                                Proto    => "tcp",

                                Type     => SOCK_STREAM)

    or die "Couldn't connect to $remote_host:$remote_port : $@\n";



# ... do something with the socket

print $socket "Why don't you call me anymore?\n";



$answer = <$socket>;



# and terminate the connection when we're done

close($socket);
or create a socket by hand for better control:
use Socket;



# create a socket

socket(TO_SERVER, PF_INET, SOCK_STREAM, getprotobyname('tcp'));



# build the address of the remote machine

$internet_addr = inet_aton($remote_host)

    or die "Couldn't convert $remote_host into an Internet address: $!\n";

$paddr = sockaddr_in($remote_port, $internet_addr);



# connect

connect(TO_SERVER, $paddr)

    or die "Couldn't connect to $remote_host:$remote_port : $!\n";



# ... do something with the socket

print TO_SERVER "Why don't you call me anymore?\n";



# and terminate the connection when we're done

close(TO_SERVER);
While coding this by hand requires a lot of steps, the IO::Socket::INET class wraps them all in a convenient constructor. The important things to know are where you're going (the PeerAddr and PeerPort parameters) and how you're getting there (the Type parameter). IO::Socket::INET tries to determine these things from what you've given it. It deduces Proto from the Type and Port if possible, and assumes tcp otherwise.
PeerAddr is a string containing either a hostname ("www.oreilly.com
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a TCP Server
Inhaltsvorschau
You want to write a server that waits for clients to connect over the network to a particular port.
This recipe assumes you're using the Internet to communicate. For TCP-like communication within a single Unix machine, see Recipe 17.6.
Use the standard IO::Socket::INET class:
use IO::Socket;



$server = IO::Socket::INET->new(LocalPort => $server_port,

                                Type      => SOCK_STREAM,

                                Reuse     => 1,

                                Listen    => 10 )   # or SOMAXCONN

    or die "Couldn't be a tcp server on port $server_port : $@\n";



while ($client = $server->accept( )) {

    # $client is the new connection

}



close($server);
Or craft it by hand for better control:
use Socket;



# make the socket

socket(SERVER, PF_INET, SOCK_STREAM, getprotobyname('tcp'));



# so we can restart our server quickly

setsockopt(SERVER, SOL_SOCKET, SO_REUSEADDR, 1);



# build up my socket address

$my_addr = sockaddr_in($server_port, INADDR_ANY);

bind(SERVER, $my_addr)

    or die "Couldn't bind to port $server_port : $!\n";



# establish a queue for incoming connections

listen(SERVER, SOMAXCONN)

    or die "Couldn't listen on port $server_port : $!\n";



# accept and process connections

while (accept(CLIENT, SERVER)) {

    # do something with CLIENT

}



close(SERVER);
Setting up a server is more complicated than being a client. The optional listen function tells the operating system how many pending, unanswered connections can queue up while waiting for your server. The setsockopt function used in the Solution allows you to avoid waiting two minutes after killing your server before you restart it again (valuable in testing). The bind call registers your server with the kernel so others can find you. Finally, accept takes the incoming connections one by one.
The numeric argument to listen
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Communicating over TCP
Inhaltsvorschau
You want to read or write data over a TCP connection.
This recipe assumes you're using the Internet to communicate. For TCP-like communication within a single machine, see Recipe 17.6.
Use print or <> :
print SERVER "What is your name?\n";

chomp ($response = <SERVER>);
Or use send and recv :
defined (send(SERVER, $data_to_send, $flags))

    or die "Can't send : $!\n";



recv(SERVER, $data_read, $maxlen, $flags)

    or die "Can't receive: $!\n";
Or use the corresponding methods on an IO::Socket object:
use IO::Socket;



$server->send($data_to_send, $flags)

    or die "Can't send: $!\n";



$server->recv($data_read, $maxlen, $flags)

    or die "Can't recv: $!\n";
To find out whether data can be read or written, use the select function, which is nicely wrapped by the standard IO::Socket class:
use IO::Select;



$select = IO::Select->new( );

$select->add(*FROM_SERVER);

$select->add($to_client);



@read_from = $select->can_read($timeout);

foreach $socket (@read_from) {

    # read the pending data from $socket

}
Sockets handle two completely different types of I/O, each with attendant pitfalls and benefits. The normal Perl I/O functions used on files (except for seek and sysseek) work for stream sockets, but datagram sockets require the system calls send and recv, which work on complete records.
Awareness of buffering issues is particularly important in socket programming. That's because buffering, while designed to enhance performance, can interfere with the interactive feel that some programs require. Gathering input with <> may try to read more data from the socket than is yet available as it looks for a record separator. Both print and <> use stdio buffers, so unless you've changed autoflushing (see the Introduction to Chapter 7) on the socket handle, your data won't be sent to the other end as soon as you
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Setting Up a UDP Client
Inhaltsvorschau
You want to exchange messages with another process using UDP (datagrams).
To set up a UDP socket handle, use either the low-level Socket module on your own filehandle:
use Socket;

socket(SOCKET, PF_INET, SOCK_DGRAM, getprotobyname("udp")) 

    or die "socket: $!";
or else IO::Socket, which returns an anonymous one:
use IO::Socket;

$handle = IO::Socket::INET->new(Proto => 'udp') 

    or die "socket: $@";     # yes, it uses $@ here
Then to send a message to a machine named $HOSTNAME on port number $PORTNO, use:
$ipaddr   = inet_aton($HOSTNAME);

$portaddr = sockaddr_in($PORTNO, $ipaddr);

send(SOCKET, $MSG, 0, $portaddr) =  = length($MSG)

        or die "cannot send to $HOSTNAME($PORTNO): $!";
To receive a message of length no greater than $MAXLEN, use:
$portaddr = recv(SOCKET, $MSG, $MAXLEN, 0)      or die "recv: $!";

($portno, $ipaddr) = sockaddr_in($portaddr);

$host = gethostbyaddr($ipaddr, AF_INET);

print "$host($portno) said $MSG\n";
Datagram sockets are unlike stream sockets. Streams provide sessions, giving the illusion of a stable connection. You might think of them as working like a telephone call—expensive to set up, but once established, reliable and easy to use. Datagrams, though, are more like the postal system—it's cheaper and easier to send a letter to your friend on the other side of the world than to call them on the phone. Datagrams are easier on the system than streams. You send a small amount of information one message at a time. But your messages' delivery isn't guaranteed, and they might arrive in the wrong order. Like a small post box, the receiver's queue might fill up and cause further messages to be dropped.
Why then, if datagrams are unreliable, do we have them? Because some applications are most sensibly implemented in terms of datagrams. For instance, in streaming audio, it's more important that the stream as a whole be preserved than that every packet get through, especially if packets are being dropped because there's not enough bandwidth for them all. Another use for datagrams is broadcasting, which corresponds to mass mailing of advertisements in the postal model, and is equally popular in most circles. One use for broadcast packets is to send out a message to your local subnet saying, "Hey, is there anybody around here who wants to be my server?"
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Setting Up a UDP Server
Inhaltsvorschau
You want to write a UDP server.
First bind to the port on which the server is to be contacted. With IO::Socket, this is easily accomplished:
use IO::Socket;

$server = IO::Socket::INET->new(LocalPort => $server_port,

                                Proto     => "udp")

    or die "Couldn't be a udp server on port $server_port : $@\n";
Then, go into a loop receiving messages:
while ($him = $server->recv($datagram, $MAX_TO_READ, $flags)) {

    # do something

}
Life with UDP is much simpler than life with TCP. Instead of accepting client connections one at a time and committing yourself to a long-term relationship, take messages from clients as they come in. The recv function returns the address of the sender, which you must then decode.
Example 17-2 is a small UDP-based server that just sits around waiting for messages. Every time a message comes in, we find out who sent it, send that person a message based on the previous message, and then save the new message.
Example 17-2. udpqotd
  #!/usr/bin/perl -w

  # udpqotd - UDP message server

  use strict;

  use IO::Socket;

  my ($sock, $oldmsg, $newmsg, $hisaddr, $hishost, $MAXLEN, $PORTNO);

  $MAXLEN = 1024;

  $PORTNO = 5151;

  $sock = IO::Socket::INET->new(LocalPort => $PORTNO, Proto => 'udp')

      or die "socket: $@";

  print "Awaiting UDP messages on port $PORTNO\n";

  $oldmsg = "This is the starting message.";

  while ($sock->recv($newmsg, $MAXLEN)) {

      my($port, $ipaddr) = sockaddr_in($sock->peername);

      $hishost = gethostbyaddr($ipaddr, AF_INET);

      print "Client $hishost said ``$newmsg''\n";

      $sock->send($oldmsg);

      $oldmsg = "[$hishost] $newmsg";

  } 

  die "recv: $!";
This program is easier using IO::Socket than the raw Socket module. We don't have to say where to send the message, because the library keeps track of who sent the last message and stores that information away on the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Unix Domain Sockets
Inhaltsvorschau
You want to communicate with other processes on only the local machine.
Use domain sockets. You can use the code and techniques from the preceding Internet domain recipes, with the following changes:
  • Because the naming system is different, use sockaddr_un instead of sockaddr_in.
  • Use IO::Socket::UNIX instead of IO::Socket::INET, and use Peer and Local instead of PeerAddr/PeerPort and LocalAddr/LocalPort.
  • Use PF_UNIX instead of PF_INET, and give PF_UNSPEC as the last argument to socket.
  • SOCK_STREAM clients don't have to bind to a local address before they connect.
Unix domain sockets have names like files on the filesystem. In fact, most systems implement them as special files; that's what Perl's -S filetest operator looks for—whether the file is a Unix domain socket.
Supply the filename as the Peer argument to IO::Socket::UNIX->new, or encode it with sockaddr_un and pass it to connect. Here's how to make server and client Unix domain stream sockets with IO::Socket::UNIX:
use IO::Socket;



unlink "/tmp/mysock";

$server = IO::Socket::UNIX->new(LocalAddr => "/tmp/mysock",

                                Type      => SOCK_STREAM,

                                Listen    => 5 )

    or die $@;



$client = IO::Socket::UNIX->new(PeerAddr  => "/tmp/mysock",

                                Type      => SOCK_STREAM,

                                Timeout   => 10 )

    or die $@;
Here's how to use the traditional functions to make stream sockets:
use Socket;



socket(SERVER, PF_UNIX, SOCK_STREAM, 0);

unlink "/tmp/mysock";

bind(SERVER, sockaddr_un("/tmp/mysock"))

    or die "Can't create server: $!";



socket(CLIENT, PF_UNIX, SOCK_STREAM, 0);

connect(CLIENT, sockaddr_un("/tmp/mysock"))

    or die "Can't connect to /tmp/mysock: $!";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Identifying the Other End of a Socket
Inhaltsvorschau
You have a socket and want to identify the machine at the other end.
If you're only interested in the IP address of the remote machine, use:
use Socket;



$other_end         = getpeername(SOCKET)

    or die "Couldn't identify other end: $!\n";

($port, $iaddr)    = unpack_sockaddr_in($other_end);

$ip_address        = inet_ntoa($iaddr);
If you want its actual hostname, use:
use Socket;



$other_end        = getpeername(SOCKET)

    or die "Couldn't identify other end: $!\n";

($port, $iaddr)   = unpack_sockaddr_in($other_end);

$actual_ip        = inet_ntoa($iaddr);

$claimed_hostname = gethostbyaddr($iaddr, AF_INET);

@name_lookup      = gethostbyname($claimed_hostname)

    or die "Could not look up $claimed_hostname : $!\n";

@resolved_ips     = map { inet_ntoa($_) }

    @name_lookup[ 4 .. $#name_lookup ];
For a long time, figuring out who connected to you was considered more straightforward than it really is. The getpeername function returns the IP address of the remote machine in a packed binary structure (or undef if an error occurred). To unpack it, use inet_ntoa. If you want the name of the remote end, call gethostbyaddr to look up the name of the machine in the DNS tables, right?
Not really. That's only half the solution. Because a name lookup goes to the name's owner's DNS server and a lookup of an IP address goes to the address's owner's DNS server, you have to contend with the possibility that the machine that connected to you is giving incorrect names. For instance, the machine evil.crackers.org could belong to malevolent cyberpirates who tell their DNS server that its IP address (1.2.3.4) should be identified as trusted.dod.gov. If your program trusts trusted.dod.gov, a connection from evil.crackers.org will cause getpeername to return the right IP address (1.2.3.4), but gethostbyaddr will return the duplicitous name.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding Your Own Name and Address
Inhaltsvorschau
You want to find your (fully qualified) hostname.
First, get your (possibly qualified) hostname. Try either the standard Sys::Hostname module:
use Sys::Hostname;



$hostname = hostname( );
or POSIX's uname function:
use POSIX qw(uname);

($kernel, $hostname, $release, $version, $hardware) = uname( );



$hostname = (uname)[1];             # or just one
Then turn it into an IP address and convert to its canonical name:
use Socket;                         # for AF_INET

$address  = gethostbyname($hostname)

    or die "Couldn't resolve $hostname : $!";

$hostname = gethostbyaddr($address, AF_INET)

    or die "Couldn't re-resolve $hostname : $!";
Sys::Hostname tries to be portable by using knowledge about your system to decide how best to find the hostname. It tries many different ways of getting the hostname, but several involve running other programs. This can lead to tainted data (see Recipe 19.1).
POSIX::uname, on the other hand, works only on POSIX systems and isn't guaranteed to provide anything useful in the nodename field that we are examining. That said, the value is useful on many machines and doesn't suffer from the tainted data problem that Sys::Hostname does.
Once you have the name, though, you must consider that it might be missing a domain name. For instance, Sys::Hostname may return you guanaco instead of guanaco.camelids.org. To fix this, convert the name back into an IP address with gethostbyname and then back into a name again with gethostbyaddr. By involving the domain name system, you are guaranteed of getting a full name.
The gethostbyname and gethostbyaddr functions in Chapter 29 of Programming Perl and in perlfunc(1); the documentation for the standard Net::hostent and Sys::Hostname modules
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Closing a Socket After Forking
Inhaltsvorschau
Your program has forked and you want to tell the other end that you're done sending data. You've tried close on the socket, but the remote end never gets an EOF or SIGPIPE.
Use shutdown:
shutdown(SOCKET, 0);                # I/we have stopped reading data

shutdown(SOCKET, 1);                # I/we have stopped writing data

shutdown(SOCKET, 2);                # I/we have stopped using this socket
On an IO::Socket object, you could also write:
$socket->shutdown(0);               # I/we have stopped reading data
When a process forks, the child has copies of the parent's open filehandles, including sockets. When you close a file or socket, you close only the current process's copy. If another process (parent or child) still has the socket open, the operating system doesn't consider their file or socket closed.
Take the case of a socket that data is being sent to. If two processes have this socket open, one can close it but, because the other still has it open, the socket isn't considered closed by the operating system. Until the other process closes the socket, the process reading from the socket won't get an end-of-file. This can lead to confusion and deadlock.
To avoid this, either close unused filehandles after a fork or use shutdown. The shutdown function is a more insistent form of close—it tells the operating system that even though other processes have copies of this filehandle, it should be marked as closed, and the other end should get an end-of-file if the processes read from it or a SIGPIPE if they write to it.
The numeric argument to shutdown lets you specify which sides of the connection are closed. An argument of 0 says that we're done reading data, so the other end of the socket will get a SIGPIPE if they try writing. 1 says that we're done writing data, so the other end of the socket will get an end-of-file if they try reading.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing Bidirectional Clients
Inhaltsvorschau
You want set up a fully interactive client so you can type a line, get the answer, type a line, get the answer, etc., somewhat like telnet.
Once you've connected, fork off a duplicate process. One twin reads only your input and passes it on to the server, and the other reads only the server's output and sends it to your own output.
In a client-server relationship, it is difficult to know whose turn it is to talk. Single-threaded solutions involving the four-argument version of select are hard to write and maintain. But there's no reason to ignore multitasking solutions. The fork function dramatically simplifies this problem.
Once you've connected to the service you'd like to chat with, call fork to clone a twin. Each of these two (nearly) identical processes has a simple job. The parent copies everything from the socket to standard output, and the child simultaneously copies everything from standard input to the socket.
The code is in Example 17-4.
Example 17-4. biclient
  #!/usr/bin/perl -w

  # biclient - bidirectional forking client

  use strict;

  use IO::Socket;

  my ($host, $port, $kidpid, $handle, $line);

  

  unless (@ARGV =  = 2) { die "usage: $0 host port" }

  ($host, $port) = @ARGV;

  

  # create a tcp connection to the specified host and port

  $handle = IO::Socket::INET->new(Proto     => "tcp",

                                  PeerAddr  => $host,

                                  PeerPort  => $port)

         or die "can't connect to port $port on $host: $!";

  

  $handle->autoflush(1);              # so output gets there right away

  print STDERR "[Connected to $host:$port]\n";

  

  # split the program into two processes, identical twins

  die "can't fork: $!" unless defined($kidpid = fork( ));

  

  if ($kidpid) {                      

      # parent copies the socket to standard output

      while (defined ($line = <$handle>)) {

          print STDOUT $line;

      }

      kill("TERM" => $kidpid);        # send SIGTERM to child

  }

  else {                              

      # child copies standard input to the socket

      while (defined ($line = <STDIN>)) {

          print $handle $line;

      }

  }

  exit;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Forking Servers
Inhaltsvorschau
You want to write a server that forks a subprocess to handle each new client.
Fork in the accept loop, and use a $SIG{CHLD} handler to reap the children.
# set up the socket SERVER, bind and listen ...

use POSIX qw(:sys_wait_h);



sub REAPER {

    1 until (-1 =  = waitpid(-1, WNOHANG));

    $SIG{CHLD} = \&REAPER;                 # unless $]>= 5.002

}



$SIG{CHLD} = \&REAPER;



while ($hisaddr = accept(CLIENT, SERVER)) {

    next if $pid = fork;                    # parent

    die "fork: $!" unless defined $pid;     # failure

    # otherwise child

    close(SERVER);                          # no use to child

    # ... do something

    exit;                                   # child leaves

} continue { 

    close(CLIENT);                          # no use to parent

}
This approach is very common for SOCK_STREAM servers in the Internet and Unix domains. Each incoming connection gets a cloned server of its own. The model is:
  1. Accept a stream connection.
  2. Fork off a duplicate to communicate over that stream.
  3. Return to 1.
This technique isn't used with SOCK_DGRAM sockets, because their method of communication is different. The time it takes to fork makes the forking model impractical for UDP-style servers. Instead of working with a series of stateful, long-running connections, SOCK_DGRAM servers work with a bunch of sporadic datagrams, usually statelessly. With them, the model must become:
  1. Read a datagram.
  2. Handle the datagram.
  3. Return to 1.
The child process deals with the new connection. Because it will never use the SERVER socket, we immediately close it. This is partly to keep a tidy house, but mainly so that the server socket is closed when the parent (server) process exits. If the children do not close the SERVER socket, the operating system considers the socket still open even when the parent dies. For more on this, see Recipe 17.9.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Pre-Forking Servers
Inhaltsvorschau
You want to write a server that concurrently processes several clients (as in Recipe 17.11), but connections are coming in so fast that forking slows the server too much.
Have a master server maintain a pool of pre-forked children, as shown in Example 17-5.
Example 17-5. preforker
  #!/usr/bin/perl

  # preforker - server who forks first

  use IO::Socket;

  use Symbol;

  use POSIX;

  

  # establish SERVER socket, bind and listen.

  $server = IO::Socket::INET->new(LocalPort => 6969,

                                  Type      => SOCK_STREAM,

                                  Proto     => 'tcp',

                                  Reuse     => 1,

                                  Listen    => 10 )

    or die "making socket: $@\n";

  

  # global variables

  $PREFORK                = 5;        # number of children to maintain

  $MAX_CLIENTS_PER_CHILD  = 5;        # number of clients each child should process

  %children               = ( );       # keys are current child process IDs

  $children               = 0;        # current number of children

  

  sub REAPER {                        # takes care of dead children

      $SIG{CHLD} = \&REAPER;

      my $pid = wait;

      $children --;

      delete $children{$pid};

  }

  

  sub HUNTSMAN {                      # signal handler for SIGINT

      local($SIG{CHLD}) = 'IGNORE';   # we're going to kill our children

      kill 'INT' => keys %children;

      exit;                           # clean up with dignity

  }

      

  # Fork off our children.

  for (1 .. $PREFORK) {

      make_new_child( );

  }

  

  # Install signal handlers.

  $SIG{CHLD} = \&REAPER;

  $SIG{INT}  = \&HUNTSMAN;

  

  # And maintain the population.

  while (1) {

      sleep;                          # wait for a signal (i.e., child's death)

      for ($i = $children; $i < $PREFORK; $i++) {

          make_new_child( );           # top up the child pool

      }

  }

  

  sub make_new_child {

      my $pid;

      my $sigset;

      

      # block signal for fork

      $sigset = POSIX::SigSet->new(SIGINT);

      sigprocmask(SIG_BLOCK, $sigset)

          or die "Can't block SIGINT for fork: $!\n";

      

      die "fork: $!" unless defined ($pid = fork);

      

      if ($pid) {

          # Parent records the child's birth and returns.

          sigprocmask(SIG_UNBLOCK, $sigset)

              or die "Can't unblock SIGINT for fork: $!\n";

          $children{$pid} = 1;

          $children++;

          return;

      } else {

          # Child can *not* return from this subroutine.

          $SIG{INT} = 'DEFAULT';      # make SIGINT kill us as it did before

      

          # unblock signals

          sigprocmask(SIG_UNBLOCK, $sigset)

              or die "Can't unblock SIGINT for fork: $!\n";

      

          # handle connections until we've reached $MAX_CLIENTS_PER_CHILD

          for ($i=0; $i < $MAX_CLIENTS_PER_CHILD; $i++) {

              $client = $server->accept( )     or last;

              # do something with the connection

          }

      

          # tidy up gracefully and finish

      

          # this exit is VERY important, otherwise the child will become

          # a producer of more and more children, forking yourself into

          # process death.

          exit;

      }

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Non-Forking Servers
Inhaltsvorschau
You want a server to deal with several simultaneous connections, but you don't want to fork a process to deal with each connection.
Keep an array of open clients, use select to read information when it becomes available, and deal with a client only when you have read a full request from it, as shown in Example 17-6.
Example 17-6. nonforker
  #!/usr/bin/perl -w

  # nonforker - server who multiplexes without forking

  use POSIX;

  use IO::Socket;

  use IO::Select;

  use Socket;

  use Fcntl;

  use Tie::RefHash;

  

  $port = 1685;               # change this at will

  

  # Listen to port.

  $server = IO::Socket::INET->new(LocalPort => $port,

                                  Listen    => 10 )

    or die "Can't make server socket: $@\n";

  

  # begin with empty buffers

  %inbuffer  = ( );

  %outbuffer = ( );

  %ready     = ( );

  

  tie %ready, 'Tie::RefHash';

  

  nonblock($server);

  $select = IO::Select->new($server);

  

  # Main loop: check reads/accepts, check writes, check ready to process

  while (1) {

      my $client;

      my $rv;

      my $data;

  

      # check for new information on the connections we have

  

      # anything to read or accept?

      foreach $client ($select->can_read(1)) {

  

          if ($client =  = $server) {

              # accept a new connection

  

              $client = $server->accept( );

              $select->add($client);

              nonblock($client);

          } else {

              # read data

              $data = '';

              $rv   = $client->recv($data, POSIX::BUFSIZ, 0);

  

              unless (defined($rv) && length $data) {

                  # This would be the end of file, so close the client

                  delete $inbuffer{$client};

                  delete $outbuffer{$client};

                  delete $ready{$client};

  

                  $select->remove($client);

                  close $client;

                  next;

              }

  

              $inbuffer{$client} .= $data;

  

              # test whether the data in the buffer or the data we

              # just read means there is a complete request waiting

              # to be fulfilled.  If there is, set $ready{$client}

              # to the requests waiting to be fulfilled.

              while ($inbuffer{$client} =~ s/(.*\n)//) {

                  push( @{$ready{$client}}, $1 );

              }

          }

      }

  

      # Any complete requests to process?

      foreach $client (keys %ready) {

          handle($client);

      }

  

      # Buffers to flush?

      foreach $client ($select->can_write(1)) {

          # Skip this client if we have nothing to say

          next unless exists $outbuffer{$client};

  

          $rv = $client->send($outbuffer{$client}, 0);

          unless (defined $rv) {

              # Whine, but move on.

              warn "I was told I could write, but I can't.\n";

              next;

          }

          if ($rv =  = length $outbuffer{$client} ||

              $!  =  = POSIX::EWOULDBLOCK )  

          {

              substr($outbuffer{$client}, 0, $rv) = '';

              delete $outbuffer{$client} unless length $outbuffer{$client};

          } else {

              # Couldn't write all the data, and it wasn't because

              # it would have blocked.  Shutdown and move on.

              delete $inbuffer{$client};

              delete $outbuffer{$client};

              delete $ready{$client};

  

              $select->remove($client);

              close($client);

              next;

          }

      }

  

      # Out of band data?

      foreach $client ($select->has_exception(0)) {  # arg is timeout

          # Deal with out-of-band data here, if you want to.

      }

  }

  

  # handle($socket) deals with all pending requests for $client

  sub handle {

      # requests are in $ready{$client}

      # send output to $outbuffer{$client}

      my $client = shift;

      my $request;

  

      foreach $request (@{$ready{$client}}) {

          # $request is the text of the request

          # put text of reply into $outbuffer{$client}

      }

      delete $ready{$client};

  }

  

  # nonblock($socket) puts socket into nonblocking mode

  sub nonblock {

      my $socket = shift;

      my $flags;

      

      $flags = fcntl($socket, F_GETFL, 0)

              or die "Can't get flags for socket: $!\n";

      fcntl($socket, F_SETFL, $flags | O_NONBLOCK)

              or die "Can't make socket nonblocking: $!\n";

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Multitasking Server with Threads
Inhaltsvorschau
You want to write a server that handles multiple clients from within the one process using your operating system's threads.
Use Perl v5.8.1 or later and threads.pm:
use threads;

use IO::Socket;

my $listen = IO::Socket::INET->new(

                                   LocalPort => $SERVER_PORT,

                                   ReuseAddr => 1,

                                   Listen => 10,

                                   );

sub handle_connection {

    my $socket = shift;

    my $output = shift || $socket;

    my $exit = 0;

    while (<$socket>) {

        # work with $_,

        # print to $output

        # set $exit to true when connection is done

        last if $exit;

    }

}

while (my $socket = $listen->accept) {

    async(\&handle_connection, $socket)->detach;

}
Threading in Perl is still evolving, but it became functional as of v5.8.1. The code in the Solution will not work in earlier versions of Perl. In particular, earlier versions of Perl implemented an entirely different threading model than the current "interpreter threads" system that threads.pm assumes.
The hard work of handling the connection to the client is done in the handle_connection subroutine. It is given the client socket as a parameter, and can call blocking routines like <$socket> because it runs in its own thread. If one thread blocks while reading, other threads can still run.
The master thread in the program creates the socket and accepts connections on it. When a new client connects, the master thread spawns a new thread (with the async call) to handle the connection. The thread runs until the subroutine it is called with (handle_connection in this case) returns.
We detach the newly created thread to ensure that its variables are garbage collected (closing the socket to the client) when the thread ends. If we didn't call detach, our process would accumulate dead threads until we could no longer spawn new threads.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a Multitasking Server with POE
Inhaltsvorschau
You want to write a server that handles multiple clients from within the one process, without using Perl 5.8's threads or the complexity of non-blocking I/O.
Use the cooperative multitasking framework POE (available from CPAN) and the accompanying POE::Component::Server::TCP module to create the server for you:
#!/usr/bin/perl



use warnings;

use strict;



use POE qw(Component::Server::TCP);



# Start a TCP server.  Client input will be logged to the console and

# echoed back to the client, one line at a time.



POE::Component::Server::TCP->new

  ( Port => $PORT_NUMBER,           # port to listen on 

    ClientInput => \&handle_input,  # method to call with input

  );



# Start the server.



$poe_kernel->run( );

exit 0;



sub handle_input {

  my ( $session, $heap, $input ) = @_[ SESSION, HEAP, ARG0 ];

  # $session is a POE::Session object unique to this connection,

  # $heap is this connection's between-callback storage.

  # New data from client is in $input.  Newlines are removed.

  # To echo input back to the client, simply say:

  $heap->{client}->put($input);

  # and log it to the console

  print "client ", $session->ID, ": $input\n";

}
POE is a cooperatively multitasking framework for Perl built entirely out of software components. POE doesn't require you to recompile the Perl interpreter to support threads, but it does require you to design your program around the ideas of events and callbacks. Documentation for this framework is available at http://poe.perl.org/.
It helps to think of POE as an operating system: there's the kernel (an object responsible for deciding which piece of code is run next) and your processes (called sessions, implemented as objects). POE stores the kernel object in the variable $poe_kernel, which is automatically imported into your namespace. Each process in your operating system has a heap, memory where the variables for that process are stored. Sessions have heaps as well. In an operating system, I/O libraries handle buffered I/O. In POE, a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a Multihomed Server
Inhaltsvorschau
You want to write a server that knows that the machine it runs on has multiple IP addresses, and that it should possibly do different things for each address.
Don't bind your server to a particular address. Instead, bind to INADDR_ANY. Then once you've accepted a connection, use getsockname on the client socket to find out which address the client connected to:
use Socket;



socket(SERVER, PF_INET, SOCK_STREAM, getprotobyname('tcp'));

setsockopt(SERVER, SOL_SOCKET, SO_REUSEADDR, 1);

bind(SERVER, sockaddr_in($server_port, INADDR_ANY))

    or die "Binding: $!\n";



# accept loop

while (accept(CLIENT, SERVER)) {

    $my_socket_address = getsockname(CLIENT);

    ($port, $myaddr)   = sockaddr_in($my_socket_address);

}
Whereas getpeername (as discussed in Recipe 17.7) returns the address of the remote end of the socket, getsockname returns the address of the local end. When we've bound to INADDR_ANY, thus accepting connections on any address the machine has, we need to use getsockname to identify which address the client connected to.
If you're using IO::Socket::INET, your code will look like this:
$server = IO::Socket::INET->new(LocalPort => $server_port,

                                Type      => SOCK_STREAM,

                                Proto     => 'tcp',

                                Listen    => 10)

    or die "Can't create server socket: $@\n";



while ($client = $server->accept( )) {

    $my_socket_address = $client->sockname( );

    ($port, $myaddr)   = sockaddr_in($my_socket_address);

    # ...

}
If you don't specify a local port to IO::Socket::INET->new, your socket will be bound to INADDR_ANY.
If you want your server to listen only for a particular virtual host, don't use INADDR_ANY. Instead, bind to a specific host address:
use Socket;



$port = 4269;                       # port to bind to

$host = "specific.host.com";        # virtual host to listen on



socket(Server, PF_INET, SOCK_STREAM, getprotobyname("tcp"))

    or die "socket: $!";

bind(Server, sockaddr_in($port, inet_aton($host)))

    or die "bind: $!";

while ($client_address = accept(Client, Server)) {

    # ...

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making a Daemon Server
Inhaltsvorschau
You want your program to run as a daemon.
If you are paranoid and running as root, chroot to a safe directory:
chroot("/var/daemon")

    or die "Couldn't chroot to /var/daemon: $!";
Fork once, and let the parent exit:
$pid = fork;

exit if $pid;

die "Couldn't fork: $!" unless defined($pid);
Close the three standard filehandles by reopening them to /dev/null:
for my $handle (*STDIN, *STDOUT, *STDERR) {

    open($handle, "+<", "/dev/null")

      || die "can't reopen $handle to /dev/null: $!";

}
Dissociate from the controlling terminal that started us and stop being part of whatever process group we had been a member of:
use POSIX;



POSIX::setsid( )

    or die "Can't start a new session: $!";
Trap fatal signals, setting a flag to indicate that we need to gracefully exit:
$time_to_die = 0;



sub signal_handler {

    $time_to_die = 1;

}



$SIG{INT} = $SIG{TERM} = $SIG{HUP} = \&signal_handler;

# trap or ignore $SIG{PIPE}
Wrap your actual server code in a loop:
until ($time_to_die) {

    # ...

}
Before POSIX, every operating system had its own way for a process to tell the operating system "I'm going it alone, please interfere with me as little as possible." POSIX makes it much cleaner. That said, you can still take advantage of any operating system-specific calls if you want to.
The chroot call is one of those non-POSIX calls. It makes a process change where it thinks the directory / is. For instance, after chroot "/var/daemon", if the process tries to read the file /etc/passwd, it will read /var/daemon/etc/passwd. A chrooted process needs copies of any files it will run made available inside its new /, of course. For instance, our chrooted process would need /var/daemon/bin/csh if it were going to glob files. For security reasons, only the superuser may
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Restarting a Server on Demand
Inhaltsvorschau
You want your server to shut down and restart when it receives a HUP signal, just like inetd or httpd.
Catch the SIGHUP. Within the handler, set harmless signal handlers, unblock signals, and re-execute your program:
use POSIX qw(:signal_h sigprocmask);



my $SELF = "/path/to/my/program";

my @ARGS = @ARGV;   # save for later



$SIG{HUP} = \&phoenix;



# your program



sub phoenix {

  # make signals harmless

  for my $nal (qw[ALRM CHLD HUP INT PIPE TERM]) {

    $SIG{$nal} = sub {  };

  }



  # reenable them

  my $s = POSIX::SigSet->new;

  my $t = POSIX::SigSet->new;

  sigprocmask(SIG_BLOCK, $s, $t);



  # and restart

  print "Restarting\n";

  exec $SELF => @ARGS;

  die "Couldn't exec $SELF => @ARGS\n";

}
It sounds simple ("when I get a HUP signal, restart"), but it's tricky. You must know your own program name, and that isn't easy to find out. You could use $0 or the FindBin module. For normal programs, this is fine, but critical system utilities must be more cautious, as there's no guarantee that $0 is valid. You can hardcode the filename and arguments into your program, as we do here. That's not necessarily the most convenient solution, however, so you might want to read the program and arguments from an external file (using the filesystem's protections to ensure it hasn't been tampered with).
Be sure to install your signal handler after you define $SELF and @ARGS; otherwise there's a race condition when a SIGHUP could run restart but you don't know the program to run. This would cause your program to die.
Signals are tricky beasts. When you exec to restart your program, the reborn version inherits a set of blocked signals from its parent. Inside a signal handler, that signal is blocked. So if your signal handler simply called exec right away, the new process would have SIGHUP blocked. You could only restart your program once!
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Managing Multiple Streams of Input
Inhaltsvorschau
The next input to your program could be coming from any number of filehandles, but you don't know which. You've tried using select( ), but the need to then do unbuffered I/O is more than you can deal with (and it's making your code very difficult to follow).
Use the IO::Multiplex module from CPAN. It calls a mux_input( ) function when input is received over a socket, and handles input and output buffering for you:
use IO::Multiplex;

$mux = IO::Multiplex->new( );

$mux->add($FH1);

$mux->add($FH2); # ... and so on for all the filehandles to manage

$mux->set_callback_object(_ _PACKAGE_ _);  # or an object

$mux->Loop( );



sub mux_input {

  my ($package, $mux, $fh, $input) = @_;

  # $input is ref to the filehandle's input buffer

  # ...

}
Although you can use select to manage input coming at you from multiple directions, there are many tricks and traps. For example, you can't use <> to read a line of input, because you never know whether the client has sent a full line yet (or will ever finish sending a line). You can't print to a socket without the risk of the output buffer being full and your process blocking. You need to use non-blocking I/O and maintain your own buffers, and, consequently, life rapidly becomes unmanageably complex.
Fortunately, we have a way of hiding complexity: modules. The IO::Multiplex module from CPAN takes care of non-blocking I/O and select for you. You tell it which filehandles to watch, and it tells you when new data arrives. You can even print to the filehandles, and it'll buffer and non-blockingly output it. An IO::Multiplex object manages a pool of filehandles.
Use the add method to tell IO::Multiplex to manage a filehandle. This enables non-blocking I/O and disables the stdio buffering. When IO::Multiplex receives data on one of its managed filehandles, it calls a mux_input method on an object or class of your choosing. Specify where
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: backsniff
Inhaltsvorschau
This program logs attempts to connect to ports. It uses the Sys::Syslog module to log the connection attempt through a Unix socket, with logging level LOG_NOTICE and facility LOG_DAEMON. It uses getsockname to find out what port was connected to and getpeername to find out what machine made the connection. It uses getservbyport to convert the local port number (e.g., 7) into a service name (e.g, "echo").
It produces entries in the system log file like this:

            May 25 15:50:22 coprolith sniffer: Connection from 207.46.131.141 to

            207.46.130.164:echo

         
Install it in the inetd.conf file with a line like this:

            echo    stream  tcp nowait  nobody /usr/scripts/snfsqrd sniffer

         
The program is shown in Example 17-9.
Example 17-9. backsniff
  #!/usr/bin/perl -w

  # backsniff - log attempts to connect to particular ports

  use strict;

  use Sys::Syslog qw(:DEFAULT setlogsock);

  use Socket;

  # identify my port and address

  my $sockname       = getsockname(STDIN)

                       or die "Couldn't identify myself: $!\n";

  my ($port, $iaddr) = sockaddr_in($sockname);

  my $my_address     = inet_ntoa($iaddr);

  # get a name for the service

  my $service = (getservbyport ($port, "tcp"))[0] || $port;

  # now identify remote address

  $sockname          = getpeername(STDIN)

                           or die "Couldn't identify other end: $!\n";

  ($port, $iaddr)    = sockaddr_in($sockname);

  my $ex_address     = inet_ntoa($iaddr);

  # and log the information

  setlogsock("unix");

  openlog("sniffer", "ndelay", "daemon");

  syslog("notice", "Connection from %s to %s:%s\n", $ex_address,

          $my_address, $service);

  closelog( );
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: fwdport
Inhaltsvorschau
Imagine that you're nestled deep inside a protective firewall. Somewhere in the outside world is a server that you'd like access to, but only processes on the firewall can reach it. You don't want to log into the firewall machine each time to access that service.
For example, this might arise if your company's ISP provides a news-reading service that seems to come from your main firewall machine, but rejects any NNTP connections from any other address. As the administrator of the firewall, you don't want dozens of people logging onto it, but you would like to let them read and post news from their own workstations.
The program in Example 17-10, fwdport, solves this problem in a generic fashion. You may run as many of these as you like, one per outside service. Sitting on the firewall, it can talk to both worlds. When someone wants to access the outside service, they contact this proxy, which connects on their behalf to the external service. To that outside service, the connection is coming from your firewall, so it lets it in. Then your proxy forks off twin processes, one only reading data from the external server and writing that data back to the internal client, the other only reading data from the internal client and writing that data back to the external server.
For example, you might invoke it this way:
% fwdport -s nntp -l fw.oursite.com -r news.bigorg.com
That means that the program will act as the server for the NNTP service, listening for local connections on the NNTP port on the host fw.oursite.com. When one comes in, it contacts news.bigorg.com (on the same port), and then ferries data between the remote server and local client.
Here's another example:
% fwdport -l myname:9191 -r news.bigorg.com:nntp
This time we listen for local connections on port 9191 of the host myname, and patch those connecting clients to the remote server news.bigorg.com on its NNTP port.
In a way, fwdport acts as both a server and a client. It's a server from the perspective of inside the firewall and a client from the perspective of the remote server outside. The program summarizes this chapter well because it demonstrates just about everything we've covered here. It has server activity, client activity, collecting of zombie children, forking and process management, plus much more thrown in.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 18: Internet Services
Inhaltsvorschau
This "telephone" has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us.
—Western Union internal memo, 1876
Correct use of sockets is only part of writing programs that communicate over the network. Once you have a way for two programs to talk, you still need a protocol for communication. This protocol lets each party know when to talk, and it precisely defines who is responsible for which part of the service.
Common Internet protocols are listed in Table 18-1.
Table 18-1: Common Internet protocols
Protocol
Meaning
Action
FTP
File Transfer Protocol
Copying files between remote machines
telnet
Remote login
rsh and rcp
Remote shell and Remote copy
Remote login and remote file copying
NNTP
Network News Transfer Protocol
Reading and posting USENET news
HTTP
Hypertext Transfer Protocol
Transferring documents on the Web
SMTP
Simple Mail Transfer Protocol
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Correct use of sockets is only part of writing programs that communicate over the network. Once you have a way for two programs to talk, you still need a protocol for communication. This protocol lets each party know when to talk, and it precisely defines who is responsible for which part of the service.
Common Internet protocols are listed in Table 18-1.
Table 18-1: Common Internet protocols
Protocol
Meaning
Action
FTP
File Transfer Protocol
Copying files between remote machines
telnet
Remote login
rsh and rcp
Remote shell and Remote copy
Remote login and remote file copying
NNTP
Network News Transfer Protocol
Reading and posting USENET news
HTTP
Hypertext Transfer Protocol
Transferring documents on the Web
SMTP
Simple Mail Transfer Protocol
Sending mail
POP3
Post Office Protocol
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Simple DNS Lookups
Inhaltsvorschau
You want to find the IP address of a host or turn an IP address into a name. Network servers do this to authenticate their clients, and clients do it when the user gives them a hostname. But Perl's socket library requires an IP address. Furthermore, many servers produce log files containing IP addresses, but hostnames are more useful to analysis software and humans.
If you have a name like www.perl.com, use gethostbyname if you want all addresses:
use Socket;



@addresses = gethostbyname($name)   or die "Can't resolve $name: $!\n";

@addresses = map { inet_ntoa($_) } @addresses[4 .. $#addresses];

# @addresses is a list of IP addresses ("208.201.239.48", "208.201.239.49")
Or use inet_aton if you need only the first address:
use Socket;



$address = inet_ntoa(inet_aton($name));

# $address is a single IP address "208.201.239.48"
If you have an IP address like "208.201.239.48", use:
use Socket;



$name = gethostbyaddr(inet_aton($address), AF_INET)

            or die "Can't resolve $address: $!\n";

# $name is the hostname ("www.perl.com")
This process is complicated because the functions are mere wrappers for C system calls, so you have to convert IP addresses from ASCII strings ("208.146.240.1") into C structures. The standard Socket module provides inet_aton to convert from ASCII to the packed numeric format and inet_ntoa to convert back:
use Socket;

$packed_address = inet_aton("208.146.140.1");

$ascii_address  = inet_ntoa($packed_address);
The gethostbyname function takes a string containing the hostname (or IP address). In scalar context, it returns the remote IP address suitable for passing to inet_ntoa (or undef on error). In list context, it returns a list of at least five elements (or an empty list on error). The returned list is:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Being an FTP Client
Inhaltsvorschau
You want to connect to an FTP server and transfer files. For example, you might want to automate the one-time transfer of many files or automatically mirror an entire section of an FTP server.
Use the Net::FTP module:
use Net::FTP;



$ftp = Net::FTP->new("ftp.host.com")    or die "Can't connect: $@\n";

$ftp->login($username, $password)       or die "Couldn't login\n";

$ftp->cwd($directory)                   or die "Couldn't change directory\n";

$ftp->get($filename)                    or die "Couldn't get $filename\n";

$ftp->put($filename)                    or die "Couldn't put $filename\n";
Using the Net::FTP module is a three-part process: connect to a server, identify and authenticate yourself, and transfer files. All interaction with the FTP server happens through method calls on a Net::FTP object. If an error occurs, methods return undef in scalar context or the empty list in list context.
The connection is established with the new constructor. If an error occurs, $@ is set to an error message and new returns undef. The first argument is the hostname of the FTP server, optionally followed by named options:
$ftp = Net::FTP->new("ftp.host.com",

                     Timeout => 30,

                     Debug   => 1)

    or die "Can't connect: $@\n";
The Timeout option gives the number of seconds all operations wait before giving up. Debug sets the debugging level (non-zero sends copies of all commands to STDERR). Firewall takes a string as an argument, specifying the machine acting as an FTP proxy. Port lets you select an alternate port number (the default is 21, the standard port for FTP). Finally, if the Passive option is set to true, all transfers are done passively (some firewalls and proxies require this). The Firewall and Passive options override the environment variables FTP_FIREWALL and FTP_PASSIVE.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sending Mail
Inhaltsvorschau
You want your program to send mail. Some programs monitor system resources like disk space and notify appropriate people when disk space becomes dangerously low. CGI script authors may not want programs to report errors like "the database is down" to the user, preferring instead to send mail to the database administrator about the problem.
Use the CPAN module Mail::Mailer:
use Mail::Mailer;



$mailer = Mail::Mailer->new("sendmail");

$mailer->open({ From    => $from_address,

                To      => $to_address,

                Subject => $subject,

              })

    or die "Can't open: $!\n";

print $mailer $body;

$mailer->close( );
Or use the sendmail program directly:
open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq")

                    or die "Can't fork for sendmail: $!\n";

print SENDMAIL <<"EOF";

From: User Originating Mail <me\@host>

To: Final Destination <you\@otherhost>

Subject: A relevant subject line



Body of the message goes here, in as many lines as you like.

EOF

close(SENDMAIL)     or warn "sendmail didn't close nicely";
You have three choices for sending mail from your program. You can call another program normally used to send mail, such as Mail or mailx; these are called MUAs or Mail User Agents. You can use a system-level mail program, such as sendmail; this is an MTA, or Mail Transport Agent. Or you can connect to an Simple Mail Transfer Protocol (SMTP) server. Unfortunately, there's no standard user-level mail program, sendmail doesn't have a standard location, and SMTP isn't particularly simple. The CPAN module Mail::Mailer hides these complexities from you.
Create a Mail::Mailer object with Mail::Mailer->new. If you don't pass any arguments, it uses the default mail sending method (probably a program like mail). Arguments to new select an alternate way to send the message. The first argument is the type of delivery method ("
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading and Posting Usenet News Messages
Inhaltsvorschau
You want to connect to a Usenet news server to read and post messages. Your program could send a periodic posting to a newsgroup, summarize a newsgroup, or identify first-time contributors in a newsgroup so you can send them a helpful welcome message.
Use the Net::NNTP module:
use Net::NNTP;



$server = Net::NNTP->new("news.host.dom")

    or die "Can't connect to news server: $@\n";

($narticles, $first, $last, $name) = $server->group( "misc.test" )

    or die "Can't select misc.test\n";

$headers  = $server->head($first)

    or die "Can't get headers from article $first in $name\n";

$bodytext = $server->body($first)

    or die "Can't get body from article $first in $name\n";

$article  = $server->article($first)

    or die "Can't get article $first from $name\n";



$server->postok( )

    or warn "Server didn't tell me I could post.\n";



$server->post( [ @lines ] )

    or die "Can't post: $!\n";
Usenet is a distributed bulletin board system. Servers exchange messages to ensure that each server gets all messages for the newsgroups it carries. Each server sets its own expiration criteria for how long messages stay on the server. Client newsreaders connect to their designated server (usually belonging to their company, ISP, or university) and can read existing postings and contribute new ones.
Each message (or article, as they're also known) has a set of headers and a body, separated by a blank line. Articles are identified in two ways: the message ID header and an article number within a newsgroup. An article's message ID is stored in the message itself and is guaranteed unique no matter which news server the article was read from. When an article references others, it does so by message ID. A message ID is a string like:
<0401@jpl-devvax.JPL.NASA.GOV>
An article can also be identified by its newsgroup and article number within the group. Each news server assigns its own article numbers, so they're valid only for the news server you got them from.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading Mail with POP3
Inhaltsvorschau
You want to fetch mail from a POP3 server. This lets you write a program to summarize your unread mail, move it from a remote server to a local mailbox, or toggle between Internet and local mail systems.
Use the Net::POP3 module:
$pop = Net::POP3->new($mail_server)

    or die "Can't open connection to $mail_server : $!\n";

defined ($pop->login($username, $password))

    or die "Can't authenticate: $!\n";

$messages = $pop->list

    or die "Can't get list of undeleted messages: $!\n";

foreach $msgid (keys %$messages) {

    $message = $pop->get($msgid);

    unless (defined $message) {

        warn "Couldn't fetch $msgid from server: $!\n";

        next;

    }

    # $message is a reference to an array of lines

    $pop->delete($msgid);

}
Traditionally, mail has been a three-party system: the MTA (Mail Transport Agent, a system program like sendmail) delivers mail to the spool, where it is read by the MUA (Mail User Agent, a program like mail). This dates from the days of big servers holding mail and users reading it through dumb terminals. As PCs and networks entered the picture, the need arose for MUAs like Pine to run on different machines than the one housing the spool. The Post Office Protocol (POP) implements efficient message listing, reading, and deleting over a TCP/IP session.
The Net::POP3 module is a POP client. That is, it lets your Perl program act as an MUA. The first step in using Net::POP3 is to create a new Net::POP3 object. Pass new the name of the POP3 server:
$pop = Net::POP3->new( "pop.myisp.com" )

    or die "Can't connect to pop.myisp.com: $!\n";
All Net::POP3 functions return undef or the empty list upon error, depending on the calling context. If an error occurs, the fickle $! variable just might contain a meaningful error message—but also might not.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Simulating Telnet from a Program
Inhaltsvorschau
You want to simulate a telnet connection from your program by logging into a remote machine, issuing commands, and reacting to what is returned. This has many applications, from automating tasks on machines you can telnet to but which don't support scripting or rsh, to simply testing whether a machine's Telnet daemon is still running properly.
Use the CPAN module Net::Telnet:
use Net::Telnet;



$t = Net::Telnet->new( Timeout => 10,

                       Prompt  => '/%/',

                       Host    => $hostname );



$t->login($username, $password);

@files = $t->cmd("ls");

$t->print("top");

(undef, $process_string) = $t->waitfor('/\d+ processes/');

$t->close;
Net::Telnet provides an object-oriented interface to the Telnet protocol. Create a connection with Net::Telnet->new, then interact with the remote machine using method calls on the resulting object.
Give the new method a list of named-parameter pairs, much like initializing a hash. We'll cover only a few possible parameters. The most important is Host, the machine you're telnetting to. The default host is localhost. To connect to a port other than the one Telnet normally uses, specify this in the Port option. Error handling is done through the function whose reference is specified in the Errmode parameter.
Another important option is Prompt. When you log in or run a command, Net::Telnet uses the Prompt pattern to determine when the login or command has completed. The default Prompt is:
/[\$%#>] $/
which matches the common Unix shell prompts. If the prompt on the remote machine doesn't match the default pattern, you have to specify your own. Remember to include slashes.
Timeout lets you control how long (in seconds) network operations wait before they give up. The default is 10 seconds.
An error or timeout in the Net::Telnet module raises an exception by default, which, if uncaught, prints a message to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Pinging a Machine
Inhaltsvorschau
You want to test whether a machine is alive. Network- and system-monitoring software often use the ping program as an indication of availability.
Use the standard Net::Ping module:
use Net::Ping;



$p = Net::Ping->new( )

    or die "Can't create new ping object: $!\n";

print "$host is alive" if $p->ping($host);

$p->close;
Testing whether a machine is up isn't as easy as it sounds. It's not only possible but also unpleasantly common for machines to respond to the ping command when they have no working services. It's better to think of ping as testing whether a machine is reachable, rather than whether the machine is doing its job. To check the latter, you must try its services (Telnet, FTP, web, NFS, etc.).
In the form shown in the Solution, Net::Ping attempts to connect to the TCP echo port (port number 7) on the remote machine. The ping method returns true if the connection could be made, false otherwise.
You can also ping using other protocols by passing the protocol name to new. Valid protocols are tcp, udp, syn, and icmp (all lowercase). A UDP ping attempts to connect to the echo port (port 7) on the remote machine, sends a datagram, and attempts to read the response. The machine is considered unreachable if it can't connect, if the reply datagram isn't received, or if the reply differs from the original datagram. An ICMP ping uses the ICMP protocol, just like the ping(8) command. On Unix machines, you must be the superuser to use the ICMP protocol:
# use TCP if we're not root, ICMP if we are

$pong = Net::Ping->new( $> ? "tcp" : "icmp" );



(defined $pong)

    or die "Couldn't create Net::Ping object: $!\n";



if ($pong->ping("kingkong.com")) {

    print "The giant ape lives!\n";

} else {

    print "All hail mighty Gamera, friend of children!\n";

}
A SYN ping is asynchronous: you first send out many pings and then receive responses by repeatedly invoking the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Accessing an LDAP Server
Inhaltsvorschau
You want to fetch or maintain information from a Lightweight Directory Access Protocol (LDAP) server. For example, you have a list of email addresses in your company, and you want correct names for those people.
Use the Net::LDAP module from CPAN. For example, to search, use:
use Net::LDAP;



$ldap = Net::LDAP->new("ldap.example.com") or die $@;

$ldap->bind( );

$mesg = $ldap->search(base => $base_dn,

                      filter => $FILTER);



$mesg->code( ) && die $mesg->error;



foreach $result ($mesg->all_entries) {

    # do something with $result

}

$ldap->unbind( );
The Net::LDAP module manages an LDAP session. It is a pure Perl module, so it doesn't require a C compiler to install. To use it effectively, though, you'll need to know a little about LDAP in general and the query syntax in particular. If you're new to LDAP, you might want to read the articles at http://www.onlamp.com/topics/apache/ldap.
The four steps to working with an LDAP server are connecting, authenticating, interacting, and logging off. Interacting includes searching, adding, deleting, and altering records.
The connect method establishes a connection to the LDAP server and is immediately followed by a call to the bind method. If you give no argument to bind, you log into the LDAP server anonymously. You can give a fully qualified Distinguished Name (DN) and password to authenticate yourself:
$ldap->bind("cn=directory manager,ou=gurus,dc=oreilly,dc=com",

            password => "timtoady") or die $@;
This sends the username and password unencrypted over the wire. For encrypted access, pass a sasl parameter to bind to use an Authen::SASL object to authenticate with.
The
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sending Attachments in Mail
Inhaltsvorschau
You want to send mail that includes attachments; for example, you want to mail a PDF document.
Use the MIME::Lite module from CPAN. First, create a MIME::Lite object representing the multipart message:
use MIME::Lite;



$msg = MIME::Lite->new(From    => 'sender@example.com',

                       To      => 'recipient@example.com',

                       Subject => 'My photo for the brochure',

                       Type    => 'multipart/mixed');
Then, add content through the attach method:
$msg->attach(Type        => 'image/jpeg',

             Path        => '/Users/gnat/Photoshopped/nat.jpg',

             Filename    => 'gnat-face.jpg');



$msg->attach(Type        => 'TEXT',

             Data        => 'I hope you can use this!');
Finally, send the message, optionally specifying how to send it:
$msg->send( );            # default is to use sendmail(1)

# alternatively

$msg->send('smtp', 'mailserver.example.com');
The MIME::Lite module creates and sends mail with MIME-encoded attachments. MIME stands for Multimedia Internet Mail Extensions, and is the standard way of attaching files and documents. It can't, however, extract attachments from mail messages—for that you need to read Recipe 18.10.
When creating and adding to a MIME::Lite object, pass parameters as a list of named parameter pairs. The pair conveys both mail headers (e.g., From, To, Subject) and those specific to MIME::Lite. In general, mail headers should be given with a trailing colon:
$msg = MIME::Lite->new('X-Song-Playing:' => 'Natchez Trace');
However, MIME::Lite accepts the headers in Table 18-1 without a trailing colon. * indicates a wildcard, so Content-* includes Content-Type and Content-ID but not Dis-Content.
Table 18-2:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extracting Attachments from Mail
Inhaltsvorschau
You have one or more mail messages with MIME attachments, and you want to process these messages with a Perl program to extract files or otherwise manipulate attachments.
Use the MIME-Tools bundle from CPAN:
use MIME::Parser;



$parser = MIME::Parser->new( );

$parser->output_to_core(1);      # don't write attachments to disk



$message  = $parser->parse_data($MESSAGE);    # die( )s if can't parse

# OR

$message  = $parser->parse($FILEHANDLE);      # die( )s if can't parse



$head     = $message->head( );                 # object--see docs

$preamble = $message->preamble;               # ref to array of lines

$epilogue = $message->epilogue;               # ref to array of lines



$num_parts = $message->parts;

for (my $i=0; $i < $num_parts; $i++) {

  my $part         = $message->parts($i);

  my $content_type = $part->mime_type;

  my $body         = $part->as_string;

}
Formally, a MIME message has only two parts: the head (containing headers such as From and Subject) and the body (containing the message, rather than its metadata). The body, however, has three parts: the preamble (text before the first attachment), a series of parts (the attachments), and the epilogue (text after the last attachment). This is shown in Figure 18-1.
Figure 18-1: Composition of a MIME message
In the Solution, we disable the default behavior of MIME::Parser that writes the attachments to disk. Doing so increases memory consumption because now the decoded attachments must be stored in memory, but prevents the need to clean up temporary files and directories once the attachments are no longer needed.
To write attachments to a file, replace the call to output_to_core with calls to methods that specify the directory in which to store the attachments and what to name the files. The
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing an XML-RPC Server
Inhaltsvorschau
You want to write a server for an XML-RPC web service.
Use the SOAP-Lite distribution from CPAN, which supports XML-RPC. Your server can be either standalone:
use XMLRPC::Transport::HTTP;



$daemon = XMLRPC::Transport::HTTP::Daemon

          ->new(LocalPort => $PORT)

          ->dispatch_to('ClassName')

          ->handle( );
or a CGI script:
use XMLRPC::Transport::HTTP;



$daemon = XMLRPC::Transport::HTTP::CGI

          ->dispatch_to('ClassName')

          ->handle( );
In both cases, incoming methods will be invoked on the class named in the dispatch_to method (it will be required if it is not already loaded):
package ClassName;



sub handler {

  my ($class, $arg_hash_ref) = @_;

  # ...

}
The SOAP-Lite toolkit's modules take care of translating between Perl's native data structures and the XML representation of those values. However, it's up to you how the server decides which method to invoke when a request arrives. This process of matching an XML-RPC request to a Perl function is known as dispatching.
It looks strange to see all those chained method invocation in the Solution. When used to set a value, XMLRPC::Lite methods return their invocation, which lets you chain methods rather than repeat $daemon over and over again:
$daemon = XMLRPC::Transport::HTTP::Daemon;

$daemon->new(LocalPort => $PORT);

$daemon->dispatch_to('ClassName');

$daemon->handle( );
The new constructor takes IO::Socket::INET's constructor parameters as well, so you can say ReuseAddr => 1, for example.
When you give the dispatch_to method a class name argument (as in the Solution), the XML-RPC server looks for methods in that class. If the server in the Solution receives a request for the ClassName.hasBeen method (XML-RPC methods are typically in IntraCaps), it invokes the ClassName->hasBeen
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing an XML-RPC Client
Inhaltsvorschau
You want to write a client for an XML-RPC service.
Use the XMLRPC::Lite module from the SOAP-Lite distribution:
use XMLRPC::Lite;



$server = XMLRPC::Lite->proxy("http://server.example.com/path");

$result = $server->call('ClassName.handler', @ARGS);

die $call->faultstring if $call->fault;

print $call->result;
A single XML-RPC server may run many services, differentiated by their method name: ClassName.handler corresponds to ClassName->handler on the server side; A.B.method corresponds to A::B->method; and a call to handler corresponds to main->handler.
The proxy is the actual URL of the server. If you're using a CGI server, the proxy method looks something like this:
$server->proxy("http://server.example.com/path/to/server.cgi")
There are three ways to invoke remote methods. The first way is to use the call method on your XMLRPC::Lite object. The first argument to call is the remote method name, and the remaining arguments are parameters for the remote method:
$returned = $server

         -> call("getRecordByNumber", 12, { format => "CSV" })

         -> result;
The second way to invoke a remote method is to call that method on the XMLRPC::Lite object. This works only when the remote method name isn't the same as a method provided by the XMLRPC::Lite object. For example:
$returned = $server

         -> getRecordByNumber(12, { format => "CSV" })

         -> result;
The last way to invoke a remote method is with autodispatch, turning unrequired function calls and method invocations in your Perl program into XML-RPC requests. Enable autodispatch with:
use XMLRPC::Lite +autodispatch =>

  proxy => "http://server.example.com/path";



$returned = getRecordByNumber(12, { format => "CSV" });
A critical difference between autodispatch and the other styles is that autodispatch automatically decodes the result into a Perl value for you. When you use an XMLRPC::Lite object, you must explicitly invoke the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a SOAP Server
Inhaltsvorschau
You want to write a web service where SOAP is the transport.
Use the SOAP-Lite distribution from CPAN. Your server can be either standalone:
use SOAP::Transport::HTTP;



$daemon = SOAP::Transport::HTTP::Daemon

          ->new(LocalPort => $PORT)

          ->dispatch_to('ClassName')

          ->handle( );
or a CGI script:
use SOAP::Transport::HTTP;



$daemon = SOAP::Transport::HTTP::CGI

          ->dispatch_to('ClassName')

          ->handle( );
In both cases, the only methods that SOAP clients are permitted to invoke are those in the classes named in the argument to dispatch_to (those classes will be required if not already loaded):
package ClassName;



sub handler {

  my ($class, $arg_hash_ref) = @_;

  # ...

}
The SOAP-Lite toolkit contains SOAP and XML-RPC modules. Writing a SOAP service is similar to writing an XML-RPC service. Control method dispatch in SOAP as in XML-RPC. See Recipe 18.11 for details.
Recipe 18.14; Recipe 18.11
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a SOAP Client
Inhaltsvorschau
You want to write a client for a SOAP web service.
Use the SOAP::Lite module from the SOAP-Lite distribution:
use SOAP::Lite;



$server = SOAP::Lite

       -> uri("http://localhost/Namespace")

       -> proxy("http://server.example.com/path");

$result = $server->call('ClassName.handler', @ARGS);

die $call->faultstring if $call->fault;

print $call->result;
A single SOAP server may offer remote access to the methods of many classes. A client identifies the class upon which it wishes to invoke methods with the uri parameter. The hostname in the argument is irrelevant; only the path portion (the class name) matters. For example, these two URIs are equivalent:
http://modacrylic.clue.com/GimpyMod

http://weenies.mit.edu/GimpyMod
As with XML-RPC, the proxy argument is the server's URL. For example, if your SOAP server is implemented as a CGI script, the proxy call looks like this:
$server->proxy("http://server.example.com/path/to/server.cgi");
Invoke remote methods as you do with XML-RPC, either with the call method:
$returned = $server

         -> call("getRecordByNumber", 12, { format => "CSV" })

         -> result;
or by invoking the method on a SOAP::Lite object directly:
$returned = $server

         -> getRecordByNumber(12, { format => "CSV" })

         -> result;
or using autodispatch:
use SOAP::Lite +autodispatch =>

  uri   => "http://identifier.example.com/Namespace",

  proxy => "http://server.example.com/path";



$returned = getRecordByNumber(12, { format => "CSV" });
You can also use this with OO syntax:
$returned = Some::Remote::Module->getRecordByNumber(12, { format => "CSV" });
There's a lot more to SOAP than we can explain here. The books Programming Web Services with SOAP
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: rfrm
Inhaltsvorschau
This program fetches a list of messages waiting on a POP3 server and summarizes that list:
# ./rfrm

Nathan Torkington    Re: YAPC

            Rob Brown            Re: Net::Ping syn round trip time

            Rael Dornfest        Re: Book Proposal - Blosxom in a Nutshell

            spam@example.com     Extend your ping times 633%!!!!

         
Tell the program which POP3 server to contact and the username and password to authenticate with using a ~/.rfrmrc file like this:
SERVER=pop3.example.com

USER=gnat

PASS=I(heart)Perl
The program verifies that your .rfrmrc file isn't readable or writable by anyone but you, and stops if it is.
The program is shown in Example 18-4.
Example 18-4. rfrm
#!/usr/bin/perl -w

# rfrm - get a list of mail messages waiting on a pop server



use Net::POP3;

use strict;



my ($Pop_host, $Pop_user, $Pop_pass) = read_conf( ) or usage( );



my $pop = Net::POP3->new($Pop_host)

  or die "Can't connect to $Pop_host: $!\n";

defined ($pop->login($Pop_user, $Pop_pass))

  or die "Can't authenticate\n";



my $messages = $pop->list

  or die "Can't get a list of messages\n";



foreach my $msgid (sort { $a <=> $b } keys %$messages) {

  my ($msg, $subject, $sender, $from);



  $msg = $pop->top($msgid, 0); # returns ref to array

  $msg = join "\n", @$msg;     # now it's one big string



  # extract From and Subject lines, and boil From down

  $subject = $sender = '';

  if ($msg =~ /^Subject: (.*)/m) { $subject = $1 }

  if ($msg =~ /^From: (.*)/m)    { $sender  = $1 }

  ($from = $sender) =~ s{<.*>}{  };

  if ($from =~ m{\(.*\)}) { $from = $1 }

  $from ||= $sender;



  # print boiled down summary of this message

  printf("%-20.20s %-58.58s\n", $from, $subject);

}



sub usage {

  die <<EOF ;

usage: rfrm

Configure with ~/.rfrmrc thus:

  SERVER=pop.mydomain.com

  USER=myusername

  PASS=mypassword

EOF

}



sub read_conf {

  my ($server, $user, $pass, @stat);



  open(FH, "< $ENV{HOME}/.rfrmrc") or return;



  # paranoia check

  @stat = stat(FH) or die "Can't stat ~/.rfrmrc: $!\n";

  if ($stat[2] & 177) {

    die "~/.rfrmrc should be mode 600 or tighter\n";

  }



  # read config file  

  while (<FH>) {

    if (/SERVER=(.*)/) { $server = $1 }

    if (/USER=(.*)/)   { $user   = $1 }

    if (/PASS=(.*)/)   { $pass   = $1 }

  }

  close FH;



  # must have something for every value

  return unless $server && $user && $pass;



  return ($server, $user, $pass);

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: expn and vrfy
Inhaltsvorschau
This program uses Net::SMTP to talk to an SMTP server and uses the EXPN and VRFY commands to figure out whether an address is going to work. It isn't perfect, because it relies on the remote SMTP giving meaningful information with the EXPN and VRFY commands—they are common ways for spammers to harvest email addresses, and so many servers have disabled these options. It uses Net::DNS if available, but can also work without it.
This program inspects $0 (the program name) to see how it was called. If run as expn, it uses the EXPN command; if called as vrfy, it uses the VRFY command. Use links to install it with two names (on a system without links, simply copy the program code in Example 18-5):
% cat > expn

#!/usr/bin/perl -w

...

^D

% ln expn vrfy
When given an email address as an argument, the program reports what the mail server says when you try to EXPN or VRFY the address. If you have Net::DNS installed, it tries all hosts listed as mail exchangers in the DNS entry for the address.
Here's what it looks like without Net::DNS:
% expn gnat@frii.com



               Expanding gnat at frii.com (gnat@frii.com):

calisto.frii.com Hello coprolith.frii.com [207.46.130.14],

    pleased to meet you

gnat@mail.frii.com

            

         
And here's the same address with Net::DNS installed:
% expn gnat@frii.com



               Expanding gnat at mail.frii.net (gnat@frii.com):

deimos.frii.com Hello coprolith.frii.com [207.46.130.14],

    pleased to meet you

Nathan Torkington <gnat@deimos.frii.com>



Expanding gnat at mx1.frii.net (gnat@frii.com):

phobos.frii.com Hello coprolith.frii.com [207.46.130.14],

    pleased to meet you

gnat@mail.frii.com



Expanding gnat at mx2.frii.net (gnat@frii.com):

europa.frii.com Hello coprolith.frii.com [207.46.130.14],

    pleased to meet you

gnat@mail.frii.com



Expanding gnat at mx3.frii.net (gnat@frii.com):

ns2.winterlan.com Hello coprolith.frii.com [207.46.130.14],

    pleased to meet you

550 gnat... User unknown
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 19: CGI Programming
Inhaltsvorschau
A successful tool is one that was used to do something undreamt of by its author.
—Stephen C. Johnson
Changes in the environment or the availability of food can make certain species more successful than others at finding food or avoiding predators. Many scientists believe a comet struck the Earth millions of years ago, throwing an enormous cloud of dust into the atmosphere. Subsequent radical changes to the environment proved too much for some organisms, say dinosaurs, and hastened their extinction. Other creatures, such as mammals, found new food supplies and freshly exposed habitats to compete in.
Much as the comet altered the environment for prehistoric species, the Web has altered the environment for modern programming languages. It's opened up new vistas, and although some languages have found themselves eminently unsuited to this new world order, Perl has positively thrived. Because of its strong background in text processing and system glue, Perl has readily adapted itself to the task of providing information using text-based protocols.
The Web is driven by plain text. Web servers and web browsers communicate using a text protocol called HTTP, Hypertext Transfer Protocol. Many of the documents exchanged are encoded in a text markup system called HTML, Hypertext Markup Language. This grounding in text is the source of much of the Web's flexibility, power, and success. The only notable exception to the predominance of plain text is the Secure Socket Layer (SSL) protocol that encrypts other protocols like HTTP into binary data that snoopers can't decode.
Web pages are identified using the Uniform Resource Locator (URL) naming scheme. URLs look like this:
http://www.perl.com/CPAN/

http://www.perl.com:8001/bad/mojo.html

ftp://gatekeeper.dec.com/pub/misc/netlib.tar.Z

ftp://anonymous@myplace:gatekeeper.dec.com/pub/misc/netlib.tar.Z

file:///etc/motd
The first part (
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Changes in the environment or the availability of food can make certain species more successful than others at finding food or avoiding predators. Many scientists believe a comet struck the Earth millions of years ago, throwing an enormous cloud of dust into the atmosphere. Subsequent radical changes to the environment proved too much for some organisms, say dinosaurs, and hastened their extinction. Other creatures, such as mammals, found new food supplies and freshly exposed habitats to compete in.
Much as the comet altered the environment for prehistoric species, the Web has altered the environment for modern programming languages. It's opened up new vistas, and although some languages have found themselves eminently unsuited to this new world order, Perl has positively thrived. Because of its strong background in text processing and system glue, Perl has readily adapted itself to the task of providing information using text-based protocols.
The Web is driven by plain text. Web servers and web browsers communicate using a text protocol called HTTP, Hypertext Transfer Protocol. Many of the documents exchanged are encoded in a text markup system called HTML, Hypertext Markup Language. This grounding in text is the source of much of the Web's flexibility, power, and success. The only notable exception to the predominance of plain text is the Secure Socket Layer (SSL) protocol that encrypts other protocols like HTTP into binary data that snoopers can't decode.
Web pages are identified using the Uniform Resource Locator (URL) naming scheme. URLs look like this:
http://www.perl.com/CPAN/

http://www.perl.com:8001/bad/mojo.html

ftp://gatekeeper.dec.com/pub/misc/netlib.tar.Z

ftp://anonymous@myplace:gatekeeper.dec.com/pub/misc/netlib.tar.Z

file:///etc/motd
The first part (http, ftp, file) is called the scheme, which identifies how the file is retrieved. The next part (://) means a hostname will follow, whose interpretation depends on the scheme. After the hostname comes the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a CGI Script
Inhaltsvorschau
You want to write a CGI script to process the contents of an HTML form. In particular, you want to access the form contents and produce valid output in return.
A CGI script is a server-side program launched by a web server to generate dynamic content. It receives encoded information from the remote client (user's browser) via STDIN and environment variables, and it must produce a valid HTTP header and body on STDOUT. The standard CGI module, shown in Example 19-1, painlessly manages input and output encoding.
Example 19-1. hiweb
  #!/usr/bin/perl -w

  # hiweb - load CGI module to decode information given by web server

  use strict;

  

  use CGI qw(:standard escapeHTML);

  

  # get a parameter from a form

  my $value = param('PARAM_NAME');

  

  # output a document

  print header( ), start_html("Howdy there!"),

        p("You typed: ", tt(escapeHTML($value))),

        end_html( );
CGI is just a protocol, a formal agreement between a web server and a separate program. The server encodes the client's form input data, and the CGI program decodes the form and generates output. The protocol says nothing regarding which language the program must be written in; programs and scripts that obey the CGI protocol have been written in C, shell, Rexx, C++, VMS DCL, Smalltalk, Tcl, Python, and of course Perl.
The full CGI specification lays out which environment variables hold which data (such as form input parameters) and how it's all encoded. In theory, it should be easy to follow the protocol to decode the input, but in practice, it is surprisingly tricky to get right. That's why we strongly recommend using the CGI module. The hard work of handling the CGI requirements correctly and conveniently has already been done, freeing you to write the core of your program without getting bogged down in network protocols.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Redirecting Error Messages
Inhaltsvorschau
You're having trouble tracking down your script's warnings and error messages, or your script's STDERR output is confusing your server.
Use the CGI::Carp module from the standard Perl distribution to prefix each line on STDERR with the program name and current date. You can also send warnings and errors to a file or the browser if you wish.
Tracking down error messages from CGI scripts is notoriously annoying. Even if you manage to find the server error log, you still can't determine which message came from which script, or at what time. Some unfriendly web servers even abort the script if it has the audacity to emit anything out its STDERR before the Content-Type header is generated on STDOUT, so warnings can get you into trouble.
Enter the CGI::Carp module. It replaces warn and die—plus the normal Carp module's carp, croak, cluck, and confess functions—with more verbose and safer versions. It still sends them to the normal server error log.
use CGI::Carp;

warn "This is a complaint";

die "But this one is serious";
The following use of CGI::Carp also redirects errors to a file of your choice, placed in a BEGIN block to catch compile-time warnings as well:
BEGIN {

    use CGI::Carp qw(carpout);

    open(LOG, ">>/var/local/cgi-logs/mycgi-log")

        or die "Unable to append to mycgi-log: $!\n";

    carpout(*LOG);

}
You can even arrange for fatal errors to show up at the client browser, which is nice for your own debugging but might confuse the end user.
use CGI::Carp qw(fatalsToBrowser);

die "Bad error here";
Even if the error happens before you get the HTTP header out, the module will try to detect this and avoid the dreaded 500 Server Error. Normal warnings still go to the server error log (or wherever you've sent them with carpout) with the program name and date stamp prepended.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Fixing a 500 Server Error
Inhaltsvorschau
Your CGI script gives you a 500 Server Error.
Follow the checklist given in the discussion.
This checklist is aimed at a Unix audience, but the general principles embodied in the questions apply to all systems.

Section 19.3.3.1: Make sure the web server can run the script

Check ownership and permissions with ls -l. The appropriate read and execute bits must be set on the script before the web server can run it. The script should be readable and executable by everyone (or at least by whomever the server runs scripts as). Use chmod 0755 scriptname if it's owned by you, or otherwise chmod 0555 scriptname if owned by the designated anonymous web user, assuming you are running as that user or the superuser. All directories in the path must also have their execute bit set (most FTP clients support changing protections on uploaded files if you don't have shell access to your server).
Make sure the script can be identified as a script by the web server. Most web servers have a system-wide cgi-bin, and all files in that directory will be run as scripts. Some servers identify a CGI script as any file whose name ends in a particular extension, such as .cgi or .plx. Some servers have options to permit access via the GET method alone, not through the POST method that your form likely uses. Consult your web server documentation, configuration files, webmaster, and (if all else fails) technical support.
If you're running on Unix, do you have the right path to the Perl executable on the #! line? The #! line must be the first line in the script; you can't even have blank lines before the #! line. Some operating systems have ridiculously short limits on the number of characters that can be in this line, so you may need to make a link (e.g., from /home/richh/perl to /opt/installed/third-party/software/perl-5.004/bin/perl
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a Safe CGI Program
Inhaltsvorschau
Because CGI programs allow external users to run programs on systems they would not otherwise have access on, all CGI programs represent a potential security risk. You want to minimize your exposure.
  • Use taint mode (the -T switch on the #! line).
  • Don't blindly untaint data. (See the Discussion.)
  • Sanity-check everything, including all form widget return values, even hidden widgets or values generated by JavaScript code. Many people naïvely assume that just because they tell JavaScript to check the form's values before the form is submitted, the form's values will actually be checked. Not at all! The user can trivially circumvent this by disabling JavaScript in their browser, by downloading the form and altering the JavaScript, or quit by talking HTTP without a browser using any of the examples in Chapter 20 .
  • Check return conditions from system calls.
  • Be conscious of race conditions (described in the Discussion).
  • Run with use warnings and use strict to make sure Perl isn't assuming things incorrectly.
  • Don't run anything setuid unless you absolutely must. If you must, think about running setgid instead if you can. Certainly avoid setuid root at all costs. If you must run setuid or setgid, use a wrapper unless Perl is convinced your system has secure setuid scripts and you know what this means.
  • Always encode login passwords, credit card numbers, social security numbers, and anything else you'd not care to read pasted across the front page of your local newspaper. Use a secure protocol like SSL when dealing with such data. Ensuring that a CGI only ever runs as HTTPS can be as simple as:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Executing Commands Without Shell Escapes
Inhaltsvorschau
You need to use a user's input as part of a command, but you don't want to allow the user to trick the shell into running other commands or looking at other files. If you just blindly call the system function or backticks on a single string containing a command line derived from untested user input, the shell might be used to run the command. This would be unsafe.
Unlike its single-argument version, the list form of the system function is safe from shell escapes. When the command's arguments involve user input from a form, never use this:
system("command $input @files");            # UNSAFE
Write it this way instead:
system("command", $input, @files);          # safer
Because Perl was designed as a glue language, it's easy to use it to call other programs—too easy, in some cases.
If you're merely trying to run a shell command but don't need to capture its output, it's easy enough to call system using its multiple argument form. But what happens if you're using the command in backticks or as part of a piped open? Now you have a real problem, because those don't permit the multiple argument form that system does. The solution (prior to v5.8; see later in this Discussion) is to manually fork and exec the child processes on your own. It's more work, but at least stray shell escapes won't be ruining your day.
It's safe to use backticks in a CGI script only if the arguments you give the program are internally generated, as in:
chomp($now = `date`);
But if the command within the backticks contains user-supplied input, perhaps like this:
@output = `grep $input @files`;
you have to be much more careful.
die "cannot fork: $!" unless defined ($pid = open(SAFE_KID, "-|"));

if ($pid =  = 0) {

    exec('grep', $input, @files) or die "can't exec grep: $!";

} else {

    @output = <SAFE_KID>;

    close SAFE_KID;                 # $? contains status

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Formatting Lists and Tables with HTML Shortcuts
Inhaltsvorschau
You have several lists and tables to generate and wish they were easier to output.
The CGI module provides HTML helper functions that, when passed array references, apply themselves to each element of the referenced array:
print ol( li([ qw(red blue green)]) );

<OL><LI>red</LI><LI>blue</LI><LI>green</LI></OL>

@names = qw(Larry Moe Curly);

print ul( li({ -TYPE => "disc" }, \@names) );

<UL><LI TYPE="disc">Larry</LI> <LI TYPE="disc">Moe</LI>

               <LI TYPE="disc">Curly</LI></UL>

            
The HTML-generating functions in CGI.pm can make it easy to generate lists and tables. Passed a simple string, these functions produce HTML for that string. But passed an array reference, they work on all strings in that array.
print li("alpha");

<LI>alpha</LI>

print li( [ "alpha", "omega"] ); 

<LI>alpha</LI> <LI>omega</LI>

            
The shortcut functions for lists are loaded when you use the :standard import tag, but you need to ask for :html3 explicitly to get helper functions for tables. There's also a conflict between the <TR> tag, which would normally make a tr( ) function, and Perl's built-in tr/// operator. Therefore, to make a table row, use the Tr( ) function.
This example generates an HTML table starting with a hash of arrays. The keys will be the row headers, and the array of values are the columns.
use CGI qw(:standard :html3);



%hash = (

    "Wisconsin"  => [ "Superior", "Lake Geneva", "Madison" ],

    "Colorado"   => [ "Denver", "Fort Collins", "Boulder" ],

    "Texas"      => [ "Plano", "Austin", "Fort Stockton" ],

    "California" => [ "Sebastopol", "Santa Rosa", "Berkeley" ],

);



$\ = "\n";



print "<TABLE><CAPTION>Cities I Have Known</CAPTION>";

print Tr(th [qw(State Cities)]);

for $k (sort keys %hash) {

    print Tr(th($k), td( [ sort @{$hash{$k}} ] ));

}

print "</TABLE>";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Redirecting to a Different Location
Inhaltsvorschau
You need to tell the client's browser to look elsewhere for a page.
Instead of a normal header, just issue a location redirect and exit. Don't forget the extra newline at the end of the header.
$url = "http://www.perl.com/CPAN/";

print "Location: $url\n\n";

exit;
Sometimes your CGI program doesn't need to generate the document on its own. It only needs to tell the client at the other end to fetch a different document instead. In that case, the HTTP header needs to include this directive as a Location line followed by the URL you want to send them to. Make sure to use an absolute URL, not a relative one.
The direct and literal solution given in the Solution is usually sufficient, but if you already have the CGI module loaded, use the redirect function. You might use this code for building and setting a cookie, as shown in Example 19-4.
Example 19-4. oreobounce
  #!/usr/bin/perl -w

  # oreobounce - set a cookie and redirect the browser

  use CGI qw(:cgi);

  use strict;

  

  my $oreo = cookie( -NAME    => 'filling',

                     -VALUE   => "vanilla crème",

                     -EXPIRES => '+3M',    # M for month, m for minute

                     -DOMAIN  => '.perl.com');

  

  my $whither  = "http://somewhere.perl.com/nonesuch.html";

  

  print redirect( -URL     => $whither,

                  -COOKIE  => $oreo);
That would produce:

               Status: 302 Moved Temporarily

               Set-Cookie: filling=vanilla%20cr%E4me; domain=.perl.com;

               expires=Tue, 21-Jul-1998 11:58:55 GMT

               Date: Tue, 21 Apr 1998 11:55:55 GMT

               Location: http://somewhere.perl.com/nonesuch.html

               Content-Type: text/html

               <<blank line here>>

            
Example 19-5 is a complete program that looks at the client browser name and redirects it to a page in Eric Raymond's edition of the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Debugging the Raw HTTP Exchange
Inhaltsvorschau
Your CGI script is misbehaving strangely with your browser, and you suspect something in the HTTP header is missing. You want to find out exactly what your browser is sending to the server in the HTTP header.
Create your own fake web server and point your browser at it, as shown in Example 19-6.
Example 19-6. dummyhttpd
  #!/usr/bin/perl -w

  # dummyhttpd - start an HTTP daemon and print what the client sends

  

  use strict;

  use LWP 5.32;  # minimal good version

  use HTTP::Daemon;

  

  my $server = HTTP::Daemon->new(Timeout => 60, LocalPort => 8989);

  print "Please contact me at: <URL:", $server->url, ">\n";

  

  while (my $client = $server->accept) {

    CONNECTION:

      while (my $answer = $client->get_request) {

          print $answer->as_string;

          $client->autoflush;

    RESPONSE:

          while (<STDIN>) {

              last RESPONSE   if $_ eq ".\n";

              last CONNECTION if $_ eq "..\n";

              print $client $_;

          }

          print "\nEOF\n";

      }

      print "CLOSE: ", $client->reason, "\n";

      $client->close;

      undef $client;

  }
It's hard to keep track of which versions of all browsers still have which bugs. The fake server program can save you days of head scratching, because sometimes a misbehaving browser doesn't send the server the right thing. Historically, we have seen aberrant browsers lose their cookies, mis-escape a URL, send the wrong status line, and other blunders even less obvious.
The fake server is best run on the same machine as the real server. That way your browser still sends any cookies destined for that domain. Then instead of pointing your browser at:
http://somewhere.com/cgi-bin/whatever
use the alternate port given in the new constructor in the Solution. You don't need to be the superuser to run the server if you use the alternate port.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Managing Cookies
Inhaltsvorschau
You want to get or set a cookie to help manage sessions or user preferences.
Using CGI.pm, retrieve an existing cookie like this:
$preference_value = cookie("preference name");
To prepare a cookie, do this:
$packed_cookie = cookie( -NAME    => "preference name",

                         -VALUE   => "whatever you'd like",

                         -EXPIRES => "+2y");
To save a cookie back to the client browser, you must include it in the HTTP header, probably using either the header or redirect functions:
print header(-COOKIE => $packed_cookie);
Cookies store information on the client's browser. If you're using Netscape under Unix, you can inspect your own ~/.netscape/cookies file, although this doesn't show your current cookies. It holds only those cookies present when you last exited the browser. Think of them as per-application user preferences or a way to help with transactions. Benefits of cookies are that they can be shared between several different programs on your server, and they persist even across browser invocations.
However, cookies can be used for dubious purposes like traffic analysis and click tracing. This makes some folks very nervous about who is collecting their personal data and what use will be made of their page viewing habits. Cookies don't travel well, either. If you use a browser at home or in someone else's office, it won't have the cookies from the browser at your office. For this reason, do not expect every browser to accept the cookies you give it. As if that weren't bad enough, browsers can't guarantee they'll keep cookies around forever. Here's an excerpt from the HTTP State Management Mechanism RFC (number 2109):
Because user agents have finite space in which to store cookies, they may also discard older cookies to make space for newer ones, using, for example, a least-recently-used algorithm, along with constraints on the maximum number of cookies that each origin server may set.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating Sticky Widgets
Inhaltsvorschau
You want form fields to default to the last values submitted. For instance, you want a search form like Google (http://www.google.com/) where the keywords you searched for appear in the search dialog above the results.
Use CGI.pm's HTML shortcuts to create your form, which automatically provides previous values as defaults:
print textfield("SEARCH");          # previous SEARCH value is the default
Example 19-8 is a simple script for producing the list of users currently logged in.
Example 19-8. who.cgi
  #!/usr/bin/perl -wT

  # who.cgi - run who(1) on a user and format the results nicely

  

  $ENV{IFS}='';

  $ENV{PATH}='/bin:/usr/bin';

  

  use CGI qw(:standard);

  

  # print search form

  print header( ), start_html("Query Users"), h1("Search");

  print start_form( ), p("Which user?", textfield("WHO")); submit( ), end_form( );

  

  # print results of the query if we have someone to look for

  $name = param("WHO");

  if ($name) {

      print h1("Results");

      $html = '';

      

      # call who and build up text of response

      foreach (`who`) {

          next unless /^$name\s/o;            # only lines matching $name

          s/&/&amp;/g;                        # escape HTML

          s/</&lt;/g;

          s/>/&gt;/g;

          $html .= $_;

      }

      # nice message if we didn't find anyone by that name

      $html = $html || "$name is not logged in";

      

      print pre($html);

  }

  

  print end_html( );
The call to textfield generates HTML for a text entry field whose parameter name is WHO. After printing the form, we check whether we were called with a value for the WHO parameter. If so, we try to find lines in the output from who for that user.
The documentation for the standard CGI module; Recipe 19.4; Recipe 19.6
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a Multiscreen CGI Script
Inhaltsvorschau
You want a single CGI script that can return several different pages to the browser. For instance, you want a single script for administering a database of products. The script will be called to display the form to add a product, to process the add-product form, to display a list of products to delete, to process the delete-product form, to display a list of product to edit, to display a form of the product's attributes for the user to change, and to process the edit-product form. You can use these multiscreen CGI scripts to form an elementary shopping-cart application.
Use a hidden field to encode the current screen.
It is easy to generate sticky hidden fields with the CGI module. The hidden function returns HTML for a hidden widget and uses the widget's current value if you pass hidden only the widget name:
use CGI qw(:standard);

print hidden("bacon");
To determine which page ("display product list", "display all items in shopping cart", "confirm order") to display, use another hidden field. We'll call this one .State so it won't conflict with any field we might have called State (for instance, in credit card billing information). To let the user move from page to page, use submit buttons that set .State to the name of the page to go to. For instance, to make a button to take the user to the "Checkout" page, use:
print submit(-NAME => ".State", -VALUE => "Checkout");
We wrap this in a function to make it easier to type:
sub to_page { return submit( -NAME => ".State", -VALUE => shift ) }
To decide what code to display, check the .State parameter:
$page = param(".State") || "Default";
Put the code to generate each page in separate subroutines. You could decide which subroutine to call with a long if ... elsif ... elsif:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Saving a Form to a File or Mail Pipe
Inhaltsvorschau
Your CGI script needs to save or mail the entire form contents to a file.
To store a form, use the CGI module's save_parameters function or save method, which take a filehandle argument. You can save to a file:
# first open and exclusively lock the file

open(FH, ">>/tmp/formlog")              or die "can't append to formlog: $!";

flock(FH, 2)                            or die "can't flock formlog: $!";



# either using the procedural interface

use CGI qw(:standard);

save_parameters(*FH);                   # with CGI::save



# or using the object interface

use CGI;

$query = CGI->new( );

$query->save(*FH);



close(FH)                               or die "can't close formlog: $!";
or save to a pipe, such as one connected to a mailer process:
use CGI qw(:standard);

open(MAIL, "|/usr/lib/sendmail -oi -t") or die "can't fork sendmail: $!";

print MAIL <<EOF;

From: $0 (your cgi script)

To: hisname\@hishost.com

Subject: mailed form submission



EOF

save_parameters(*MAIL);

close(MAIL)                             or die "can't close sendmail: $!";
Sometimes all you want to do with form data is to save it for later use. The save_parameters function and save method in CGI.pm write form parameters to an open filehandle. That filehandle can be attached to an open file (preferably one opened in append mode and locked, as in the Solution), or to a pipe whose other end is a mail program.
File entries are stored one per line as variable=value pairs, with any funny characters URL-escaped. Each record is separated by a line with a single equals sign. These are typically read back by invoking the CGI->new method with a filehandle argument that manages all of the unescaping automatically, as described later.
If you want to add extra information to your query before you save it, the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: chemiserie
Inhaltsvorschau
The CGI script in Example 19-9 lets people order t-shirts and sweaters over the Web, using techniques described in Recipe 19.11. Its output isn't elegant or beautiful, but illustrating the multiscreen technique in a short program was challenging enough without trying to make it pretty as well.
The shirt and sweater subroutines check their widget values. If the user somehow submits an invalid color or size, the value is reset to the first in the list of allowable colors or sizes.
Example 19-9. chemiserie
  #!/usr/bin/perl -w

  # chemiserie - simple CGI shopping for shirts and sweaters

  

  use strict;

  use CGI qw(:standard);

  use CGI::Carp qw(fatalsToBrowser);

  

  my %States;                    # state table mapping pages to functions

  my $Current_Screen;            # the current screen

  croak "This CGI works only over HTTPS"

    if $ENV{'SERVER_PORT'} && !$ENV{'HTTPS'};

  # Since we deal with sensitive data like credit card numbers

  # Hash of pages and functions.

  

  %States = (

      'Default'     => \&front_page,

      'Shirt'       => \&shirt,

      'Sweater'     => \&sweater,

      'Checkout'    => \&checkout,

      'Card'        => \&credit_card,

      'Order'       => \&order,

      'Cancel'      => \&front_page,

  );

  

  $Current_Screen = param(".State") || "Default";

  die "No screen for $Current_Screen" unless $States{$Current_Screen};

  

  # Generate the current page.

  

  standard_header( );

  

  while (my($screen_name, $function) = each %States) {

      $function->($screen_name eq $Current_Screen);

  }

  standard_footer( );

  exit;

  

  ################################

  # header, footer, menu functions

  ################################

  

  sub standard_header {

      print header( ), start_html(-Title => "Shirts", -BGCOLOR=>"White");

      print start_form( ); # start_multipart_form( ) if file upload

  }

  

  sub standard_footer { print end_form( ), end_html( ) }

  

  sub shop_menu {

      print p(defaults("Empty My Shopping Cart"),

          to_page("Shirt"),

          to_page("Sweater"),

          to_page("Checkout"));

  }

  

  #############################

  # subroutines for each screen

  #############################

  

  # The default page.

  sub front_page {

      my $active = shift;

      return unless $active;

  

      print "<H1>Hi!</H1>\n";

      print "Welcome to our Shirt Shop!  Please make your selection from ";

      print "the menu below.\n";

  

      shop_menu( );

  }

  

  # Page to order a shirt from.

  sub shirt {

      my $active = shift;

      my @sizes  = qw(XL L M S);

      my @colors = qw(Black White);

  

      my ($size, $color, $count) =

        (param("shirt_size"), param("shirt_color"), param("shirt_count"));

  

      # sanity check

      if ($count) {

          $color = $colors[0] unless grep { $_ eq $color } @colors;

          $size  = $sizes[0]  unless grep { $_ eq $size  } @sizes;

          param("shirt_color", $color);

          param("shirt_size",  $size);

      }

  

      unless ($active) {

          print hidden("shirt_size")  if $size;

          print hidden("shirt_color") if $color;

          print hidden("shirt_count") if $count;

          return;

      }

  

      print h1("T-Shirt");

      print p("What a shirt!  This baby is decked out with all the options.",

          "It comes with full luxury interior, cotton trim, and a collar",

          "to make your eyes water!  Unit price: \$33.00");

  

      print h2("Options");

      print p("How Many?", textfield("shirt_count"));

      print p("Size?",  popup_menu("shirt_size",  \@sizes ),

          "Color?", popup_menu("shirt_color", \@colors));

  

      shop_menu( );

  }

  

  # Page to order a sweater from.

  sub sweater {

      my $active = shift;

      my @sizes  = qw(XL L M);

      my @colors = qw(Chartreuse Puce Lavender);

  

      my ($size, $color, $count) =

        (param("sweater_size"), param("sweater_color"), param("sweater_count"));

  

      # sanity check

      if ($count) {

          $color = $colors[0] unless grep { $_ eq $color } @colors;

          $size  = $sizes[0]  unless grep { $_ eq $size  } @sizes;

          param("sweater_color", $color);

          param("sweater_size",  $size);

      }

  

      unless ($active) {

          print hidden("sweater_size")  if $size;

          print hidden("sweater_color") if $color;

          print hidden("sweater_count") if $count;

          return;

      }

  

      print h1("Sweater");

      print p("Nothing implies preppy elegance more than this fine",

          "sweater.  Made by peasant workers from black market silk,",

          "it slides onto your lean form and cries out \"Take me,",

          "for I am a god!\".  Unit price: \$49.99.");

  

      print h2("Options");

      print p("How Many?", textfield("sweater_count"));

      print p("Size?",  popup_menu("sweater_size",  \@sizes));

      print p("Color?", popup_menu("sweater_color", \@colors));

  

      shop_menu( );

  }

  

  # Page to display current order for confirmation.

  sub checkout {

      my $active = shift;

  

      return unless $active;

  

      print h1("Order Confirmation");

      print p("You ordered the following:");

      print order_text( );

      print p("Is this right?  Select 'Card' to pay for the items",

          "or 'Shirt' or 'Sweater' to continue shopping.");

      print p(to_page("Card"),

          to_page("Shirt"), 

          to_page("Sweater"));

  }

  

  # Page to gather credit-card information.

  sub credit_card {

      my $active = shift;

      my @widgets = qw(Name Address1 Address2 City Zip State Phone Card Expiry);

  

      unless ($active) {

          print map { hidden($_) } @widgets;

          return;

      }

  

      print pre(p("Name:          ", textfield("Name")),

                p("Address:       ", textfield("Address1")),

                p("               ", textfield("Address2")),

                p("City:          ", textfield("City")),

                p("Zip:           ", textfield("Zip")),

                p("State:         ", textfield("State")),

                p("Phone:         ", textfield("Phone")),

                p("Credit Card #: ", textfield("Card")),

                p("Expiry:        ", textfield("Expiry")));

  

      print p("Click on 'Order' to order the items.  Click on 'Cancel' 

to return shopping.");

  

      print p(to_page("Order"), to_page("Cancel"));

  }

  

  # Page to complete an order.

  sub order {

      my $active = shift;

  

      unless ($active) {

          return;

      }

  

      # you'd check credit card values here

  

      print h1("Ordered!");

      print p("You have ordered the following toppings:");

      print order_text( );

  

      print p(defaults("Begin Again"));

  }

  

  # Returns HTML for the current order ("You have ordered ...")

  sub order_text {

      my $html = '';

  

      if (param("shirt_count")) {

          $html .= p("You have ordered ", param("shirt_count"),

                     " shirts of size ",  param("shirt_size"),

                     " and color ", param("shirt_color"), ".");

      }

      if (param("sweater_count")) {

          $html .= p("You have ordered ",  param("sweater_count"),

                     " sweaters of size ", param("sweater_size"),

                     " and color ", param("sweater_color"), ".");

      }

      $html = p("Nothing!") unless $html;

      $html .= p("For a total cost of ", calculate_price( ));

      return $html;

  }

  

  sub calculate_price {

      my $shirts   = param("shirt_count")   || 0;

      my $sweaters = param("sweater_count") || 0;

      return sprintf("\$%.2f", $shirts*33 + $sweaters * 49.99);

  }

  

  

  sub to_page { submit(-NAME => ".State", -VALUE => shift) }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 20: Web Automation
Inhaltsvorschau
The web, then, or the pattern, a web at once sensuous and logical, an elegant and pregnant texture: that is style, that is the foundation of the art of literature.
—Robert Louis Stevenson, On some Technical Elements of Style in Literature (1885)
Chapter 19 concentrated on responding to browser requests and producing documents using CGI. This chapter approaches the Web from the other side: instead of responding to a browser, you pretend to be one, generating requests and processing returned documents. We make extensive use of modules to simplify this process because the intricate network protocols and document formats are tricky to get right. By letting existing modules handle the hard parts, you can concentrate on the interesting part—your own program.
The relevant modules can all be found under the following URL:
http://search.cpan.org/modlist/World_Wide_Web
There you'll find modules for computing credit card checksums, interacting with Netscape or Apache server APIs, processing image maps, validating HTML, and manipulating MIME. The largest and most important modules for this chapter, though, are found in the libwww-perl suite of modules, referred to collectively as LWP. Table 20-1 lists just a few modules included in LWP.
Table 20-1: LWP modules (continued)
Module name
Purpose
LWP::UserAgent
WWW user agent class
LWP::RobotUA
Develop robot applications
LWP::Protocol
Interface to various protocol schemes
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
Chapter 19 concentrated on responding to browser requests and producing documents using CGI. This chapter approaches the Web from the other side: instead of responding to a browser, you pretend to be one, generating requests and processing returned documents. We make extensive use of modules to simplify this process because the intricate network protocols and document formats are tricky to get right. By letting existing modules handle the hard parts, you can concentrate on the interesting part—your own program.
The relevant modules can all be found under the following URL:
http://search.cpan.org/modlist/World_Wide_Web
There you'll find modules for computing credit card checksums, interacting with Netscape or Apache server APIs, processing image maps, validating HTML, and manipulating MIME. The largest and most important modules for this chapter, though, are found in the libwww-perl suite of modules, referred to collectively as LWP. Table 20-1 lists just a few modules included in LWP.
Table 20-1: LWP modules (continued)
Module name
Purpose
LWP::UserAgent
WWW user agent class
LWP::RobotUA
Develop robot applications
LWP::Protocol
Interface to various protocol schemes
LWP::Authen::Basic
Handle 401 and 407 responses
LWP::MediaTypes
MIME types configuration (text/html, etc.)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Fetching a URL from a Perl Script
Inhaltsvorschau
You have a URL whose contents you want to fetch from a script.
Use the get function from the CPAN module LWP::Simple, part of LWP.
use LWP::Simple;

$content = get($URL);
The right library makes life easier, and the LWP modules are the right ones for this task. As you can see from the Solution, LWP makes this task a trivial one.
The get function from LWP::Simple returns undef on error, so check for errors this way:
use LWP::Simple;

unless (defined ($content = get $URL)) {

    die "could not get $URL\n";

}
When called that way, however, you can't determine the cause of the error. For this and other elaborate processing, you'll have to go beyond LWP::Simple.
Example 20-1 is a program that fetches a remote document. If it fails, it prints out the error status line. Otherwise, it prints out the document title and the number of bytes of content. We use three modules, two of which are from LWP.
LWP::UserAgent
This module creates a virtual browser. The object returned from the new constructor is used to make the actual request. We've set the name of our agent to "Schmozilla/v9.14 Platinum" just to give the remote webmaster browser-envy when they see it in their logs. This is useful on obnoxious web servers that needlessly consult the user agent string to decide whether to return a proper page or an infuriating "you need Internet Navigator v12 or later to view this site" cop-out.
HTTP::Response
This is the object type returned when the user agent actually runs the request. We check it for errors and contents.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Automating Form Submission
Inhaltsvorschau
You want to submit form values to a CGI script from your program. For example, you want to write a program that searches Amazon and notifies you when new books with a particular keyword in the title or new books by a particular author appear.
If you're submitting form values with GET, use the get method on an LWP::UserAgent object:
use LWP::Simple;

use URI::URL;



$url = url("http://www.amazon.com/exec/obidos/search-handle-url/index=books");

$url->query_form("field-author" => "Larry Wall"); # more params if needed

$page = get($url);
If you're using the POST method, create your own user agent and encode the content appropriately:
use LWP::UserAgent;



$ua = LWP::UserAgent->new( );

$resp = $ua->post("www.amazon.com/exec/obidos/search-handle-form",

                  { "url"            => "index-books",

                    "field-keywords" => "perl" });

$content = $resp->content;
For simple operations, the procedural interface of the LWP::Simple module is sufficient. For fancier ones, the LWP::UserAgent module provides a virtual browser object, which you manipulate using method calls.
The format of a query string is:
field1=value1&field2=value2&field3=value3
In GET requests, this is encoded in the URL being requested:
script.cgi?field1=value1&field2=value2&field3=value3
Fields must still be properly escaped, so setting the arg form parameter to "this isn't <EASY> & <FUN>" would yield:
http://www.site.com/path/to/

        script.cgi?arg=%22this+isn%27t+%3CEASY%3E+%26+%3CFUN%3E%22
The query_form method called on a URL object correctly escapes the form values for you, or you could use the URI::Escape::uri_escape or CGI::escape_html functions on your own. In POST requests, the query string is in the body of the HTTP document sent to the CGI script.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extracting URLs
Inhaltsvorschau
You want to extract all URLs from an HTML file. For example, you have downloaded a page that lists the MP3 files downloadable from some site. You want to extract those MP3s' URLS so you can filter the list and write a program to download the ones you want.
Use the HTML::LinkExtor module from CPAN:
use HTML::LinkExtor;



$parser = HTML::LinkExtor->new(undef, $base_url);

$parser->parse_file($filename);

@links = $parser->links;

foreach $linkarray (@links) {

    my @element = @$linkarray;

    my $elt_type = shift @element;                  # element type



    # possibly test whether this is an element we're interested in

    while (@element) {

        # extract the next attribute and its value

        my ($attr_name, $attr_value) = splice(@element, 0, 2);

        # ... do something with them ...

    }

}
You can use HTML::LinkExtor in two different ways: either by calling links to get a list of all links in the document once it is completely parsed, or by passing a code reference in the first argument to new. The referenced function is called on each link as the document is parsed.
The links method clears the link list, so call it only once per parsed document. It returns a reference to an array of elements. Each element is itself an array reference with an HTML::Element object at the front followed by a list of attribute name and attribute value pairs. For instance, the HTML:
<A HREF="http://www.perl.com/">Home page</A>

<IMG SRC="images/big.gif" LOWSRC="images/big-lowres.gif">
would return a data structure like this:
[

  [ a,   href   => "http://www.perl.com/" ],

  [ img, src    => "images/big.gif",

         lowsrc => "images/big-lowres.gif" ]

]
Here's an example of how to use $elt_type and $attr_name to print out and anchor an image:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Converting ASCII to HTML
Inhaltsvorschau
You want to convert ASCII text to HTML. For example, you have mail you want to display intelligently on a web page.
Use the simple little encoding filter in Example 20-3.
Example 20-3. text2html
  #!/usr/bin/perl -w -p00

  # text2html - trivial html encoding of normal text

  # -p means apply this script to each record.

  # -00 mean that a record is now a paragraph

  

  use HTML::Entities;

  $_ = encode_entities($_, "\200-\377");

  

  if (/^\s/) {

      # Paragraphs beginning with whitespace are wrapped in <PRE> 

      s{(.*)$}        {<PRE>\n$1</PRE>\n}s;           # indented verbatim

  } else {

      s{^(>.*)}       {$1<BR>}gm;                     # quoted text

      s{<URL:(.*?)>}    {<A HREF="$1">$1</A>}gs       # embedded URL  (good)

                      ||

      s{(http:\S+)}   {<A HREF="$1">$1</A>}gs;        # guessed URL   (bad)

      s{*(\S+)*}    {<STRONG>$1</STRONG>}g;           # this is *bold* here

      s{\b_(\S+)\_\b} {<EM>$1</EM>}g;                 # this is _italics_ here

      s{^}            {<P>\n};                        # add paragraph tag 

  }
Converting arbitrary plain text to HTML has no general solution because there are too many conflicting ways to represent formatting information. The more you know about the input, the better you can format it.
For example, if you knew that you would be fed a mail message, you could add this block to format the mail headers:
BEGIN {

    print "<TABLE>";

    $_ = encode_entities(scalar <>);

    s/\n\s+/ /g;  # continuation lines

    while ( /^(\S+?:)\s*(.*)$/gm ) {                # parse heading

        print "<TR><TH ALIGN='LEFT'>$1</TH><TD>$2</TD></TR>\n";

    }

    print "</TABLE><HR>";

}
The CPAN module HTML::TextToHTML has options for headers, footers, indentation, tables, and more.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Converting HTML to ASCII
Inhaltsvorschau
You want to convert an HTML file into formatted, plain ASCII. For example, you want to mail a web document to someone.
If you have an external formatter like lynx, call an external program:
$ascii = `lynx -dump $filename`;
If you want to do it within your program and don't care about the things that the HTML::FormatText formatter doesn't yet handle well (tables and frames):
use HTML::FormatText 3;

$ascii = HTML::FormatText->format_file(

  $filename,

  leftmargin => 0, rightmargin => 50

);
These examples both assume the HTML is in a file. If your HTML is in a variable, you need to write it to a file for lynx to read. With HTML::FormatText, use the format_string( ) method:
use HTML::FormatText 3;

$ascii = HTML::FormatText->format_string(

  $filename,

  leftmargin => 0, rightmargin => 50

);
If you use Netscape, its "Save as" option with the type set to "Text" does the best job with tables.
The documentation for the CPAN modules HTML::TreeBuilder and HTML::FormatText; your system's lynx(1) manpage; Recipe 20.6
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extracting or Removing HTML Tags
Inhaltsvorschau
You want to remove HTML tags from a string, leaving just plain text. For example, you are indexing a document but don't want your index to show "words" like <B> and <body>.
The following oft-cited solution is simple but wrong on all but the most trivial HTML:
($plain_text = $html_text) =~ s/<[^>]*>//gs;     # WRONG
A correct but slower and slightly more complicated way is to use the technique from Recipe 20.5:
use HTML::FormatText 2;

$plain_text = HTML::FormatText->format_string($html_text);
As with almost everything else in Perl, there is more than one way to do it. Each solution attempts to strike a balance between speed and flexibility. Occasionally you may find HTML that's simple enough that a trivial command-line call works:
% perl -pe 's/<[^>]*>//g' file
However, this breaks with files whose tags cross line boundaries, like this:
<IMG SRC = "foo.gif"

     ALT = "Flurp!">
So, you'll see people doing this instead:
% perl -0777 -pe 's/<[^>]*>//gs' file
or its scripted equivalent:
{

    local $/;               # temporary whole-file input mode

    $html = <FILE>;

    $html =~ s/<[^>]*>//gs;

}
But even that isn't good enough except for simplistic HTML without any interesting bits in it. This approach fails for the following examples of valid HTML (among many others):
<IMG SRC = "foo.gif" ALT = "A > B">



<!-- <A comment> -->



<script>if (a<b && a>c)</script>



<# Just data #>



<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
If HTML comments include other tags, those solutions would also break on text like this:
<!-- This section commented out.

    <B>You can't see me!</B>

-->
The only solution that works well here is to use the HTML parsing routines from CPAN. The second code snippet shown in the Solution demonstrates this better technique.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding Stale Links
Inhaltsvorschau
You want to check a document for invalid links.
Use the technique outlined in Recipe 20.3 to extract each link, and then use LWP::Simple's head function to make sure that link exists.
Example 20-5 is an applied example of the link-extraction technique. Instead of just printing the name of the link, we call LWP::Simple's head function on it. The HEAD method fetches the remote document's metainformation without downloading the whole document. If it fails, the link is bad, so we print an appropriate message.
Because this program uses the get function from LWP::Simple, it is expecting a URL, not a filename. If you want to supply either, use the URI::Heuristic module described in Recipe 20.1.
Example 20-5. churl
  #!/usr/bin/perl -w

  # churl - check urls

  use HTML::LinkExtor;

  use LWP::Simple;

  $base_url = shift

      or die "usage: $0 <start_url>\n";

  $parser = HTML::LinkExtor->new(undef, $base_url);

  $html = get($base_url);

  die "Can't fetch $base_url" unless defined($html);

  $parser->parse($html);

  @links = $parser->links;

  print "$base_url: \n";

  foreach $linkarray (@links) {

      my @element  = @$linkarray;

      my $elt_type = shift @element;

      while (@element) {

          my ($attr_name , $attr_value) = splice(@element, 0, 2);

          if ($attr_value->scheme =~ /\b(ftp|https?|file)\b/) {

              print "  $attr_value: ", head($attr_value) ? "OK" : "BAD", "\n";

          }

      }

  }
Here's an example of a program run:
% churl http://www.wizards.com

http://www.wizards.com

               FrontPage/FP_Color.gif:  OK

               FrontPage/FP_BW.gif:  BAD

               #FP_Map:  OK

               Games_Library/Welcome.html:  OK

            
This program has the same limitation as the HTML::LinkExtor program in Recipe 20.3.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding Fresh Links
Inhaltsvorschau
Given a list of URLs, you want to determine which have been modified most recently. For example, you want to sort your bookmarks so those most recently updated are on the top.
The program in Example 20-6 reads URLs from standard input, rearranges them by date, and prints them to standard output with those dates prepended.
Example 20-6. surl
  #!/usr/bin/perl -w

  # surl - sort URLs by their last modification date

  use strict;

  use LWP::UserAgent;

  use HTTP::Request;

  use URI::URL qw(url);

  my %Date;

  my $ua = LWP::UserAgent->new( );

  while ( my $url = url(scalar <>) ) {

      my $ans;

      next unless $url->scheme =~ /^(file|https?)$/;

      $ans = $ua->head($url);

      if ($ans->is_success) {

          $Date{$url} = $ans->last_modified || 0;  # unknown

      } else {

          warn("$url: Error [", $ans->code, "] ", $ans->message, "!\n");

      }

  }

  foreach my $url ( sort { $Date{$b} <=> $Date{$a} } keys %Date ) {

      printf "%-25s %s\n", $Date{$url} ? (scalar localtime $Date{$url})

                                       : "<NONE SPECIFIED>", $url;

  }
The surl script works more like a traditional filter program. It reads from standard input one URL per line. (Actually, it uses ARGV to read, which defaults to STDIN when @ARGV is empty.) The last-modified date on each URL is fetched by a HEAD request. That date is stored in a hash with the URL as key. Then a simple sort by value is run on the hash to reorder the URLs by date. On output, the internal date is converted into localtime format.
Here's an example of using the xurl program from the earlier recipe to extract the URLs, then running that program's output to feed into surl.
% xurl http://use.perl.org/~gnat/journal | surl | head

Mon Jan 13 22:58:16 2003  http://www.nanowrimo.org/

               Sun Jan 12 19:29:00 2003  http://www.costik.com/gamespek.html
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Templates to Generate HTML
Inhaltsvorschau
You want to store a parameterized template in an external file, read that template from your CGI script, and substitute your own variables for escapes embedded in the text. This separates your program from the static parts of the document.
To expand only variable references, use this template function:
sub template {

    my ($filename, $fillings) = @_;

    my $text;

    local $/;                    # slurp mode (undef)

    open(my $fh, "<", $filename) or return;

    $text = <$fh>;               # read whole file

    close($fh);                  # ignore retval

    # replace quoted words with value in %$fillings hash

    $text =~ s{ %% ( .*? ) %% }

              { exists( $fillings->{$1} )

                      ? $fillings->{$1}

                      : ""

              }gsex;

    return $text;

}
on a data file like this:
<!-- simple.template for internal template( ) function -->

<HTML><HEAD><TITLE>Report for %%username%%</TITLE></HEAD>

<BODY><H1>Report for %%username%%</H1>

%%username%% logged in %%count%% times, for a total of %%total%% minutes.
If you can guarantee the data file is secure from tampering, use the CPAN module Text::Template to expand full expressions. A data file for Text::Template looks like this:
<!-- fancy.template for Text::Template -->

<HTML><HEAD><TITLE>Report for {$user}</TITLE></HEAD>

<BODY><H1>Report for {$user}</H1>

{ lcfirst($user) } logged in {$count} times, for a total of 

{ int($total / 60) } minutes.
For a complete templating solution, see the Template Toolkit's Template module This offers a scripting language and mod_perl integration, and is covered in Recipe 21.17.
Parameterized output for your CGI scripts is a good idea for many reasons. Separating your program from its data lets other people (art directors, for instance) change the HTML but not the program. Even better, two programs can share the same template, so style changes in the template are immediately reflected in both programs' output.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Mirroring Web Pages
Inhaltsvorschau
You want a local copy of a web page kept up-to-date.
Use LWP::Simple's mirror function:
use LWP::Simple;

mirror($URL, $local_filename);
Although closely related to the get function discussed in Recipe 20.1, the mirror function doesn't download the file unconditionally. It adds the If-Modified-Since header to the GET request it creates, so the server does not transfer the file unless the file has been updated.
The mirror function mirrors only a single page, not a full tree. To mirror a set of pages, use this recipe in conjunction with Recipe 20.3. A good solution to mirroring an entire directory hierarchy can be found in the w3mir program, also found on CPAN, and the wget program from ftp.gnu.org.
Be careful! It's possible (and easy) to write programs that run amok and begin downloading all web pages on the net. This is not only poor etiquette, it's also an infinite task, since some pages are dynamically generated. It could also get you into trouble with someone who doesn't want their pages downloaded en masse.
The documentation for the CPAN module LWP::Simple; the HTTP specification at http://www.w3.org/pub/WWW/Protocols/HTTP/
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating a Robot
Inhaltsvorschau
You want to create a script that navigates the Web on its own (i.e., a robot), and you'd like to respect the remote sites' wishes.
Instead of writing your robot with LWP::UserAgent, use LWP::RobotUA instead:
use LWP::RobotUA;

$ua = LWP::RobotUA->new('websnuffler/0.1', 'me@wherever.com');
To avoid marauding robots and web crawlers hammering their servers, sites are encouraged to create a file with access rules called robots.txt. If you're fetching only one document, this is no big deal, but if your script fetches many documents from the same server, you could easily exhaust that site's bandwidth.
When writing scripts to run around the Web, it's important to be a good net citizen: don't request documents from the same server too often, and heed the advisory access rules in their robots.txt file.
The easiest way to handle this is to use the LWP::RobotUA module instead of LWP::UserAgent to create agents. This agent automatically knows to fetch data slowly when calling the same server repeatedly. It also checks each site's robots.txt file to see whether you're trying to grab a file that is off-limits. If you do, you'll get a response like this:

               403 (Forbidden) Forbidden by robots.txt

            
Here's an example robots.txt file, fetched using the GET program that comes with the LWP module suite:
% GET http://www.webtechniques.com/robots.txt 

User-agent: *

               Disallow: /stats

               Disallow: /db

               Disallow: /logs

               Disallow: /store

               Disallow: /forms

               Disallow: /gifs

               Disallow: /wais-src

               Disallow: /scripts

               Disallow: /config

            
A more interesting and extensive example is at http://www.cnn.com/robots.txt. This file is so big, they even keep it under RCS control!
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing a Web Server Log File
Inhaltsvorschau
You want to extract selected information from a web server log file.
Pull apart the log file as follows:
while (<LOGFILE>) {

  my ($client, $identuser, $authuser, $date, $time, $tz, $method,

      $url, $protocol, $status, $bytes) =

  /^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.*?) (\S+)"

      (\S+) (\S+)$/;

  # ...

}
This regular expression pulls apart entries in Common Log Format, an informal standard that most web servers adhere to. The fields are listed in Table 20-1.
Table 20-2: Common Log Format fields
Field
Meaning
client
IP address or hostname of browser's machine
identuser
If IDENT (RFC 1413) was used, what it returned
authuser
If username/password authentication was used, whom they logged in as
date
Date of request (e.g., 01/Mar/1997)
time
Time of request (e.g., 12:55:36)
tz
Time zone (e.g., -0700)
method
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing Server Logs
Inhaltsvorschau
You need to summarize your server logs, but you don't have a customizable program to do it.
Parse the error log yourself with regular expressions, or use the Logfile modules from CPAN.
Example 20-9 is a sample report generator for an Apache weblog.
Example 20-9. sumwww
  #!/usr/bin/perl -w

  # sumwww - summarize web server log activity

  

  $lastdate = "";

  daily_logs( );

  summary( );

  exit;

  

  # read CLF files and tally hits from the host and to the URL

  sub daily_logs {

      while (<>) {

          ($type, $what) = /"(GET|POST)\s+(\S+?) \S+"/ or next;

          ($host, undef, undef, $datetime) = split;

          ($bytes) = /\s(\d+)\s*$/ or next;

          ($date)  = ($datetime =~ /\[([^:]*)/);

          $posts  += ($type eq POST);

          $home++ if m, / ,;

          if ($date ne $lastdate) {

              if ($lastdate) { write_report( )     }

              else           { $lastdate = $date  }

          }

          $count++;

          $hosts{$host}++;

          $what{$what}++;

          $bytesum += $bytes;

      }

      write_report( ) if $count;

  }

  

  # use *typeglob aliasing of global variables for cheap copy

  sub summary  {

      $lastdate = "Grand Total";

      *count   = *sumcount;

      *bytesum = *bytesumsum;

      *hosts   = *allhosts;

      *posts   = *allposts;

      *what    = *allwhat;

      *home    = *allhome;

      write;

  }

  

  # display the tallies of hosts and URLs, using formats

  sub write_report {

      write;

  

      # add to summary data

      $lastdate    = $date;

      $sumcount   += $count;

      $bytesumsum += $bytesum;

      $allposts   += $posts;

      $allhome    += $home;

  

      # reset daily data

      $posts = $count = $bytesum = $home = 0;

      @allwhat{keys %what}   = keys %what;

      @allhosts{keys %hosts} = keys %hosts;

      %hosts = %what = ( );

  }

  

  format STDOUT_TOP =

  @|||||||||| @|||||| @||||||| @||||||| @|||||| @|||||| @|||||||||||||

  "Date",     "Hosts", "Accesses", "Unidocs", "POST", "Home", "Bytes"

  ----------- ------- -------- -------- ------- ------- --------------

  .

  

  format STDOUT =

  @>>>>>>>>>> @>>>>>> @>>>>>>> @>>>>>>> @>>>>>> @>>>>>> @>>>>>>>>>>>>>

  $lastdate,  scalar(keys %hosts), 

              $count, scalar(keys %what),

                               $posts,  $home,   $bytesum

  .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Cookies
Inhaltsvorschau
You want to fetch web pages, but the server is using cookies to track you. For example, some sites use a cookie to remember that you've authenticated. If you don't send the right cookie, you'll never get past the login screen.
Let LWP::UserAgent handle cookies for you. You can enable cookies for just this program run with:
$ua->cookie_jar({  });
Or instead store cookies in a file between invocations with:
$ua->cookie_jar({ file => "$ENV{HOME}/.cookies" });
The default behavior of LWP::UserAgent is never to send a Cookie: header, even when the server offers cookies in a response. To keep track of cookies LWP::UserAgent receives and send them when appropriate, provide the user agent object with a special "cookie jar" object to hold the cookies: an HTTP::Cookies object.
Pass the cookie_jar method either an HTTP::Cookies object to use that object as the cookie jar, or else a hash reference whose contents go into a new HTTP::Cookies object.
Without parameters, an HTTP::Cookies object keeps cookies in memory, so they're no longer available once your program exits. The file parameter in the cookie_jar method call specifies a filename to use for initializing the cookie jar and for saving updated or new cookies. This is how you give cookies a shelf life beyond a single run of your program.
To disable cookies, call cookie_jar with no parameters:
$ua->cookie_jar( );
The documentation for the CPAN modules LWP::UserAgent and HTTP::Cookie
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Fetching Password-Protected Pages
Inhaltsvorschau
You want to use LWP to fetch web pages or submit forms, but the web server requires authentication.
Set the username and password for a particular realm with the user agent's credentials method:
$ua->credentials('http://www.perlcabal.com/cabal_only/',

                 'Secret Perl Cabal Files',

                 'username' => 'password');
To access pages protected by basic authentication, a browser must supply the username and password for the realm of the authentication. The realm is just a string that identifies which username and password the user must supply. The credentials method tells the user agent to send the username and password for a particular realm.
A somewhat kludgey solution is to specify URLs with the username and password in them:
http://user:password@www.example.com/private/pages/
This is kludgey because links within the returned document do not have your username and password encoded in them. Solutions that rely entirely on URL-encoded usernames and passwords often quickly degenerate into code that wishes it had used credentials to begin with.
The documentation for the CPAN module LWP::UserAgent
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Fetching https:// Web Pages
Inhaltsvorschau
You want to work with a web server over a secure (SSL) connection. For example, you want to automate ordering supplies for your company from an online store, and the online store wisely protects its transactions with SSL.
Install Crypt::SSLeay and https: URLs automatically work with LWP. You do not need to reinstall LWP.
When LWP sends a request to an https server, it checks whether there is a module installed to do SSL. The two modules that LWP can use are, in order of preference, Crypt::SSLeay and IO::Socket::SSL. Of the two, Crypt::SSLeay is the more fully featured, but requires the OpenSSL libraries from http://www.openssl.org.
The documentation for the CPAN module Crypt::SSLeay; the README.SSL file in the libwww-perl distribution
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Resuming an HTTP GET
Inhaltsvorschau
You have part of a file and want to download the rest without refetching the content you already have. For example, your initial download was interrupted, so you want to complete it.
Use the HTTP 1.1 Range header in your GET request:
use LWP;

$have = length($file);

$response = $ua->get($URL,

                     'Range', "bytes=$have-");

# $response->content hold the rest of the file
The Range header lets you specify which bytes to fetch. The 0th byte is the first in the file, so the range "bytes=0-" fetches the whole file.
You can also specify a range with two endpoints: "0-25", for example, fetches the first 26 bytes of the file. If you want to fetch an interior range, use "26-99".
Some servers don't support ranges, even though they claim to understand HTTP 1.1. In this case you'll be sent the whole file, not the range you asked for. To detect this, use HEAD to see the size of the file and then use a GET with a range to fetch the rest. If the content in the GET response is the same length as the original file, your range was ignored.
Here is the full list of ranges possible in the HTTP 1.1 specification:
[start]-
From start on (inclusive)
[start]-[end]
From start to end (inclusive)
-[num]
The last num bytes
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing HTML
Inhaltsvorschau
You need to extract complex information from a web page or pages. For example, you want to extract news stories from web sites like CNN.com or news.bbc.co.uk.
Use regular expressions for data that's well identified:
# story is everything from <!-- story --> to <!-- /story -->

if ($html =~ m{<!-- story -->(.*?)<!-- /story -->}s) {

  my $story = $1;

  # ...

} else {

  warn "No story found in the page";

}
But for tables and data identifiable only by complex patterns of HTML, use a parser:
use HTML::TokeParser;



my $parser = HTML::TokeParser->new($FILENAME)

    or die "Can't open $FILENAME: $!\n";

while (my $token = $parser->get_token( )) {

    my $type = $token->[0];

    if    ($type eq 'S')  { ... }   # start tag

    elsif ($type eq 'E')  { ... }   # end tag

    elsif ($type eq 'T')  { ... }   # text

    elsif ($type eq 'C')  { ... }   # comment

    elsif ($type eq 'D')  { ... }   # declaration

    elsif ($type eq 'PI') { ... }   # processing instruction

    else { die "$type isn't a valid HTML token type" }

}
Regular expressions are a convenient way to extract information from HTML. However, as the complexity of the HTML and the amount of information to be extracted go up, the maintainability of the regular expressions goes down. For a few well-defined fields, regular expressions are fine. For anything else, use a proper parser.
As an example of processing HTML with regular expressions, let's get the list of recent O'Reilly book releases. The list is found on http://www.oreilly.com/catalog/new.html, but there's also a navigation bar and a list of upcoming releases, so we can't simply extract all links.
The relevant HTML from the page looks like this:
<!--  New titles  -->

<h3>New Titles</h3>

<ul><li><a href="netwinformian/">.NET Windows Forms in a

Nutshell</a> <em>(March)</em></li><li><a href="actscrptpr/">

ActionScript for Flash MX Pocket Reference</a> <em>(March)</em>

</li><li><a href="abcancer/">After Breast Cancer</a> <em>(March)

...

<li><a href="samba2/">Using Samba, 2nd Edition</a> <em>(February)

</em></li><li><a href="vbscriptian2/">VBScript in a Nutshell, 2nd

Edition</a> <em>(March)</em></li><li><a href="tpj2/">Web, Graphics

& Perl/Tk</a> <em>(March)</em></li></ul></td>

<td valign="top">

<!--  Upcoming titles  -->
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extracting Table Data
Inhaltsvorschau
You have data in an HTML table, and you would like to turn that into a Perl data structure. For example, you want to monitor changes to an author's CPAN module list.
Use the HTML::TableContentParser module from CPAN:
use HTML::TableContentParser;



$tcp = HTML::TableContentParser->new;

$tables = $tcp->parse($HTML);



foreach $table (@$tables) {

  @headers = map { $_->{data} } @{ $table->{headers} };

  # attributes of table tag available as keys in hash

  $table_width = $table->{width};



  foreach $row (@{ $tables->{rows} }) {

    # attributes of tr tag available as keys in hash

    foreach $col (@{ $row->{cols} }) {

      # attributes of td tag available as keys in hash

      $data = $col->{data};

    }

  }

}
The HTML::TableContentParser module converts all tables in the HTML document into a Perl data structure. As with HTML tables, there are three layers of nesting in the data structure: the table, the row, and the data in that row.
Each table, row, and data tag is represented as a hash reference. The hash keys correspond to attributes of the tag that defined that table, row, or cell. In addition, the value for a special key gives the contents of the table, row, or cell. In a table, the value for the rows key is a reference to an array of rows. In a row, the cols key points to an array of cells. In a cell, the data key holds the HTML contents of the data tag.
For example, take the following table:
<table width="100%" bgcolor="#ffffff">

  <tr>

    <td>Larry &amp; Gloria</td>

    <td>Mountain View</td>

    <td>California</td>

  </tr>

  <tr>

    <td><b>Tom</b></td>

    <td>Boulder</td>

    <td>Colorado</td>

  </tr>

  <tr>

    <td>Nathan &amp; Jenine</td>

    <td>Fort Collins</td>

    <td>Colorado</td>

  </tr>

</table>
The parse method returns this data structure:
[

  {

    'width' => '100%',

    'bgcolor' => '#ffffff',

    'rows' => [

               {

                'cells' => [

                            { 'data' => 'Larry &amp; Gloria' },

                            { 'data' => 'Mountain View' },

                            { 'data' => 'California' },

                           ],

                'data' => "\n      "

               },

               {

                'cells' => [

                            { 'data' => '<b>Tom</b>' },

                            { 'data' => 'Boulder' },

                            { 'data' => 'Colorado' },

                           ],

                'data' => "\n      "

               },

               {

                'cells' => [

                            { 'data' => 'Nathan &amp; Jenine' },

                            { 'data' => 'Fort Collins' },

                            { 'data' => 'Colorado' },

                           ],

                'data' => "\n      "

               }

              ]

  }

]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: htmlsub
Inhaltsvorschau
This program makes substitutions in HTML files so changes happen only in normal text. If you had the file scooby.html that contained:
<HTML><HEAD><TITLE>Hi!</TITLE></HEAD>

<BODY><H1>Welcome to Scooby World!</H1>

I have <A HREF="pictures.html">pictures</A> of the crazy dog

himself.  Here's one!<P>

<IMG SRC="scooby.jpg" ALT="Good doggy!"><P>

<BLINK>He's my hero!</BLINK>  I would like to meet him some day,

and get my picture taken with him.<P>

P.S. I am deathly ill.  <A HREF="shergold.html">Please send

cards</A>.

</BODY></HTML>
you could use htmlsub to change every occurrence of the word "picture" in the document text to read "photo". It prints the new document on STDOUT:
% htmlsub picture photo scooby.html

<HTML><HEAD><TITLE>Hi!</TITLE></HEAD>

            <BODY><H1>Welcome to Scooby World!</H1>

            I have <A HREF="pictures.html">photos</A> of the crazy dog

            himself.  Here's one!<P>

            <IMG SRC="scooby.jpg" ALT="Good doggy!"><P>

            <BLINK>He's my hero!</BLINK>  I would like to meet him some day

            and get my photo taken with him.<P>

            P.S. I am deathly ill.  <A HREF="shergold.html">Please send

            cards</A>

            </BODY></HTML

         
The program is shown in Example 20-12.
Example 20-12. htmlsub
  #!/usr/bin/perl -w

  # htmlsub - make substitutions in normal text of HTML files

  # from Gisle Aas <gisle@aas.no>

  

  sub usage { die "Usage: $0 <from> <to> <file>...\n" }

  

  my $from = shift or usage;

  my $to   = shift or usage;

  usage unless @ARGV;

  

  # Build the HTML::Filter subclass to do the substituting.

  

  package MyFilter;

  use HTML::Filter;

  @ISA=qw(HTML::Filter);

  use HTML::Entities qw(decode_entities encode_entities);

  

  sub text

  {

     my $self = shift;

     my $text = decode_entities($_[0]);

     $text =~ s/\Q$from/$to/go;       # most important line

     $self->SUPER::text(encode_entities($text));

  }

  

  # Now use the class.

  

  package main;

  foreach (@ARGV) {

      MyFilter->new->parse_file($_);

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program: hrefsub
Inhaltsvorschau
hrefsub makes substitutions in HTML files, so changes apply only to text in <A HREF="..." > tags. For instance, if you had the scooby.html file from the previous recipe, and you've moved shergold.html to be cards.html, you need but say:
% hrefsub shergold.html cards.html scooby.html

<HTML><HEAD><TITLE>Hi!</TITLE></HEAD>

            <BODY><H1>Welcome to Scooby World!</H1>

            I have <A HREF="pictures.html">pictures</A> of the crazy dog

            himself.  Here's one!<P>

            <IMG SRC="scooby.jpg" ALT="Good doggy!"><P

            <BLINK>He's my hero!</BLINK>  I would like to meet him some day

            and get my picture taken with him.<P>

            P.S. I am deathly ill.  <a href="cards.html">Please send

            cards</A>

            </BODY></HTML>

         
The HTML::Filter manual page has a BUGS section that says:
Comments in declarations are removed from the declarations and then inserted as separate comments after the declaration. If you turn on strict_comment( ), then comments with embedded "-\|-" are split into multiple comments.
This version of hrefsub (shown in Example 20-13) always lowercases the a and the attribute names within this tag when substitution occurs. If $foo is a multiword string, then the text given to MyFilter->text may be broken such that these words do not come together; i.e., the substitution does not work. There should probably be a new option to HTML::Parser to make it not return text until the whole segment has been seen. Also, some people may not be happy with having their 8-bit Latin-1 characters replaced by ugly entities, so htmlsub does that, too.
Example 20-13. hrefsub
  #!/usr/bin/perl -w

  # hrefsub - make substitutions in <A HREF="..."> fields of HTML files

  # from Gisle Aas <gisle@aas.no>

  

  sub usage { die "Usage: $0 <from> <to> <file>...\n" }

  

  my $from = shift or usage;

  my $to   = shift or usage;

  usage unless @ARGV;

  

  # The HTML::Filter subclass to do the substitution.

  

  package MyFilter;

  use HTML::Filter;

  @ISA=qw(HTML::Filter);

  use HTML::Entities qw(encode_entities);

  

  sub start {

     my($self, $tag, $attr, $attrseq, $orig) = @_;

     if ($tag eq 'a' && exists $attr->{href}) {

             if ($attr->{href} =~ s/\Q$from/$to/g) {

                 # must reconstruct the start tag based on $tag and $attr.

                 # wish we instead were told the extent of the 'href' value

                 # in $orig.

                 my $tmp = "<$tag";

                 for (@$attrseq) {

                     my $encoded = encode_entities($attr->{$_});

                     $tmp .= qq( $_="$encoded ");

                 }

                 $tmp .= ">";

                 $self->output($tmp);

                 return;

             }

     }

     $self->output($orig);

  }

  

  # Now use the class.

  

  package main;

  foreach (@ARGV) {

          MyFilter->new->parse_file($_);

  }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 21: mod_perl
Inhaltsvorschau
Speed is good only when wisdom leads the way.
—James Poe
The mod_perl project (http://perl.apache.org/) integrates Perl with the Apache web server. That way, you can use Perl to configure Apache, manipulate and respond to requests, write to log files, and much more.
Most people begin using mod_perl to avoid the performance penalty of CGI. With CGI programs, the web server starts a separate process for each request. This can be a costly business on most operating systems, with lots of kernel data structures to copy and file I/O to load the new process's binary. If you serve a lot of requests, the operating system may be unable to keep up with the demand for new processes, leaving your web server (and indeed the whole machine) unresponsive.
By embedding the Perl interpreter within the Apache process, mod_perl removes the need to start a separate process to generate dynamic content. Indeed, the Apache::Registry and Apache::PerlRun modules provide a CGI environment within this persistent Perl interpreter (and form the basis of Recipe 21.12). This gives you an immediate performance boost over CGI (some report 10-100x performance) but doesn't take full advantage of the integration of Perl with Apache. For that, you need to write your own handlers.
Because Apache has access to Perl at every step as it processes a request (and vice versa), you can write code (handlers) for every phase of a request-response cycle. There are 13 phases for which you can write handlers, and each phase has a default handler (so you don't have to install a handler for every phase).
You must do three things to install a handler for a specific phase: write the code, load the code into mod_perl, and tell mod_perl to call the code.
Handlers are simply subroutines. They're passed an Apache request object as the first argument, and through that object they can learn about the request, change Apache's information about the request, log errors, generate the response, and more. The return value of a handler determines whether the current phase continues with other handlers, the current phase ends successfully and execution proceeds to the next phase, or the current phase ends with an error. The return values are constants from the Apache::Constants module.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
The mod_perl project (http://perl.apache.org/) integrates Perl with the Apache web server. That way, you can use Perl to configure Apache, manipulate and respond to requests, write to log files, and much more.
Most people begin using mod_perl to avoid the performance penalty of CGI. With CGI programs, the web server starts a separate process for each request. This can be a costly business on most operating systems, with lots of kernel data structures to copy and file I/O to load the new process's binary. If you serve a lot of requests, the operating system may be unable to keep up with the demand for new processes, leaving your web server (and indeed the whole machine) unresponsive.
By embedding the Perl interpreter within the Apache process, mod_perl removes the need to start a separate process to generate dynamic content. Indeed, the Apache::Registry and Apache::PerlRun modules provide a CGI environment within this persistent Perl interpreter (and form the basis of Recipe 21.12). This gives you an immediate performance boost over CGI (some report 10-100x performance) but doesn't take full advantage of the integration of Perl with Apache. For that, you need to write your own handlers.
Because Apache has access to Perl at every step as it processes a request (and vice versa), you can write code (handlers) for every phase of a request-response cycle. There are 13 phases for which you can write handlers, and each phase has a default handler (so you don't have to install a handler for every phase).
You must do three things to install a handler for a specific phase: write the code, load the code into mod_perl, and tell mod_perl to call the code.
Handlers are simply subroutines. They're passed an Apache request object as the first argument, and through that object they can learn about the request, change Apache's information about the request, log errors, generate the response, and more. The return value of a handler determines whether the current phase continues with other handlers, the current phase ends successfully and execution proceeds to the next phase, or the current phase ends with an error. The return values are constants from the Apache::Constants module.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Authenticating
Inhaltsvorschau
You want to verify the username and password supplied by users who are authenticating themselves.
Get the password with $r->get_basic_auth_pw, and the username with $r->connection->user. Indicate success by returning OK. Indicate failure by calling $r->note_basic_auth_failure and returning AUTH_REQUIRED.
package Your::Authentication::Package;

use Apache::Constants ':common';



sub handler {

  my $r = shift;



  return OK unless $r->is_main;    # skip for subrequests



  my ($res, $sent_pw) = $r->get_basic_auth_pw;

  if ($res != OK) {

    $r->note_basic_auth_failure;

    return $res;

  }



  my $user = $r->user;



  # check username and password, setting $failed if they don't match



  if ($failed) {

    $r->note_basic_auth_failure;

    return AUTH_REQUIRED;

  }



  return OK;

}
Install the handler for a directory or set of files with:
# the realm

AuthName "Holiday Photos"

# next line shouldn't be changed

AuthType Basic

PerlAuthenHandler Your::Authentication::Package

require valid-user
The realm is what the user sees when their browser prompts for a username and password. If you set the realm to "Holiday Photos", the user is prompted to "enter username and password for Holiday Photos". You need at least one require directive to trigger the call to the authentication handler.
When you invoke $r->get_basic_auth_pw, Apache processes any authentication information sent by the client. Therefore you can't call $r->user before you call $r->get_basic_auth_pw (well, you can, but you won't get anything back).
The call to $r->get_basic_auth_pw returns two values, a status code and a password. If the status is OK, the browser agreed to authenticate and provided information. If the status is DECLINED, either the area isn't protected by basic authentication or there's no AuthType specified in httpd.conf. If the status is SERVER_ERROR, there's no realm defined for this area. If the status is AUTH_REQUIRED, the browser mangled or omitted basic authentication. If you decide to return AUTH_REQUIRED, first call
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Setting Cookies
Inhaltsvorschau
You want to send a cookie to a client as part of a response.
Use the Apache::Cookie module from CPAN. From within your content handler, create a new cookie and attach it to the outgoing headers:
use Apache::Cookie;

$cookie = Apache::Cookie->new($r,

                              -name    => "cookie name",

                              -value   => "its value",

                              -expires => "+1d" );

$cookie->bake;
Don't forget to send the headers before generating content:
$r->send_http_header;



$r->print("...");
The Apache::Cookie module builds a string that represents a cookie. To specify an expiration time for your cookie, use one of these formats:
+30s
30 seconds from now
+10m
10 minutes from now
+1h
1 hour from now
-1d
1 day ago
now
Now
+3M
Three months from now
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Accessing Cookie Values
Inhaltsvorschau
You want to examine values client sent you in a cookie.
Use the CPAN module Apache::Cookie to populate a hash of cookie objects derived from the header sent by the client.
use Apache::Cookie;

$ac = Apache::Cookie->new($r);

%all_cookies = $ac->parse( );
Now each element of that hash is an object representing a single cookie:
$one_cookie = $all_cookies{COOKIE_NAME};
Interrogate the object to learn about that cookie's values:
$one_cookie->value( )     $one_cookie->name( )

$one_cookie->domain( )    $one_cookie->path( )

$one_cookie->expires( )   $one_cookie->secure( )
To test whether a cookie was sent by the browser, use exists on the hash element:
unless (exists $all_cookies{chocolate}) {

  $r->header_out(Location => "http://www.site.com/login");

  return REDIRECT;

}
Don't simply test for truth:
unless ($all_cookies{chocolate}) {  # BAD
Valid cookie values include the empty string and 0, both false to Perl. See the Introduction to Chapter 1 for more.
The CGI::Cookie module is a pure Perl substitute for Apache::Cookie. Its strategy for getting a hash of cookies is slightly different from that of Apache::Cookies:
use CGI::Cookie;

%all_cookies = CGI::Cookie->fetch;
This hash of cookies works the same as the one by Apache::Cookie.
Writing Apache Modules with Perl and C; Recipe 20.14; Recipe 3.7 in mod_perl Developer's Cookbook; the Apache.pm manpage; documentation for the CGI::Cookie and Apache::Cookie modules from CPAN
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Redirecting the Browser
Inhaltsvorschau
You want to send a redirection back to the browser.
Use $r->header_out to set the Location header, then return REDIRECT:
$r->header_out(Location => "http://www.example.com/somewhere");

return REDIRECT;
If you set the Location header and return REDIRECT, the client knows the address of the new page. This is called an external redirection, because the browser (external to the web server) handles the mechanics of requesting the new page. The URL should be a complete URL (with http, etc.), never a partial one.
An internal redirection is one where Apache sends back another page from the same site. The browser never knows that the page has changed, which means relative URLs from the page could be broken. Request an internal redirection with:
$r->internal_redirect($new_partial_url);

return OK;
Apache treats internal redirections almost as though they were new requests: each phase of the request cycle is called again for the new request. Unlike the Location header, internal_redirect takes only a partial URL. You should have no logic after calling internal_redirect other than to return OK.
Writing Apache Modules with Perl and C; Recipe 19.7; Recipe 8.5 in mod_perl Developer's Cookbook; the Apache.pm manpage
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Interrogating Headers
Inhaltsvorschau
You want to learn the value of a header sent by the client.
Use the $r->header_in method:
$value = $r->header_in("Header-name");
For example, suppose you want to discover the client's preferred language (as sent in the Accept-Language header).
if ($r->header_in("Accept-Language") !~ /\ben-US\b/i) {

  $r->print("No furriners!");

  return OK;

}
If you want to access more than one header, use the $r->headers_in method, which returns a list of key-value pairs of all clients' request headers, which are typically assigned to a hash:
%h = $r->headers_in;

if ($h{"Accept-Language"} !~ /\ben-US\b/i) {

  $r->print("No furriners!");

  return OK;

}
Writing Apache Modules with Perl and C; Recipe 3.4 in mod_perl Developer's Cookbook; the Apache.pm manpage
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Accessing Form Parameters
Inhaltsvorschau
You want the values for form fields submitted by the client.
To access the form's various parameters, use $r->content to access POSTed parameters and $r->args to access GET parameters encoded in the URL.
%post_parameters = $r->content;

%get_parameters  = $r->args;
You can call $r->content only once per request because the first call consumes all POSTed data.
The Apache::Request module from CPAN gives you a $r->param method to access specific parameters, regardless of whether they're from GET or POST:
use Apache::Request;



sub handler {

  my $r = Apache::Request->instance(shift);

  my @param_names = $r->param;

  my $value       = $r->param("username");  # single value

  my @values      = $r->param("toppings");  # multiple values



  # ...

}
Processing form parameters without Apache::Request is problematic with values that occur multiple times. For example, a SELECT list with MULTIPLE enabled sends repeated entries for the same parameter name. Putting them into a hash preserves only one of those entries. Apache::Request solves this problem by accumulating multiply-submitted parameters in an array.
Form parameters POSTed to your handler can be a problem. The nature of Apache is that once one handler reads the POSTed data, another handler cannot come along later and reread that same information. So if you're going to process POSTed form parameters, you had better keep the decoded parameters around in case another handler wants to access them. The instance constructor handles this for us. When two handlers both call the instance constructor, the second handler gets back the Apache::Request object populated by the first, with form parameters already decoded.
The Apache::Request $r->param interface is based on the CGI module's parameter-parsing interface.
The Apache.pm manpage;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Receiving Uploaded Files
Inhaltsvorschau
You want a mod_perl handler that processes an uploaded file. For example, an image gallery might let the owner upload image files to the gallery.
Use the Apache::Request module's $r->upload and $r->param methods from within your handler (assuming the file upload field was called fileParam):
use Apache::Request;



my $TEN_MEG = 10 * 2 ** 20;       # 10 megabytes

sub handler {

  my $r = Apache::Request->new(shift,

                               DISABLE_UPLOADS => 0,

                               POST_MAX        => $TEN_MEG);

  $r->parse;

  my $uploaded_file = $r->upload("fileParam");

  my $filename      = $uploaded_file->filename;  # filename

  my $fh            = $uploaded_file->fh;        # filehandle

  my $size          = $uploaded_file->size;      # size in bytes

  my $info          = $uploaded_file->info;      # headers

  my $type          = $uploaded_file->type;      # Content-Type        

  my $tempname      = $uploaded_file->tempname;  # temporary name

  # ...

}
By default, Apache::Request won't process uploaded file data. This is because the file is read into memory, which might not be released to the operating system once the request is over. If you do enable uploaded files (by setting DISABLE_UPLOADS to false), set an upper limit on the size of the file you will accept. This prevents a malicious attacker from sending an infinite stream of data and exhausting your system's memory. The POST_MAX value (10M in the Solution code) is that maximum value, specified in bytes.
The $r->upload method processes the POSTed file data and returns an Apache::Upload object. This object has the following methods for accessing information on the uploaded file:
Method
Returns
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Speeding Up Database Access
Inhaltsvorschau
You use the DBI module within your mod_perl handler, but connecting to the database server for each request is slowing down your web application unacceptably.
To cache database connections transparently, load the Apache::DBI module before the DBI module:
use Apache::DBI;

use DBI;
Many sites load the Apache::DBI module from the httpd.conf file to ensure it's loaded before anything else:
PerlModule Apache::DBI
The Apache::DBI module intercepts the DBI->connect method, returning a previously opened handle if the handle had the same connection parameters as the current request. The module also prevents $dbh->disconnect from closing connections. This lets you add use Apache::DBI to the start of an existing program without having to touch the rest of your code.
The Apache::DBI module uses an open database connection for each different database login in each Apache child process. You might need to change your database server's configuration to increase its maximum number of connections. With commercial database systems, you might even need to buy more client licenses.
This proliferation of connections can lead to situations where Apache::DBI isn't the best choice. For example, if each user of your site has his own database login, you'll need as many concurrent database connections as the number of active users multiplied by however many httpd processes are running worth of database connections, which might well be more than your server supports! Similarly, if you have many Apache child processes running concurrently, this could open more simultaneous database connections than your server supports.
One strategy for optimizing database access is to batch requests where possible. For example, if you're logging to a database, consider accumulating log information and updating the database only after every 5 or 10 hits.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Customizing Apache's Logging
Inhaltsvorschau
You want to change how Apache logs requests. For example, you want a database of URLs and access counts, or per-user logs.
Install a handler with PerlLogHandler:
PerlModule Apache::MyLogger

PerlLogHandler Apache::MyLogger
Within the handler, methods on the request object obtain information about the completed request. In the following code, $r is the request object and $c is the connection object obtained from $r->connection:
$r->the_request               GET /roast/chickens.html HTTP/1.1

$r->uri                       /roast/chickens.html

$r->header_in("User-Agent")   Mozilla-XXX

$r->header_in("Referer")      http://gargle.com/?search=h0t%20chix0rz

$r->bytes_sent                1648

$c->get_remote_host           208.201.239.56

$r->status_line               200 OK

$r->server_hostname           www.myserver.com
Apache calls logging handlers after sending the response to the client. You have full access to the request and response parameters, such as client IP address, headers, status, and even content. Access this information through method calls on the request object.
You'll probably want to escape values before writing them to a text file because spaces, newlines, and quotes could spoil the formatting of the files. Two useful functions are:
# return string with newlines and double quotes escaped

sub escape {

  my $a = shift;

  $a =~ s/([\n\"])/sprintf("%%%02x", ord($1))/ge;

 return $a;

}



# return string with newlines, spaces, and double quotes escaped

sub escape_plus {

  my $a = shift;

  $a =~ s/([\n \"])/sprintf("%%%02x", ord($1))/ge;

  return $a;

}
Two prebuilt logging modules on CPAN are Apache::Traffic and Apache::DBILogger. Apache::Traffic lets you assign owner strings (either usernames, UIDs, or arbitrary strings) to your web server's directories in
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Transparently Storing Information in URLs
Inhaltsvorschau
You want to store information like session IDs in the URL, but you don't want to figure out to how work around the extra data when constructing relative URLs.
Store the ID at the start of the URL:
http://www.example.com/ID/12345678/path/to/page
Extract it with a PerlTransHandler, and store it in a pnote, a hash entry accessible by other Perl handlers in this request:
sub trans {

  my $r = shift;



  my $uri = $r->uri( );

  if ($uri =~ s{/ID/(\d{8})}{  }) {

    $r->pnotes("ID", $1);

  }

  $r->uri($uri);

  return DECLINED;

}
Restore the URL in a PerlFixupHandler:
sub fixup {

  my $r = shift;



  my $id = $r->pnotes("ID");

  if ($id) {

    $r->uri("/ID/$id" . $r->uri);

  }

  return DECLINED;

}
Consult the pnote in the content handler:
use Apache::URI;



sub content {

  my $r = shift;

  my $id = $r->pnotes("ID");



  unless ($id) {

    join(('', map { int rand 10 } (1..8));

    my $uri = Apache::URI->parse($r);

    $uri->path("ID/$id" . $uri->path);

    $r->header_out(Location => $uri->unparse);

    return REDIRECT;

  }



  # use $id



  return OK;

}
The client thinks your pages have a URL like http://www.example.com/ID/12345678/path/to/page.html. Your PerlTransHandler intercepts the incoming request and removes the /ID/12345678 part before Apache tries to translate the request into a file location. Just before your content handler runs, your PerlFixupHandler reinserts the ID. When your content handler calls $r->uri, it gets a URI that includes the ID.
We returned DECLINED from our PerlTransHandler and PerlFixupHandler to indicate that any other translation or fixup handlers that were installed should also be run. If we returned OK in the PerlTransHandler, Apache would not call any subsequent translation handlers. In PerlFixupHandlers, DECLINED and OK both mean a successful fixup, and that other fixup handlers should also run.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Communicating Between mod_perl and PHP
Inhaltsvorschau
You want to build your site from both mod_perl and PHP. For example, you might want to use mod_perl for authentication and logging, while PHP generates the actual content. However, doing so means that Perl and PHP must share values; for example, so the PHP content handler knows which username successfully authenticated through mod_perl.
Use Apache notes. From Perl, you simply say:
$main = $r->main || $r;

$main->notes($KEY => $VALUE);

$VALUE = $main->notes($KEY);
From PHP, you say:
apache_note($KEY, $VALUE);

$VALUE = apache_note($KEY);
A note is a string value attached to an Apache request. They're a perfect way to pass information between handlers, even when those handlers are written in different programming languages. Each request has a different set of notes, so from Perl always identify the main request and use it to communicate with PHP code.
Don't confuse the $r->notes method with the $r->pnotes method. The latter is only available to Perl modules.
Recipe 21.10
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Migrating from CGI to mod_perl
Inhaltsvorschau
Your CGI script is called so often that your web server's performance deteriorates unacceptably. You'd like to use mod_perl to make things faster.
Use Apache::Registry or Apache::PerlRun:
PerlModule Apache::Registry

# or Apache::PerlRun

PerlModule CGI

PerlSendHeader On



Alias /perl/ /real/path/to/perl/scripts/

<Location /perl>

SetHandler  perl-script

PerlHandler Apache::Registry

# or Apache::PerlRun

Options ExecCGI

</Location>
The Solution tells Apache that requests with URLs starting in /perl/ are in /real/path/to/perl/scripts/ and that Apache::Registry handles them. This module runs them in a CGI environment. PerlModule CGI preloads the CGI module, and PerlSendHeader On makes most CGI scripts work out of the box with mod_perl.
We have configured /perl/ to work analogously to /cgi-bin/. To make the suffix .perl indicate mod_perl CGI scripts, just as the suffix .cgi indicates regular CGI scripts, use the following in your Apache configuration file:
<Files *.perl>

SetHandler  perl-script

PerlHandler Apache::Registry

Options ExecCGI

</Files>
Because the Perl interpreter that runs your CGI script doesn't shut down when your script is done, as would occur when the web server runs your script as a separate program, you cannot rely on global variables being undefined when the script runs repeatedly. The warnings and strict pragmas check for many bad habits in these kinds of scripts. There are other gotchas, too—see the mod_perl_traps manpage.
The Apache::PerlRun handler can work around some of these traps. This is like Apache::Registry, but doesn't cache the compiled module. If your CGI program is sloppy and doesn't initialize variables or close filehandles, you can still gain speed by not starting a new process for every request. To use it, substitute Apache::PerlRun for Apache::Registry.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sharing Information Between Handlers
Inhaltsvorschau
You want to share information between handlers, but global variables are global to a process and not automatically cleaned up after every request.
Use Apache pnotes (Perl notes):
# in one handler

$r->pnotes("Name", $name);



# in another handler

$name = $r->pnotes("Name");
Apache modules communicate with each other using notes (see Recipe 21.11). Apache notes act like a hash attached to a request—one handler stores a value for a key in the hash, so that another handler can read it later. The Perl notes features is also a hash attached to the request object, but it's only for the Perl handlers.
To set a pnote, pass a key and a value to the $r->pnotes method. To retrieve a pnote, pass only the key. You can store complex data structures:
$r->pnotes("Person", { Name => "Nat",

                       Age  => 30,

                       Kids => 2 });

# later

$person = $r->pnotes("Person");
and even objects:
$person = new Person;

$person->name("Nat");

$person->age(30);

$person->kids(2);



$r->pnotes(Person => $person);



# later



$person = $r->pnotes("Person");

# $person is a reference to the same object
Recipe 8.11 in mod_perl Developer's Cookbook; Apache::Table; pnotes method in the Apache manpage
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reloading Changed Modules
Inhaltsvorschau
You've updated your mod_perl modules, but you have to restart the web server for Apache to notice the change.
Use Apache::StatINC (standard with mod_perl) to automatically reload any code when it changes on disk:
PerlModule Apache::StatINC

PerlInitHandler Apache::StatINC
Or use the CPAN module Apache::Reload to limit the monitoring to specific modules:
PerlModule Apache::Reload

PerlInitHandler Apache::Reload

PerlSetVar ReloadAll Off

PerlSetVar ReloadModules "Example::One Example::Two Example::Three"
Apache::Reload includes the functionality of Apache::StatINC. Simply saying:
PerlModule Apache::Reload

PerlInitHandler Apache::Reload
is enough to duplicate the functionality of Apache::StatINC. That is, at the start of each request, Apache::Reload goes through all currently loaded modules, checking timestamps to see which have changed. Because checking every module on every request is a burden on popular sites, Apache::Reload also lets you specify which modules to check and reload.
The documentation for the Apache::StatINC and Apache::Reload modules; Recipe 8.1 in mod_perl Developer's Cookbook; the mod_perl guide at http://perl.apache.org/guide
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Benchmarking a mod_perl Application
Inhaltsvorschau
You have an idea to speed up your application, but you're not sure whether your change will help.
Use Apache::Timeit to time how long your content handler takes to run:
PerlModule Apache::Timeit

PerlFixupHandler Apache::Timeit
For more detailed analysis, use the Apache::DProf module available from CPAN:
PerlModule Apache::DProf
The Apache::Timeit module is available at http://perl.apache.org/dist/contrib/Timeit.pm.
This module records in the error log the amount of time that elapsed while the content handler ran. By scanning the logs and averaging these numbers, you can see which pages take longer to generate, and you can then start to figure out why.
To drill down into your code and figure out which parts of the content handler are taking the most time, use the Apache::DProf module. This module connects the standard (as of v5.8) Devel::DProf module to Apache and mod_perl.
The profiler records time spent in every subroutine the Perl module executes. The record of times is written to a file named dprof/ $$ /tmon.out ($$ is the process ID of the Apache child process), located under the ServerRoot directory. This file corresponds to every Perl subroutine encountered during the lifetime of the Apache child process. To profile just a single request, set the MaxRequestsPerChild directive in the httpd.conf file:
MaxRequestsPerChild 1
You must create and chmod this directory yourself:
cd $APACHE_SERVER_ROOT

mkdir logs/dprof

chmod 777 logs/dprof
To analyze the output, use the dprofpp program:
dprofpp -r dprof/13169/tmon.out

Total Elapsed Time = 89.93962 Seconds

               Real Time = 89.93962 Seconds

               Exclusive Times

               %Time ExclSec CumulS #Calls sec/call Csec/c  Name
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Templating with HTML::Mason
Inhaltsvorschau
You want to separate presentation (HTML formatting) from logic (Perl code) in your program. Your web site has a lot of components with only slight variations between them. You'd like to abstract out common elements and build your pages from templates without having a lot of "if I'm in this page, then print this; else if I'm in some other page . . . " conditional statements in a single master template.
Use HTML::Mason components and inheritance.
HTML::Mason (also simply called Mason) offers the power of Perl in templates. The basic unit of a web site built with Mason is the component—a file that produces output. The file can be HTML, Perl, or a mixture of both. Components can take arguments and execute arbitrary Perl code. Mason has many features, documented at http://masonhq.com and in Embedding Perl in HTML with Mason by Dave Rolsky and Ken Williams (O'Reilly; online at http://masonbook.com).
Mason works equally well with CGI, mod_perl, and non-web programs. For the purposes of this recipe, however, we look at how to use it with mod_perl. The rest of this recipe contains a few demonstrations to give you a feel for what you can do with Mason and how your site will be constructed. There are more tricks, traps, and techniques for everything we discuss, though, so be sure to visit the web site and read the book for the full story.

Section 21.16.3.1: Configuration

Install the HTML-Mason distribution from CPAN and add the following to your httpd.conf:
PerlModule HTML::Mason::ApacheHandler

<Location /mason>

  SetHandler perl-script

  PerlHandler HTML::Mason::ApacheHandler

  DefaultType text/html

</Location>
This tells mod_perl that every URL that starts with /mason is handled by Mason. So if you request /mason/hello.html, the file mason/hello.html
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Templating with Template Toolkit
Inhaltsvorschau
You want to separate presentation (HTML formatting) from logic (Perl code) in your program. You want designers and other people who don't speak Perl to be able to edit the templates.
Use the Template Toolkit and Apache::Template.
The Template Toolkit (TT2) is a general templating system that can be used not just for web pages, but for any kind of templated text. The Apache::Template module is an Apache content handler that uses TT2 to build the returned page. The biggest benefit of TT2 is that it has a simple language for variables, loops, and data structures, which can be used instead of Perl for presentation logic. This simple language can be read and written by people who don't know Perl.
This recipe documents Version 2 of the Template Toolkit. As with HTML::Mason, there's far more to TT2 that we can possibly cover here. This recipe is just a tour of some of the highlights of TT2's syntax and functionality. The Template Toolkit is well documented at http://www.template-toolkit.org, and in the upcoming book Perl Template Toolkit, by Darren Chamberlain, Dave Cross, and Andy Wardley (O'Reilly).

Section 21.17.3.1: Configuration

Install the Template and Apache::Template modules from CPAN. Add this to your httpd.conf file:
PerlModule Apache::Template



TT2EvalPerl On

TT2Params all



TT2IncludePath /usr/local/apache/htdocs/tt2



<Location /tt2>

  SetHandler perl-script

  PerlHandler Apache::Template

  DefaultType text/html

</Location>
The TT2EvalPerl directive lets us embed Perl code in our templates as well as the TT2 language. TT2Params tells Apache::Template to give our templates access to form parameters, Apache environment variables, notes, cookies, and more. TT2IncludePath tells the Template Toolkit where to look for templates that our templates include. Finally, we designate the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 22: XML
Inhaltsvorschau
I am a little world made cunningly. Of elements, and an angelic sprite
—John Donne, Holy Sonnets
The Extensible Markup Language (XML) standard was released in 1998. It quickly became the standard way to represent and exchange almost every kind of data, from books to genes to function calls.
XML succeeded where other past "standard" data formats failed (including XML's ancestor, SGML—the Standard Generalized Markup Language). There are three reasons for XML's success: it is text-based instead of binary, it is simple rather than complex, and it has a superficial resemblance to HTML.
Text
Unix realized nearly 30 years before XML that humans primarily interact with computers through text. Thus text files are the only files any system is guaranteed to be able to read and write. Because XML is text, programmers can easily make legacy systems emit XML reports.
Simplicity
As we'll see, a lot of complexity has arisen around XML, but the XML standard itself is very simple. There are very few things that can appear in an XML document, but from those basic building blocks you can build extremely complex systems.
HTML
XML is not HTML, but XML and HTML share a common ancestor: SGML. The superficial resemblance meant that the millions of programmers who had to learn HTML to put data on the web were able to learn (and accept) XML more easily.
Example 22-1 shows a simple XML document.
Example 22-1. Simple XML document
<?xml version="1.0" encoding="UTF-8"?>

<books>

  <!-- Programming Perl 3ed -->

  <book id="1">

    <title>Programming Perl</title>

    <edition>3</edition>

    <authors>

      <author>

        <firstname>Larry</firstname>

        <lastname>Wall</lastname>

      </author>

      <author>

        <firstname>Tom</firstname>

        <lastname>Christiansen</lastname>

      </author>

      <author>

        <firstname>Jon</firstname>

        <lastname>Orwant</lastname>

      </author>

    </authors>

    <isbn>0-596-00027-8</isbn>

  </book>

  <!-- Perl & LWP -->

  <book id="2">

    <title>Perl &amp; </title>

    <edition>1</edition>

    <authors>

      <author>

        <firstname>Sean</firstname>

        <lastname>Burke</lastname>

      </author>

    </authors>

    <isbn>0-596-00178-9</isbn>

  </book>

  <book id="3">

    <!-- Anonymous Perl -->

    <title>Anonymous Perl</title>

    <edition>1</edition>

    <authors />

    <isbn>0-555-00178-0</isbn>

  </book>

</books>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introduction
Inhaltsvorschau
The Extensible Markup Language (XML) standard was released in 1998. It quickly became the standard way to represent and exchange almost every kind of data, from books to genes to function calls.
XML succeeded where other past "standard" data formats failed (including XML's ancestor, SGML—the Standard Generalized Markup Language). There are three reasons for XML's success: it is text-based instead of binary, it is simple rather than complex, and it has a superficial resemblance to HTML.
Text
Unix realized nearly 30 years before XML that humans primarily interact with computers through text. Thus text files are the only files any system is guaranteed to be able to read and write. Because XML is text, programmers can easily make legacy systems emit XML reports.
Simplicity
As we'll see, a lot of complexity has arisen around XML, but the XML standard itself is very simple. There are very few things that can appear in an XML document, but from those basic building blocks you can build extremely complex systems.
HTML
XML is not HTML, but XML and HTML share a common ancestor: SGML. The superficial resemblance meant that the millions of programmers who had to learn HTML to put data on the web were able to learn (and accept) XML more easily.
Example 22-1 shows a simple XML document.
Example 22-1. Simple XML document
<?xml version="1.0" encoding="UTF-8"?>

<books>

  <!-- Programming Perl 3ed -->

  <book id="1">

    <title>Programming Perl</title>

    <edition>3</edition>

    <authors>

      <author>

        <firstname>Larry</firstname>

        <lastname>Wall</lastname>

      </author>

      <author>

        <firstname>Tom</firstname>

        <lastname>Christiansen</lastname>

      </author>

      <author>

        <firstname>Jon</firstname>

        <lastname>Orwant</lastname>

      </author>

    </authors>

    <isbn>0-596-00027-8</isbn>

  </book>

  <!-- Perl & LWP -->

  <book id="2">

    <title>Perl &amp; </title>

    <edition>1</edition>

    <authors>

      <author>

        <firstname>Sean</firstname>

        <lastname>Burke</lastname>

      </author>

    </authors>

    <isbn>0-596-00178-9</isbn>

  </book>

  <book id="3">

    <!-- Anonymous Perl -->

    <title>Anonymous Perl</title>

    <edition>1</edition>

    <authors />

    <isbn>0-555-00178-0</isbn>

  </book>

</books>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing XML into Data Structures
Inhaltsvorschau
You want a Perl data structure (a combination of hashes and arrays) that corresponds to the structure and content of an XML file. For example, you have XML representing a configuration file, and you'd like to say $xml->{config}{server}{hostname} to access the contents of <config><server><hostname>...</hostname>.
Use the XML::Simple module from CPAN. If your XML is in a file, pass the filename to XMLin:
use XML::Simple;

$ref = XMLin($FILENAME, ForceArray => 1);
If your XML is in a string, pass the string to XMLin:
use XML::Simple;

$ref = XMLin($STRING, ForceArray => 1);
Here's the data structure that XML::Simple produces from the XML in Example 22-1:
{
'book' => {

  '1' => {

    'authors' => [

      {

        'author' => [

          {

            'firstname' => [ 'Larry' ],

            'lastname'  => [ 'Wall'  ]

          },

          {

            'firstname' => [ 'Tom' ],

            'lastname'  => [ 'Christiansen' ]

          },

          {

            'firstname' => [ 'Jon' ],

            'lastname'  => [ 'Orwant' ]

          }

        ]

      }

    ],

    'edition' => [ '3' ],

    'title'   => [ 'Programming Perl' ],

    'isbn'    => [ '0-596-00027-8' ]

  },

  '2' => {

    'authors' => [

      {

        'author' => [

          {

            'firstname' => [ 'Sean' ],

            'lastname'  => [ 'Burke' ]

          }

        ]

      }

    ],

    'edition' => [ '1' ],

    'title'   => [ 'Perl & LWP' ],

    'isbn'    => [ '0-596-00178-9' ]

  },

  '3' => {

    'authors' => [ {  } ],

    'edition' => [ '1' ],

    'title'   => [ 'Anonymous Perl' ],

    'isbn'    => [ '0-555-00178-0' ]

  },

}

  }
The basic function of XML::Simple is to turn an element that contains other elements into a hash. If there are multiple identically named elements inside a single containing element (e.g., book), they become an array of hashes unless XML::Simple knows they are uniquely identified by attributes (as happens here with the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing XML into a DOM Tree
Inhaltsvorschau
You want to use the Document Object Model (DOM) to access and perhaps change the parse tree of an XML file.
Use the XML::LibXML module from CPAN:
use XML::LibXML;

my $parser = XML::LibXML->new( );

my $dom    = $parser->parse_string($XML);

# or

my $dom    = $parser->parse_file($FILENAME);

my $root   = $dom->getDocumentElement;
DOM is a framework of classes for representing XML parse trees. Each element is a node in the tree, with which you can do operations like find its children nodes (the XML elements in this case), add another child node, and move the node somewhere else in the tree. The parse_string, parse_file, and parse_fh (filehandle) constructors all return a DOM object that you can use to find nodes in the tree.
For example, given the books XML from Example 22-1, Example 22-2 shows one way to print the titles.
Example 22-2. dom-titledumper
#!/usr/bin/perl -w

# dom-titledumper -- display titles in books file using DOM



use XML::LibXML;

use Data::Dumper;

use strict;



my $parser = XML::LibXML->new;

my $dom = $parser->parse_file("books.xml") or die;



# get all the title elements

my @titles = $dom->getElementsByTagName("title");

foreach my $t (@titles) {

    # get the text node inside the <title> element, and print its value

    print $t->firstChild->data, "\n";

}
The getElementsByTagName method returns a list of elements as nodes within the document that have the specific tag name. Here we get a list of the title elements, then go through each title to find its contents. We know that each title has only a single piece of text, so we assume the first child node is text and print its contents.
If we wanted to confirm that the node was a text node, we could say:
die "the title contained something other than text!"

  if $t->firstChild->nodeType != 3;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing XML into SAX Events
Inhaltsvorschau
You want to receive Simple API for XML (SAX) events from an XML parser because event-based parsing is faster and uses less memory than parsers that build a DOM tree.
Use the XML::SAX module from CPAN:
use XML::SAX::ParserFactory;

use MyHandler;



my $handler = MyHandler->new( );

my $parser = XML::SAX::ParserFactory->parser(Handler => $handler);



$parser->parse_uri($FILENAME);

# or

$parser->parse_string($XML);
Logic for handling events goes into the handler class (MyHandler in this example), which you write:
# in MyHandler.pm

package MyHandler;



use base qw(XML::SAX::Base);



sub start_element {   # method names are specified by SAX

  my ($self, $data) = @_;

  # $data is hash with keys like Name and Attributes

  # ...

}



# other possible methods include end_element( ) and characters( )



1;
An XML processor that uses SAX has three parts: the XML parser that generates SAX events, the handler that reacts to them, and the stub that connects the two. The XML parser can be XML::Parser, XML::LibXML, or the pure Perl XML::SAX::PurePerl that comes with XML::SAX. The XML::SAX::ParserFactory module selects a parser for you and connects it to your handler. Your handler takes the form of a class that inherits from XML::SAX::Base. The stub is the program shown in the Solution.
The XML::SAX::Base module provides stubs for the different methods that the XML parser calls on your handler. Those methods are listed in Table 22-2, and are the methods defined by the SAX1 and SAX2 standards at http://www.saxproject.org/. The Perl implementation uses more Perl-ish data structures and is described in the XML::SAX::Intro manpage.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making Simple Changes to Elements or Text
Inhaltsvorschau
You want to filter some XML. For example, you want to make substitutions in the body of a document, or add a price to every book described in an XML document, or you want to change <book id="1"> to <book> <id>1</id>.
Use the XML::SAX::Machines module from CPAN:
#!/usr/bin/perl -w



use MySAXFilter1;

use MySAXFilter2;

use XML::SAX::ParserFactory;

use XML::SAX::Machines qw(Pipeline);



my $machine = Pipeline(MySAXFilter1 => MySAXFilter2); # or more

$machine->parse_uri($FILENAME);
Write a handler, inheriting from XML::SAX::Base as in Recipe 22.3, then whenever you need a SAX event, call the appropriate handler in your superclass. For example:
$self->SUPER::start_element($tag_struct);
A SAX filter accepts SAX events and triggers new ones. The XML::SAX::Base module detects whether your handler object is called as a filter. If so, the XML::SAX::Base methods pass the SAX events onto the next filter in the chain. If your handler object is not called as a filter, then the XML::SAX::Base methods consume events but do not emit them. This makes it almost as simple to write events as it is to consume them.
The XML::SAX::Machines module chains the filters for you. Import its Pipeline function, then say:
my $machine = Pipeline(Filter1 => Filter2 => Filter3 => Filter4);

$machine->parse_uri($FILENAME);
SAX events triggered by parsing the XML file go to Filter1, which sends possibly different events to Filter2, which in turn sends events to Filter3, and so on to Filter4. The last filter should print or otherwise do something with the incoming SAX events. If you pass a reference to a typeglob, XML::SAX::Machines writes the XML to the filehandle in that typeglob.
Example 22-5 shows a filter that turns the id attribute in book elements from the XML document in Example 22-1 into a new
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Validating XML
Inhaltsvorschau
You want to ensure that the XML you're processing conforms to a DTD or XML Schema.
To validate against a DTD, use the XML::LibXML module:
use XML::LibXML;

my $parser = XML::LibXML->new;

$parser->validation(1);

$parser->parse_file($FILENAME);
To validate against a W3C Schema, use the XML::Xerces module:
use XML::Xerces;



my $parser = XML::Xerces::DOMParser->new;

$parser->setValidationScheme($XML::Xerces::DOMParser::Val_Always);



my $error_handler = XML::Xerces::PerlErrorHandler->new( );

$parser->setErrorHandler($error_handler);



$parser->parse($FILENAME);
The libxml2 library, upon which XML::LibXML is based, can validate as it parses. The validation method on the parser enables this option. At the time of this writing, XML::LibXML could only validate with DOM parsing—validation is not available with SAX-style parsing.
Example 22-7 is a DTD for the books.xml file in Example 22-1.
Example 22-7. validating-booksdtd
<!ELEMENT books (book*)>

<!ELEMENT book (title,edition,authors,isbn)>

<!ELEMENT authors (author*)>

<!ELEMENT author (firstname,lastname)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT edition (#PCDATA)>

<!ELEMENT firstname (#PCDATA)>

<!ELEMENT lastname (#PCDATA)>

<!ELEMENT isbn (#PCDATA)>



<!ATTLIST book

  id    CDATA    #REQUIRED

>
To make XML::LibXML parse the DTD, add this line to the books.xml file:
<!DOCTYPE books

  SYSTEM "books.dtd">
Example 22-8 is a simple driver used to parse and validate.
Example 22-8. validating-bookchecker
#!/usr/bin/perl -w

# bookchecker - parse and validate the books.xml file



use XML::LibXML;



$parser = XML::LibXML->new;

$parser->validation(1);

$parser->parse_file("books.xml");
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding Elements and Text Within an XML Document
Inhaltsvorschau
You want to get to a specific part of the XML; for example, the href attribute of an a tag whose contents are an img tag with alt text containing the word "monkey".
Use XML::LibXML and construct an XPath expression to find nodes you're interested in:
use XML::LibXML;



my $parser = XML::LibXML->new;

$doc = $parser->parse_file($FILENAME);

my @nodes = $doc->findnodes($XPATH_EXPRESSION);
Example 22-9 shows how you would print all the titles in the book XML from Example 22-1.
Example 22-9. xpath-1
#!/usr/bin/perl -w



use XML::LibXML;



my $parser = XML::LibXML->new;

$doc = $parser->parse_file("books.xml");



# find title elements

my @nodes = $doc->findnodes("/books/book/title");



# print the text in the title elements

foreach my $node (@nodes) {

  print $node->firstChild->data, "\n";

}
The difference between DOM's getElementsByTagName and findnodes is that the former identifies elements only by their name. An XPath expression specifies a set of steps that the XPath engine takes to find nodes you're interested in. In Example 22-9 the XPath expression says "start at the top of the document, go into the books element, go into the book element, and then go into the title element."
The difference is important. Consider this XML document:
<message>

  <header><to>Tom</to><from>Nat</from></header>

  <body>

    <order><to>555 House St, Mundaneville</to>

           <product>Fish sticks</product>

    </order>

  </body>

</message>
There are two to elements here: one in the header and one in the body. If we said $doc->getElementsByTagName("to"), we'd get both to elements. The XPath expression "/message/header/to" restricts output to the to element in the header.
XPath expressions are like regular expressions that operate on XML structure instead of text. As with regular expressions, there are a lot of things you can specify in XPath expressions—far more than the simple "find this child node and go into it" that we've been doing.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing XML Stylesheet Transformations
Inhaltsvorschau
You have an XML stylesheet that you want to use to convert XML into something else. For example, you want to produce HTML from files of XML using the stylesheet.
Use XML::LibXSLT:
use XML::LibXSLT;



my $xslt       = XML::LibXSLT->new;



my $stylesheet = $xslt->parse_stylesheet_file($XSL_FILENAME);

my $results    = $stylesheet->transform_file($XML_FILENAME);



print $stylesheet->output_string($results);
XML::LibXSLT is built on the fast and powerful libxslt library from the GNOME project. To perform a transformation, first build a stylesheet object from the XSL source and then use it to transform an XML file. If you wanted to (for example, your XSL is dynamically generated rather than being stored in a file), you could break this down into separate steps:
use XML::LibXSLT;

use XML::LibXML;



my $xml_parser  = XML::LibXML->new;

my $xslt_parser = XML::LibXSLT->new;



my $xml         = $xml_parser->parse_file($XML_FILENAME);

my $xsl         = $xml_parser->parse_file($XSL_FILENAME);



my $stylesheet  = $xslt_parser->parse_stylesheet($xsl);

my $results     = $stylesheet->transform($xml);

my $output      = $stylesheet->output_string($results);
To save the output to a file, use output_file:
$stylesheet->output_file($OUTPUT_FILENAME);
Similarly, write the output to an already-opened filehandle with output_fh:
$stylesheet->output_fh($FILEHANDLE);
It's possible to pass parameters to the transformation engine. For example, your transformation might use parameters to set a footer at the bottom of each page:
$stylesheet->transform($xml, footer => "'I Made This!'");
The strange quoting is because the XSLT engine expects to see quoted values. In the preceding example, the double quotes tell Perl it's a string, whereas the single quotes are for the XSLT engine.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing Files Larger Than Available Memory
Inhaltsvorschau
You want to work with a large XML file, but you can't read it into memory to form a DOM or other kind of tree because it's too big.
Use SAX (as described in Recipe 22.3) to process events instead of building a tree.
Alternatively, use XML::Twig to build trees only for the parts of the document you want to work with (as specified by XPath expressions):
use XML::Twig;



my $twig = XML::Twig->new( twig_handlers => {

                               $XPATH_EXPRESSION => \&HANDLER,

                               # ...

                            });

$twig->parsefile($FILENAME);

$twig->flush( );
You can call a lot of DOM-like functions from within a handler, but only the elements identified by the XPath expression (and whatever those elements enclose) go into a tree.
DOM modules turn the entire document into a tree, regardless of whether you use all of it. With SAX modules, there are no trees built—if your task depends on document structure, you must keep track of that structure yourself. A happy middle ground is XML::Twig, which creates DOM trees only for the bits of the file that you're interested in. Because you work with files a piece at a time, you can cope with very large files by processing pieces that fit in memory.
For example, to print the titles of books in books.xml (Example 22-1), you could write:
use XML::Twig;



my $twig = XML::Twig->new( twig_roots => { '/books/book' => \&do_book });

$twig->parsefile("books.xml");

$twig->purge( );



sub do_book {

  my($title) = $_->find_nodes("title");

  print $title->text, "\n";

}
For each book element, XML::Twig calls do_book on its contents. That subroutine finds the title node and prints its text. Rather than having the entire file parsed into a DOM structure, we keep only one book element at a time.
Consult the XML::Twig manpages for details on how much DOM and XPath the module supports—it's not complete, but it's growing all the time. XML::Twig uses XML::Parser for its XML parsing, and as a result the functions available on nodes are slightly different from those provided by XML::LibXSLT's DOM parsing.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading and Writing RSS Files
Inhaltsvorschau
You want to create a Rich Site Summary (RSS) file, or read one produced by another application.
Use the CPAN module XML::RSS to read an existing RSS file:
use XML::RSS;



my $rss = XML::RSS->new;

$rss->parsefile($RSS_FILENAME);



my @items = @{$rss->{items}};

foreach my $item (@items) {

  print "title: $item->{'title'}\n";

  print "link: $item->{'link'}\n\n";

 }
To create an RSS file:
use XML::RSS;



my $rss  = XML::RSS->new (version => $VERSION);

$rss->channel( title       => $CHANNEL_TITLE,

               link        => $CHANNEL_LINK,

               description => $CHANNEL_DESC);

$rss->add_item(title       => $ITEM_TITLE,

               link        => $ITEM_LINK,

               description => $ITEM_DESC,

               name        => $ITEM_NAME);

print $rss->as_string;
There are at least four variations of RSS extant: 0.9, 0.91, 1.0, and 2.0. At the time of this writing, XML::RSS understood all but RSS 2.0. Each version has different capabilities, so methods and parameters depend on which version of RSS you're using. For example, RSS 1.0 supports RDF and uses the Dublin Core metadata (http://dublincore.org/). Consult the documentation for what you can and cannot call.
XML::RSS uses XML::Parser to parse the RSS. Unfortunately, not all RSS files are well-formed XML, let alone valid. The XML::RSSLite module on CPAN offers a looser approach to parsing RSS—it uses regular expressions and is much more forgiving of incorrect XML.
Example 22-13 uses XML::RSSLite and LWP::Simple to download The Guardian's RSS feed and print out the items whose descriptions contain the keywords we're interested in.
Example 22-13. rss-parser
#!/usr/bin/perl -w

# guardian-list -- list Guardian articles matching keyword



use XML::RSSLite;

use LWP::Simple;

use strict;



# list of keywords we want

my @keywords = qw(perl internet porn iraq bush);



# get the RSS

my $URL = 'http://www.guardian.co.uk/rss/1,,,00.xml';

my $content = get($URL);



# parse the RSS

my %result;

parseRSS(\%result, \$content);



# build the regex from keywords

my $re = join "|", @keywords;

$re = qr/\b(?:$re)\b/i;



# print report of matching items

foreach my $item (@{ $result{items} }) {

  my $title = $item->{title};

  $title =~ s{\s+}{ };  $title =~ s{^\s+}{  }; $title =~ s{\s+$}{  };



  if ($title =~ /$re/) {

    print "$title\n\t$item->{link}\n\n";

  }

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing XML
Inhaltsvorschau
You have a data structure that you'd like to convert to XML.
Use XML::Simple's XMLout function:
use XML::Simple qw(XMLout);



my $xml = XMLout($hashref);
The XMLout function takes a data structure and produces XML from it. For example, here's how to generate part of the book data:
#!/usr/bin/perl -w



use XML::Simple qw(XMLout);



$ds =  {

    book => [

              {

                id      => 1,

                title   => [ "Programming Perl" ],

                edition => [ 3 ],

              },

              {

                id      => 2,

                title   => [ "Perl & LWP" ],

                edition => [ 1 ],

              },

              {

                id      => 3,

                title   => [ "Anonymous Perl" ],

                edition => [ 1 ],

              },

             ]

  };



print XMLout($ds, RootName => "books" );
This produces:

               <books>

               <book id="1">

               <edition>3</edition>

               <title>Programming Perl</title>

               </book>

               <book id="2">

               <edition>1</edition>

               <title>Perl &amp; </title

               </book>

               <book id="3">

               <edition>1</edition>

               <title>Anonymous Perl</title>

               </book>

               </books>

            
The rule is: if you want something to be text data, rather than an attribute value, put it in an array. Notice how we used the RootName option to XMLout to specify that books is the top-level element. Pass undef or the empty string to generate an XML fragment with no top-level fragment. The default value is opt.
The id entry in each hash became an attribute because the default behavior of XMLout is to do this for the id, key, and name fields. Prevent this with:
XMLout($ds, RootName => "books", KeyAttr => [  ]);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
	

Zurück zu Perl Cookbook


Themen

Buchreihen

Special Interest

International Sites

O'Reilly China O'Reilly USA O'Reilly Japan O'Reilly Taiwan