JETZT ONLINE BESTELLEN
Second Edition August 2003
ISBN 978-0-596-00313-5
964 Seiten
EUR41.00
Weitere Informationen zu diesem Buch
Inhaltsverzeichnis |
Kolophon |
Rezensionen |
Inhaltsverzeichnis
- Chapter 1: Strings
- InhaltsvorschauHe multiplieth words without knowledge.—Job 35:16Many programming languages force you to work at an uncomfortably low level. You think in lines, but your language wants you to deal with pointers. You think in strings, but it wants you to deal with bytes. Such a language can drive you to distraction. Don't despair; Perl isn't a low-level language, so lines and strings are easy to handle.Perl was designed for easy but powerful text manipulation. In fact, Perl can manipulate text in so many ways that they can't all be described in one chapter. Check out other chapters for recipes on text processing. In particular, see Chapter 6 and Chapter 8, which discuss interesting techniques not covered here.Perl's fundamental unit for working with data is the scalar, that is, single values stored in single (scalar) variables. Scalar variables hold strings, numbers, and references. Array and hash variables hold lists or associations of scalars, respectively. References are used for referring to values indirectly, not unlike pointers in low-level languages. Numbers are usually stored in your machine's double-precision floating-point notation. Strings in Perl may be of any length, within the limits of your machine's virtual memory, and can hold any arbitrary data you care to put there—even binary data containing null bytes.A string in Perl is not an array of characters—nor of bytes, for that matter. You cannot use array subscripting on a string to address one of its characters; use
substrfor that. Like all data types in Perl, strings grow on demand. Space is reclaimed by Perl's garbage collection system when no longer used, typically when the variables have gone out of scope or when the expression in which they were used has been evaluated. In other words, memory management is already taken care of, so you don't have to worry about it.A scalar value is either defined or undefined. If defined, it may hold a string, number, or reference. The only undefined value isEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauMany programming languages force you to work at an uncomfortably low level. You think in lines, but your language wants you to deal with pointers. You think in strings, but it wants you to deal with bytes. Such a language can drive you to distraction. Don't despair; Perl isn't a low-level language, so lines and strings are easy to handle.Perl was designed for easy but powerful text manipulation. In fact, Perl can manipulate text in so many ways that they can't all be described in one chapter. Check out other chapters for recipes on text processing. In particular, see Chapter 6 and Chapter 8, which discuss interesting techniques not covered here.Perl's fundamental unit for working with data is the scalar, that is, single values stored in single (scalar) variables. Scalar variables hold strings, numbers, and references. Array and hash variables hold lists or associations of scalars, respectively. References are used for referring to values indirectly, not unlike pointers in low-level languages. Numbers are usually stored in your machine's double-precision floating-point notation. Strings in Perl may be of any length, within the limits of your machine's virtual memory, and can hold any arbitrary data you care to put there—even binary data containing null bytes.A string in Perl is not an array of characters—nor of bytes, for that matter. You cannot use array subscripting on a string to address one of its characters; use
substrfor that. Like all data types in Perl, strings grow on demand. Space is reclaimed by Perl's garbage collection system when no longer used, typically when the variables have gone out of scope or when the expression in which they were used has been evaluated. In other words, memory management is already taken care of, so you don't have to worry about it.A scalar value is either defined or undefined. If defined, it may hold a string, number, or reference. The only undefined value isundef. All other values are defined, even numeric and the empty string. Definedness is not the same as Boolean truth, though; to check whether a value is defined, use theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Accessing Substrings
- InhaltsvorschauYou want to access or modify just a portion of a string, not the whole thing. For instance, you've read a fixed-width record and want to extract individual fields.The
substrfunction lets you read from and write to specific portions of the string.$value = substr($string, $offset, $count); $value = substr($string, $offset); substr($string, $offset, $count) = $newstring; substr($string, $offset, $count, $newstring); # same as previous substr($string, $offset) = $newtail;
Theunpackfunction gives only read access, but is faster when you have many substrings to extract.# get a 5-byte string, skip 3 bytes, # then grab two 8-byte strings, then the rest; # (NB: only works on ASCII data, not Unicode) ($leading, $s1, $s2, $trailing) = unpack("A5 x3 A8 A8 A*", $data); # split at 5-byte boundaries @fivers = unpack("A5" x (length($string)/5), $string); # chop string into individual single-byte characters @chars = unpack("A1" x length($string), $string);Strings are a basic data type; they aren't arrays of a basic data type. Instead of using array subscripting to access individual characters as you sometimes do in other programming languages, in Perl you use functions likeunpackorsubstrto access individual characters or a portion of the string.The offset argument tosubstrindicates the start of the substring you're interested in, counting from the front if positive and from the end if negative. If the offset is 0, the substring starts at the beginning. The count argument is the length of the substring.$string = "This is what you have"; # +012345678901234567890 Indexing forwards (left to right) # 109876543210987654321- Indexing backwards (right to left) # note that 0 means 10 or 20, etc. above $first = substr($string, 0, 1); # "T" $start = substr($string, 5, 2); # "is" $rest = substr($string, 13); # "you have" $last = substr($string, -1); # "e" $end = substr($string, -4); # "have" $piece = substr($string, -8, 3); # "you"
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Establishing a Default Value
- InhaltsvorschauYou would like to supply a default value to a scalar variable, but only if it doesn't already have one. It often happens that you want a hardcoded default value for a variable that can be overridden from the command line or through an environment variable.Use the
||or||=operator, which work on both strings and numbers:# use $b if $b is true, else $c $a = $b || $c; # set $x to $y unless $x is already true $x ||= $y;
If0, "0", and "" are valid values for your variables, usedefinedinstead:# use $b if $b is defined, else $c $a = defined($b) ? $b : $c; # the "new" defined-or operator from future perl use v5.9; $a = $b // $c;
The big difference between the two techniques (definedand||) is what they test: definedness versus truth. Three defined values are still false in the world of Perl:0, "0", and "". If your variable already held one of those, and you wanted to keep that value, a||wouldn't work. You'd have to use the more elaborate three-way test withdefinedinstead. It's often convenient to arrange for your program to care about only true or false values, not defined or undefined ones.Rather than being restricted in its return values to a mere 1 or 0 as in most other languages, Perl's||operator has a much more interesting property: it returns its first operand (the lefthand side) if that operand is true; otherwise it returns its second operand. The&&operator also returns the last evaluated expression, but is less often used for this property. These operators don't care whether their operands are strings, numbers, or references—any scalar will do. They just return the first one that makes the whole expression true or false. This doesn't affect the Boolean sense of the return value, but it does make the operators' return values more useful.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Exchanging Values Without Using Temporary Variables
- InhaltsvorschauYou want to exchange the values of two scalar variables, but don't want to use a temporary variable.Use list assignment to reorder the variables.
($VAR1, $VAR2) = ($VAR2, $VAR1);
Most programming languages require an intermediate step when swapping two variables' values:$temp = $a; $a = $b; $b = $temp;
Not so in Perl. It tracks both sides of the assignment, guaranteeing that you don't accidentally clobber any of your values. This eliminates the temporary variable:$a = "alpha"; $b = "omega"; ($a, $b) = ($b, $a); # the first shall be last -- and versa vice
You can even exchange more than two variables at once:($alpha, $beta, $production) = qw(January March August); # move beta to alpha, # move production to beta, # move alpha to production ($alpha, $beta, $production) = ($beta, $production, $alpha);
When this code finishes,$alpha,$beta, and$productionhave the values "March", "August", and "January".The section on "List value constructors" in perldata(1) and on "List Values and Arrays" in Chapter 2 of Programming PerlEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Converting Between Characters and Values
- InhaltsvorschauYou want to print the number represented by a given character, or you want to print a character given a number.Use
ordto convert a character to a number, or usechrto convert a number to its corresponding character:$num = ord($char); $char = chr($num);
The%cformat used inprintfandsprintfalso converts a number to a character:$char = sprintf("%c", $num); # slower than chr($num) printf("Number %d is character %c\n", $num, $num); Number 101 is character eAC*template used withpackandunpackcan quickly convert many 8-bit bytes; similarly, useU*for Unicode characters.@bytes = unpack("C*", $string); $string = pack("C*", @bytes); $unistr = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9); @unichars = unpack("U*", $unistr);Unlike low-level, typeless languages such as assembler, Perl doesn't treat characters and numbers interchangeably; it treats strings and numbers interchangeably. That means you can't just assign characters and numbers back and forth. Perl provides Pascal'schrandordto convert between a character and its corresponding ordinal value:$value = ord("e"); # now 101 $character = chr(101); # now "e"If you already have a character, it's really represented as a string of length one, so just print it out directly usingprintor the%sformat inprintfandsprintf. The%cformat forcesprintforsprintfto convert a number into a character; it's not used for printing a character that's already in character format (that is, a string).printf("Number %d is character %c\n", 101, 101);Thepack,unpack,chr, andordfunctions are all faster thanEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using Named Unicode Characters
- InhaltsvorschauYou want to use Unicode names for fancy characters in your code without worrying about their code points.Place a
usecharnamesat the top of your file, then freely insert "\N{CHARSPEC}" escapes into your string literals.Theusecharnamespragma lets you use symbolic names for Unicode characters. These are compile-time constants that you access with the\N{CHARSPEC} double-quoted string sequence. Several subpragmas are supported. The:fullsubpragma grants access to the full range of character names, but you have to write them out in full, exactly as they occur in the Unicode character database, including the loud, all-capitals notation. The:shortsubpragma gives convenient shortcuts. Any import without a colon tag is taken to be a script name, giving case-sensitive shortcuts for those scripts.use charnames ':full'; print "\N{GREEK CAPITAL LETTER DELTA} is called delta.\n"; Δ is called delta use charnames ':short'; print "\N{greek:Delta} is an upper-case delta.\n"; Δ is an upper-case delta use charnames qw(cyrillic greek); print "\N{Sigma} and \N{sigma} are Greek sigmas.\n"; print "\N{Be} and \N{be} are Cyrillic bes.\n"; Σ and σ are Greek sigmas Б and б are Cyrillic besTwo functions,charnames::viacodeandcharnames::vianame, can translate between numeric code points and the long names. The Unicode documents use the notation U+XXXX to indicate the Unicode character whose code point is XXXX, so we'll use that here in our output.use charnames qw(:full); for $code (0xC4, 0x394) { printf "Character U+%04X (%s) is named %s\n", $code, chr($code), charnames::viacode($code); } Character U+00C4 (Ä) is named LATIN CAPITAL LETTER A WITH DIAERESISEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Processing a String One Character at a Time
- InhaltsvorschauYou want to process a string one character at a time.Use
splitwith a null pattern to break up the string into individual characters, or useunpackif you just want the characters' values:@array = split(//, $string); # each element a single character @array = unpack("U*", $string); # each element a code point (number)Or extract each character in turn with a loop:while (/(.)/g) { # . is never a newline here # $1 has character, ord($1) its number }As we said before, Perl's fundamental unit is the string, not the character. Needing to process anything a character at a time is rare. Usually some kind of higher-level Perl operation, like pattern matching, solves the problem more handily. See, for example, Recipe 7.14, where a set of substitutions is used to find command-line arguments.Splitting on a pattern that matches the empty string returns a list of individual characters in the string. This is a convenient feature when done intentionally, but it's easy to do unintentionally. For instance,/X*/matches all possible strings, including the empty string. Odds are you will find others when you don't mean to.Here's an example that prints the characters used in the string "anappleaday", sorted in ascending order:%seen = ( ); $string = "an apple a day"; foreach $char (split //, $string) { $seen{$char}++; } print "unique chars are: ", sort(keys %seen), "\n"; unique chars are: adelnpyThesesplitandunpacksolutions give an array of characters to work with. If you don't want an array, use a pattern match with the/gflag in awhileloop, extracting one character at a time:%seen = ( ); $string = "an apple a day"; while ($string =~ /(.)/g) { $seen{$1}++; } print "unique chars are: ", sort(keys %seen), "\n";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reversing a String by Word or Character
- InhaltsvorschauYou want to reverse the words or characters of a string.Use the
reversefunction in scalar context for flipping characters:$revchars = reverse($string);
To flip words, usereversein list context withsplitandjoin:$revwords = join(" ", reverse split(" ", $string));Thereversefunction is two different functions in one. Called in scalar context, it joins together its arguments and returns that string in reverse order. Called in list context, it returns its arguments in the opposite order. When usingreversefor its character-flipping behavior, usescalarto force scalar context unless it's entirely obvious.$gnirts = reverse($string); # reverse letters in $string @sdrow = reverse(@words); # reverse elements in @words $confused = reverse(@words); # reverse letters in join("", @words)Here's an example of reversing words in a string. Using a single space, " ", as the pattern tosplitis a special case. It causessplitto use contiguous whitespace as the separator and also discard leading null fields, just like awk. Normally,splitdiscards only trailing null fields.# reverse word order $string = 'Yoda said, "can you see this?"'; @allwords = split(" ", $string); $revwords = join(" ", reverse @allwords); print $revwords, "\n"; this?" see you "can said, YodaWe could remove the temporary array@allwordsand do it on one line:$revwords = join(" ", reverse split(" ", $string));Multiple whitespace in$stringbecomes a single space in$revwords. If you want to preserve whitespace, use this:$revwords = join("", reverse split(/(\s+)/, $string));One use ofreverseis to test whether a word is a palindrome (a word that reads the same backward or forward):Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Treating Unicode Combined Characters as Single Characters
- InhaltsvorschauYou have a Unicode string that contains combining characters, and you'd like to treat each of these sequences as a single logical character.Process them using
\Xin a regular expression.$string = "fac\x{0327}ade"; # "façade" $string =~ /fa.ade/; # fails $string =~ /fa\Xade/; # succeeds @chars = split(//, $string); # 7 letters in @chars @chars = $string =~ /(.)/g; # same thing @chars = $string =~ /(\X)/g; # 6 "letters" in @charsIn Unicode, you can combine a base character with one or more non-spacing characters following it; these are usually diacritics, such as accent marks, cedillas, and tildas. Due to the presence of precombined characters, for the most part to accommodate legacy character systems, there can be two or more ways of writing the same thing.For example, the word "façade" can be written with one character between the two a's, "\x{E7}", a character right out of Latin1 (ISO 8859-1). These characters might be encoded into a two-byte sequence under the UTF-8 encoding that Perl uses internally, but those two bytes still only count as one single character. That works just fine.There's a thornier issue. Another way to write U+00E7 is with two different code points: a regular "c" followed by "\x{0327}". Code point U+0327 is a non-spacing combining character that means to go back and put a cedilla underneath the preceding base character.There are times when you want Perl to treat each combined character sequence as one logical character. But because they're distinct code points, Perl's character-related operations treat non-spacing combining characters as separate characters, includingsubstr,length, and regular expression metacharacters, such as in/./or/[^abc]/.In a regular expression, theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Canonicalizing Strings with Unicode Combined Characters
- InhaltsvorschauYou have two strings that look the same when you print them out, but they don't test as string equal and sometimes even have different lengths. How can you get Perl to consider them the same strings?When you have otherwise equivalent strings, at least some of which contain Unicode combining character sequences, instead of comparing them directly, compare the results of running them through the
NFD( )function from the Unicode::Normalize module.use Unicode::Normalize; $s1 = "fa\x{E7}ade"; $s2 = "fac\x{0327}ade"; if (NFD($s1) eq NFD($s2)) { print "Yup!\n" }The same character sequence can sometimes be specified in multiple ways. Sometimes this is because of legacy encodings, such as the letters from Latin1 that contain diacritical marks. These can be specified directly with a single character (like U+00E7, LATIN SMALL LETTER C WITH CEDILLA) or indirectly via the base character (like U+0063, LATIN SMALL LETTER C) followed by a combining character (U+0327, COMBINING CEDILLA).Another possibility is that you have two or more marks following a base character, but the order of those marks varies in your data. Imagine you wanted the letter "c" to have both a cedilla and a caron on top of it in order to print a ̌. That could be specified in any of these ways:$string = v231.780; # LATIN SMALL LETTER C WITH CEDILLA # COMBINING CARON $string = v99.807.780; # LATIN SMALL LETTER C # COMBINING CARON # COMBINING CEDILLA $string = v99.780.807 # LATIN SMALL LETTER C # COMBINING CEDILLA # COMBINING CARON
The normalization functions rearrange those into a reliable ordering. Several are provided, includingNFD( )for canonical decomposition andNFC( )for canonical decomposition followed by canonical composition. No matter which of these three ways you used to specify yourEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Treating a Unicode String as Octets
- InhaltsvorschauYou have a Unicode string but want Perl to treat it as octets (e.g., to calculate its length or for purposes of I/O).The
use bytespragma makes all Perl operations in its lexical scope treat the string as a group of octets. Use it when your code is calling Perl's character-aware functions directly:$ff = "\x{FB00}"; # ff ligature $chars = length($ff); # length is one character { use bytes; # force byte semantics $octets = length($ff); # length is two octets } $chars = length($ff); # back to character semanticsAlternatively, theEncodemodule lets you convert a Unicode string to a string of octets, and back again. Use it when the character-aware code isn't in your lexical scope:use Encode qw(encode_utf8); sub somefunc; # defined elsewhere $ff = "\x{FB00}"; # ff ligature $ff_oct = encode_utf8($ff); # convert to octets $chars = somefunc($ff); # work with character string $octets = somefunc($ff_oct); # work with octet stringAs explained in this chapter's Introduction, Perl knows about two types of string: those made of simple uninterpreted octets, and those made of Unicode characters whose UTF-8 representation may require more than one octet. Each individual string has a flag associated with it, identifying the string as either UTF-8 or octets. Perl's I/O and string operations (such aslength) check this flag and give character or octet semantics accordingly.Sometimes you need to work with bytes and not characters. For example, many protocols have aContent-Lengthheader that specifies the size of the body of a message in octets. You can't simply use Perl'slengthfunction to calculate the size, because if the string you're callinglengthon is marked as UTF-8, you'll get the size in characters.Theuse bytesEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Expanding and Compressing Tabs
- InhaltsvorschauYou want to convert tabs in a string to the appropriate number of spaces, or vice versa. Converting spaces into tabs can be used to reduce file size when the file has many consecutive spaces. Converting tabs into spaces may be required when producing output for devices that don't understand tabs or think them at different positions than you do.Either use a rather funny looking substitution:
while ($string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e) { # spin in empty loop until substitution finally fails }or use the standard Text::Tabs module:use Text::Tabs; @expanded_lines = expand(@lines_with_tabs); @tabulated_lines = unexpand(@lines_without_tabs);
Assuming tab stops are set every N positions (where N is customarily eight), it's easy to convert them into spaces. The standard textbook method does not use the Text::Tabs module but suffers slightly from being difficult to understand. Also, it uses the$`variable, whose very mention currently slows down every pattern match in the program. This is explained in Special Variables in Chapter 6. You could use this algorithm to make a filter to expand its input's tabstops to eight spaces each:while (<>) { 1 while s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e; print; }To avoid$`, you could use a slightly more complicated alternative that uses the numbered variables for explicit capture; this one expands tabstops to four each instead of eight:1 while s/^(.*?)(\t+)/$1 . ' ' x (length($2) * 4 - length($1) % 4)/e;
Another approach is to use the offsets directly from the@+and@-arrays. This also expands to four-space positions:1 while s/\t+/' ' x (($+[0] - $-[0]) * 4 - $-[0] % 4)/e;
If you're looking at all of these1 whileloops and wondering why they couldn't have been written as part of a simpleEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Expanding Variables in User Input
- InhaltsvorschauYou've read a string with an embedded variable reference, such as:
You owe $debt to me.
Now you want to replace$debtin the string with its value.Use a substitution with symbolic references if the variables are all globals:$text =~ s/\$(\w+)/${$1}/g;But use a double/eeif they might be lexical (my) variables:$text =~ s/(\$\w+)/$1/gee;
The first technique is basically to find what looks like a variable name, then use symbolic dereferencing to interpolate its contents. If$1contains the stringsomevar,${$1}will be whatever$somevarcontains. This won't work if theusestrict'refs' pragma is in effect because that bans symbolic dereferencing.Here's an example:our ($rows, $cols); no strict 'refs'; # for ${$1}/g below my $text; ($rows, $cols) = (24, 80); $text = q(I am $rows high and $cols long); # like single quotes! $text =~ s/\$(\w+)/${$1}/g; print $text; I am 24 high and 80 longYou may have seen the/esubstitution modifier used to evaluate the replacement as code rather than as a string. It's designed for situations where you don't know the exact replacement value, but you do know how to calculate it. For example, doubling every whole number in a string:$text = "I am 17 years old"; $text =~ s/(\d+)/2 * $1/eg;
When Perl is compiling your program and sees a/eon a substitute, it compiles the code in the replacement block along with the rest of your program, long before the substitution actually happens. When a substitution is made,$1is replaced with the string that matched. The code to evaluate would then be something like:2 * 17
If we tried saying:$text = 'I am $AGE years old'; # note single quotes $text =~ s/(\$\w+)/$1/eg; # WRONG
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Controlling Case
- InhaltsvorschauA string in uppercase needs converting to lowercase, or vice versa.Use the
lcanducfunctions or the\Land\Ustring escapes.$big = uc($little); # "bo peep" -> "BO PEEP" $little = lc($big); # "JOHN" -> "john" $big = "\U$little"; # "bo peep" -> "BO PEEP" $little = "\L$big"; # "JOHN" -> "john"
To alter just one character, use thelcfirstanducfirstfunctions or the\land\ustring escapes.$big = "\u$little"; # "bo" -> "Bo" $little = "\l$big"; # "BoPeep" -> "boPeep"
The functions and string escapes look different, but both do the same thing. You can set the case of either just the first character or the whole string. You can even do both at once to force uppercase (actually, titlecase; see later explanation) on initial characters and lowercase on the rest.$beast = "dromedary"; # capitalize various parts of $beast $capit = ucfirst($beast); # Dromedary $capit = "\u\L$beast"; # (same) $capall = uc($beast); # DROMEDARY $capall = "\U$beast"; # (same) $caprest = lcfirst(uc($beast)); # dROMEDARY $caprest = "\l\U$beast"; # (same)
These capitalization-changing escapes are commonly used to make a string's case consistent:# titlecase each word's first character, lowercase the rest $text = "thIS is a loNG liNE"; $text =~ s/(\w+)/\u\L$1/g; print $text; This Is A Long LineYou can also use these for case-insensitive comparison:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Properly Capitalizing a Title or Headline
- InhaltsvorschauYou have a string representing a headline, the title of book, or some other work that needs proper capitalization.Use a variant of this
tc( )titlecasing function:INIT { our %nocap; for (qw( a an the and but or as at but by for from in into of off on onto per to with )) { $nocap{$_}++; } } sub tc { local $_ = shift; # put into lowercase if on stop list, else titlecase s/(\pL[\pL']*)/$nocap{$1} ? lc($1) : ucfirst(lc($1))/ge; s/^(\pL[\pL']*) /\u\L$1/x; # first word guaranteed to cap s/ (\pL[\pL']*)$/\u\L$1/x; # last word guaranteed to cap # treat parenthesized portion as a complete title s/\( (\pL[\pL']*) /(\u\L$1/x; s/(\pL[\pL']*) \) /\u\L$1)/x; # capitalize first word following colon or semi-colon s/ ( [:;] \s+ ) (\pL[\pL']* ) /$1\u\L$2/x; return $_; }The rules for correctly capitalizing a headline or title in English are more complex than simply capitalizing the first letter of each word. If that's all you need to do, something like this should suffice:s/(\w+\S*\w*)/\u\L$1/g;
Most style guides tell you that the first and last words in the title should always be capitalized, along with every other word that's not an article, the particle "to" in an infinitive construct, a coordinating conjunction, or a preposition.Here's a demo, this time demonstrating the distinguishing property of titlecase. Assume thetcfunction is as defined in the Solution.# with apologies (or kudos) to Stephen Brust, PJF, # and to JRRT, as always. @data = ( "the enchantress of \x{01F3}ur mountain", "meeting the enchantress of \x{01F3}ur mountain", "the lord of the rings: the fellowship of the ring", ); $mask = "%-20s: %s\n"; sub tc_lame { local $_ = shift; s/(\w+\S*\w*)/\u\L$1/g; return $_; } for $datum (@data) { printf $mask, "ALL CAPITALS", uc($datum); printf $mask, "no capitals", lc($datum); printf $mask, "simple titlecase", tc_lame($datum); printf $mask, "better titlecase", tc($datum); print "\n"; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Interpolating Functions and Expressions Within Strings
- InhaltsvorschauYou want a function call or expression to expand within a string. This lets you construct more complex templates than with simple scalar variable interpolation.Break up your expression into distinct concatenated pieces:
$answer = $var1 . func( ) . $var2; # scalar only
Or use the slightly sneaky@{[LISTEXPR]}or${\(SCALAREXPR)}expansions:$answer = "STRING @{[ LIST EXPR ]} MORE STRING"; $answer = "STRING ${\( SCALAR EXPR )} MORE STRING";This code shows both techniques. The first line shows concatenation; the second shows the expansion trick:$phrase = "I have " . ($n + 1) . " guanacos."; $phrase = "I have ${\($n + 1)} guanacos.";The first technique builds the final string by concatenating smaller strings, avoiding interpolation but achieving the same end. Becauseprinteffectively concatenates its entire argument list, if we were going toprint$phrase, we could have just said:print "I have ", $n + 1, " guanacos.\n";
When you absolutely must have interpolation, you need the punctuation-riddled interpolation from the Solution. Only@,$, and\are special within double quotes and most backquotes. (As withm//ands///, theqx( )synonym is not subject to double-quote expansion if its delimiter is single quotes!$home=qx'echohomeis$HOME';would get the shell$HOMEvariable, not one in Perl.) So, the only way to force arbitrary expressions to expand is by expanding a${ }or@{ }whose block contains a reference.In the example:$phrase = "I have ${\( count_em( ) )} guanacos.";the function call within the parentheses is not in scalar context; it is still in list context. The following overrules that:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Indenting Here Documents
- InhaltsvorschauWhen using the multiline quoting mechanism called a here document, the text must be flush against the margin, which looks out of place in the code. You would like to indent the here document text in the code, but not have the indentation appear in the final string value.Use a
s///operator to strip out leading whitespace.# all in one ($var = << HERE_TARGET) =~ s/^\s+//gm; your text goes here HERE_TARGET # or with two steps $var = << HERE_TARGET; your text goes here HERE_TARGET $var =~ s/^\s+//gm;The substitution is straightforward. It removes leading whitespace from the text of the here document. The/mmodifier lets the^character match at the start of each line in the string, and the/gmodifier makes the pattern-matching engine repeat the substitution as often as it can (i.e., for every line in the here document).($definition = << 'FINIS') =~ s/^\s+//gm; The five varieties of camelids are the familiar camel, his friends the llama and the alpaca, and the rather less well-known guanaco and vicuña. FINISBe warned: all patterns in this recipe use\s, meaning one whitespace character, which will also match newlines. This means they will remove any blank lines in your here document. If you don't want this, replace\swith[^\S\n]in the patterns.The substitution uses the property that the result of an assignment can be used as the lefthand side of=~. This lets us do it all in one line, but works only when assigning to a variable. When you're using the here document directly, it would be considered a constant value, and you wouldn't be able to modify it. In fact, you can't change a here document's value unless you first put it into a variable.Not to worry, though, because there's an easy way around this, particularly if you're going to do this a lot in the program. Just write a subroutine:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reformatting Paragraphs
- InhaltsvorschauYour string is too big to fit the screen, and you want to break it up into lines of words, without splitting a word between lines. For instance, a style correction script might read a text file a paragraph at a time, replacing bad phrases with good ones. Replacing a phrase like utilizes the inherent functionality of with uses will change the length of lines, so it must somehow reformat the paragraphs when they're output.Use the standard Text::Wrap module to put line breaks at the right place:
use Text::Wrap; @output = wrap($leadtab, $nexttab, @para);
Or use the more discerning CPAN module, Text::Autoformat, instead:use Text::Autoformat; $formatted = autoformat $rawtext;
The Text::Wrap module provides thewrapfunction, shown in Example 1-3, which takes a list of lines and reformats them into a paragraph with no line more than$Text::Wrap::columnscharacters long. We set$columnsto 20, ensuring that no line will be longer than 20 characters. We passwraptwo arguments before the list of lines: the first is the indent for the first line of output, the second the indent for every subsequent line.Example 1-3. wrapdemo#!/usr/bin/perl -w # wrapdemo - show how Text::Wrap works @input = ("Folding and splicing is the work of an editor,", "not a mere collection of silicon", "and", "mobile electrons!"); use Text::Wrap qw($columns &wrap); $columns = 20; print "0123456789" x 2, "\n"; print wrap(" ", " ", @input), "\n";The result of this program is:01234567890123456789 Folding and splicing is the work of an editor, not a mere collection of silicon andEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Escaping Characters
- InhaltsvorschauYou need to output a string with certain characters (quotes, commas, etc.) escaped. For instance, you're producing a format string for
sprintfand want to convert literal%signs into%%.Use a substitution to backslash or double each character to be escaped:# backslash $var =~ s/([CHARLIST])/\\$1/g; # double $var =~ s/([CHARLIST])/$1$1/g;
$varis the variable to be altered. TheCHARLISTis a list of characters to escape and can contain backslash escapes like\tand\n. If you just have one character to escape, omit the brackets:$string =~ s/%/%%/g;
The following code lets you do escaping when preparing strings to submit to the shell. (In practice, you would need to escape more than just ' and " to make any arbitrary string safe for the shell. Getting the list of characters right is so hard, and the risks if you get it wrong are so great, that you're better off using the list form ofsystemandexecto run programs, shown in Recipe 16.2. They avoid the shell altogether.)$string = q(Mom said, "Don't do that."); $string =~ s/(['"])/\\$1/g;
We had to use two backslashes in the replacement because the replacement section of a substitution is read as a double-quoted string, and to get one backslash, you need to write two. Here's a similar example for VMS DCL, where you need to double every quote to get one through:$string = q(Mom said, "Don't do that."); $string =~ s/(['"])/$1$1/g;
Microsoft command interpreters are harder to work with. In Windows, COMMAND.COM recognizes double quotes but not single ones, disregards backquotes for running commands, and requires a backslash to make a double quote into a literal. Any of the many free or commercial Unix-like shell environments available for Windows will work just fine, though.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Trimming Blanks from the Ends of a String
- InhaltsvorschauYou have read a string that may have leading or trailing whitespace, and you want to remove it.Use a pair of pattern substitutions to get rid of them:
$string =~ s/^\s+//; $string =~ s/\s+$//;
Or write a function that returns the new value:$string = trim($string); @many = trim(@many); sub trim { my @out = @_; for (@out) { s/^\s+//; # trim left s/\s+$//; # trim right } return @out = = 1 ? $out[0] # only one to return : @out; # or many }This problem has various solutions, but this one is the most efficient for the common case. This function returns new versions of the strings passed in to it with their leading and trailing whitespace removed. It works on both single strings and lists.To remove the last character from the string, use thechopfunction. Be careful not to confuse this with the similar but differentchompfunction, which removes the last part of the string contained within that variable if and only if it is contained in the$/variable, "\n" by default. These are often used to remove the trailing newline from input:# print what's typed, but surrounded by > < symbols while (<STDIN>) { chomp; print ">$_<\n"; }This function can be embellished in any of several ways.First, what should you do if several strings are passed in, but the return context demands a single scalar? As written, the function given in the Solution does a somewhat silly thing: it (inadvertently) returns a scalar representing the number of strings passed in. This isn't very useful. You could issue a warning or raise an exception. You could also squash the list of return values together.For strings with spans of extra whitespace at points other than their ends, you could have your function collapse any remaining stretch of whitespace characters in the interior of the string down to a single space each by adding this line as the new last line of the loop:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Parsing Comma-Separated Data
- InhaltsvorschauYou have a data file containing comma-separated values that you need to read, but these data fields may have quoted commas or escaped quotes in them. Most spreadsheets and database programs use comma-separated values as a common interchange format.If your data file follows normal Unix quoting and escaping conventions, where quotes within a field are backslash-escaped "
like \"this\"", use the standard Text::ParseWords and this simple code:use Text::ParseWords; sub parse_csv0 { return quotewords("," => 0, $_[0]); }However, if quotes within a field are doubled "like ""this""", you could use the following procedure from Mastering Regular Expressions, Second Edition:sub parse_csv1 { my $text = shift; # record containing comma-separated values my @fields = ( ); while ($text =~ m{ # Either some non-quote/non-comma text: ( [^"',] + ) # ...or... | # ...a double-quoted field: (with "" allowed inside) " # field's opening quote; don't save this ( now a field is either (?: [^"] # non-quotes or | "" # adjacent quote pairs ) * # any number ) " # field's closing quote; unsaved }gx) { if (defined $1) { $field = $1; } else { ($field = $2) =~ s/""/"/g; } push @fields, $field; } return @fields; }Or use the CPAN Text:CSV module:use Text::CSV; sub parse_csv1 { my $line = shift; my $csv = Text::CSV->new( ); return $csv->parse($line) && $csv->fields( ); }Or use the CPAN Tie::CSV_File module:tie @data, "Tie::CSV_File", "data.csv"; for ($i = 0; $i < @data; $i++) { printf "Row %d (Line %d) is %s\n", $i, $i+1, "@{$data[$i]}"; for ($j = 0; $j < @{$data[$i]}; $j++) { print "Column $j is <$data[$i][$j]>\n"; } }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Constant Variables
- InhaltsvorschauYou want a variable whose value cannot be modified once set.If you don't need it to be a scalar variable that can interpolate, the
useconstantpragma will work:use constant AVOGADRO => 6.02252e23; printf "You need %g of those for guac\n", AVOGADRO;
If it does have to be a variable, assign to the typeglob a reference to a literal string or number, then use the scalar variable:*AVOGADRO = \6.02252e23; print "You need $AVOGADRO of those for guac\n";
But the most foolproof way is via a smalltieclass whoseSTOREmethod raises an exception:package Tie::Constvar; use Carp; sub TIESCALAR { my ($class, $initval) = @_; my $var = $initval; return bless \$var => $class; } sub FETCH { my $selfref = shift; return $$selfref; } sub STORE { confess "Meddle not with the constants of the universe"; }Theuse constantpragma is the easiest to use, but has a few drawbacks. The biggest one is that it doesn't give you a variable that you can expand in double-quoted strings. Another is that it isn't scoped; it puts a subroutine of that name into the package namespace.The way the pragma really works is to create a subroutine of that name that takes no arguments and always returns the same value (or values if a list is provided). That means it goes into the current package's namespace and isn't scoped. You could do the same thing yourself this way:sub AVOGADRO( ) { 6.02252e23 }If you wanted it scoped to the current block, you could make a temporary subroutine by assigning an anonymous subroutine to the typeglob of that name:use subs qw(AVOGADRO); local *AVOGADRO = sub ( ) { 6.02252e23 };But that's pretty magical, so you should comment the code if you don't plan to use the pragma.If instead of assigning to the typeglob a reference to a subroutine, you assign to it a reference to a constant scalar, then you'll be able to use the variable of that name. That's the second technique given in the Solution. Its disadvantage is that typeglobs are available only for package variables, not for lexicals created viaEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Soundex Matching
- InhaltsvorschauYou have two English surnames and want to know whether they sound somewhat similar, regardless of spelling. This would let you offer users a "fuzzy search" of names in a telephone book to catch "Smith" and "Smythe" and others within the set, such as "Smite" and "Smote".Use the standard Text::Soundex module:
use Text::Soundex; $CODE = soundex($STRING); @CODES = soundex(@LIST);
Or use the CPAN module Text::Metaphone:use Text::Metaphone; $phoned_words = Metaphone('Schwern');The soundex algorithm hashes words (particularly English surnames) into a small space using a simple model that approximates an English speaker's pronunciation of the words. Roughly speaking, each word is reduced to a four-character string. The first character is an uppercase letter; the remaining three are digits. By comparing the soundex values of two strings, we can guess whether they sound similar.The following program prompts for a name and looks for similarly sounding names from the password file. This same approach works on any database with names, so you could key the database on the soundex values if you wanted to. Such a key wouldn't be unique, of course.use Text::Soundex; use User::pwent; print "Lookup user: "; chomp($user =<STDIN>); exit unless defined $user; $name_code = soundex($user); while ($uent = getpwent( )) { ($firstname, $lastname) = $uent->gecos =~ /(\w+)[^,]*\b(\w+)/; if ($name_code eq soundex($uent->name) || $name_code eq soundex($lastname) || $name_code eq soundex($firstname) ) { printf "%s: %s %s\n", $uent->name, $firstname, $lastname; } }The Text::Metaphone module from CPAN addresses the same problem in a different and better way. Thesoundexfunction returns a letter and a three-digit code that maps just the beginning of the input string, whereasEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: fixstyle
- InhaltsvorschauImagine you have a table with both old and new strings, such as the following:Old wordsNew wordsbonnethoodrubbereraserlorrytrucktrouserspantsThe program in Example 1-4 is a filter that changes all occurrences of each element in the first set to the corresponding element in the second set.When called without filename arguments, the program is a simple filter. If filenames are supplied on the command line, an in-place edit writes the changes to the files, with the original versions saved in a file with a "
.orig" extension. See Recipe 7.16 for a description. A -v command-line option writes notification of each change to standard error.The table of original strings and their replacements is stored below_ _END_ _in the main program, as described in Recipe 7.12. Each pair of strings is converted into carefully escaped substitutions and accumulated into the$codevariable like the popgrep2 program in Recipe 6.10.A-tcheck to test for an interactive run check tells whether we're expecting to read from the keyboard if no arguments are supplied. That way if users forget to give an argument, they aren't wondering why the program appears to be hung.Example 1-4. fixstyle#!/usr/bin/perl -w # fixstyle - switch first set of <DATA> strings to second set # usage: $0 [-v] [files ...] use strict; my $verbose = (@ARGV && $ARGV[0] eq '-v' && shift); if (@ARGV) { $^I = ".orig"; # preserve old files } else { warn "$0: Reading from stdin\n" if -t STDIN; } my $code = "while (<>) {\n"; # read in config, build up code to eval while (<DATA>) { chomp; my ($in, $out) = split /\s*=>\s*/; next unless $in && $out; $code .= "s{\\Q$in\\E}{$out}g"; $code .= "&& printf STDERR qq($in => $out at \$ARGV line \$.\\n)" if $verbose; $code .= ";\n"; } $code .= "print;\n}\n"; eval "{ $code } 1" || die; _ _END_ _ analysed => analyzed built-in => builtin chastized => chastised commandline => command-line de-allocate => deallocate dropin => drop-in hardcode => hard-code meta-data => metadata multicharacter => multi-character multiway => multi-way non-empty => nonempty non-profit => nonprofit non-trappable => nontrappable pre-define => predefine preextend => pre-extend re-compiling => recompiling reenter => re-enter turnkey => turn-keyEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: psgrep
- InhaltsvorschauMany programs, including ps, netstat, lsof, ls -l, find -ls, and tcpdump, can produce more output than can be conveniently summarized. Logfiles also often grow too long to be easily viewed. You could send these through a filter like grep to pick out only certain lines, but regular expressions and complex logic don't mix well; just look at the hoops we jump through in Recipe 6.18.What we'd really like is to make full queries on the program output or logfile. For example, to ask ps something like, "Show me all processes that exceed 10K in size but which aren't running as the superuser" or "Which commands are running on pseudo-ttys?"The psgrep program does this—and infinitely more—because the specified selection criteria are not mere regular expressions; they're full Perl code. Each criterion is applied in turn to every line of output. Only lines matching all arguments are output. The following is a list of things to find and how to find them.Lines containing "sh" at the end of a word:
% psgrep '/sh\b/'
Processes whose command names end in "sh":% psgrep 'command =~ /sh$/'
Processes running with a user ID below 10:% psgrep 'uid < 10'
Login shells with active ttys:% psgrep 'command =~ /^-/' 'tty ne "?"'
Processes running on pseudo-ttys:% psgrep 'tty =~ /^[p-t]/'
Non-superuser processes running detached:% psgrep 'uid && tty eq "?"'
Huge processes that aren't owned by the superuser:% psgrep 'size > 10 * 2**10' 'uid != 0'
The last call to psgrep produced the following output when run on our system. As one might expect, only netscape and its spawn qualified.FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND 0 101 9751 1 0 0 14932 9652 do_select S p1 0:25 netscape 100000 101 9752 9751 0 0 10636 812 do_select S p1 0:00 (dns helper)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 2: Numbers
- InhaltsvorschauAnyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.—John von Neumann (1951)Numbers, the most basic data type of almost any programming language, can be surprisingly tricky. Random numbers, numbers with decimal points, series of numbers, and conversion between strings and numbers all pose trouble.Perl works hard to make life easy for you, and the facilities it provides for manipulating numbers are no exception to that rule. If you treat a scalar value as a number, Perl converts it to one. This means that when you read ages from a file, extract digits from a string, or acquire numbers from any of the other myriad textual sources that Real Life pushes your way, you don't need to jump through the hoops created by other languages' cumbersome requirements to turn an ASCII string into a number.Perl tries its best to interpret a string as a number when you use it as one (such as in a mathematical expression), but it has no direct way of reporting that a string doesn't represent a valid number. Perl quietly converts non-numeric strings to zero, and it will stop converting the string once it reaches a non-numeric character—so "
A7" is still0, and "7A" is just7. (Note, however, that the -w flag will warn of such improper conversions.) Sometimes, such as when validating input, you need to know whether a string represents a valid number. We show you how in Recipe 2.1.Recipe 2.15 shows how to get a number from strings containing hexadecimal, octal, or binary representations of numbers such as "0xff", "0377", and "0b10110". Perl automatically converts numeric literals of these non-decimal bases that occur in your program code (so$a=3+0xffwill set$ato 258) but not data read by that program (you can't read "ff" or even "0xff" into$band then say$a=3Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauNumbers, the most basic data type of almost any programming language, can be surprisingly tricky. Random numbers, numbers with decimal points, series of numbers, and conversion between strings and numbers all pose trouble.Perl works hard to make life easy for you, and the facilities it provides for manipulating numbers are no exception to that rule. If you treat a scalar value as a number, Perl converts it to one. This means that when you read ages from a file, extract digits from a string, or acquire numbers from any of the other myriad textual sources that Real Life pushes your way, you don't need to jump through the hoops created by other languages' cumbersome requirements to turn an ASCII string into a number.Perl tries its best to interpret a string as a number when you use it as one (such as in a mathematical expression), but it has no direct way of reporting that a string doesn't represent a valid number. Perl quietly converts non-numeric strings to zero, and it will stop converting the string once it reaches a non-numeric character—so "
A7" is still0, and "7A" is just7. (Note, however, that the -w flag will warn of such improper conversions.) Sometimes, such as when validating input, you need to know whether a string represents a valid number. We show you how in Recipe 2.1.Recipe 2.15 shows how to get a number from strings containing hexadecimal, octal, or binary representations of numbers such as "0xff", "0377", and "0b10110". Perl automatically converts numeric literals of these non-decimal bases that occur in your program code (so$a=3+0xffwill set$ato 258) but not data read by that program (you can't read "ff" or even "0xff" into$band then say$a=3+$bto make$abecome 258).As if integers weren't giving us enough grief, floating-point numbers can cause even more headaches. Internally, a computer represents numbers with decimal points as floating-point numbers in binary format. Floating-point numbers are not the same as real numbers; they are an approximation of real numbers, with limited precision. Although infinitely many real numbers exist, you only have finite space to represent them, usually about 64 bits or so. You have to cut corners to fit them all in.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Checking Whether a String Is a Valid Number
- InhaltsvorschauYou want to check whether a string represents a valid number. This is a common problem when validating input, as in CGI scripts, configuration files, and command-line arguments.Compare it against a regular expression that matches the kinds of numbers you're interested in:
if ($string =~ /PATTERN/) { # is a number } else { # is not }Or use the patterns provided by the CPAN module Regexp::Common:if ($string =~ m{^$RE{num}{real}$}) { # is a real number } else { # is not }This problem gets to the heart of what we mean by a number. Even things that sound simple, like integer, make you think hard about what you will accept; for example, "Is a leading + for positive numbers optional, mandatory, or forbidden?" The many ways that floating-point numbers can be represented could overheat your brain.Decide what you will and will not accept. Then, construct a regular expression to match those things alone. Here are some precooked solutions (the Cookbook's equivalent of just-add-water meals) for most common cases:warn "has nondigits" if /\D/; warn "not a natural number" unless /^\d+$/; # rejects -3 warn "not an integer" unless /^-?\d+$/; # rejects +3 warn "not an integer" unless /^[+-]?\d+$/; warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2 warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/; warn "not a C float" unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;These lines do not catch the IEEE notations of "Infinity" and "NaN", but unless you're worried that IEEE committee members will stop by your workplace and beat you over the head with copies of the relevant standards documents, you can probably forget about these strange forms.If your number has leading or trailing whitespace, those patterns won't work. Either add the appropriate logic directly, or call theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Rounding Floating-Point Numbers
- InhaltsvorschauYou want to round a floating-point value to a certain number of decimal places. This problem arises from the same inaccuracies in representation that make testing for equality difficult (see Recipe 2.3), as well as in situations where you must reduce the precision of your answers for readability.Use the Perl function
sprintf, orprintfif you're just trying to produce output:# round off to two places $rounded = sprintf("%.2f", $unrounded);Or you can use other rounding functions described in the Discussion.Whether visible or not, rounding of some sort is virtually unavoidable when working with floating-point numbers. Carefully defined standards (namely, IEEE 754, the standard for binary floating-point arithmetic) coupled with reasonable defaults within Perl often manage to eliminate or at least hide these round-off errors.In fact, Perl's implicit rounding on output is usually good enough so that it rarely surprises. It's almost always best to leave the numbers unrounded until output, and then, if you don't like Perl's default rounding, useprintforsprintfyourself with a format that makes the rounding explicit. The%f,%e, and%gformats all let you specify how many decimal places to round their argument to. Here's an example showing how all three behave; in each case, we're asking for a field that's 12 spaces wide, but with a precision of no more than four digits to the right of the decimal place.for $n ( 0.0000001, 10.1, 10.00001, 100000.1 ) { printf "%12.4e %12.4f %12.4g\n", $n, $n, $n; }This produces the following output:1.0000e-07 0.0000 1e-07 1.0100e+01 10.1000 10.1 1.0000e+01 10.0000 10 1.0000e+05 100000.1000 1e+05
If that were all there were to the matter, rounding would be pretty easy. You'd just pick your favorite output format and be done with it.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Comparing Floating-Point Numbers
- InhaltsvorschauFloating-point arithmetic isn't exact. You want to compare two floating-point numbers and know whether they're equal when carried out to a certain number of decimal places. Most of the time, this is the way you should compare floating-point numbers for equality.Use
sprintfto format the numbers to a certain number of decimal places, then compare the resulting strings:# equal(NUM1, NUM2, PRECISION) : returns true if NUM1 and NUM2 are # equal to PRECISION number of decimal places sub equal { my ($A, $B, $dp) = @_; return sprintf("%.${dp}g", $A) eq sprintf("%.${dp}g", $B); }Alternatively, store the numbers as integers by assuming the decimal place.You need theequalroutine because computers' floating-point representations are just approximations of most real numbers, as we discussed in the Introduction to this chapter. Perl's normal printing routines display numbers rounded to 15 decimal places or so, but its numeric tests don't round. So sometimes you can print out numbers that look the same (after rounding) but do not test the same (without rounding).This problem is especially noticeable in a loop, where round-off error can silently accumulate. For example, you'd think that you could start a variable out at zero, add one-tenth to it ten times, and end up with one. Well, you can't, because a base-2 computer can't exactly represent one-tenth. For example:for ($num = $i = 0; $i < 10; $i++) { $num += 0.1 } if ($num != 1) { printf "Strange, $num is not 1; it's %.45f\n", $num; }prints out:Strange, 1 is not 1; it's 0.999999999999999888977697537484345957636833191The$numis interpolated into the double-quoted string using a default conversion format of "%.15g" (on most systems), so it looks like 1. But internally, it really isn't. If you had checked only to a few decimal places, for example, five:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Operating on a Series of Integers
- InhaltsvorschauYou want to perform an operation on all integers between X and Y, such as when you're working on a contiguous section of an array or wherever you want to process all numbers within a range.Use a
forloop, or.. in conjunction with aforeachloop:foreach ($X .. $Y) { # $_ is set to every integer from X to Y, inclusive } foreach $i ($X .. $Y) { # $i is set to every integer from X to Y, inclusive } for ($i = $X; $i <= $Y; $i++) { # $i is set to every integer from X to Y, inclusive } for ($i = $X; $i <= $Y; $i += 7) { # $i is set to every integer from X to Y, stepsize = 7 }The first two approaches use aforeachloop in conjunction with the$X .. $Yconstruct, which creates a list of integers between$Xand$Y. Now, if you were just assigning that range to an array, this would use up a lot of memory whenever$Xand$Ywere far apart. But in aforeachloop, Perl notices this and doesn't waste time or memory allocating a temporary list. When iterating over consecutive integers, theforeachloop will run faster than the equivalentforloop.Another difference between the two constructs is that theforeachloop implicitly localizes the loop variable to the body of the loop, but theforloop does not. That means that after theforloop finishes, the loop variable will contain the value it held upon the final iteration. But in the case of theforeachloop, that value will be inaccessible, and the variable will hold whatever it held—if anything—prior to entering the loop. You can, however, use a lexically scoped variable as the loop variable:foreach my $i ($X .. $Y) { ... } for (my $i=$X; $i <= $Y; $i++) { ... }The following code shows each technique. Here we just print the numbers we generate:print "Infancy is: "; foreach (0 .. 2) { print "$_ "; } print "\n"; print "Toddling is: "; foreach $i (3 .. 4) { print "$i "; } print "\n"; print "Childhood is: "; for ($i = 5; $i <= 12; $i++) { print "$i "; } print "\n";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Working with Roman Numerals
- InhaltsvorschauYou want to convert between regular numbers and Roman numerals. You need to do this with items in outlines, page numbers on a preface, and copyrights for movie credits.Use the Roman module from CPAN:
use Roman; $roman = roman($arabic); # convert to roman numerals $arabic = arabic($roman) if isroman($roman); # convert from roman numerals
The Roman module provides bothRomanandromanfor converting Arabic ("normal") numbers to their Roman equivalents.Romanproduces uppercase letters, whereasromangives lowercase ones.The module only deals with Roman numbers from 1 to 3999, inclusive. The Romans didn't represent negative numbers or zero, and 5000 (which 4000 is represented in terms of) uses a symbol outside the ASCII character set.use Roman; $roman_fifteen = roman(15); # "xv" print "Roman for fifteen is $roman_fifteen\n"; $arabic_fifteen = arabic($roman_fifteen); print "Converted back, $roman_fifteen is $arabic_fifteen\n"; Roman for fifteen is xv Converted back, xv is 15
Or to print the current year:use Time::localtime; use Roman; printf "The year is now %s\n", Roman(1900 + localtime->year); The year is now MMIIINow, if you happen to have Unicode fonts available, you'll find that code points U+2160 through U+2183 represent Roman numerals, including those beyond the typical ASCII values.use charnames ":full"; print "2003 is", "\N{ROMAN NUMERAL ONE THOUSAND}" x 2, "\N{ROMAN NUMERAL THREE}\n";2003 is Ⅿ Ⅿ ⅢHowever, the Roman module doesn't yet have an option to use those characters.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Generating Random Numbers
- InhaltsvorschauYou want to make random numbers in a given range, inclusive, such as when you randomly pick an array index, simulate rolling a die in a game of chance, or generate a random password.Use Perl's
randfunction:$random = int( rand( $Y-$X+1 ) ) + $X;
This code generates and prints a random integer between 25 and 75, inclusive:$random = int( rand(51)) + 25; print "$random\n";
Therandfunction returns a fractional number, from (and including) 0 up to (but not including) its argument. We give it an argument of 51 to get a number that can be 0 or more, but never 51 or more. We take the integer portion of this to get a number from 0 to 50, inclusive (50.99999.... will be turned into 50 byint). We then add 25 to it to get a number from 25 to 75, inclusive.A common application of this is the random selection of an element from an array:$elt = $array[ rand @array ];
That's just like saying:$elt = $array[ int( rand(0+@array) ) ];
Becauserandis prototyped to take just one argument, it implicitly imposes scalar context on that argument, which, on a named array, is the number of elements in that array. The function then returns a floating-point number smaller than its argument and greater than or equal to zero. A floating-point number used as an array subscript implicitly undergoes integer truncation (rounding toward zero), producing in the end an evenly distributed, randomly selected array element to assign to$elt.Generating a random password from a sequence of characters is similarly easy:@chars = ( "A" .. "Z", "a" .. "z", 0 .. 9, qw(! @ $ % ^ & *) ); $password = join("", @chars[ map { rand @chars } ( 1 .. 8 ) ]);We usemapto generate eight random indices into@chars, extract the corresponding characters with a slice, and join them together to form the random password. This isn't aEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Generating Repeatable Random Number Sequences
- InhaltsvorschauEvery time you run your program, you get a different sequence of (pseudo-)random numbers. But you want a reproducible sequence, useful when running a simulation, so you need Perl to produce the same set of random numbers each time.Use Perl's
srandfunction:srand EXPR; # use a constant here for repeated sequences
Making random numbers is hard. The best that computers can do, without special hardware, is generate "pseudo-random" numbers, which are evenly distributed in their range of values. These are generated using a mathematical formula, which means that given the same seed (starting point), two programs will produce identical pseudo-random numbers.Thesrandfunction creates a new seed for the pseudo-random number generator. If given an argument, it uses that number as the seed. If no argument is given,sranduses a value that's reasonably difficult to guess as the seed.If you callrandwithout first callingsrandyourself, Perl callssrandfor you, choosing a "good" seed. This way, every time you run your program you'll get a different set of random numbers. Ancient versions of Perl did not callsrand, so the same program always produced the same sequence of pseudo-random numbers every time the program was run. Certain sorts of programs don't want a different set of random numbers each time; they want the same set. When you need that behavior, callsrandyourself, supplying it with a particular seed:srand( 42 ); # pick any fixed starting point
Don't callsrandmore than once in a program, because if you do, you'll start the sequence again from that point. Unless, of course, that's what you want.Just because Perl tries to use a good default seed does not necessarily guarantee that the numbers generated are cryptographically secure against the most intrepid crackers. Textbooks on cryptography are usually good sources of cryptographically secure random number generators.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making Numbers Even More Random
- InhaltsvorschauYou want to generate numbers that are more random than Perl's random numbers. Limitations of your C library's random number generator seeds can sometimes cause problems. The sequence of pseudo-random numbers may repeat too soon for some applications.Use a different random number generator, such as those provided by the Math::Random and Math::TrulyRandom modules from CPAN:
use Math::TrulyRandom; $random = truly_random_value( ); use Math::Random; $random = random_uniform( );
The Perl build process tries to find the best C-library routine to use for generating pseudo-random numbers, looking at rand(3), random(3), and drand48(3). (This can be changed manually at build time, however.) The standard library functions are getting pretty good, but some ancient implementations of therandfunction return only 16-bit random numbers or have other algorithmic weaknesses, and may therefore not be sufficiently random for your purposes.The Math::TrulyRandom module uses inadequacies of your system's timers to generate the random numbers. This takes a while, so it isn't useful for generating a lot of random numbers.The Math::Random module uses therandliblibrary to generate random numbers. It also includes a wide range of related functions for generating random numbers according to specific distributions, such as binomial, poisson, and exponential.Thesrandandrandfunctions in perlfunc(1) and Chapter 29 of Programming Perl; Recipe 2.6 and Recipe 2.7; the documentation for the CPAN modules Math::Random and Math::TrulyRandomEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Generating Biased Random Numbers
- InhaltsvorschauYou want to pick a random value where the probabilities of the values are not equal (the distribution is not even). You might be trying to randomly select a banner to display on a web page, given a set of relative weights saying how often each banner is to be displayed. Alternatively, you might want to simulate behavior according to a normal distribution (the bell curve).If you want a random value distributed according to a specific function—e.g., the Gaussian (Normal) distribution—consult a statistics textbook to find the appropriate function or algorithm. This subroutine generates random numbers that are normally distributed, with a standard deviation of 1 and a mean of 0:
sub gaussian_rand { my ($u1, $u2); # uniformly distributed random numbers my $w; # variance, then a weight my ($g1, $g2); # gaussian-distributed numbers do { $u1 = 2 * rand( ) - 1; $u2 = 2 * rand( ) - 1; $w = $u1*$u1 + $u2*$u2; } while ($w >= 1 || $w = = 0); $w = sqrt( (-2 * log($w)) / $w ); $g2 = $u1 * $w; $g1 = $u2 * $w; # return both if wanted, else just one return wantarray ? ($g1, $g2) : $g1; }If you have a list of weights and values you want to randomly pick from, follow this two-step process: first, turn the weights into a probability distribution withweight_to_dist, and then use the distribution to randomly pick a value withweighted_rand:# weight_to_dist: takes a hash mapping key to weight and returns # a hash mapping key to probability sub weight_to_dist { my %weights = @_; my %dist = ( ); my $total = 0; my ($key, $weight); local $_; foreach (values %weights) { $total += $_; } while ( ($key, $weight) = each %weights ) { $dist{$key} = $weight/$total; } return %dist; } # weighted_rand: takes a hash mapping key to probability, and # returns the corresponding element sub weighted_rand { my %dist = @_; my ($key, $weight); while (1) { # to avoid floating point inaccuracies my $rand = rand; while ( ($key, $weight) = each %dist ) { return $key if ($rand -= $weight) < 0; } } }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Doing Trigonometry in Degrees, Not Radians
- InhaltsvorschauYou want your trigonometry routines to operate in degrees instead of Perl's native radians.Convert between radians and degrees (2π radians equals 360 degrees):
use constant PI => (4 * atan2 (1, 1)); sub deg2rad { my $degrees = shift; return ($degrees / 180) * PI; } sub rad2deg { my $radians = shift; return ($radians / PI) * 180; }Alternatively, use the standard Math::Trig module:use Math::Trig; $radians = deg2rad($degrees); $degrees = rad2deg($radians);
If you're doing a lot of trigonometry, look into using either the standard Math::Trig or POSIX modules. They provide many more trigonometric functions than are defined in the Perl core. Otherwise, the first solution will define therad2deganddeg2radfunctions. The value of π isn't built directly into Perl, but you can calculate it to as much precision as your floating-point hardware provides. In the Solution, thePIfunction is a constant created withuseconstant. Instead of having to remember that π is 3.14159265358979 or so, we use the built-in function call, resolved at compile time, which, besides sparing us from memorizing a long string of digits, is also guaranteed to provide as much accuracy as the platform supports.If you're looking for the sine in degrees, use this:# deg2rad and rad2deg defined either as above or from Math::Trig sub degree_sine { my $degrees = shift; my $radians = deg2rad($degrees); my $result = sin($radians); return $result; }Thesin,cos, andatan2functions in perlfunc(1) and Chapter 29 of Programming Perl; the documentation for the standard POSIX and Math::Trig modules (also in Chapter 32 of Programming Perl)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Calculating More Trigonometric Functions
- InhaltsvorschauYou want to calculate values for trigonometric functions like sine, tangent, or arc-cosine.Perl provides only
sin,cos, andatan2as standard functions. From these, you can derivetanand all other trig functions (if you're intimately familiar with esoteric trig identities):sub tan { my $theta = shift; return sin($theta)/cos($theta); }The POSIX module provides a wider range of trig functions:use POSIX; $y = acos(3.7);
The standard Math::Trig module provides a complete set of functions and supports operations on or resulting in complex numbers:use Math::Trig; $y = acos(3.7);
Thetanfunction will cause a division-by-zero exception when$thetais π/2, 3π/2, and so on, because the cosine is 0 for these values. Similarly,tanand many other functions from Math::Trig may generate the same error. To trap these, useeval:eval { $y = tan($pi/2); } or return undef;Thesin,cos, andatan2functions in perlfunc(1) and Chapter 29 of Programming Perl; the documentation for the standard Math::Trig module; we talk about trigonometry in the context of imaginary numbers in Recipe 2.14; we talk about the use ofevalto catch exceptions in Recipe 10.12Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Taking Logarithms
- InhaltsvorschauYou want to take a logarithm in various bases.For logarithms to base e, use the built-in
log:$log_e = log(VALUE);
For logarithms to base 10, use the POSIX module'slog10function:use POSIX qw(log10); $log_10 = log10(VALUE);
For other bases, use the mathematical identity:
where x is the number whose logarithm you want, n is the desired base, and e is the natural logarithm base.sub log_base { my ($base, $value) = @_; return log($value)/log($base); }Thelog_basefunction lets you take logarithms to any base. If you know the base you'll want in advance, it's more efficient to cache the log of the base instead of recalculating it every time.# log_base as defined earlier $answer = log_base(10, 10_000); print "log10(10,000) = $answer\n"; log10(10,000) = 4The Math::Complex module does the caching for you via itslogn( )routine, so you can write:use Math::Complex; printf "log2(1024) = %lf\n", logn(1024, 2); # watch out for argument order! log2(1024) = 10.000000even though no complex number is involved here. This is not very efficient, but there are plans to rewrite Math::Complex in C for speed.Thelogfunction in perlfunc(1) and Chapter 29 of Programming Perl; the documentation for the standard POSIX and Math::Complex modules (also in Chapter 32 of Programming Perl)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Multiplying Matrices
- InhaltsvorschauYou want to multiply a pair of two-dimensional arrays. Mathematicians and engineers often need this.Use the PDL modules, available from CPAN. PDL is the Perl Data Language—modules that give fast access to compact matrix and mathematical functions:
use PDL; # $a and $b are both pdl objects $c = $a x $b;
Alternatively, apply the matrix multiplication algorithm to your two-dimensional array:sub mmult { my ($m1,$m2) = @_; my ($m1rows,$m1cols) = matdim($m1); my ($m2rows,$m2cols) = matdim($m2); unless ($m1cols = = $m2rows) { # raise exception die "IndexError: matrices don't match: $m1cols != $m2rows"; } my $result = [ ]; my ($i, $j, $k); for $i (range($m1rows)) { for $j (range($m2cols)) { for $k (range($m1cols)) { $result->[$i][$j] += $m1->[$i][$k] * $m2->[$k][$j]; } } } return $result; } sub range { 0 .. ($_[0] - 1) } sub veclen { my $ary_ref = $_[0]; my $type = ref $ary_ref; if ($type ne "ARRAY") { die "$type is bad array ref for $ary_ref" } return scalar(@$ary_ref); } sub matdim { my $matrix = $_[0]; my $rows = veclen($matrix); my $cols = veclen($matrix->[0]); return ($rows, $cols); }If you have the PDL library installed, you can use its lightning-fast manipulation of numbers. This requires far less memory and CPU than Perl's array manipulation. When using PDL objects, many numeric operators (such as+and*) are overloaded and work on an element-by-element basis (e.g.,*is the so-called scalar multiplication operator). To get true matrix multiplication, use the overloadedxoperator.use PDL; $a = pdl [ [ 3, 2, 3 ], [ 5, 9, 8 ], ]; $b = pdl [ [ 4, 7 ], [ 9, 3 ], [ 8, 1 ], ]; $c = $a x $b; # x overloadEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using Complex Numbers
- InhaltsvorschauYour application must manipulate complex numbers, as are often needed in engineering, science, and mathematics.Either keep track of the real and imaginary components yourself:
# $c = $a * $b manually $c_real = ( $a_real * $b_real ) - ( $a_imaginary * $b_imaginary ); $c_imaginary = ( $a_real * $b_imaginary ) + ( $b_real * $a_imaginary );
or use the Math::Complex module (part of the standard Perl distribution):# $c = $a * $b using Math::Complex use Math::Complex; $c = $a * $b;
Here's how you'd manually multiply3+5iand2-2i:$a_real = 3; $a_imaginary = 5; # 3 + 5i; $b_real = 2; $b_imaginary = -2; # 2 - 2i; $c_real = ( $a_real * $b_real ) - ( $a_imaginary * $b_imaginary ); $c_imaginary = ( $a_real * $b_imaginary ) + ( $b_real * $a_imaginary ); print "c = ${c_real}+${c_imaginary}i\n"; c = 16+4iand with Math::Complex:use Math::Complex; $a = Math::Complex->new(3,5); # or Math::Complex->new(3,5); $b = Math::Complex->new(2,-2); $c = $a * $b; print "c = $c\n"; c = 16+4iYou may create complex numbers via thecplxconstructor or via the exported constant i:use Math::Complex; $c = cplx(3,5) * cplx(2,-2); # easier on the eye $d = 3 + 4*i; # 3 + 4i printf "sqrt($d) = %s\n", sqrt($d); sqrt(3+4i) = 2+iThe Math::Trig module uses the Math::Complex module internally because some functions can break out from the real axis into the complex plane—for example, the inverse sine of 2.The documentation for the standard Math::Complex module (also in Chapter 32 of Programming Perl)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Converting Binary, Octal, and Hexadecimal Numbers
- InhaltsvorschauYou want to convert a string (e.g., "
0b10110", "0x55", or "0755") containing a binary, octal, or hexadecimal number to the correct number.Perl understands numbers specified in binary (base-2), octal (base-8), and hexadecimal (base-16) notation only when they occur as literals in your programs. If they come in as data—such as by reading from files or environment variables, or when supplied as command-line arguments—no automatic conversion takes place.Use Perl'shexfunction if you have a hexadecimal string like "2e" or "0x2e":$number = hex($hexadecimal); # hexadecimal only ("2e" becomes 47)Use theoctfunction if you have a hexadecimal string like "0x2e", an octal string like "047", or a binary string like "0b101110":$number = oct($hexadecimal); # "0x2e" becomes 47 $number = oct($octal); # "057" becomes 47 $number = oct($binary); # "0b101110" becomes 47
Theoctfunction converts octal numbers with or without the leading "0"; for example, "0350" or "350". Despite its name,octdoes more than convert octal numbers: it also converts hexadecimal ("0x350") numbers if they have a leading "0x" and binary ("0b101010") numbers if they have a leading "0b". Thehexfunction converts only hexadecimal numbers, with or without a leading "0x": "0x255", "3A", "ff", or "deadbeef". (Letters may be in upper- or lowercase.)Here's an example that accepts an integer in decimal, binary, octal, or hex, and prints that integer in all four bases. It uses theoctfunction to convert the data from binary, octal, and hexadecimal if the input begins with a 0. It then usesprintfto convert into all four bases as needed.print "Gimme an integer in decimal, binary, octal, or hex: "; $num = <STDIN>; chomp $num; exit unless defined $num; $num = oct($num) if $num =~ /^0/; # catches 077 0b10 0x20 printf "%d %#x %#o %#b\n", ($num) x 4;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Putting Commas in Numbers
- InhaltsvorschauYou want to output a number with commas in the right places. People like to see long numbers broken up in this way, especially in reports.Reverse the string so you can use backtracking to avoid substitution in the fractional part of the number. Then use a regular expression to find where you need commas, and substitute them in. Finally, reverse the string back.
sub commify { my $text = reverse $_[0]; $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g; return scalar reverse $text; }It's a lot easier in regular expressions to work from the front than from the back. With this in mind, we reverse the string and make a minor change to the algorithm that repeatedly inserts commas three digits from the end. When all insertions are done, we reverse the final string and return it. Becausereverseis sensitive to its implicit return context, we force it to scalar context.This function can easily be adjusted to accommodate the use of periods instead of commas, as are used in many countries.Here's an example ofcommifyin action:# more reasonable web counter :-) use Math::TrulyRandom; $hits = truly_random_value( ); # negative hits! $output = "Your web page received $hits accesses last month.\n"; print commify($output); Your web page received -1,740,525,205 accesses last monthperllocale(1); thereversefunction in perlfunc(1) and Chapter 29 of Programming Perl; the section "Adding Commas to a Number with Lookaround" in Chapter 2 of Mastering Regular Expressions, Second EditionEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Printing Correct Plurals
- InhaltsvorschauYou're printing something like "
Ittook$timehours", but "Ittook1hours" is ungrammatical. You would like to get it right.Useprintfand the conditional operator (X?Y:Z) to alter the noun or verb:printf "It took %d hour%s\n", $time, $time = = 1 ? "" : "s"; printf "%d hour%s %s enough.\n", $time, $time = = 1 ? "" : "s", $time = = 1 ? "is" : "are";Or use the Lingua::EN::Inflect module from CPAN, as described in the following Discussion.The only reason inane messages like "1file(s)updated" appear is because their authors are too lazy to bother checking whether the count is 1 or not.If your noun changes by more than an "-s", you'll need to change theprintfaccordingly:printf "It took %d centur%s", $time, $time = = 1 ? "y" : "ies";
This is good for simple cases, but you'll tire of writing it. This leads you to write funny functions like this:sub noun_plural { local $_ = shift; # order really matters here! s/ss$/sses/ || s/([psc]h)$/${1}es/ || s/z$/zes/ || s/ff$/ffs/ || s/f$/ves/ || s/ey$/eys/ || s/y$/ies/ || s/ix$/ices/ || s/([sx])$/$1es/ || s/$/s/ || die "can't get here"; return $_; } *verb_singular = \&noun_plural; # make function aliasAs you find more exceptions, your function will become increasingly convoluted. When you need to handle such morphological changes, turn to the flexible solution provided by the Lingua::EN::Inflect module from CPAN.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: Calculating Prime Factors
- InhaltsvorschauThe following program takes one or more integer arguments and determines the prime factors. It uses Perl's native numeric representation, unless those numbers use floating-point representation and thus lose accuracy. Otherwise (or if the program's -b switch is used), it uses the standard Math::BigInt library, thus allowing for huge numbers. However, it only loads this library if necessary. That's why we use
requireandimportinstead ofuse, which would unconditionally load the library at compile time instead of conditionally at runtime. This is not an efficient way to crack the huge integers used for cryptographic purposes.Call the program with a list of numbers, and it will show you the prime factors of those numbers:% bigfact 8 9 96 2178 8 2**3 9 3**2 96 2**5 3 2178 2 3**2 11**2
You can give it very large numbers:% bigfact 239322000000000000000000 +239322000000000000000000 2**19 3 5**18 +39887 % bigfact 25000000000000000000000000 +25000000000000000000000000 2**24 5**26
The program is shown in Example 2-2.Example 2-2. bigfact#!/usr/bin/perl # bigfact - calculate prime factors use strict; use integer; our ($opt_b, $opt_d); use Getopt::Std; @ARGV && getopts('bd') or die "usage: $0 [-b] number ..."; load_biglib( ) if $opt_b; ARG: foreach my $orig ( @ARGV ) { my ($n, %factors, $factor); $n = $opt_b ? Math::BigInt->new($orig) : $orig; if ($n + 0 ne $n) { # don't use -w for this printf STDERR "bigfact: %s would become %s\n", $n, $n+0 if $opt_d; load_biglib( ); $n = Math::BigInt->new($orig); } printf "%-10s ", $n; # Here $sqi will be the square of $i. We will take advantage # of the fact that ($i + 1) ** 2 = = $i ** 2 + 2 * $i + 1. for (my ($i, $sqi) = (2, 4); $sqi <= $n; $sqi += 2 * $i ++ + 1) { while ($n % $i = = 0) { $n /= $i; print STDERR "<$i>" if $opt_d; $factors {$i} ++; } } if ($n != 1 && $n != $orig) { $factors{$n}++ } if (! %factors) { print "PRIME\n"; next ARG; } for $factor ( sort { $a <=> $b } keys %factors ) { print "$factor"; if ($factors{$factor} > 1) { print "**$factors{$factor}"; } print " "; } print "\n"; } # this simulates a use, but at runtime sub load_biglib { require Math::BigInt; Math::BigInt->import( ); #immaterial? }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 3: Dates and Times
- InhaltsvorschauIt is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.—IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2Times and dates are important things to be able to manipulate. "How many users logged in last month?", "How many seconds should I sleep if I want to wake up at midday?", and "Has this user's password expired yet?" are common questions whose answers involve surprisingly non-obvious manipulations.Perl represents points in time as intervals, measuring seconds past a point in time called the Epoch. On Unix and many other systems, the Epoch was 00:00 Jan 1, 1970, UTC (Universal Corrected Time).When we talk about dates and times, we often interchange two different concepts: points in time (dates and times) and intervals between points in time (weeks, months, days, etc.). Epoch seconds represent intervals and points in the same units, so you can do basic arithmetic on them.However, people are not used to working with Epoch seconds. We are more used to dealing with individual year, month, day, hour, minute, and second values. Furthermore, the month can be represented by its full name or its abbreviation. The day can precede or follow the month. Because of the difficulty of performing calculations with a variety of formats, we typically convert human-supplied strings or lists to Epoch seconds, calculate, and then convert back to strings or lists for output.Epoch seconds are an absolute number of seconds, so they don't take into account time zones or daylight saving times. When converting to or from distinct values, always consider whether the time represented is UTC or local. Use different conversion functions depending on whether you need to convert from UTC to local time or vice versa.Perl'sEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Introduction
- InhaltsvorschauTimes and dates are important things to be able to manipulate. "How many users logged in last month?", "How many seconds should I sleep if I want to wake up at midday?", and "Has this user's password expired yet?" are common questions whose answers involve surprisingly non-obvious manipulations.Perl represents points in time as intervals, measuring seconds past a point in time called the Epoch. On Unix and many other systems, the Epoch was 00:00 Jan 1, 1970, UTC (Universal Corrected Time).When we talk about dates and times, we often interchange two different concepts: points in time (dates and times) and intervals between points in time (weeks, months, days, etc.). Epoch seconds represent intervals and points in the same units, so you can do basic arithmetic on them.However, people are not used to working with Epoch seconds. We are more used to dealing with individual year, month, day, hour, minute, and second values. Furthermore, the month can be represented by its full name or its abbreviation. The day can precede or follow the month. Because of the difficulty of performing calculations with a variety of formats, we typically convert human-supplied strings or lists to Epoch seconds, calculate, and then convert back to strings or lists for output.Epoch seconds are an absolute number of seconds, so they don't take into account time zones or daylight saving times. When converting to or from distinct values, always consider whether the time represented is UTC or local. Use different conversion functions depending on whether you need to convert from UTC to local time or vice versa.Perl's
timefunction returns the number of seconds that have passed since the Epoch—more or less. POSIX requires thattimenot include leap seconds, a peculiar practice of adjusting the world's clock by a second here and there to account for the slowing down of the Earth's rotation due to tidal angular-momentum dissipation. (See theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding Today's Date
- InhaltsvorschauYou need to find the year, month, and day values for today's date.Use
localtime, which returns values for the current date and time if given no arguments. You can either uselocaltimeand extract the information you want from the list it returns:($DAY, $MONTH, $YEAR) = (localtime)[3,4,5];
or use Time::localtime, which overrideslocaltimeto return a Time::tm object:use Time::localtime; $tm = localtime; ($DAY, $MONTH, $YEAR) = ($tm->mday, $tm->mon, $tm->year);
Here's how you'd print the current date as "YYYY MM DD", using the non-overriddenlocaltime:($day, $month, $year) = (localtime)[3,4,5]; printf("The current date is %04d %02d %02d\n", $year+1900, $month+1, $day); The current date is 2003 03 06To extract the fields we want from the list returned bylocaltime, we take a list slice. We could also have written it as:($day, $month, $year) = (localtime)[3..5];
This is how we'd print the current date as "YYYY-MM-DD" (in approved ISO 8601 fashion), using Time::localtime:use Time::localtime; $tm = localtime; printf("The current date is %04d-%02d-%02d\n", $tm->year+1900, ($tm->mon)+1, $tm->mday); The current date is 2003-03-06The object interface might look out of place in a short program. However, when you do a lot of work with the distinct values, accessing them by name makes code much easier to understand.A more obfuscated way that does not involve temporary variables is:printf("The current date is %04d-%02d-%02d\n", sub {($_[5]+1900, $_[4]+1, $_[3])}->(localtime));There is alsostrftimefrom the POSIX module discussed in Recipe 3.8:use POSIX qw(strftime); print strftime "%Y-%m-%d\n", localtime;
Thegmtimefunction works just asEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Converting DMYHMS to Epoch Seconds
- InhaltsvorschauYou want to convert a date, a time, or both with distinct values for day, month, year, etc. to Epoch seconds.Use the
timelocalortimegmfunctions in the standard Time::Local module, depending on whether the date and time is in the current time zone or in UTC.use Time::Local; $TIME = timelocal($sec, $min, $hours, $mday, $mon, $year); $TIME = timegm($sec, $min, $hours, $mday, $mon, $year);
The built-in functionlocaltimeconverts an Epoch seconds value to distinct DMYHMS values; thetimelocalsubroutine from the standard Time::Local module converts distinct DMYHMS values to an Epoch seconds value. Here's an example that shows how to find Epoch seconds for a time in the current day. It gets the day, month, and year values fromlocaltime:# $hours, $minutes, and $seconds represent a time today, # in the current time zone use Time::Local; $time = timelocal($seconds, $minutes, $hours, (localtime)[3,4,5]);
If you're passing month and year values totimelocal, it expects values with the same range as those whichlocaltimereturns. Namely, months start at 0, and years have 1900 subtracted from them.Thetimelocalfunction assumes the DMYHMS values represent a time in the current time zone. Time::Local also exports atimegmsubroutine that assumes the DMYHMS values represent a time in the UTC time zone. Unfortunately, there is no convenient way to convert from a time zone other than the current local time zone or UTC. The best you can do is convert to UTC and add or subtract the time zone offset in seconds.This code illustrates both the use oftimegmand how to adjust the ranges of months and years:# $day is day in month (1-31) # $month is month in year (1-12) # $year is four-digit year e.g., 1967 # $hours, $minutes and $seconds represent UTC (GMT) time use Time::Local; $time = timegm($seconds, $minutes, $hours, $day, $month-1, $year-1900);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Converting Epoch Seconds to DMYHMS
- InhaltsvorschauYou have a date and time in Epoch seconds, and you want to calculate individual DMYHMS values from it.Use the
localtimeorgmtimefunctions, depending on whether you want the date and time in UTC or your local time zone.($seconds, $minutes, $hours, $day_of_month, $month, $year, $wday, $yday, $isdst) = localtime($time);The standard Time::timelocal and Time::gmtime modules override thelocaltimeandgmtimefunctions to provide named access to the individual values.use Time::localtime; # or Time::gmtime $tm = localtime($TIME); # or gmtime($TIME) $seconds = $tm->sec; # ...
Thelocaltimeandgmtimefunctions return strange year and month values; the year has 1900 subtracted from it, and 0 is the month value for January. Be sure to correct the base values for year and month, as this example does:($seconds, $minutes, $hours, $day_of_month, $month, $year, $wday, $yday, $isdst) = localtime($time); printf("Dateline: %02d:%02d:%02d-%04d/%02d/%02d\n", $hours, $minutes, $seconds, $year+1900, $month+1, $day_of_month);We could have used the Time::localtime module to avoid the temporary variables:use Time::localtime; $tm = localtime($time); printf("Dateline: %02d:%02d:%02d-%04d/%02d/%02d\n", $tm->hour, $tm->min, $tm->sec, $tm->year+1900, $tm->mon+1, $tm->mday);Thelocaltimefunction in perlfunc(1) and Chapter 29 of Programming Perl; the documentation for the standard Time::localtime and Time::gmtime modules; convert in the other direction using Recipe 3.3Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Adding to or Subtracting from a Date
- InhaltsvorschauYou have a date and time and want to find the date and time of some period in the future or past.Simply add or subtract Epoch seconds:
$when = $now + $difference; $then = $now - $difference;
If you have distinct DMYHMS values, use the CPAN Date::Calc module. If you're doing arithmetic with days only, useAdd_Delta_Days($offsetis a positive or negative integral number of days):use Date::Calc qw(Add_Delta_Days); ($y2, $m2, $d2) = Add_Delta_Days($y, $m, $d, $offset);
If you are concerned with hours, minutes, and seconds (in other words, times as well as dates), useAdd_Delta_DHMS:use Date::Calc qw(Add_Delta_DHMS); ($year2, $month2, $day2, $h2, $m2, $s2) = Add_Delta_DHMS( $year, $month, $day, $hour, $minute, $second, $days_offset, $hour_offset, $minute_offset, $second_offset );Calculating with Epoch seconds is easiest, disregarding the effort to get dates and times into and out of Epoch seconds. This code shows how to calculate an offset (55 days, 2 hours, 17 minutes, and 5 seconds, in this case) from a given base date and time:$birthtime = 96176750; # 18/Jan/1973, 3:45:50 am $interval = 5 + # 5 seconds 17 * 60 + # 17 minutes 2 * 60 * 60 + # 2 hours 55 * 60 * 60 * 24; # and 55 days $then = $birthtime + $interval; print "Then is ", scalar(localtime($then)), "\n"; Then is Wed Mar 14 06:02:55 1973We could have used Date::Calc'sAdd_Delta_DHMSfunction and avoided the conversion to and from Epoch seconds:use Date::Calc qw(Add_Delta_DHMS); ($year, $month, $day, $hh, $mm, $ss) = Add_Delta_DHMS( 1973, 1, 18, 3, 45, 50, # 18/Jan/1973, 3:45:50 am 55, 2, 17, 5); # 55 days, 2 hrs, 17 min, 5 sec print "To be precise: $hh:$mm:$ss, $month/$day/$year\n";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Difference of Two Dates
- InhaltsvorschauYou need to find the number of days between two dates or times.If your dates are in Epoch seconds and fall in the range
Fri Dec 13 20:45:52 1901toTue Jan 19 03:14:07 2038(inclusive), subtract one from the other and convert the seconds to days:$seconds = $recent - $earlier;
If you have distinct DMYMHS values or are worried about the range limitations of Epoch seconds, use the Date::Calc module from CPAN. It can calculate the difference between dates:use Date::Calc qw(Delta_Days); $days = Delta_Days( $year1, $month1, $day1, $year2, $month2, $day2);
It also calculates the difference between a pair of dates and times:use Date::Calc qw(Delta_DHMS); ($days, $hours, $minutes, $seconds) = Delta_DHMS( $year1, $month1, $day1, $hour1, $minute1, $seconds1, # earlier $year2, $month2, $day2, $hour2, $minute2, $seconds2); # laterOne problem with Epoch seconds is how to convert the large integers back to forms that people can read. The following example shows one way of converting an Epoch seconds value back to its component numbers of weeks, days, hours, minutes, and seconds:$bree = 361535725; # 16 Jun 1981, 4:35:25 $nat = 96201950; # 18 Jan 1973, 3:45:50 $difference = $bree - $nat; print "There were $difference seconds between Nat and Bree\n"; There were 265333775 seconds between Nat and Bree $seconds = $difference % 60; $difference = ($difference - $seconds) / 60; $minutes = $difference % 60; $difference = ($difference - $minutes) / 60; $hours = $difference % 24; $difference = ($difference - $hours) / 24; $days = $difference % 7; $weeks = ($difference - $days) / 7; print "($weeks weeks, $days days, $hours:$minutes:$seconds)\n"; (438 weeks, 4 days, 23:49:35)
Date::Calc's functions can ease these calculations. TheDelta_DaysEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Day in a Week/Month/Year or Week Number
- InhaltsvorschauYou have a date, either in Epoch seconds or as distinct year, month, etc. values. You want to find out what week of the year, day of the week, day of the month, or day of the year that the date falls on.If you have Epoch seconds, the day of the year, day of the month, and day of the week are returned by
localtime. The week of the year is easily calculated from the day of the year (but see the following discussion, as standards differ).($MONTHDAY, $WEEKDAY, $YEARDAY) = (localtime $DATE)[3,6,7]; $WEEKNUM = int($YEARDAY / 7) + 1;
If you have distinct DMYHMS values, you can either convert them to Epoch seconds values as in Recipe 3.2 and then use the previous solution, or else use theDay_of_Week,Week_Number, andDay_of_Yearfunctions from the CPAN module Date::Calc:use Date::Calc qw(Day_of_Week Week_Number Day_of_Year); # you have $year, $month, and $day # $day is day of month, by definition. $wday = Day_of_Week($year, $month, $day); $wnum = Week_Number($year, $month, $day); $dnum = Day_of_Year($year, $month, $day);
TheDay_of_Week,Week_Number, andDay_of_Yearfunctions all expect years that haven't had 1900 subtracted from them and months where January is 1, not 0. The return value fromDay_of_Weekcan be 1 through 7 (corresponding to Monday through Sunday) or 0 in case of an error (an invalid date, for example).use Date::Calc qw(Day_of_Week Week_Number Day_of_Week_to_Text); $year = 1981; $month = 6; # (June) $day = 16; $wday = Day_of_Week($year, $month, $day); print "$month/$day/$year was a ", Day_of_Week_to_Text($wday), "\n"; ## see comment above $wnum = Week_Number($year, $month, $day); print "in the $wnum week.\n"; 6/16/1981 was a Tuesday in week number 25
The governing standard bodies of particular countries may have rules about when the first week of the year starts. For example, in Norway the first week must have at least 4 days in it (and weeks start on Mondays). If January 1 falls on a week with 3 or fewer days, it is counted as week 52 (or 53) of the previous year. In America, the first Monday of the year is usually the start of the first workweek. Given such rules, you may have to write your own algorithm, or at least look at theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Parsing Dates and Times from Strings
- InhaltsvorschauYou read in a date or time specification in an arbitrary format but need to parse that string into distinct year, month, etc. values.If your date is already numeric, or in a rigid and easily parsed format, use a regular expression (and possibly a hash mapping month names to numbers) to extract individual day, month, and year values, and then use the standard Time::Local module's
timelocalandtimegmfunctions to turn that into an Epoch seconds value.use Time::Local; # $date is "2003-02-13" (YYYY-MM-DD form). ($yyyy, $mm, $dd) = ($date =~ /(\d+)-(\d+)-(\d+)/); # calculate epoch seconds at midnight on that day in this timezone $epoch_seconds = timelocal(0, 0, 0, $dd, $mm-1, $yyyy);
For a more flexible solution, use theParseDatefunction provided by the CPAN module Date::Manip, and then useUnixDateto extract the individual values.use Date::Manip qw(ParseDate UnixDate); $date = ParseDate($STRING); if (!$date) { # bad date } else { @VALUES = UnixDate($date, @FORMATS); }The flexibleParseDatefunction accepts many formats. It even converts strings such as "today", "2 weeks ago Friday", "2nd Sunday in 1996", and "last Sunday in December", plus it understands the date and time format used in mail and news headers. It returns the decoded date in its own format: a string of the form "YYYYMMDDHH:MM:SS". You could compare two such strings to compare the dates they represent, but arithmetic is difficult. We therefore use theUnixDatefunction to extract the year, month, and day values in a preferred format.UnixDatetakes a date (as returned byParseDate) and a list of formats. It applies each format to the string and returns the result. A format is a string describing one or more elements of the date and time and the way that the elements are to be formatted. For example,Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Printing a Date
- InhaltsvorschauYou need to print a date and time shown in Epoch seconds format in human-readable form.Call
localtimeorgmtimein scalar context, which takes an Epoch seconds value and returns a string of the formTueJuly2205:15:202003:$STRING = localtime($EPOCH_SECONDS);
Alternatively, thestrftimefunction in the standard POSIX module supports a more customizable output format and takes individual DMYHMS values:use POSIX qw(strftime); $STRING = strftime($FORMAT, $SECONDS, $MINUTES, $HOUR, $DAY_OF_MONTH, $MONTH, $YEAR, $WEEKDAY, $YEARDAY, $DST);The CPAN module Date::Manip has aUnixDateroutine that works like a specialized formsprintfdesigned to handle dates. Pass it a Date::Manip date value. Using Date::Manip in lieu of POSIX::strftime has the advantage of not requiring a POSIX-compliant system.use Date::Manip qw(UnixDate); $STRING = UnixDate($DATE, $FORMAT);
The simplest solution is built into Perl already: thelocaltimefunction. In scalar context, it returns the string formatted in a particular way:Wed July 16 23:58:36 2003This makes for simple code, although it restricts the format of the string:use Time::Local; $time = timelocal(50, 45, 3, 18, 0, 73); print "Scalar localtime gives: ", scalar(localtime($time)), "\n"; Scalar localtime gives: Thu Jan 18 03:45:50 1973Of course,localtimerequires the date and time in Epoch seconds. ThePOSIX::strftimefunction takes individual DMYMHS values plus a format and returns a string. The format is similar to aprintfformat:%directives specify fields in the output string. A full list of these directives is available in your system's documentation forEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - High-Resolution Timers
- InhaltsvorschauYou need to measure time with a finer granularity than the full seconds that
timereturns.The Time::HiRes module, which is included standard starting with the v5.8 release of Perl, encapsulates this functionality for most systems:use Time::HiRes qw(gettimeofday); $t0 = gettimeofday( ); ## do your operation here $t1 = gettimeofday( ); $elapsed = $t1 - $t0; # $elapsed is a floating point value, representing number # of seconds between $t0 and $t1
Here's some code that uses Time::HiRes to time how long the user takes to press the Return key:use Time::HiRes qw(gettimeofday); print "Press return when ready: "; $before = gettimeofday( ); $line = <STDIN>; $elapsed = gettimeofday( ) - $before; print "You took $elapsed seconds.\n"; Press return when ready You took 0.228149 seconds
The module'sgettimeofdayfunction returns a two-element list representing seconds and microseconds when called in list context, or a single floating-point number combining the two when called in scalar context. You can also import itstimefunction to replace the standard core version by that name; this always acts like scalargettimeofday.The module also providesusleepandualarmfunctions, which are alternate versions of the standard Perlsleepandalarmfunctions that understand granularities of microseconds instead of just seconds. They take arguments in microseconds; alternatively, you can import the module'ssleepandalarmfunctions, which take floating-point arguments in seconds, to replace the standard versions, which take integer arguments in seconds. For access to your system's low-levelitimerroutines (if you have them),setitimerandgetitimerare also provided.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Short Sleeps
- InhaltsvorschauYou need to sleep for less than a second.Use the
select( )function, if your system supports it:select(undef, undef, undef, $time_to_sleep);
Some systems don't support a four-argumentselect. The Time::HiRes module provides a sleep function that takes a floating-point number of seconds:use Time::HiRes qw(sleep); sleep($time_to_sleep);
Here's an example ofselect. It's a simpler version of the program in Recipe 1.6. Think of it as your very own 300-baud terminal.while (<>) { select(undef, undef, undef, 0.25); print; }Using Time::HiRes, we'd write it as:use Time::HiRes qw(sleep); while (<>) { sleep(0.25); print; }The documentation for the CPAN modules Time::HiRes and Benchmark; thesleepandselectfunctions in perlfunc(1) and Chapter 29 of Programming Perl; we use theselectfunction for short sleeps in theslowcatprogram in Recipe 1.6Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: hopdelta
- InhaltsvorschauHave you ever wondered why it took so long for someone's mail to get to you? With postal mail, you can't trace how long each intervening post office let your letter gather dust in their back office. But with electronic mail, you can. The message carries in its header
Received: lines showing when each intervening mail transport agent along the way got the message.The dates in the headers are hard to read. You have to read them backwards, bottom to top. They are written in many varied formats, depending on the whim of each transport agent. Worst of all, each date is written in its own local time zone. It's hard to eyeball "Tue,26May199823:57:38-0400" and "Wed,27May199805:04:03+0100" and realize these two dates are only 6 minutes and 25 seconds apart.TheParseDateandDateCalcfunctions in the Date::Manip module from CPAN can help this:use Date::Manip qw(ParseDate DateCalc); $d1 = ParseDate("Sun, 09 Mar 2003 23:57:38 -0400"); $d2 = ParseDate("Mon, 10 Mar 2003 05:04:03 +0100"); print DateCalc($d1, $d2); +0:0:0:0:0:6:25That's a nice format for a program to read, but it's still not what the casual reader wants to see. The hopdelta program, shown in Example 3-1, takes a mailer header and tries to analyze the deltas (difference) between each hop (mail stop). Its output is shown in the local time zone.Example 3-1. hopdelta#!/usr/bin/perl # hopdelta - feed mail header, produce lines # showing delay at each hop. use strict; use Date::Manip qw (ParseDate UnixDate); # print header; this should really use format/write due to # printf complexities printf "%-20.20s %-20.20s %-20.20s %s\n", "Sender", "Recipient", "Time", "Delta"; $/ = ''; # paragraph mode $_ = <>; # read header s/\n\s+/ /g; # join continuation lines # calculate when and where this started my($start_from) = /^From.*\@([^\s>]*)/m; my($start_date) = /^Date:\s+(.*)/m; my $then = getdate($start_date); printf "%-20.20s %-20.20s %s\n", 'Start', $start_from, fmtdate($then); my $prevfrom = $start_from; # now process the headers lines from the bottom up for (reverse split(/\n/)) { my ($delta, $now, $from, $by, $when); next unless /^Received:/; s/\bon (.*?) (id.*)/; $1/s; # qmail header, I think unless (($when) = /;\s+(.*)$/) { # where the date falls warn "bad received line: $_"; next; } ($from) = /from\s+(\S+)/; ($from) = /\((.*?)\)/ unless $from; # some put it here $from =~ s/\)$//; # someone was too greedy ($by) = /by\s+(\S+\.\S+)/; # who sent it on this hop # now random mungings to get their string parsable for ($when) { s/ (for|via) .*$//; s/([+-]\d\d\d\d) \(\S+\)/$1/; s/id \S+;\s*//; } next unless $now = getdate($when); # convert to Epoch $delta = $now - $then; printf "%-20.20s %-20.20s %s ", $from, $by, fmtdate($now); $prevfrom = $by; puttime($delta); $then = $now; } exit; # convert random date strings into Epoch seconds sub getdate { my $string = shift; $string =~ s/\s+\(.*\)\s*$//; # remove nonstd tz my $date = ParseDate($string); my $epoch_secs = UnixDate($date,"%s"); return $epoch_secs; } # convert Epoch seconds into a particular date string sub fmtdate { my $epoch = shift; my($sec,$min,$hour,$mday,$mon,$year) = localtime($epoch); return sprintf "%02d:%02d:%02d %04d/%02d/%02d", $hour, $min, $sec, $year + 1900, $mon + 1, $mday, } # take seconds and print in pleasant-to-read format sub puttime { my($seconds) = shift; my($days, $hours, $minutes); $days = pull_count($seconds, 24 * 60 * 60); $hours = pull_count($seconds, 60 * 60); $minutes = pull_count($seconds, 60); put_field('s', $seconds); put_field('m', $minutes); put_field('h', $hours); put_field('d', $days); print "\n"; } # usage: $count = pull_count(seconds, amount) # remove from seconds the amount quantity, altering caller's version. # return the integral number of those amounts so removed. sub pull_count { my($answer) = int($_[0] / $_[1]); $_[0] -= $answer * $_[1]; return $answer; } # usage: put_field(char, number) # output number field in 3-place decimal format, with trailing char # suppress output unless char is 's' for seconds sub put_field { my ($char, $number) = @_; printf " %3d%s", $number, $char if $number || $char eq 's'; } =endEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 4: Arrays
- InhaltsvorschauWorks of art, in my opinion, are the only objects in the material universe to possess internal order, and that is why, though I don't believe that only art matters, I do believe in Art for Art's sake.—E.M. ForsterIf you are asked about the contents of your pockets, or the names of the first three Greek letters, or how to get to the highway, you recite a list: you name one thing after another in a particular order. Lists are part of your conception of the world. With Perl's powerful list- and array-handling primitives, you can translate this world view directly into code.In this chapter, we'll use the terms list and array as the Perl language thinks of them. Take
("alpha", "beta", "gamma"); that's a list of the names of the first three Greek letters, in order. To store that list into a variable, use an array, as in@greeks=("alpha", "beta", "gamma"). Both are ordered groups of scalar values; the difference is that an array is a named variable, one whose array length can be directly changed, whereas a list is a more ephemeral notion. You might think of an array as a variable and a list as the values it contains.This distinction may seem arbitrary, but operations that modify the length of these groupings (likepushandpop) require a proper array and not merely a list. Think of the difference between$aand4. You can say$a++but not4++. Likewise, you can saypop(@a)but notpop(1,2,3).The most important thing to glean from this is that Perl's lists and arrays are both ordered groupings of scalars. Operators and functions that work on lists or arrays are designed to provide faster or more convenient access to the elements than manual access would provide. Since few actually deal with modifying the array's length, you can usually use arrays and lists interchangeably.You can't use nested parentheses to create a list of lists. If you try that in Perl, your lists getEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauIf you are asked about the contents of your pockets, or the names of the first three Greek letters, or how to get to the highway, you recite a list: you name one thing after another in a particular order. Lists are part of your conception of the world. With Perl's powerful list- and array-handling primitives, you can translate this world view directly into code.In this chapter, we'll use the terms list and array as the Perl language thinks of them. Take
("alpha", "beta", "gamma"); that's a list of the names of the first three Greek letters, in order. To store that list into a variable, use an array, as in@greeks=("alpha", "beta", "gamma"). Both are ordered groups of scalar values; the difference is that an array is a named variable, one whose array length can be directly changed, whereas a list is a more ephemeral notion. You might think of an array as a variable and a list as the values it contains.This distinction may seem arbitrary, but operations that modify the length of these groupings (likepushandpop) require a proper array and not merely a list. Think of the difference between$aand4. You can say$a++but not4++. Likewise, you can saypop(@a)but notpop(1,2,3).The most important thing to glean from this is that Perl's lists and arrays are both ordered groupings of scalars. Operators and functions that work on lists or arrays are designed to provide faster or more convenient access to the elements than manual access would provide. Since few actually deal with modifying the array's length, you can usually use arrays and lists interchangeably.You can't use nested parentheses to create a list of lists. If you try that in Perl, your lists get flattened, meaning that both these lines are equivalent:@nested = ("this", "that", "the", "other"); @nested = ("this", "that", ("the", "other"));Why doesn't Perl (usefully) just support nested lists directly? Although partially for historical reasons, this easily allows for operations (likeEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Specifying a List in Your Program
- InhaltsvorschauYou want to include a list in your program. This is how you initialize arrays.You can write out a comma-separated list of elements:
@a = ("quick", "brown", "fox");If you have a lot of single-word elements, use theqw( )operator:@a = qw(Meddle not in the affairs of wizards.);
If you have a lot of multiword elements, use a here document and extract lines:@lines = (<< "END_OF_HERE_DOC" =~ /^\s*(.+)/gm); I sit beside the fire and think of all that I have seen, of meadow-flowers and butterflies and summers that have been; END_OF_HERE_DOCThe first technique is the one most commonly used, often because only small arrays are normally initialized as program literals. Initializing a large array would fill your program with values and make it hard to read, so such arrays either tend to be initialized in a separate library file (see Chapter 12), or else have their values read in from a file:@bigarray = ( ); open(FH, "<", "myinfo") or die "Couldn't open myinfo: $!"; while (<FH>) { chomp; push(@bigarray, $_); } close(FH);The second technique usesqw( ), one of several pseudo-functions in Perl used for quoting without having to resort to actual quotation marks. This one splits its string argument on whitespace to produce a list of words, where "words" in this instance means strings that don't contain any whitespace. The initial argument is not subject to interpolation of variables or (most) backslash escape sequences.@banner = ('Costs', 'only', '$4.95'); @banner = qw(Costs only $4.95); @banner = split(' ', 'Costs only $4.95');You can useqw( )only when each whitespace-separated argument is to be a distinct element in the return list. Be careful not to give Columbus four ships instead of three:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Printing a List with Commas
- InhaltsvorschauYou'd like to print out a list containing an unknown number of elements, placing an "and" before the last element and commas between each element if there are more than two.Use this function, which returns the formatted string:
sub commify_series { (@_ = = 0) ? '' : (@_ = = 1) ? $_[0] : (@_ = = 2) ? join(" and ", @_) : join(", ", @_[0 .. ($#_-1)], "and $_[-1]"); }It often looks odd to print out arrays:@array = ("red", "yellow", "green"); print "I have ", @array, " marbles.\n"; print "I have @array marbles.\n"; I have redyellowgreen marbles I have red yellow green marblesWhat you really want it to say is, "Ihavered,yellow,andgreenmarbles". The function given in the solution generates strings in that format. The word "and" is placed between the last two list elements. If there are more than two elements in the list, a comma is placed between every element.Example 4-1 gives a complete demonstration of the function, with one addition: if any element in the list already contains a comma, a semicolon is used for the separator character instead.Example 4-1. commify_series#!/usr/bin/perl -w # commify_series - show proper comma insertion in list output # @lists is an array of (references to anonymous) arrays @lists = ( [ 'just one thing' ], [ qw(Mutt Jeff) ], [ qw(Peter Paul Mary) ], [ 'To our parents', 'Mother Theresa', 'God' ], [ 'pastrami', 'ham and cheese', 'peanut butter and jelly', 'tuna' ], [ 'recycle tired, old phrases', 'ponder big, happy thoughts' ], [ 'recycle tired, old phrases', 'ponder big, happy thoughts', 'sleep and dream peacefully' ], ); foreach $aref (@lists) { print "The list is: " . commify_series(@$aref) . ".\n"; } # demo for single list @list = qw(one two three); print "The last list is: " . commify_series(@list) . ".\n"; sub commify_series { my $sepchar = grep(/,/ => @_) ? ";" : ","; (@_ = = 0) ? '' : (@_ = = 1) ? $_[0] : (@_ = = 2) ? join(" and ", @_) : join("$sepchar ", @_[0 .. ($#_-1)], "and $_[-1]"); }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Changing Array Size
- InhaltsvorschauYou want to enlarge or truncate an array. For example, you might truncate an array of employees that's already sorted by salary to list the five highest-paid employees. Or, if you know how big your array will get and that it will grow piecemeal, it's more efficient to grab memory for it in one step by enlarging just once than to keep
pushing values onto the end.Assign to$#ARRAY:# grow or shrink @ARRAY $#ARRAY = $NEW_LAST_ELEMENT_INDEX_NUMBER;
Assigning to an element past the end automatically extends the array:$ARRAY[$NEW_LAST_ELEMENT_INDEX_NUMBER] = $VALUE;
$#ARRAYis the number of the last valid index in@ARRAY. If we assign it a number smaller than its current value, we truncate the array. Truncated elements are lost forever. If we assign$#ARRAYa number larger than its current value, the array grows. New elements have the undefined value.$#ARRAYis not@ARRAY, though. Although$#ARRAYis the last valid index in the array,@ARRAY(in scalar context, as when treated as a number) is the number of elements.$#ARRAYis one less than@ARRAYbecause array indices start at 0.Here's some code that uses both. We have to sayscalar@arrayin theprintbecause Perl gives list context to (most) functions' arguments, but we want@arrayin scalar context.sub what_about_that_array { print "The array now has ", scalar(@people), " elements.\n"; print "The index of the last element is $#people.\n"; print "Element #3 is `$people[3]'.\n"; } @people = qw(Crosby Stills Nash Young); what_about_that_array( );prints:The array now has 4 elements The index of the last element is 3 Element #3 is `Young'whereas:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Implementing a Sparse Array
- InhaltsvorschauAn array with large, unoccupied expanses between occupied elements wastes memory. How do you reduce that overhead?Use a hash instead of an array.If you assign to the millionth element of an array, Perl allocates a million and one slots to store scalars. Only the last element contains interesting data, leaving earlier ones each set to
undefat a cost of four (or more) bytes per unoccupied slot.In recent versions of Perl, if you grow an array by assigning either past the end or directly to$#ARRAY, you can distinguish these implicitundefs from those that would result from assigningundefthere by usingexistsinstead ofdefined, just as you would with a hash.$#foo = 5; @bar = ( (undef) x 5 ) ; printf "foo element 3 is%s defined\n", defined $foo[3] ? "" : "n't"; printf "foo element 3 does%s exist\n", exists $foo[3] ? "" : "n't"; printf "bar element 3 is%s defined\n", defined $bar[3] ? "" : "n't"; printf "bar element 3 does%s exist\n", exists $bar[3] ? "" : "n't"; foo element 3 isn't defined foo element 3 doesn't exist bar element 3 isn't defined bar element 3 does existHowever, you still waste a lot of space. That's because Perl's array implementation reserves a contiguous vector, one for each element up to the highest occupied position.$real_array[ 1_000_000 ] = 1; # costs 4+ megabytes
A hash works differently: you pay only for what you really use, not for unoccupied positions. Although a hash element costs somewhat more than an array element because you need to store both the value and its key, with sparse arrays, the savings can be astonishing.$fake_array{ 1_000_000 } = 1; # costs 28 bytesWhat's the trade-off? Because a hash's keys aren't ordered, a little more work is needed to sort the numeric keys so you can handle their values in the same order as you would if they were stored as a real array. With an array, you'd just do this to process elements in index order:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Iterating Over an Array
- InhaltsvorschauYou want to repeat a procedure for every element in a list.Often you use an array to collect information you're interested in; for instance, login names of users who have exceeded their disk quota. When you finish collecting the information, you want to process it by doing something with every element in the array. In the disk quota example, you might send each user a stern mail message.Use a
foreachloop:foreach $item (LIST) { # do something with $item }Let's say we've used@bad_usersto compile a list of users who are over their allotted disk quotas. To call somecomplainsubroutine for each user, we'd use:foreach $user (@bad_users) { complain($user); }Rarely is this recipe so simply applied. Instead, we often use functions to generate the list:foreach $var (sort keys %ENV) { print "$var=$ENV{$var}\n"; }Here we're usingsortandkeysto build a sorted list of environment variable names. If you use the list more than once, you'll obviously keep it around by saving in an array. But for one-shot processing, it's often tidier to process the list directly.Not only can we add complexity to this formula by building up the list in theforeach, we can also add complexity by doing more work inside the code block. A common application offoreachis to gather information on every element of a list and then, based on that information, decide whether to do something. For instance, returning to the disk quota example:foreach $user (@all_users) { $disk_space = get_usage($user); # find out how much disk space in use if ($disk_space > $MAX_QUOTA) { # if it's more than we want ... complain($user); # ... then object vociferously } }More complicated program flow is possible. The code can calllastto jump out of the loop,nextEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Iterating Over an Array by Reference
- InhaltsvorschauYou have a reference to an array, and you want to use a loop to work with the array's elements.Use
foreachorforto loop over the dereferenced array:# iterate over elements of array in $ARRAYREF foreach $item (@$ARRAYREF) { # do something with $item } for ($i = 0; $i <= $#$ARRAYREF; $i++) { # do something with $ARRAYREF->[$i] }The solutions assume you have a scalar variable containing the array reference. This lets you do things like this:@fruits = ( "Apple", "Blackberry" ); $fruit_ref = \@fruits; foreach $fruit (@$fruit_ref) { print "$fruit tastes good in a pie.\n"; } Apple tastes good in a pie Blackberry tastes good in a pieWe could have rewritten theforeachloop as aforloop like this:for ($i=0; $i <= $#$fruit_ref; $i++) { print "$fruit_ref->[$i] tastes good in a pie.\n"; }Frequently, though, the array reference is the result of a more complex expression. Use the@{EXPR}notation to turn the result of the expression back into an array:$namelist{felines} = \@rogue_cats; foreach $cat ( @{ $namelist{felines} } ) { print "$cat purrs hypnotically..\n"; } print "--More--\nYou are controlled.\n";Again, we can replace theforeachwith aforloop:for ($i=0; $i <= $#{ $namelist{felines} }; $i++) { print "$namelist{felines}[$i] purrs hypnotically.\n"; }perlref(1) and perllol(1); Chapter 8 of Programming Perl; Recipe 11.1; Recipe 4.5Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Extracting Unique Elements from a List
- InhaltsvorschauYou want to eliminate duplicate values from a list, such as when you build the list from a file or from the output of another command. This recipe is equally applicable to removing duplicates as they occur in input and to removing duplicates from an array you've already populated.Use a hash to record which items have been seen, then
keysto extract them. You can use Perl's idea of truth to shorten and speed up your code.Section 4.7.2.1: Straightforward
%seen = ( ); @uniq = ( ); foreach $item (@list) { unless ($seen{$item}) { # if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } }Section 4.7.2.2: Faster
%seen = ( ); foreach $item (@list) { push(@uniq, $item) unless $seen{$item}++; }Section 4.7.2.3: Similar but with user function
%seen = ( ); foreach $item (@list) { some_func($item) unless $seen{$item}++; }Section 4.7.2.4: Faster but different
%seen = ( ); foreach $item (@list) { $seen{$item}++; } @uniq = keys %seen;Section 4.7.2.5: Faster and even more different
%seen = ( ); @uniq = grep { ! $seen{$_} ++ } @list;The question at the heart of the matter is "Have I seen this element before?" Hashes are ideally suited to such lookups. The first technique (Recipe 4.7.2.1) builds up the array of unique values as we go along, using a hash to record whether something is already in the array.The second technique (Recipe 4.7.2.2) is the most natural way to write this sort of thing in Perl. It creates a new entry in the hash every time it sees an element that hasn't been seen before, using theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding Elements in One Array but Not Another
- InhaltsvorschauYou want to find elements that are in one array but not another.You want to find elements in
@Athat aren't in@B. Build a hash of the keys of@Bto use as a lookup table. Then check each element in@Ato see whether it is in@B.Section 4.8.2.1: Straightforward implementation
# assume @A and @B are already loaded %seen = ( ); # lookup table to test membership of B @aonly = ( ); # answer # build lookup table foreach $item (@B) { $seen{$item} = 1 } # find only elements in @A and not in @B foreach $item (@A) { unless ($seen{$item}) { # it's not in %seen, so add to @aonly push(@aonly, $item); } }Section 4.8.2.2: More idiomatic version
my %seen; # lookup table my @aonly; # answer # build lookup table @seen{@B} = ( ); foreach $item (@A) { push(@aonly, $item) unless exists $seen{$item}; }Section 4.8.2.3: Loopless version
my @A = ...; my @B = ...; my %seen; @seen {@A} = ( ); delete @seen {@B}; my @aonly = keys %seen;As with nearly any problem in Perl that asks whether a scalar is in one list or another, this one uses a hash. First, process@Bso that the%seenhash records each element from@Bby setting its value to 1. Then process@Aone element at a time, checking whether that particular element had been in@Bby consulting the%seenhash.The given code retains duplicate elements in@A. This can be fixed easily by adding the elements of@Ato%seenas they are processed:foreach $item (@A) { push(@aonly, $item) unless $seen{$item}; $seen{$item} = 1; # mark as seen }The first two solutions differ mainly in how they build the hash. The first iterates throughEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Computing Union, Intersection, or Difference of Unique Lists
- InhaltsvorschauYou have a pair of lists, each holding unduplicated items. You'd like to find out which items are in both lists (intersection), one but not the other (difference), or either (union).The following solutions need the listed initializations:
@a = (1, 3, 5, 6, 7, 8); @b = (2, 3, 5, 7, 9); @union = @isect = @diff = ( ); %union = %isect = ( ); %count = ( );
Section 4.9.2.1: Simple solution for union and intersection
foreach $e (@a) { $union{$e} = 1 } foreach $e (@b) { if ( $union{$e} ) { $isect{$e} = 1 } $union{$e} = 1; } @union = keys %union; @isect = keys %isect;Section 4.9.2.2: More idiomatic version
foreach $e (@a, @b) { $union{$e}++ && $isect{$e}++ } @union = keys %union; @isect = keys %isect;Section 4.9.2.3: Union, intersection, and symmetric difference
foreach $e (@a, @b) { $count{$e}++ } @union = keys %count; foreach $e (keys %count) { if ($count{$e} = = 2) { push @isect, $e; } else { push @diff, $e; } }Section 4.9.2.4: Indirect solution
@isect = @diff = @union = ( ); foreach $e (@a, @b) { $count{$e}++ } @union = keys %count; foreach $e (keys %count) { push @{ $count{$e} = = 2 ? \@isect : \@diff }, $e; }The first solution most directly computes the union and intersection of two lists, neither containing duplicates. Two hashes are used to record whether a particular item goes in the union or the intersection. We put every element of the first array in the union hash, giving it a true value. Then, processing each element of the second array, we check whether that element is already present in the union. If it is, we put it in the intersection as well. In any event, it goes into the union. When we're done, we extract the keys of both the union and intersection hashes. The values aren't needed.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Appending One Array to Another
- InhaltsvorschauYou want to join two arrays by appending all elements of one to the other.Use
push:# push push(@ARRAY1, @ARRAY2);
Thepushfunction is optimized for appending a list to the end of an array. You can take advantage of Perl's list flattening to join two arrays, but this results in significantly more copying thanpush:@ARRAY1 = (@ARRAY1, @ARRAY2);
Here's an example ofpushin action:@members = ("Time", "Flies"); @initiates = ("An", "Arrow"); push(@members, @initiates); # @members is now ("Time", "Flies", "An", "Arrow")To insert the elements of one array into the middle of another, use thesplicefunction:splice(@members, 2, 0, "Like", @initiates); print "@members\n"; splice(@members, 0, 1, "Fruit"); splice(@members, -2, 2, "A", "Banana"); print "@members\n";
This is the output:Time Flies Like An Arrow Fruit Flies Like A BananaThespliceandpushfunctions in perlfunc(1) and Chapter 29 of Programming Perl; the "List Values and Arrays" section of Chapter 2 of Programming Perl; the "List Value Constructors" section of perldata(1)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reversing an Array
- InhaltsvorschauYou want to reverse an array.Use the
reversefunction:# reverse @ARRAY into @REVERSED @REVERSED = reverse @ARRAY;
Or process with aforeachloop on a reversed list:foreach $element (reverse @ARRAY) { # do something with $element }Or use aforloop, starting with the index of the last element and working your way down:for ($i = $#ARRAY; $i >= 0; $i--) { # do something with $ARRAY[$i] }Called in list context, thereversefunction reverses elements of its argument list. You can save a copy of that reversed list into an array, or just useforeachto walk through it directly if that's all you need. Theforloop processes the array elements in reverse order by using explicit indices. If you don't need a reversed copy of the array, theforloop can save memory and time on very large arrays.If you're usingreverseto reverse a list that you just sorted, you should have sorted it in the correct order to begin with. For example:# two-step: sort then reverse @ascending = sort { $a cmp $b } @users; @descending = reverse @ascending; # one-step: sort with reverse comparison @descending = sort { $b cmp $a } @users;Thereversefunction in perlfunc(1) and Chapter 29 of Programming Perl; we usereversein Recipe 1.7Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Processing Multiple Elements of an Array
- InhaltsvorschauYou want to
poporshiftmultiple elements at a time.Usesplice:# remove $N elements from front of @ARRAY (shift $N) @FRONT = splice(@ARRAY, 0, $N); # remove $N elements from the end of the array (pop $N) @END = splice(@ARRAY, -$N);
Thesplicefunction allows you to add elements, delete elements, or both, at any point in an array, not just at the ends. All other operations that modify an array's length can also be written as asplice:Direct methodSplice equivalentpush(@a, $x, $y)
splice(@a, @a, 0, $x, $y)
pop(@a)
splice(@a, -1)
shift(@a)
splice(@a, 0, 1)
unshift(@a, $x, $y)
splice(@a, 0, 0, $x, $y)
$a[$x] = $y
splice(@a, $x, 1, $y)
(@a, @a = ( ))
splice(@a)
Unlikepopandunshift, though, which always delete and return just one element at a time—and from the ends only—splicelets you specify the number of elements. This leads to code like the examples in the Solution.It's often convenient to wrap thesesplices as functions:sub shift2 (\@) { return splice(@{$_[0]}, 0, 2); } sub pop2 (\@) { return splice(@{$_[0]}, -2); }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding the First List Element That Passes a Test
- InhaltsvorschauYou want the first element in the list (or its index) that passes a test. Alternatively, you want to know whether any element passes the test. The test can be simple identity ("Is this element in the list?") or more complex ("I have a list of Employee objects, sorted from highest salary to lowest. Which manager has the highest salary?"). Simple cases normally require only the value of the element, but when the array itself will be altered, you probably need to know the index number of the first matching element.To find a matching value, use
foreachto loop over every element, and calllastas soon as you find a match:my ($match, $found, $item); foreach $item (@array) { if (CRITERION) { $match = $item; # must save $found = 1; last; } } if ($found) { ## do something with $match } else { ## unfound }To find a matching index, useforto loop a variable over every array index, and calllastas soon as you find a match:my ($i, $match_idx); for ($i = 0; $i < @array; $i++) { if (CRITERION) { $match_idx = $i; # save the index last; } } if (defined $match_idx) { ## found in $array[$match_idx] } else { ## unfound }The List::Util module, shipped standard with Perl as of v5.8 but available on CPAN for earlier versions, provides an even easier approach:use List::Util qw(first); $match = first { CRITERION } @listLacking (until recently) a built-in mechanism to do this, we must write our own code to go through the list and test each element. We useforeachandfor, and calllastto ensure that we stop as soon as we find a match. Before we uselastto stop looking, though, we save the value or index.A common approach is to try to usegrephere. Butgrepalways tests all elements and finds all matches, so it's inefficient if you want only the first match. However,Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding All Elements in an Array Matching Certain Criteria
- InhaltsvorschauFrom a list, you want only the elements that match certain criteria.This notion of extracting a subset of a larger list is common. It's how you find all engineers in a list of employees, all users in the "staff" group, or all the filenames you're interested in.Use
grepto apply a condition to all elements in the list and return only those for which the condition was true:@MATCHING = grep { TEST ($_) } @LIST;This could also be accomplished with aforeachloop:@matching = ( ); foreach (@list) { push(@matching, $_) if TEST ($_); }The Perlgrepfunction is shorthand for all that looping and mucking about. It's not really like the Unixgrepcommand; it doesn't have options to return line numbers or to negate the test, and it isn't limited to regular-expression tests. For example, to filter out just the large numbers from an array or to find out which keys in a hash have very large values:@bigs = grep { $_ > 1_000_000 } @nums; @pigs = grep { $users{$_} > 1e7 } keys %users;Here's something that sets@matchingto lines from the who command that start with "gnat":@matching = grep { /^gnat / } `who`;Here's another example:@engineers = grep { $_->position( ) eq "Engineer" } @employees;It extracts only those objects from the array@employeeswhosepositionmethod returns the stringEngineer.You could have even more complex tests in agrep:@secondary_assistance = grep { $_->income >= 26_000 && $_->income < 30_000 } @applicants;But at that point you may decide it would be more legible to write a proper loop instead.The "For Loops," "Foreach Loops," and "Loop Control" sections ofEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Sorting an Array Numerically
- InhaltsvorschauYou want to sort a list of numbers, but Perl's
sort(by default) sorts in ASCII order.Use Perl'ssortfunction and the <=> numerical comparison operator:@sorted = sort { $a <=> $b } @unsorted;Thesortfunction takes an optional code block, which lets you replace the default alphabetic comparison with your own subroutine. This comparison function is called each timesorthas to compare two values. The values to compare are loaded into the special package variables$aand$b, which are automaticallylocalized.The comparison function should return a negative number if$aought to appear before$bin the output list,0if they're the same and their order doesn't matter, or a positive number if$aought to appear after$b. Perl has two operators that behave this way: <=> for sorting numbers in ascending numeric order, andcmpfor sorting strings in ascending alphabetic order. By default,sortusescmp-style comparisons.Here's code that sorts the list of PIDs in@pids, lets the user select one, then sends it a TERM signal followed by a KILL signal. We use a code block that compares$ato$bwith <=> to sort numerically:# @pids is an unsorted array of process IDs foreach my $pid (sort { $a <=> $b } @pids) { print "$pid\n"; } print "Select a process ID to kill:\n"; chomp ($pid = <>); die "Exiting ... \n" unless $pid && $pid =~ /^\d+$/; kill('TERM',$pid); sleep 2; kill('KILL',$pid);If you use$a<=>$bor$acmp$b, the list will be sorted in ascending order. For a descending sort, all we have to do is swap$aand$bin the sort subroutine:@descending = sort { $b <=> $a } @unsorted;Comparison routines must be consistent; that is, they should always return the same answer when called with the same values. Inconsistent comparison routines lead to infinite loops or core dumps, especially in older releases of Perl.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Sorting a List by Computable Field
- InhaltsvorschauYou want to sort a list by something more complex than a simple string or numeric comparison.This is common when working with objects ("sort by the employee's salary") or complex data structures ("sort by the third element in the array that this is a reference to"). It's also applicable when you want to sort by more than one key; for instance, sorting by birthday and then by name when multiple people share the same birthday.Use the customizable comparison routine in
sort:@ordered = sort { compare( ) } @unordered;You can speed this up by precomputing the field.@precomputed = map { [compute( ),$_] } @unordered; @ordered_precomputed = sort { $a->[0] <=> $b->[0] } @precomputed; @ordered = map { $_->[1] } @ordered_precomputed;And, finally, you can combine the three steps:@ordered = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [compute( ), $_] } @unordered;The use of a comparison routine was explained in Recipe 4.15. As well as using built-in operators like<=>, you can construct more complex tests:@ordered = sort { $a->name cmp $b->name } @employees;You often seesortused like this in part of aforeachloop:foreach $employee (sort { $a->name cmp $b->name } @employees) { print $employee->name, " earns \$", $employee->salary, "\n"; }If you're going to do a lot of work with elements in a particular order, it's more efficient to sort once and work from that:@sorted_employees = sort { $a->name cmp $b->name } @employees; foreach $employee (@sorted_employees) { print $employee->name, " earns \$", $employee->salary, "\n"; } # load %bonus foreach $employee (@sorted_employees) { if ( $bonus{ $employee->ssn } ) { print $employee->name, " got a bonus!\n"; } }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Implementing a Circular List
- InhaltsvorschauYou want to create and manipulate a circular list.Use
unshiftandpop(orpushandshift) on a normal array.unshift(@circular, pop(@circular)); # the last shall be first push(@circular, shift(@circular)); # and vice versa
Circular lists are commonly used to repeatedly process things in order; for example, connections to a server. The code shown previously isn't a true computer science circular list, with pointers and true circularity. Instead, the operations provide for moving the last element to the first position, and vice versa.sub grab_and_rotate ( \@ ) { my $listref = shift; my $element = $listref->[0]; push(@$listref, shift @$listref); return $element; } @processes = ( 1, 2, 3, 4, 5 ); while (1) { $process = grab_and_rotate(@processes); print "Handling process $process\n"; sleep 1; }Theunshiftandpushfunctions in perlfunc(1) and Chapter 29 of Programming Perl; Recipe 13.13Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Randomizing an Array
- InhaltsvorschauYou want to randomly shuffle the elements of an array. The obvious application is writing a card game, where you must shuffle a deck of cards, but it is equally applicable to any situation where you want to treat elements of an array in a random order.Use the
shufflefunction from the standard List::Util module, which returns the elements of its input list in a random order.use List::Util qw(shuffle); @array = shuffle(@array);
Shuffling is a surprisingly tricky process. It's easy to write a bad shuffle:sub naive_shuffle { # DON'T DO THIS for (my $i = 0; $i < @_; $i++) { my $j = int rand @_; # pick random element ($_[$i], $_[$j]) = ($_[$j], $_[$i]); # swap 'em } }This algorithm is biased; the list's possible permutations don't all have the same probability of being generated. The proof of this is simple: take the case where we're passed a three-element list. We generate three random numbers, each of which can have three possible values, yielding 27 possible outcomes. There are only six permutations of the three-element list, though. Because 27 isn't evenly divisible by 6, some outcomes are more likely than others.The List::Util module'sshufflefunction avoids this bias to produce a more randomly shuffled result.If all you want to do is pick one random element from the array, use:$value = $array[ int(rand(@array)) ];
Therandfunction in perlfunc(1) and Chapter 29 of Programming Perl; for more on random numbers, see Recipe 2.6, Recipe 2.7, and Recipe 2.8; Recipe 4.20 provides another way to select a random permutationEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: words
- InhaltsvorschauHave you ever wondered how programs like ls generate columns of sorted output that you read down the columns instead of across the rows? For example:
awk cp ed login mount rmdir sum basename csh egrep ls mt sed sync cat date fgrep mail mv sh tar chgrp dd grep mkdir ps sort touch chmod df kill mknod pwd stty vi chown echo ln more rm su
Example 4-2 does this.Example 4-2. words#!/usr/bin/perl -w # words - gather lines, present in columns use strict; my ($item, $cols, $rows, $maxlen); my ($xpixel, $ypixel, $mask, @data); getwinsize( ); # first gather up every line of input, # remembering the longest line length seen $maxlen = 1; while (<>) { my $mylen; s/\s+$//; $maxlen = $mylen if (($mylen = length) > $maxlen); push(@data, $_); } $maxlen += 1; # to make extra space # determine boundaries of screen $cols = int($cols / $maxlen) || 1; $rows = int(($#data+$cols) / $cols); # pre-create mask for faster computation $mask = sprintf("%%-%ds ", $maxlen-1); # subroutine to check whether at last item on line sub EOL { ($item+1) % $cols = = 0 } # now process each item, picking out proper piece for this position for ($item = 0; $item < $rows * $cols; $item++) { my $target = ($item % $cols) * $rows + int($item/$cols); my $piece = sprintf($mask, $target < @data ? $data[$target] : ""); $piece =~ s/\s+$// if EOL( ); # don't blank-pad to EOL print $piece; print "\n" if EOL( ); } # finish up if needed print "\n" if EOL( ); # not portable -- linux only sub getwinsize { my $winsize = "\0" x 8; my $TIOCGWINSZ = 0x40087468; if (ioctl(STDOUT, $TIOCGWINSZ, $winsize)) { ($rows, $cols, $xpixel, $ypixel) = unpack('S4', $winsize); } else { $cols = 80; } }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: permute
- InhaltsvorschauHave you ever wanted to generate all possible permutations of an array or to execute some code for every possible permutation? For example:
% echo man bites dog | permute dog bites man bites dog man dog man bites man dog bites bites man dog man bites dog
The number of permutations of a set is the factorial of the size of the set. This number grows extremely fast, so you don't want to run it on many permutations:Set Size Permutations 1 1 2 2 3 6 4 24 5 120 6 720 7 5040 8 40320 9 362880 10 3628800 11 39916800 12 479001600 13 6227020800 14 87178291200 15 1307674368000
Doing something for each alternative takes a correspondingly large amount of time. In fact, factorial algorithms exceed the number of particles in the universe with very small inputs. The factorial of 500 is greater than ten raised to the thousandth power!use Math::BigInt; sub factorial { my $n = shift; my $s = 1; $s *= $n-- while $n > 0; return $s; } print factorial(Math::BigInt->new("500")); +1220136... (1035 digits total)The two solutions that follow differ in the order of the permutations they return.The solution in Example 4-3 uses a classic list permutation algorithm used by Lisp hackers. It's relatively straightforward but makes unnecessary copies. It's also hardwired to do nothing but print out its permutations.Example 4-3. tsc-permute#!/usr/bin/perl -n # tsc_permute: permute each word of input permute([split], [ ]); sub permute { my @items = @{ $_[0] }; my @perms = @{ $_[1] }; unless (@items) { print "@perms\n"; } else { my (@newitems,@newperms,$i); foreach $i (0 .. $#items) { @newitems = @items; @newperms = @perms; unshift(@newperms, splice(@newitems, $i, 1)); permute( \@newitems, \@newperms); } } }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 5: Hashes
- InhaltsvorschauDoing linear scans over an associative array is like trying to club someone to death with a loaded Uzi.—Larry WallPeople and parts of computer programs interact in all sorts of ways. Single scalar variables are like hermits, living a solitary existence whose only meaning comes from within the individual. Arrays are like cults, where multitudes marshal themselves under the name of a charismatic leader. In the middle lies the comfortable, intimate ground of the one-to-one relationship that is the hash. (Older documentation for Perl often called hashes associative arrays, but that's a mouthful. Other languages that support similar constructs sometimes use different terms for them; you may hear about hash tables, tables, dictionaries, mappings, or even alists, depending on the language.)Unfortunately, this isn't a relationship of equals. The relationship encoded in a hash is that of the genitive case or the possessive, like the word "of " in English, or like "'s". We could encode that the boss of Nat is Tim. Hashes only give convenient ways to access values for Nat's boss; you can't ask whose boss Tim is. Finding the answer to that question is a recipe in this chapter.Fortunately, hashes have their own special benefits, just like relationships. Hashes are a built-in data type in Perl. Their use reduces many complex algorithms to simple variable accesses. They are also fast and convenient to build indices and quick lookup tables.Only use the
%when referring to the hash as a whole, such as%boss. When referring to the value associated with a particular key, that's a single scalar value, so a$is called for—just as when referring to one element of an array, you also use a$. This means that "the boss of Nat" would be written as$boss{"Nat"}. We can assign "Tim" to that:$boss{"Nat"} = "Tim";It's time to put a name to these notions. The relationship embodied in a hash is a good thing to use for its name. In the previous example you see a dollar sign, which might surprise you since this is a hash, not a scalar. But we're setting a single scalar value in that hash, so use a dollar sign. Where a lone scalar hasEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauPeople and parts of computer programs interact in all sorts of ways. Single scalar variables are like hermits, living a solitary existence whose only meaning comes from within the individual. Arrays are like cults, where multitudes marshal themselves under the name of a charismatic leader. In the middle lies the comfortable, intimate ground of the one-to-one relationship that is the hash. (Older documentation for Perl often called hashes associative arrays, but that's a mouthful. Other languages that support similar constructs sometimes use different terms for them; you may hear about hash tables, tables, dictionaries, mappings, or even alists, depending on the language.)Unfortunately, this isn't a relationship of equals. The relationship encoded in a hash is that of the genitive case or the possessive, like the word "of " in English, or like "'s". We could encode that the boss of Nat is Tim. Hashes only give convenient ways to access values for Nat's boss; you can't ask whose boss Tim is. Finding the answer to that question is a recipe in this chapter.Fortunately, hashes have their own special benefits, just like relationships. Hashes are a built-in data type in Perl. Their use reduces many complex algorithms to simple variable accesses. They are also fast and convenient to build indices and quick lookup tables.Only use the
%when referring to the hash as a whole, such as%boss. When referring to the value associated with a particular key, that's a single scalar value, so a$is called for—just as when referring to one element of an array, you also use a$. This means that "the boss of Nat" would be written as$boss{"Nat"}. We can assign "Tim" to that:$boss{"Nat"} = "Tim";It's time to put a name to these notions. The relationship embodied in a hash is a good thing to use for its name. In the previous example you see a dollar sign, which might surprise you since this is a hash, not a scalar. But we're setting a single scalar value in that hash, so use a dollar sign. Where a lone scalar hasEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Adding an Element to a Hash
- InhaltsvorschauYou need to add an entry to a hash.Simply assign to the hash key:
$HASH{$KEY} = $VALUE;Putting something into a hash is straightforward. In languages that don't provide the hash as an intrinsic data type, you have to worry about overflows, resizing, and collisions in your hash table. In Perl, all that is taken care of for you with a simple assignment. If that entry was already occupied (had a previous value), memory for that value is automatically freed, just as when assigning to a simple scalar.# %food_color defined per the introduction $food_color{Raspberry} = "pink"; print "Known foods:\n"; foreach $food (keys %food_color) { print "$food\n"; } Known foods Banana Apple Raspberry Carrot LemonIf you don't want to overwrite an existing value, but somehow have one key reference multiple values, see Recipe 5.8 and Recipe 11.2.The "List Value Constructors" section of perldata(1); the "List Values and Arrays" section of Chapter 2 of Programming Perl; Recipe 5.2Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Testing for the Presence of a Key in a Hash
- InhaltsvorschauYou need to know whether a hash has a particular key, regardless of whatever value may be associated with that key.Use the
existsfunction.# does %HASH have a value for $KEY ? if (exists($HASH{$KEY})) { # it exists } else { # it doesn't }This code usesexiststo check whether a key is in the%food_colorhash:# %food_color per the introduction foreach $name ("Banana", "Martini") { if (exists $food_color{$name}) { print "$name is a food.\n"; } else { print "$name is a drink.\n"; } } Banana is a food Martini is a drinkTheexistsfunction tests whether a key is in the hash. It doesn't test whether the value corresponding to that key is defined, nor whether the value is true or false. We may be splitting hairs, but problems caused by confusing existence, definedness, and truth can multiply like rabbits. Take this code:%age = ( ); $age{"Toddler"} = 3; $age{"Unborn"} = 0; $age{"Phantasm"} = undef; foreach $thing ("Toddler", "Unborn", "Phantasm", "Relic") { print "$thing: "; print "Exists " if exists $age{$thing}; print "Defined " if defined $age{$thing}; print "True " if $age{$thing}; print "\n"; } Toddler: Exists Defined True Unborn: Exists Defined Phantasm: Exists Relic$age{"Toddler"}passes the existence, definedness, and truth tests. It exists because we gave "Toddler" a value in the hash; it's defined because that value isn'tundef; and it's true because the value isn't one of Perl's false values.$age{"Unborn"}passes only the existence and definedness tests. It exists because we gave "Unborn" a value in the hash, and it's defined because that value isn'tundef. It isn't true, however, because0is one of Perl's false values.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Creating a Hash with Immutable Keys or Values
- InhaltsvorschauYou'd like to have a hash whose keys or values can't be altered once set.Use the appropriate functions from the standard Hash::Util module.
use Hash::Util qw{ lock_keys unlock_keys lock_value unlock_value lock_hash unlock_hash };To restrict access to keys already in the hash, so no new keys can be introduced:lock_keys(%hash); # restrict to current keys lock_keys(%hash, @klist); # restrict to keys from @klist
To forbid deletion of the key or modification of its value:lock_value(%hash, $key);
To make all keys and their values read-only:lock_hash(%hash);
Suppose you're using a hash to implement a record (or an object) with some pre-determined set of keys, such as "NAME", "RANK", and "SERNO". You'd like to consider it an error to access any keys besides the ones initially in the hash, such as "ANME", a typo. Because Perl always creates hash elements on demand, this wouldn't be caught the way it would if you misspelled a variable name while under theuse strictpragma.The Hash::Util module'slock_keysfunction takes care of this for you. Once a hash is marked as having locked keys, you can't use any other keys than those. The keys need not yet be in the hash, and they may still be deleted if they are. But no new keys may be used.Access to the values in those locked keys is not restricted bylock_keys. However, you may use thelock_valuefunction to render a value in a hash read-only. That hash can also have its keys locked, but doesn't need to if the goal is just to have one or more values marked read-only.If you want to lock down the entire hash, thereby restricting both its keys and its values, thelock_hashfunction will do.The documentation for the Hash::Util moduleEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Deleting from a Hash
- InhaltsvorschauYou want to remove an entry from a hash so that it doesn't show up with
keys,values, oreach. If you were using a hash to associate salaries with employees, and an employee resigned, you'd want to remove their entry from the hash.Use thedeletefunction:# remove $KEY and its value from %HASH delete($HASH{$KEY});Sometimes people mistakenly try to useundefto remove an entry from a hash.undef$hash{$key}and$hash{$key}=undefboth make%hashhave an entry with key$keyand valueundef.Thedeletefunction is the only way to remove a specific entry from a hash. Once you've deleted a key, it no longer shows up in akeyslist or aneachiteration, andexistswill return false for that key.This demonstrates the difference betweenundefanddelete:# %food_color as per Introduction sub print_foods { my @foods = keys %food_color; my $food; print "Keys: @foods\n"; print "Values: "; foreach $food (@foods) { my $color = $food_color{$food}; if (defined $color) { print "$color "; } else { print "(undef) "; } } print "\n"; } print "Initially:\n"; print_foods( ); print "\nWith Banana undef\n"; undef $food_color{"Banana"}; print_foods( ); print "\nWith Banana deleted\n"; delete $food_color{"Banana"}; print_foods( ); Initially Keys: Banana Apple Carrot Lemon Values: yellow red orange yellow With Banana undef Keys: Banana Apple Carrot Lemon Values: (undef) red orange yellow With Banana deleted Keys: Apple Carrot Lemon Values: red orange yellowAs you see, if we set$food_color{"Banana"}toundef, "Banana" still shows up as a key in the hash. The entry is still there; we only succeeded in making the valueEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Traversing a Hash
- InhaltsvorschauYou want to perform an action on each entry (i.e., each key-value pair) in a hash.Use
eachwith awhileloop:while(($key, $value) = each(%HASH)) { # do something with $key and $value }Or usekeyswith aforeachloop, unless the hash is potentially very large:foreach $key (keys %HASH) { $value = $HASH{$key}; # do something with $key and $value }Here's a simple example, iterating through the%food_colorhash from the introduction:# %food_color per the introduction while(($food, $color) = each(%food_color)) { print "$food is $color.\n"; } Banana is yellow Apple is red Carrot is orange Lemon is yellow foreach $food (keys %food_color) { my $color = $food_color{$food}; print "$food is $color.\n"; } Banana is yellow Apple is red Carrot is orange Lemon is yellowWe didn't really need the$colorvariable in theforeachexample, because we use it only once. Instead, we could have written:print "$food is $food_color{$food}.\n"Every timeeachis called on the same hash, it returns the "next" key-value pair. We say "next" because the pairs are returned in the order the underlying lookup structure imposes on them, which appears to be no order at all. Wheneachruns out of hash elements, it returns the empty list( ), whose assignment tests false and terminates thewhileloop.Theforeachexample useskeys, which constructs an entire list containing every key from the hash before the loop even begins executing. The advantage to usingeachis that it gets the keys and values one pair at a time. If the hash contains many keys, not having to preconstruct a complete list of them can save substantial memory. TheeachEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Printing a Hash
- InhaltsvorschauYou want to print a hash, but neither
print"%hash" norprint%hashdoes what you want; the first is a literal, while the second just has the keys and values all scrunched together.One of several approaches is to iterate over every key-value pair in the hash using Recipe 5.5 and print them:while ( ($k,$v) = each %hash ) { print "$k => $v\n"; }Or usemapto generate a list of strings:print map { "$_ => $hash{$_}\n" } keys %hash;Or use the interpolation trick from Recipe 1.15 to interpolate the hash as a list:print "@{[ %hash ]}\n";Or use a temporary array variable to hold the hash, and then print that:{ my @temp = %hash; print "@temp"; }The methods differ in the degree that their output is customizable (in order and formatting) and in their efficiency.The first method, iterating over the hash, is flexible and space-efficient. You can format the output as you like it, and it requires only two scalar variables: the current key and value. You can print the hash in key order (at the cost of building a list of sorted keys) if you use aforeachloop:foreach $k (sort keys %hash) { print "$k => $hash{$k}\n"; }Themapfunction is just as flexible. You can still process the list in any order by sorting the keys. You can customize the output to your heart's content. But it builds up a list of strings like "KEY=>VALUE\n" to pass toprint.The last two methods are interpolation tricks. By treating the hash as a list, you can't predict or control the output order of key-value pairs. Furthermore, the output will consist of a list of keys and values, each separated by whatever string that$" happens to hold. You can't put newlines between pairs or "=>" within them, as we could with the other methods.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Retrieving from a Hash in Insertion Order
- InhaltsvorschauThe
keysandeachfunctions traverse the hash elements in a strange order, and you want them in the order in which you inserted them.Use the Tie::IxHash module.use Tie::IxHash; tie %HASH, "Tie::IxHash"; # manipulate %HASH @keys = keys %HASH; # @keys is in insertion order
Tie::IxHash makeskeys,each, andvaluesreturn the hash elements in the order they were added. This often removes the need to preprocess the hash keys with a complexsortcomparison or maintain a distinct array containing the keys in the order they were inserted into the hash.Tie::IxHash also provides an object-oriented interface tosplice,push,pop,shift,unshift,keys,values, anddelete, among others.Here's an example, showing bothkeysandeach:# initialize use Tie::IxHash; tie %food_color, "Tie::IxHash"; $food_color{"Banana"} = "Yellow"; $food_color{"Apple"} = "Green"; $food_color{"Lemon"} = "Yellow"; print "In insertion order, the foods are:\n"; foreach $food (keys %food_color) { print " $food\n"; } print "Still in insertion order, the foods' colors are:\n"; while (( $food, $color ) = each %food_color ) { print "$food is colored $color.\n"; } In insertion order, the foods are Banana Apple Lemon Still in insertion order, the foods' colors are Banana is colored Yellow Apple is colored Green Lemon is colored YellowThe documentation for the CPAN module Tie::IxHash; Recipe 13.5Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Hashes with Multiple Values per Key
- InhaltsvorschauYou want to store more than one value for each key.Store an array reference in
$hash{$key}, then put the values into the referenced array.You can store only scalar values in a hash. References, however, are scalars. This solves the problem of storing multiple values for one key by making$hash{$key}a reference to an array containing values for$key. The normal hash operations—insertion, deletion, iteration, and testing for existence—can now be written in terms of array operations likepush,splice, andforeach.This code shows simple insertion into the hash. It processes the output of who(1) on Unix machines and outputs a terse listing of users and the ttys they're logged in on:%ttys = ( ); open(WHO, "who|") or die "can't open who: $!"; while (<WHO>) { ($user, $tty) = split; push( @{$ttys{$user}}, $tty ); } foreach $user (sort keys %ttys) { print "$user: @{$ttys{$user}}\n"; }The heart of the code is thepushline, the multivalued version of$ttys{$user}=$tty. The first time through, that hash value is undefined, so Perl automatically allocates a new anonymous hash and stores its reference in that value so that thepushcan succeed. This is called autovivification, and is explained more in Chapter 11.We interpolate all the tty names in theprintline with@{$ttys{$user}}. We'd loop over the anonymous array if, for instance, we wanted to print the owner of each tty:foreach $user (sort keys %ttys) { print "$user: ", scalar( @{$ttys{$user}} ), " ttys.\n"; foreach $tty (sort @{$ttys{$user}}) { @stat = stat("/dev/$tty"); $user = @stat ? ( getpwuid($stat[4]) )[0] : "(not available)"; print "\t$tty (owned by $user)\n"; } }Theexistsfunction can have two meanings: "Is there at least one value for this key?" and "Does this value exist for this key?" Implementing the second approach requires searching the array for the value. TheEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Inverting a Hash
- InhaltsvorschauHashes map keys to values. You have a hash and a value whose corresponding key you want to find.Use
reverseto create an inverted hash whose values are the original hash's keys and vice versa.# %LOOKUP maps keys to values %REVERSE = reverse %LOOKUP;
This technique uses the list equivalence of hashes mentioned in the introduction. In list context,reversetreats%LOOKUPas a list and reverses the order of its elements. The significant property of a hash treated as a list is that the list elements come in associated pairs: the first element is the key; the second, the value. When youreversesuch a list, the first element is now the value, and the second the key. Treating this list as a hash results in a hash whose values are the keys of the original hash and vice versa.Here's an example:%surname = ( "Mickey" => "Mantle", "Babe" => "Ruth" ); %first_name = reverse %surname; print $first_name{"Mantle"}, "\n"; MickeyWhen we treat%surnameas a list, it becomes:("Mickey", "Mantle", "Babe", "Ruth")(or maybe("Babe", "Ruth", "Mickey", "Mantle")because we can't predict the order). Reversing this list gives us:("Ruth", "Babe", "Mantle", "Mickey")When we treat this list as a hash, it becomes:("Ruth" => "Babe", "Mantle" => "Mickey")Now instead of turning first names into surnames, it turns surnames into first names.Example 5-2 is a program calledfoodfind. If you give it a food name, it'll tell you the color of that food. If you give it a color, it'll tell you a food of that color.Example 5-2. foodfind#!/usr/bin/perl -w # foodfind - find match for food or color $given = shift @ARGV or die "usage: foodfind food_or_color\n"; %color = ( "Apple" => "red", "Banana" => "yellow", "Lemon" => "yellow", "Carrot" => "orange" ); %food = reverse %color; if (exists $color{$given}) { print "$given is a food with color $color{$given}.\n"; } if (exists $food{$given}) { print "$food{$given} is a food with color $given.\n"; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Sorting a Hash
- InhaltsvorschauYou need to work with the elements of a hash in a particular order.Use
keysto get a list of keys, thensortthem based on the ordering you want:# %hash is the hash to sort @keys = sort { criterion( ) } (keys %hash); foreach $key (@keys) { $value = $hash{$key}; # do something with $key, $value }Even though you can't directly maintain a hash in a specific order (unless you use the Tie::IxHash module mentioned in Recipe 5.7), you can access its entries in any order.This technique offers many variations on the same basic mechanism: you extract the keys, reorder them using thesortfunction, and then process the entries in the new order. All the sorting tricks shown in Chapter 4 can be used here. Let's look at some applications.The following code simply usessortto order the keys alphabetically:foreach $food (sort keys %food_color) { print "$food is $food_color{$food}.\n"; }This sorts the keys by their associated values:foreach $food (sort { $food_color{$a} cmp $food_color{$b} } keys %food_color) { print "$food is $food_color{$food}.\n"; }This sorts by length of the values:@foods = sort { length($food_color{$a}) <=> length($food_color{$b}) } keys %food_color; foreach $food (@foods) { print "$food is $food_color{$food}.\n"; }Thesortandkeysfunctions in perlfunc(1) and in Chapter 29 of Programming Perl; Recipe 5.7; we discuss sorting lists in Recipe 4.16Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Merging Hashes
- InhaltsvorschauYou need to make a new hash with the entries of two existing hashes.Treat them as lists, and join them as you would lists.
%merged = (%A, %B);
To save memory, loop over the hashes' elements and build a new hash that way:%merged = ( ); while ( ($k,$v) = each(%A) ) { $merged{$k} = $v; } while ( ($k,$v) = each(%B) ) { $merged{$k} = $v; }The first method, like the earlier recipe on inverting a hash, uses the hash-list equivalence explained in the introduction.(%A,%B)evaluates to a list of paired keys and values. When we assign it to%merged, Perl turns that list of pairs back into a hash.Here's an example of that technique:# %food_color as per the introduction %drink_color = ( Galliano => "yellow", "Mai Tai" => "blue" ); %ingested_color = (%drink_color, %food_color);Keys in both input hashes appear just once in the output hash. If a food and a drink shared the same name, for instance, then the last one seen by the first merging technique would be the one that showed up in the resultant hash.This style of direct assignment, as in the first example, is easier to read and write, but requires a lot of memory if the hashes are large. That's because Perl has to unroll both hashes into a temporary list before the assignment to the merged hash is done. Step-by-step merging usingeach, as in the second technique, spares you that cost and lets you decide what to do with duplicate keys.The first example could be rewritten to use theeachtechnique:# %food_color per the introduction, then %drink_color = ( Galliano => "yellow", "Mai Tai" => "blue" ); %substance_color = ( ); while (($k, $v) = each %food_color) { $substance_color{$k} = $v; } while (($k, $v) = each %drink_color) { $substance_color{$k} = $v; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding Common or Different Keys in Two Hashes
- InhaltsvorschauYou need to find keys in one hash that are or are not present in another hash.Use
keysto loop through the keys of one hash, checking whether each key is also in the other hash.Section 5.12.2.1: Find common keys
my @common = ( ); foreach (keys %hash1) { push(@common, $_) if exists $hash2{$_}; } # @common now contains common keysSection 5.12.2.2: Find keys from one hash that aren't in both
my @this_not_that = ( ); foreach (keys %hash1) { push(@this_not_that, $_) unless exists $hash2{$_}; }Because we're finding common or different keys of the hashes, we can apply our earlier array recipes for finding common or different elements to arrays of the hashes' keys. For an explanation, see Recipe 4.9.This code uses the difference technique to find non-citrus foods:# %food_color per the introduction # %citrus_color is a hash mapping citrus food name to its color. %citrus_color = ( Lemon => "yellow", Orange => "orange", Lime => "green" ); # build up a list of non-citrus foods @non_citrus = ( ); foreach (keys %food_color) { push (@non_citrus, $_) unless $citrus_color{$_}; }The "Hashes" section of Chapter 2 of Programming Perl; theeachfunction in perlfunc(1) and in Chapter 29 of Programming PerlEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Hashing References
- InhaltsvorschauWhen you use
keyson a hash whose keys are references, the references thatkeysreturns no longer work. This situation often arises when you want to cross-reference two different hashes.Use Tie::RefHash:use Tie::RefHash; tie %hash, "Tie::RefHash"; # you may now use references as the keys to %hash
Hash keys are automatically "stringified," that is, treated as though they appeared between double quotes. With numbers or strings, nothing is lost. This isn't so with references, though.Stringified references look like these:Class::Somewhere=HASH(0x72048) ARRAY(0x72048)A stringified reference can't be dereferenced, because it is just a string and no longer a reference. This means you can't use references as the keys to a hash without losing their "magic."Hand-rolled solutions to this problem involve maintaining a distinct hash whose keys are stringified references and whose values are the actual references. This is what Tie::RefHash does. We'll use IO objects for filehandles here to show you that even such strange references can index a hash tied with Tie::RefHash.Here's an example:use Tie::RefHash; use IO::File; tie %name, "Tie::RefHash"; foreach $filename ("/etc/termcap", "/vmunix", "/bin/cat") { $fh = IO::File->new("< $filename") or next; $name{$fh} = $filename; } print "open files: ", join(", ", values %name), "\n"; foreach $file (keys %name) { seek($file, 0, 2); # seek to the end printf("%s is %d bytes long.\n", $name{$file}, tell($file)); }If you're storing objects as the keys to a hash, though, you almost always should be storing a unique attribute of the object (e.g., name or ID number) instead.The documentation for the standard Tie::RefHash module; the "Warning" section ofEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Presizing a Hash
- InhaltsvorschauYou want to preallocate memory for a hash to speed up your program so Perl won't have to incrementally allocate memory each time a new entry is added to the hash. Often you know the final size of a hash before you start building it up, and it's possible to use this information to speed up your program.Assign the number of key-value pairs your hash will have to
keys%HASH.# presize %hash to $num keys(%hash) = $num;
This feature may or may not improve your performance. Perl already shares keys between hashes, so if you already have a hash with "Apple" as a key, Perl won't need to allocate memory for another copy of "Apple" when you add an entry whose key is "Apple" to another hash.# will have 512 users in %users keys(%users) = 512;
Perl's internal data structures require the number of keys to be a power of 2. If we had said:keys(%users) = 1000;
Perl would have internally allocated 1024 "buckets" for the hash. Keys and buckets aren't always one to one. You get the best performance when they are, but the distribution of keys to buckets is dependent on your keys and Perl's (immutable) hash algorithm.Thekeysfunction in perlfunc(1) and Chapter 29 of Programming Perl; Recipe 4.3Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding the Most Common Anything
- InhaltsvorschauYou have an aggregate data structure, such as an array or a hash. You want to know how often each element in the array (or value in the hash) occurs. For instance, if your array contains web server transactions, you might want to find the most commonly requested file. If your hash maps usernames to number of logins, you want to find the most common number of logins.Use a hash to count how many times each element, key, or value appears:
%count = ( ); foreach $element (@ARRAY) { $count{$element}++; }Any time you want to count how often different things appear, you should probably be using a hash. Theforeachadds one to$count{$element}for every occurrence of$element.Recipe 4.7 and Recipe 4.8Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Representing Relationships Between Data
- InhaltsvorschauYou want to represent relationships between elements of data—for instance, the mother of relationship in a family tree or parent process for a process table. This is closely related to representing tables in relational databases (tables represent relationships between information) and to representing computer science graph structures (edges represent relationships between nodes).Use a hash to represent the relationship.Here's part of the family tree from the Bible:
%father = ( 'Cain' => 'Adam', 'Abel' => 'Adam', 'Seth' => 'Adam', 'Enoch' => 'Cain', 'Irad' => 'Enoch', 'Mehujael' => 'Irad', 'Methusael' => 'Mehujael', 'Lamech' => 'Methusael', 'Jabal' => 'Lamech', 'Jubal' => 'Lamech', 'Tubalcain' => 'Lamech', 'Enos' => 'Seth' );This lets us, for instance, easily trace a person's lineage:while (<>) { chomp; do { print "$_ "; # print the current name $_ = $father{$_}; # set $_ to $_'s father } while defined; # until we run out of fathers print "\n"; }We can already ask questions like "Who begat Seth?" by checking the%fatherhash. By inverting this hash, we invert the relationship. This lets us use Recipe 5.9 to answer questions like "Whom did Lamech beget?"while ( ($k,$v) = each %father ) { push( @{ $children{$v} }, $k ); } $" = ', '; # separate output with commas while (<>) { chomp; if ($children{$_}) { @children = @{$children{$_}}; } else { @children = "nobody"; } print "$_ begat @children.\n"; }Hashes can also represent relationships such as the C language#includes. A includes B if A contains#includeEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: dutree
- InhaltsvorschauThe dutree program, shown in Example 5-3, turns the output of du:
% du pcb 19 pcb/fix 20 pcb/rev/maybe/yes 10 pcb/rev/maybe/not 705 pcb/rev/maybe 54 pcb/rev/web 1371 pcb/rev 3 pcb/pending/mine 1016 pcb/pending 2412 pcb
into sorted, indented output:2412 pcb | 1371 rev | | 705 maybe | | | 675 | | | 20 yes | | | 10 not | | 612 | | 54 web | 1016 pending | | 1013 | | 3 mine | 19 fix | 6The arguments you give dutree are passed through to du. That way you could call dutree in any of these ways, or maybe more if your du supports other options:% dutree % dutree /usr % dutree -a % dutree -a /bin
The%Dirsizehash maintains the mapping of names to sizes. For example,$Dirsize{"pcb"}contains 2412 in this sample run. We'll use that hash both for output and for sorting each directory's subdirectories by size.%Kidsis more interesting. For any given path PATH,$Kids{PATH} contains a (reference to an) array of names of subdirectories of this one. The "pcb" entry contains a reference to an anonymous array containing "fix", "rev", and "pending". The "rev" entry contains "maybe" and "web". The "maybe" entry contains "yes" and "not", which do not have their own entries because they are end nodes in the tree.Theoutputfunction is passed the start of the tree—the last line read in from the output of du. First it prints that directory and its size. Then the function sorts the directory's children (if any) so that those with the most disk usage float to the top. Finally,Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 6: Pattern Matching
- Inhaltsvorschau[Art is] pattern informed by sensibility.—Sir Herbert Read, The Meaning of ArtMost modern programming languages offer primitive pattern-matching tools, usually through an extra library. In contrast, Perl's patterns are integrated directly into the language core. Perl's pattern matching boasts features not found elsewhere, features that encourage a whole different way of looking at data. Just as chess players see patterns in the board positions that their pieces control, Perl adepts look at data in terms of patterns. These patterns, expressed in the intensely symbolic notation of regular expressions, provide access to powerful algorithms normally available only to scholars of computer science."If this pattern matching thing is so powerful and so fantastic," you may be asking, "why don't you have a hundred different recipes on regular expressions in this chapter?" Regular expressions are the natural solution to many problems involving numbers, strings, dates, web documents, mail addresses, and almost everything else in this book; we use pattern matching over 100 times in other chapters. This chapter mostly presents recipes in which pattern matching forms part of the questions, not just part of the answers.Perl's extensive and integrated support for regular expressions means that you not only have features available that you won't find in any other language, but you have new ways of using them, too. Programmers new to Perl often look for functions like these:
match( $string, $pattern ); subst( $string, $pattern, $replacement );
but matching and substituting are such common tasks that they merit their own notation:$meadow =~ m/sheep/; # True if $meadow contains "sheep" $meadow !~ m/sheep/; # True if $meadow doesn't contain "sheep" $meadow =~ s/old/new/; # Replace "old" with "new" in $meadow
Pattern matching isn't like direct string comparison, even at its simplest level; it's more like string searching with mutant wildcards on steroids. Without anchors, the position where the match occurs can float freely throughout the string. Any of the following lines would also be matched by the expressionEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauMost modern programming languages offer primitive pattern-matching tools, usually through an extra library. In contrast, Perl's patterns are integrated directly into the language core. Perl's pattern matching boasts features not found elsewhere, features that encourage a whole different way of looking at data. Just as chess players see patterns in the board positions that their pieces control, Perl adepts look at data in terms of patterns. These patterns, expressed in the intensely symbolic notation of regular expressions, provide access to powerful algorithms normally available only to scholars of computer science."If this pattern matching thing is so powerful and so fantastic," you may be asking, "why don't you have a hundred different recipes on regular expressions in this chapter?" Regular expressions are the natural solution to many problems involving numbers, strings, dates, web documents, mail addresses, and almost everything else in this book; we use pattern matching over 100 times in other chapters. This chapter mostly presents recipes in which pattern matching forms part of the questions, not just part of the answers.Perl's extensive and integrated support for regular expressions means that you not only have features available that you won't find in any other language, but you have new ways of using them, too. Programmers new to Perl often look for functions like these:
match( $string, $pattern ); subst( $string, $pattern, $replacement );
but matching and substituting are such common tasks that they merit their own notation:$meadow =~ m/sheep/; # True if $meadow contains "sheep" $meadow !~ m/sheep/; # True if $meadow doesn't contain "sheep" $meadow =~ s/old/new/; # Replace "old" with "new" in $meadow
Pattern matching isn't like direct string comparison, even at its simplest level; it's more like string searching with mutant wildcards on steroids. Without anchors, the position where the match occurs can float freely throughout the string. Any of the following lines would also be matched by the expressionEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Copying and Substituting Simultaneously
- InhaltsvorschauYou're tired of using two separate statements with redundant information, one to copy and another to substitute.Instead of:
$dst = $src; $dst =~ s/this/that/;
use:($dst = $src) =~ s/this/that/;
Sometimes you wish you could run a search and replace on a copy of a string, but you don't care to write this in two separate steps. You don't have to, because you can apply the regex operation to the result of the copy operation.For example:# strip to basename ($progname = $0) =~ s!^.*/!!; # Make All Words Title-Cased ($capword = $word) =~ s/(\w+)/\u\L$1/g; # /usr/man/man3/foo.1 changes to /usr/man/cat3/foo.1 ($catpage = $manpage) =~ s/man(?=\d)/cat/;
You can even use this technique on an entire array:@bindirs = qw( /usr/bin /bin /usr/local/bin ); for (@libdirs = @bindirs) { s/bin/lib/ } print "@libdirs\n"; /usr/lib /lib /usr/local/libBecause of precedence, parentheses are required when combining an assignment if you wish to change the result in the leftmost variable. The result of a substitution is its success: either "" for failure, or an integer number of times the substitution was done. Contrast this with the preceding examples where the parentheses surround the assignment itself. For example:($a = $b) =~ s/x/y/g; # 1: copy $b and then change $a $a = ($b =~ s/x/y/g); # 2: change $b, count goes in $a $a = $b =~ s/x/y/g; # 3: same as 2
The "Variables" section of Chapter 2 of Programming Perl, and the "Assignment Operators" section of perlop(1) and Chapter 3 of Programming PerlEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Matching Letters
- InhaltsvorschauYou want to see whether a string contains only alphabetic characters.The obvious character class for matching regular letters isn't good enough in the general case:
if ($var =~ /^[A-Za-z]+$/) { # it is purely alphabetic }because it doesn't pay attention to letters with diacritics or characters from other writing systems. The best solution is to use Unicode properties:if ($var =~ /^\p{Alphabetic}+$/) { # or just /^\pL+$/ print "var is purely alphabetic\n"; }On older releases of Perl that don't support Unicode, your only real option was to use either a negated character class:if ($var =~ /^[^\W\d_]+$/) { print "var is purely alphabetic\n"; }or, if supported, POSIX character classes:if ($var =~ /^[[:alpha:]]+$/) { print "var is purely alphabetic\n"; }But these don't work for non-ASCII letters unless youuselocaleand the system you're running on actually supports POSIX locales.Apart from Unicode properties or POSIX character classes, Perl can't directly express "something alphabetic" independent of locale, so we have to be more clever. The\wregular expression notation matches one alphabetic, numeric, or underscore character—hereafter known as an "alphanumunder" for short. Therefore,\Wis one character that is not one of those. The negated character class[^\W\d_]specifies a character that must be neither a non-alphanumunder, a digit, nor an underscore. That leaves nothing but alphabetics, which is what we were looking for.Here's how you'd use this in a program:use locale; use POSIX 'locale_h'; # the following locale string might be different on your system unless (setlocale(LC_ALL, "fr_CA.ISO8859-1")) { die "couldn't set locale to French Canadian\n"; } while (<DATA>) { chomp; if (/^[^\W\d_]+$/) { print "$_: alphabetic\n"; } else { print "$_: line noise\n"; } } _ _END_ _ silly façade coöperate niño Renée Molière hæmoglobin naïve tschüß random!stuff#hereEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Matching Words
- InhaltsvorschauYou want to pick out words from a string.Think hard about what you want a word to be and what separates one word from the next, and then write a regular expression that encodes your decisions. For example:
/\S+/ # as many non-whitespace characters as possible /[A-Za-z'-]+/ # as many letters, apostrophes, and hyphens
Because words vary between applications, languages, and input streams, Perl does not have built-in definitions of words. You must make them from character classes and quantifiers yourself, as we did previously. The second pattern is an attempt to recognize "shepherd's" and "sheep-shearing" each as single words.Most approaches have limitations because of the vagaries of written language. For instance, although the second pattern successfully identifies "spank'd" and "counter-clockwise" as words, it also pulls the "rd" out of "23rdPsalm". To be more precise when pulling words out from a string, specify the characters surrounding the word. Normally, this should be a word boundary, not whitespace:/\b([A-Za-z]+)\b/ # usually best /\s([A-Za-z]+)\s/ # fails at ends or w/ punctuation
Although Perl provides\w, which matches a character that is part of a valid Perl identifier, Perl identifiers are rarely what you think of as words, since we mean a string of alphanumerics and underscores, but not colons or quotes. Because it's defined in terms of\w,\bmay surprise you if you expect to match an English word boundary (or, even worse, a Mongolian word boundary).\band\Bcan still be useful. For example,/\Bis\B/matches the string "is" within a word only, not at the edges. And while "thistle" would be found, "vis-à-vis" wouldn't.The treatment of\b,\w, and\sin perlreEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Commenting Regular Expressions
- InhaltsvorschauYou want to make your complex regular expressions understandable and maintainable.You have several techniques at your disposal: electing alternate delimiters to avoid so many backslashes, placing comments outside the pattern or inside it using the
/xmodifier, and building up patterns piecemeal in named variables.The piece of sample code in Example 6-1 uses the first couple techniques, and its initial comment describes the overall intent of the regular expression. For simple patterns, this may be all that is needed. More complex patterns, as in the example, require more documentation.Example 6-1. resname#!/usr/bin/perl -p # resname - change all "foo.bar.com" style names in the input stream # into "foo.bar.com [204.148.40.9]" (or whatever) instead use Socket; # load inet_addr s{ ( # capture the hostname in $1 (?: # these parens for grouping only (?! [-_] ) # lookahead for neither underscore nor dash [\w-] + # hostname component \. # and the domain dot ) + # now repeat that whole thing a bunch of times [A-Za-z] # next must be a letter [\w-] + # now trailing domain part ) # end of $1 capture }{ # replace with this: "$1 " . # the original bit, plus a space ( ($addr = gethostbyname($1)) # if we get an addr ? "[" . inet_ntoa($addr) . "]" # format it : "[???]" # else mark dubious ) }gex; # /g for global # /e for execute # /x for nice formattingFor aesthetics, the example uses alternate delimiters. When you split your match or substitution over multiple lines, using matching braces aids readability. A more common use of alternate delimiters is for patterns and replacements that themselves contain slashes, such as inEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding the Nth Occurrence of a Match
- InhaltsvorschauYou want to find the Nth match in a string, not just the first one. For example, you'd like to find the word preceding the third occurrence of "
fish":One fish two fish red fish blue fishUse the/gmodifier in awhileloop, keeping count of matches:$WANT = 3; $count = 0; while (/(\w+)\s+fish\b/gi) { if (++$count = = $WANT) { print "The third fish is a $1 one.\n"; # Warning: don't `last' out of this loop } } The third fish is a red one.Or use a repetition count and repeated pattern like this:/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;As explained in this chapter's Introduction, using the/gmodifier in scalar context creates something of a progressive match, useful inwhileloops. This is commonly used to count the number of times a pattern matches in a string:# simple way with while loop $count = 0; while ($string =~ /PAT/g) { $count++; # or whatever you'd like to do here } # same thing with trailing while $count = 0; $count++ while $string =~ /PAT/g; # or with for loop for ($count = 0; $string =~ /PAT/g; $count++) { } # Similar, but this time count overlapping matches $count++ while $string =~ /(?=PAT)/g;To find the Nth match, it's easiest to keep your own counter. When you reach the appropriate N, do whatever you care to. A similar technique could be used to find every Nth match by checking for multiples of N using the modulus operator. For example,(++$count%3)= =0would be used to find every third match.If this is too much bother, you can always extract all matches and then hunt for the ones you'd like.$pond = 'One fish two fish red fish blue fish'; # using a temporary @colors = ($pond =~ /(\w+)\s+fish\b/gi); # get all matches $color = $colors[2]; # then the one we want # or without a temporary array $color = ( $pond =~ /(\w+)\s+fish\b/gi )[2]; # just grab element 3 print "The third fish in the pond is $color.\n";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Matching Within Multiple Lines
- InhaltsvorschauYou want to use regular expressions on a string containing more than one logical line, but the special characters . (any character but newline),
^(start of string), and$(end of string) don't seem to work for you. This might happen if you're reading in multiline records or the whole file at once.Use/m,/s, or both as pattern modifiers./sallows . to match a newline (normally it doesn't). If the target string has more than one line in it,/foo.*bar/scould match a "foo" on one line and a "bar" on a following line. This doesn't affect dots in character classes like[#%.], since they are literal periods anyway.The/mmodifier allows^and$to match immediately before and after an embedded newline, respectively./^=head[1-7]/mwould match that pattern not just at the beginning of the record, but anywhere right after a newline as well.A common, brute-force approach to parsing documents where newlines are not significant is to read the file one paragraph at a time (or sometimes even the entire file as one string) and then extract tokens one by one. If the pattern involves dot, such as.+or.*?, and must match across newlines, you need to do something special to make dot match a newline; ordinarily, it does not. When you've read more than one line into a string, you'll probably prefer to have^and$match beginning- and end-of-line, not just beginning- and end-of-string.The difference between/mand/sis important:/mallows^and$to match next to an embedded newline, whereas/sallows . to match newlines. You can even use them together—they're not mutually exclusive.Example 6-2 creates a simplistic filter to strip HTML tags out of each file in@ARGVand then send those results toSTDOUT. First we undefine the record separator so each read operation fetches one entire file. (There could be more than one file, becauseEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading Records with a Separator
- InhaltsvorschauYou want to read records separated by a pattern, but Perl doesn't allow its input record separator variable to be a regular expression.Many problems, most obviously those involving parsing complex file formats, become simpler when you can extract records separated by different strings.Read the whole file and use
split:undef $/; @chunks = split(/pattern/, <FILEHANDLE>);
Perl's official record separator, the$/variable, must be a fixed string, not a pattern. To sidestep this limitation, undefine the input record separator entirely so that the nextreadlineoperation reads the rest of the file. This is sometimes called slurp mode, because it slurps in the whole file as one big string. Thensplitthat huge string using the record separating pattern as the first argument.Here's an example where the input stream is a text file that includes lines consisting of ".Se", ".Ch", and ".Ss", which are special codes in the troff macro set that this book was developed under. These strings are the separators, and we want to find text that falls between them.# .Ch, .Se and .Ss divide chunks of STDIN { local $/ = undef; @chunks = split(/^\.(Ch|Se|Ss)$/m, <>); } print "I read ", scalar(@chunks), " chunks.\n";We create a localized version of$/so its previous value is restored once the block finishes. By usingsplitwith parentheses in the pattern, captured separators are also returned. This way data elements in the return list alternate with elements containing "Se", "Ch", or "Ss".If you don't want separators returned, but still need parentheses, use non-capturing parentheses in the pattern:/^\.(?:Ch|Se|Ss)$/m.To split before a pattern but include the pattern in the return, use a lookahead assertion:/^(?=\.(?:Ch|Se|Ss))/m. That way each chunk except the first starts with the pattern.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Extracting a Range of Lines
- InhaltsvorschauYou want to extract all lines from a starting pattern through an ending pattern or from a starting line number up to an ending line number.A common example of this is extracting the first 10 lines of a file (line numbers 1 to 10) or just the body of a mail message (everything past the blank line).Use the operators
.. or... with patterns or line numbers.The.. operator will test the right operand on the same iteration that the left operand flips the operator into the true state.while (<>) { if (/BEGIN PATTERN/ .. /END PATTERN/) { # line falls between BEGIN and END in the # text, inclusive. } } while (<>) { if (FIRST_LINE_NUM .. LAST_LINE_NUM) { # operate only between first and last line, inclusive. } }But the... operator waits until the next iteration to check the right operand.while (<>) { if (/BEGIN PATTERN/ ... /END PATTERN/) { # line is between BEGIN and END on different lines } } while (<>) { if (FIRST_LINE_NUM ... LAST_LINE_NUM) { # operate only between first and last line, not inclusive } }The range operators,.. and..., are probably the least understood of Perl's myriad operators. They were designed to allow easy extraction of ranges of lines without forcing the programmer to retain explicit state information. Used in scalar context, such as in the test ofifandwhilestatements, these operators return a true or false value that's partially dependent on what they last returned. The expressionleft_operand..right_operandreturns false untilleft_operandis true, but once that test has been met, it stops evaluatingleft_operandand keeps returning true untilright_operandbecomes true, after which it restarts the cycle. Put another way, the first operand turns on the construct as soon as it returns a true value, whereas the second one turns it off as soon asEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Matching Shell Globs as Regular Expressions
- InhaltsvorschauYou want to allow users to specify matches using traditional shell wildcards, not full Perl regular expressions. Wildcards are easier to type than full regular expressions for simple cases.Use the following subroutine to convert four shell wildcard characters into their equivalent regular expression; all other characters are quoted to render them literals.
sub glob2pat { my $globstr = shift; my %patmap = ( '*' => '.*', '?' => '.', '[' => '[', ']' => ']', ); $globstr =~ s{(.)} { $patmap{$1} || "\Q$1" }ge; return '^' . $globstr . '$'; }A Perl regex pattern is not the same as a shell wildcard pattern. The shell's*.*is not a valid regular expression. Its meaning as a pattern would be/^.*\..*$/s, which is admittedly much less fun to type.The function given in the Solution makes these conversions for you, following the standard wildcard rules used by theglobbuilt-in. Table 6-1Table 6-1 shows equivalent wildcard patterns in the shell and in Perl.Table 6-2: Shell globs and equivalent Perl wildcard patterns ShellPerllist.?^list\..$project.*^project\..*$Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Speeding Up Interpolated Matches
- InhaltsvorschauYou want your function or program to take one or more regular expressions as arguments, but doing so seems to run slower than using literals.To overcome this bottleneck, if you have only one pattern whose value won't change during the entire run of a program, store it in a string and use
/$pattern/o:while ($line = <>) { if ($line =~ /$pattern/o) { # do something } }However, that won't work for more than one pattern. Precompile the pattern strings using theqr//operator, then match each result against each of the targets:@pats = map { qr/$_/ } @strings; while ($line = <>) { for $pat (@pats) { if ($line =~ /$pat/) { # do something; } } }When Perl compiles a program, it converts patterns into an internal form. This conversion occurs at compile time for patterns without variables, but at runtime for those that do. Interpolating variables into patterns, as in/$pattern/, can slow your program down—sometimes substantially. This is particularly noticeable when$patternchanges often.The/omodifier locks in the values from variables interpolated into the pattern. That is, variables are interpolated only once: the first time the match is run. Because Perl ignores any later changes to those variables, make sure to use it only on unchanging variables.Using/oon patterns without interpolated variables doesn't hurt, but it also doesn't help. The/omodifier is also of no help when you have an unknown number of regular expressions and need to check one or more strings against all of these patterns, since you need to vary the patterns' contents. Nor is it of any use when the interpolated variable is a function argument, since each call to the function gives the variable a new value.Example 6-4 is an example of the slow but straightforward technique for matching many patterns against many lines. The arrayEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Testing for a Valid Pattern
- InhaltsvorschauYou want to let users enter their own patterns, but an invalid one would abort your program the first time you tried to use it.Test the pattern in an
eval{ }construct first, matching against some dummy string. If$@is not set, no exception occurred, so you know the pattern successfully compiled as a valid regular expression. Here is a loop that continues prompting until the user supplies a valid pattern:do { print "Pattern? "; chomp($pat = <>); eval { "" =~ /$pat/ }; warn "INVALID PATTERN $@" if $@; } while $@;Here's a standalone subroutine that verifies whether a pattern is valid:sub is_valid_pattern { my $pat = shift; eval { "" =~ /$pat/ }; return $@ ? 0 : 1; }Another way to write that is like this:sub is_valid_pattern { my $pat = shift; return eval { "" =~ /$pat/; 1 } || 0; }This version doesn't need to use$@, because if the pattern match executes without exception, the next statement with just a 1 is reached and returned. Otherwise it's skipped, so just a 0 is returned.There's no limit to the number of invalid, uncompilable patterns. The user could mistakenly enter "<I\s*[^">, "*** GET RICH ***", or "+5-i". If you blindly use the proffered pattern in your program, it raises an exception, normally a fatal event.The tiny program in Example 6-6 demonstrates this.Example 6-6. paragrep#!/usr/bin/perl # paragrep - trivial paragraph grepper die "usage: $0 pat [files]\n" unless @ARGV; $/ = ''; $pat = shift; eval { "" =~ /$pat/; 1 } or die "$0: Bad pattern $pat: $@\n"; while (<>) { print "$ARGV $.: $_" if /$pat/o; }That/omeans to interpolate variables once only, even if their contents later change.You could encapsulate this in a function call that returns 1 if the block completes and 0 if not, as shown in the Solution. The simplerEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Honoring Locale Settings in Regular Expressions
- InhaltsvorschauYou want to translate case when in a different locale, or you want to make
\wmatch letters with diacritics, such as José or déjà vu.For example, let's say you're given half a gigabyte of text written in German and told to index it. You want to extract words (with\w+) and convert them to lowercase (withlcor\L), but the normal versions of\wandlcneither match the German words nor change the case of accented letters.Perl's regular-expression and text-manipulation routines have hooks to the POSIX locale setting. Under theuselocalepragma, accented characters are taken care of—assuming a reasonableLC_CTYPEspecification and system support for the same.use locale;
By default,\w+and case-mapping functions operate on upper- and lowercase letters, digits, and underscores. This works only for the simplest of English words, failing even on many common imports. Theuselocaledirective redefines what a "word character" means.In Example 6-7 you see the difference in output between having selected the English ("en") locale and the German ("de") one.Example 6-7. localeg#!/usr/bin/perl -w # localeg - demonstrate locale effects use locale; use POSIX 'locale_h'; $name = "andreas k\xF6nig"; @locale{qw(German English)} = qw(de_DE.ISO_8859-1 us-ascii); setlocale(LC_CTYPE, $locale{English}) or die "Invalid locale $locale{English}"; @english_names = ( ); while ($name =~ /\b(\w+)\b/g) { push(@english_names, ucfirst($1)); } setlocale(LC_CTYPE, $locale{German}) or die "Invalid locale $locale{German}"; @german_names = ( ); while ($name =~ /\b(\w+)\b/g) { push(@german_names, ucfirst($1)); } print "English names: @english_names\n"; print "German names: @german_names\n";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Approximate Matching
- InhaltsvorschauYou want to match fuzzily, that is, allowing for a margin of error, where the string doesn't quite match the pattern. Whenever you want to be forgiving of misspellings in user input, you want fuzzy matching.Use the String::Approx module, available from CPAN:
use String::Approx qw(amatch); if (amatch("PATTERN", @list)) { # matched } @matches = amatch("PATTERN", @list);String::Approx calculates the difference between the pattern and each string in the list. If less than a certain number—by default, 10 percent of the pattern length—of one-character insertions, deletions, or substitutions are required to make the string fit the pattern, it still matches. In scalar context,amatchreturns the number of successful matches. In list context, it returns the strings matched.use String::Approx qw(amatch); open(DICT, "/usr/dict/words") or die "Can't open dict: $!"; while(<DICT>) { print if amatch("balast"); } ballast balustrade blast blastula sandblastOptions passed toamatchcontrol case-sensitivity and the permitted number of insertions, deletions, or substitutions. These are fully described in the String::Approx documentation.The module's matching function seems to run between 10 and 40 times slower than Perl's built-in pattern matching. So use String::Approx only if you're after a fuzziness in your matching that Perl's patterns can't provide.The documentation for the CPAN module String::Approx; Recipe 1.22Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Matching from Where the Last Pattern Left Off
- InhaltsvorschauYou want to match again in the same string, starting from where the last match left off. This is a useful approach to take when repeatedly extracting data in chunks from a string.Use a combination of the
/gand/cmatch modifiers, the\Gpattern anchor, and theposfunction.The/gmodifier on a pattern match makes the matching engine keep track of the position in the string where it finished matching. If the next match also uses/gon that string, the engine starts looking for a match from this remembered position. This lets you, for example, use awhileloop to progressively extract repeated occurrences of a match. Here we find all non-negative integers:while (/(\d+)/g) { print "Found number $1\n"; }Within a pattern,\Gmeans the end of the previous match. For example, if you had a number stored in a string with leading blanks, you could change each leading blank into the digit zero this way:$n = " 49 here"; $n =~ s/\G /0/g; print $n; 00049 hereYou can also make good use of\Gin awhileloop. Here we use\Gto parse a comma-separated list of numbers (e.g., "3,4,5,9,120"):while (/\G,?(\d+)/g) { print "Found number $1\n"; }By default, when your match fails (when we run out of numbers in the examples, for instance) the remembered position is reset to the start. If you don't want this to happen, perhaps because you want to continue matching from that position but with a different pattern, use the modifier/cwith/g:$_ = "The year 1752 lost 10 days on the 3rd of September"; while (/(\d+)/gc) { print "Found number $1\n"; } # the /c above left pos at end of final match if (/\G(\S+)/g) { print "Found $1 right after the last number.\n"; } Found number 1752Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Greedy and Non-Greedy Matches
- InhaltsvorschauYou have a pattern with a greedy quantifier like
*,+,?, or{ }, and you want to stop it from being greedy.A classic example is the naïve substitution to remove tags from HTML. Although it looks appealing,s#<TT>.*</TT>##gsideletes everything from the first openTTtag through the last closing one. This would turn "Even<TT>vi</TT>canedit<TT>troff</TT>effectively." into "Eveneffectively", completely changing the meaning of the sentence!Replace the offending greedy quantifier with the corresponding non-greedy version. That is, change*,+,?, and{ }into*?,+?,??, and{ }?, respectively.Perl has two sets of quantifiers: the maximal ones—*,+,?, and{ }—and the minimal ones—*?,+?,??,and{ }?. Less formally, these two sorts of quantifiers are often referred to as greedy and non-greedy (or sometimes lazy), respectively. For instance, given the string "PerlisaSwissArmyChainsaw!", the pattern/(r.*s)/matches "rlisaSwissArmyChains", whereas/(r.*?s)/matches "rlis".With maximal quantifiers, when you ask to match a variable number of times, such as zero or more times for*or one or more times for+, the matching engine prefers the "or more" portion of that description. Thus/foo.*bar/matches the first "foo" through the last "bar" in the string, rather than only through the next "bar" as some might expect. That's because the greedy.*first expands to the rest of the string, but since that wouldn't leave any characters for "bar" to match, the engine backs up one character at a time until it finds "bar".To make any repetition operator match minimally instead of maximally, add an extraEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Detecting Doubled Words
- InhaltsvorschauYou want to check for doubled words in a document.Use backreferences in your pattern.Parentheses in a pattern make the matching engine remember what text that portion of the pattern matched. Later in the pattern, refer to the actual string that matched with
\1(indicating the string matched by the first set of parentheses),\2(for the string matched by the second set of parentheses), and so on. Don't use$1within a regex, because it would be a variable interpolated before the match began. The pattern/([A-Z])\1/matches a capital letter followed not just by any capital letter, but by whichever one was just matched (i.e., captured by the first set of parentheses in that pattern).The next sample code reads its input files by paragraph, with the definition of paragraph following Perl's notion of a paragraph—a chunk of text terminated by two or more contiguous newlines. Within each paragraph, the code finds all doubled words. It ignores case and can match across newlines.Here we use/xto embed whitespace and comments to improve readability. The/ipermits both instances of "is" in the sentence "Isisthisok?" to match, even though they differ in case. We use/gin awhileloop to keep finding doubled words until we run out of text.$/ = ''; # paragrep mode while (<>) { while ( m{ \b # start at a word boundary (begin letters) (\S+) # find chunk of non-whitespace \b # until another word boundary (end letters) ( \s+ # separated by some whitespace \1 # and that very same chunk again \b # until another word boundary ) + # one or more sets of those }xig ) { print "dup word '$1' at paragraph $.\n"; } }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Matching Nested Patterns
- InhaltsvorschauYou want to match a nested set of enclosing delimiters, such as the arguments to a function call.Use match-time pattern interpolation, recursively:
my $np; $np = qr{ \( (?: (?> [^( )]+ ) # Non-capture group w/o backtracking | (??{ $np }) # Group with matching parens )* \) }x;Or use the Text::Balanced module'sextract_bracketedfunction.The$(??{CODE})construct runs the code and interpolates the string that the code returns right back into the pattern. A simple, non-recursive example that matches palindromes demonstrates this:if ($word =~ /^(\w+)\w?(??{reverse $1})$/ ) { print "$word is a palindrome.\n"; }Consider a word like "reviver", which this pattern correctly reports as a palindrome. The$1variable contains "rev" partway through the match. The optional word character following catches the "i". Then the codereverse $1runs and produces "ver", and that result is interpolated into the pattern.For matching something balanced, you need to recurse, which is a bit tricker. A compiled pattern that uses(??{CODE})can refer to itself. The pattern given in the Solution matches a set of nested parentheses, however deep they may go. Given the value of$npin that pattern, you could use it like this to match a function call:$text = "myfunfun(1,(2*(3+4)),5)"; $funpat = qr/\w+$np/; # $np as above $text =~ /^$funpat$/; # Matches!
You'll find many CPAN modules that help with matching (parsing) nested strings. The Regexp::Common module supplies canned patterns that match many of the tricker strings. For example:use Regexp::Common; $text = "myfunfun(1,(2*(3+4)),5)"; if ($text =~ /(\w+\s*$RE{balanced}{-parens=>'( )'})/o) { print "Got function call: $1\n"; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Expressing AND, OR, and NOT in a Single Pattern
- InhaltsvorschauYou have an existing program that accepts a pattern as an argument or as input. It doesn't allow you to add extra logic, like case-insensitive options, ANDs, or NOTs. So you need to write a single pattern that matches either of two different patterns (the "or" case) or both of two patterns (the "and" case), or that reverses the sense of the match ("not").This situation arises often in configuration files, web forms, or command-line arguments. Imagine there's a program that does this:
chomp($pattern = <CONFIG_FH>); if ( $data =~ /$pattern/ ) { ..... }As the maintainer ofCONFIG_FH, you need to convey Booleans through to the program using one configuration parameter.True if either/ALPHA/or/BETA/matches, like/ALPHA/||/BETA/:/ALPHA|BETA/ /(?:ALPHA)|(?:BETA)/ # works no matter what in both
True if both/ALPHA/and/BETA/match, but may overlap, meaning "BETALPHA" should be okay, like/ALPHA/&&/BETA/:/^(?=.*ALPHA)BETA/s
True if both/ALPHA/and/BETA/match, but may not overlap, meaning that "BETALPHA" should fail:/ALPHA.*BETA|BETA.*ALPHA/s
True if pattern/PAT/does not match, like$var!~/PAT/:/^(?:(?!PAT).)*$/s
True if patternBADdoes not match, but patternGOODdoes:/(?=^(?:(?!BAD).)*$)GOOD/s
(You can't actually count on being able to place the/smodifier there after the trailing slash, but we'll show how to include it in the pattern itself at the end of the Discussion.)When in a normal program you want to know whether something doesn't match, use one of:if (!($string =~ /pattern/)) { something( ) } # ugly if ( $string !~ /pattern/) { something( ) } # preferred unless ( $string =~ /pattern/) { something( ) } # sometimes clearerEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Matching a Valid Mail Address
- InhaltsvorschauYou want to find a pattern to verify the validity of a supplied mail address.Because you cannot do real-time validation of deliverable mail addresses, no single, succinct pattern will solve this problem. You must pick from several available compromise approaches.Our best advice for verifying a person's mail address is to have them enter their address twice, just as you would when changing a password. This usually weeds out typos. If both entries match, send mail to that address with a personal message such as:
Dear someuser@host.com, Please confirm the mail address you gave us on Sun Jun 29 10:29:01 MDT 2003 by replying to this message. Include the string "Rumpelstiltskin" in that reply, but spelled in reverse; that is, start with "Nik...". Once this is done, your confirmed address will be entered into our records.
If you get back a message where they've followed your directions, you can be reasonably assured that it's real.A related strategy that's less open to forgery is to give them a personal identification number (PIN). Record the address and PIN (preferably a random one) for later processing. In the mail you send, ask them to include the PIN in their reply. In case your email bounces, or the message is included via a vacation script, ask them to mail back the PIN slightly altered, such as with the characters reversed, one added or subtracted to each digit, etc.Most common patterns used for address verification or validation fail in various and sometimes subtle ways. For example, the addressthis&that@somewhere.comis valid and quite possibly deliverable, but most patterns that allegedly match valid mail addresses fail to let that one pass.1 while $addr =~ s/\([^( )]*\)//g;
You could use the 6598-byte pattern given on the last page of the first edition ofEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Matching Abbreviations
- InhaltsvorschauSuppose you had a list of commands, such as "
send", "abort", "list", and "edit". The user types one in, but you don't want to make them type out the whole thing.Use the following technique if all strings start with different characters, or to arrange matches so one takes precedence over another, as "SEND" has precedence over "STOP" here:chomp($answer = <>); if ("SEND" =~ /^\Q$answer/i) { print "Action is send\n" } elsif ("STOP" =~ /^\Q$answer/i) { print "Action is stop\n" } elsif ("ABORT" =~ /^\Q$answer/i) { print "Action is abort\n" } elsif ("LIST" =~ /^\Q$answer/i) { print "Action is list\n" } elsif ("EDIT" =~ /^\Q$answer/i) { print "Action is edit\n" }Or use the Text::Abbrev module:use Text::Abbrev; $href = abbrev qw(send abort list edit); for (print "Action: "; <>; print "Action: ") { chomp; my $action = $href->{ lc($_) }; print "Action is $action\n"; }The first technique exchanges the typical operand order of a match. Normally you have a variable on the left side of the match and a known pattern on the right side. We might try to decide which action the user wanted us to take by saying$answer=~/^ABORT/i, which is true if$answerbegins with the string "ABORT". It matches regardless of whether$answerhas anything after "ABORT", so "ABORTLATER" would still match. Handling abbreviations generally requires quite a bit of ugliness:$answer=~/^A(B(O(R(T)?)?)?)?$/i.Compare the classicvariable=~/pattern/with "ABORT"=~/^\Q$answer/i. The\Qescapes characters that would otherwise be treated specially: that way your program won't blow up if the user enters an invalid pattern. When the user enters something like "ab", the expanded match becomes "ABORT"=~Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: urlify
- InhaltsvorschauThis program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries to avoid including end-of-sentence punctuation in the marked-up URL.It is a typical Perl filter, so it can be fed input from a pipe:
% gunzip -c ~/mail/archive.gz | urlify > archive.urlified
or by supplying files on the command line:% urlify ~/mail/*.inbox > ~/allmail.urlified
The program is shown in Example 6-10.Example 6-10. urlify#!/usr/bin/perl # urlify - wrap HTML links around URL-like constructs $protos = '(http|telnet|gopher|file|wais|ftp)'; $ltrs = '\w'; $gunk = ';/#~:.?+=&%@!\-'; $punc = '.:?\-'; $any = "${ltrs}${gunk}${punc}"; while (<>) { s{ \b # start at word boundary ( # begin $1 { $protos : # need resource and a colon [$any] +? # followed by on or more # of any valid character, but # be conservative and take only # what you need to.... ) # end $1 } (?= # look-ahead non-consumptive assertion [$punc]* # either 0 or more punctuation [^$any] # followed by a non-url char | # or else $ # then end of the string ) }{<A HREF="$1">$1</A>}igox; print; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: tcgrep
- InhaltsvorschauThis program is a Perl rewrite of the Unix grep program. Although it runs slower than C versions (especially the GNU greps), it offers many more features.The first and perhaps most important feature is that it runs anywhere Perl does. Other enhancements are that it can ignore anything that's not a plain text file, automatically expand compressed or gzipped files, recurse down directories, search complete paragraphs or user-defined records, look in younger files before older ones, and add underlining or highlighting of matches. It also supports the -c option to indicate a count of matching records, as well as -C for a count of matching patterns when there could be more than one per record.This program uses gzcat or zcat to decompress compressed files, so this feature is unavailable on systems without these programs and systems that can't run external programs (such as old Macs).Run the program with no arguments for a usage message (see the
usagesubroutine in the following code). The following example recursively and case-insensitively greps every file in ~/mail for mail messages from someone called "kate", reporting filenames that contained matches:% tcgrep -ril '^From: .*kate' ~/mail
The program is shown in Example 6-11.Example 6-11. tcgrep#!/usr/bin/perl -w # tcgrep: tom christiansen's rewrite of grep # v1.0: Thu Sep 30 16:24:43 MDT 1993 # v1.1: Fri Oct 1 08:33:43 MDT 1993 # v1.2: Fri Jul 26 13:37:02 CDT 1996 # v1.3: Sat Aug 30 14:21:47 CDT 1997 # v1.4: Mon May 18 16:17:48 EDT 1998 use strict; # globals our ($Me, $Errors, $Grand_Total, $Mult, %Compress, $Matches); my ($matcher, $opt); # matcher - anon. sub to check for matches # opt - ref to hash w/ command-line options init( ); # initialize globals ($opt, $matcher) = parse_args( ); # get command line options and patterns matchfile($opt, $matcher, @ARGV); # process files exit(2) if $Errors; exit(0) if $Grand_Total; exit(1); ################################### sub init { ($Me = $0) =~ s!.*/!!; # get basename of program, "tcgrep" $Errors = $Grand_Total = 0; # initialize global counters $Mult = ""; # flag for multiple files in @ARGV $| = 1; # autoflush output %Compress = ( # file extensions and program names z => 'gzcat', # for uncompressing gz => 'gzcat', Z => 'zcat', ); } ################################### sub usage { die << EOF usage: $Me [flags] [files] Standard grep options: i case insensitive n number lines c give count of lines matching C ditto, but >1 match per line possible w word boundaries only s silent mode x exact matches only v invert search sense (lines that DON'T match) h hide filenames e expression (for exprs beginning with -) f file with expressions l list filenames matching Specials: 1 1 match per file H highlight matches u underline matches r recursive on directories or dot if none t process directories in 'ls -t' order p paragraph mode (default: line mode) P ditto, but specify separator, e.g. -P '%%\\n' a all files, not just plain text files q quiet about failed file and dir opens T trace files as opened May use a TCGREP environment variable to set default options. EOF } ################################### sub parse_args { use Getopt::Std; my ($optstring, $zeros, $nulls, %opt, $pattern, @patterns, $match_code); my ($SO, $SE); if (my $opts = $ENV{TCGREP}) { # get envariable TCGREP $opts =~ s/^([^\-])/-$1/; # add leading - if missing unshift(@ARGV, $opts); # add TCGREP opt string to @ARGV } $optstring = "incCwsxvhe:f:l1HurtpP:aqT"; $zeros = 'inCwxvhelut'; # options to init to 0 $nulls = 'pP'; # options to init to "" @opt{ split //, $zeros } = ( 0 ) x length($zeros); @opt{ split //, $nulls } = ( '' ) x length($nulls); getopts($optstring, \%opt) or usage( ); # handle option "-f patfile", for list of patterns if ($opt{f}) { open(PATFILE, $opt{f}) or die "$Me: Can't open '$opt{f}': $!"; # make sure each pattern in file is valid while ($pattern = <PATFILE>) { chomp $pattern; eval { 'foo' =~ /$pattern/, 1 } or die "$Me: $opt{f}:$.: bad pattern: $@"; push @patterns, $pattern; } close PATFILE; } else { # make sure pattern is valid $pattern = $opt{e} || shift(@ARGV) || usage( ); eval { 'foo' =~ /$pattern/; 1 } or die "$Me: bad pattern: $@"; @patterns = ($pattern); } # option -H is for highlight, option -u is for underline if ($opt{H} || $opt{u}) { my $term = $ENV{TERM} || 'vt100'; my $terminal; # eval{ } only to trap potential exceptions in function calls eval { # try to look up escapes for stand-out require POSIX; # or underline via Term::Cap use Term::Cap; my $termios = POSIX::Termios->new( ); $termios->getattr; my $ospeed = $termios->getospeed; $terminal = Tgetent Term::Cap { TERM=>undef, OSPEED=>$ospeed } }; unless ($@) { # if successful, get escapes for either local $^W = 0; # stand-out (-H) or underlined (-u) ($SO, $SE) = $opt{H} ? ($terminal->Tputs('so'), $terminal->Tputs('se')) : ($terminal->Tputs('us'), $terminal->Tputs('ue')); } else { # if use of Term::Cap fails, ($SO, $SE) = $opt{H} # use tput command to get escapes ? (`tput -T $term smso`, `tput -T $term rmso`) : (`tput -T $term smul`, `tput -T $term rmul`) } } # option -i makes all pattern case insensitive if ($opt{i}) { @patterns = map {"(?i)$_"} @patterns; } # option -p or -P is paragraph mode, so add /m if ($opt{p} || $opt{P}) { @patterns = map {"(?m)$_"} @patterns; } # option -p is standard paragraph mode $opt{p} && ($/ = ''); # option -p is user-defined paragraph mode $opt{P} && ($/ = eval(qq("$opt{P}"))); # for -P '%%\n' # option -w is at word boundary only (XXX: is this always optimal?) $opt{w} && (@patterns = map {'\b' . $_ . '\b'} @patterns); # option -x is for whole lines only $opt{'x'} && (@patterns = map {"^$_\$"} @patterns); # determine whether to emit file name in front of each match if (@ARGV) { $Mult = 1 if ($opt{r} || (@ARGV > 1) || -d $ARGV[0]) && !$opt{h}; } # if just listing filenames, stop after first match $opt{1} += $opt{l}; # that's a one and an ell # this way only need look for -H $opt{H} += $opt{u}; # if we're doing a complete count, where doing some counting $opt{c} += $opt{C}; # if we're counting, keep track of status $opt{'s'} += $opt{c}; # stop at first match if checking status but not counting $opt{1} += $opt{'s'} && !$opt{c}; # that's a one # default args are cwd if recursive, stdin otherwise @ARGV = ($opt{r} ? '.' : '-') unless @ARGV; # we're recursive even w/o -r if all args are directories $opt{r} = 1 if !$opt{r} && grep(-d, @ARGV) = = @ARGV; ###### # now the hard part: build of the matching function as text to eval # $match_code = ''; $match_code .= 'study;' if @patterns > 5; # might speed things up a bit foreach (@patterns) { s(/)(\\/)g } # add the stand-out and end-stand-out sequences for highlight mode if ($opt{H}) { foreach $pattern (@patterns) { $match_code .= "\$Matches += s/($pattern)/${SO}\$1${SE}/g;"; } } # option -v means to count a line if it *doesn't* match elsif ($opt{v}) { foreach $pattern (@patterns) { $match_code .= "\$Matches += !/$pattern/;"; } } # do full count, multiple hits per line elsif ($opt{C}) { foreach $pattern (@patterns) { $match_code .= "\$Matches++ while /$pattern/g;"; } } else { foreach $pattern (@patterns) { $match_code .= "\$Matches++ if /$pattern/;"; } } # now compile as a closure, and grab function pointer $matcher = eval "sub { $match_code }"; die if $@; return (\%opt, $matcher); } ################################### sub matchfile { $opt = shift; # reference to option hash $matcher = shift; # reference to matching sub my ($file, @list, $total, $name); local($_); $total = 0; FILE: while (defined ($file = shift(@_))) { if (-d $file) { if (-l $file && @ARGV != 1) { warn "$Me: \"$file\" is a symlink to a directory\n" if $opt->{T}; next FILE; } if (!$opt->{r}) { warn "$Me: \"$file\" is a directory, but no -r given\n" if $opt->{T}; next FILE; } unless (opendir(DIR, $file)) { unless ($opt->{'q'}) { warn "$Me: can't opendir $file: $!\n"; $Errors++; } next FILE; } @list = ( ); for (readdir(DIR)) { # skip cwd and parent dir push(@list, "$file/$_") unless /^\.{1,2}$/; } closedir(DIR); # option -t is sort by age, youngest first # use algorithm from Recipe 4.XXX, Sorting a List by Computable Field if ($opt->{t}) { @list = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, -M $_ ] } @list; } else { @list = sort @list; } matchfile($opt, $matcher, @list); # process files next FILE; } # avoid annoying situation of grep wanting to read from keyboard # but user not realizing this if ($file eq '-') { warn "$Me: reading from stdin\n" if -t STDIN && !$opt->{'q'}; $name = '<STDIN>'; } else { $name = $file; unless (-e $file) { warn qq($Me: file "$file" does not exist\n) unless $opt->{'q'}; $Errors++; next FILE; } unless (-f $file || $opt->{a}) { warn qq($Me: skipping non-plain file "$file"\n) if $opt->{T}; next FILE; } # could use File::Spec instead my ($ext) = $file =~ /\.([^.]+)$/; # check whether it's an extension whose contents we know # how to convert to plain text via a filter program if (defined($ext) && exists($Compress{$ext})) { $file = "$Compress{$ext} < $file |"; } elsif (! (-T $file || $opt->{a})) { warn qq($Me: skipping binary file "$file"\n) if $opt->{T}; next FILE; } } warn "$Me: checking $file\n" if $opt->{T}; unless (open(FILE, $file)) { unless ($opt->{'q'}) { warn "$Me: $file: $!\n"; $Errors++; } next FILE; } $total = 0; $Matches = 0; LINE: while (<FILE>) { $Matches = 0; ############## &{$matcher}( ); # do it! (check for matches) ############## next LINE unless $Matches; $total += $Matches; if ($opt->{p} || $opt->{P}) { s/\n{2,}$/\n/ if $opt->{p}; chomp if $opt->{P}; } print("$name\n"), next FILE if $opt->{l}; # The following commented out block is the # expanded/legible version of the statement # that immediately follows it. This is one # of the few times we sacrifice readability # for execution speed: we carefully arrange # that print( ) be called just once, not four times, # and we don't resort to a braced block either. # (note that $Mult must be "" not 0 for this to work) ######## ## unless ($opt->{'s'}) { ## print "$name:" if $Mult; ## print "$.:" if $opt{n}; ## print; ## print (('-' x 20) . "\n") if $opt->{p} || $opt->{P}; ## } ######## $opt->{'s'} || print $Mult && "$name:", $opt->{n} ? "$.:" : "", $_, ($opt->{p} || $opt->{P}) && ('-' x 20) . "\n"; next FILE if $opt->{1}; # that's a one } } continue { # again, next block equivalent to line following ####### ## if ($opt->{c}) { ## print $name if $Mult; ## print "$total\n"; ## } ####### print $Mult && "$name:", "$total\n" if $opt->{c}; } $Grand_Total += $total; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Regular Expression Grab Bag
- InhaltsvorschauWe have found these regular expressions useful or interesting:
- Swap first two words
-
s/(\S+)(\s+)(\S+)/$3$2$1/
- Keyword = Value
-
m/^(\w+)\s*=\s*(.*?)\s*$/ # keyword is $1, value is $2
- Line of at least 80 characters
-
m/.{80,}/ length( ) >= 80 # ok, not a regex - MM/DD/YY HH:MM:SS
-
m|(\d+)/(\d+)/(\d+) (\d+):(\d+):(\d+)|
- Changing directories
-
s(/usr/bin)(/usr/local/bin)g
- Expanding %7E (hex) escapes
-
s/%([0-9A-Fa-f][0-9A-Fa-f])/chr(hex($1))/ge
- Deleting C comments (imperfectly)
-
s{ /* # Match the opening delimiter .*? # Match a minimal number of characters */ # Match the closing delimiter }{ }gsx; - Removing leading and trailing whitespace
-
s/^\s+//; s/\s+$//;
- Turning \ followed by n into a real newline
-
s/\\n/\n/g;
- Removing package portion of fully qualified symbols
-
s/^.*:://
- Dotted quads (most IP addresses)
-
# XXX: fails on legal IPs 127.1 and 2130706433. m{ ^ ( \d | [01]?\d\d | 2[0-4]\d | 25[0-5] ) \. ( \d | [01]?\d\d | 2[0-4]\d | 25[0-5] ) \. ( \d | [01]?\d\d | 2[0-4]\d | 25[0-5] ) \. ( \d | [01]?\d\d | 2[0-4]\d | 25[0-5] ) $ }x
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 7: File Access
- InhaltsvorschauI the heir of all ages, in the foremost files of time.—Alfred, Lord Tennyson, Locksley HallNothing is more central to data processing than the file. As with everything else in Perl, easy things are easy and hard things are possible. Common tasks (opening files, reading data, writing data) use simple I/O functions and operators, whereas fancier functions do hard things like non-blocking I/O and file locking.This chapter deals with the mechanics of file access: opening a file, telling subroutines which files to work with, locking files, and so on. Chapter 8 deals with techniques for working with the contents of a file: reading, writing, shuffling lines, and other operations you can do once you have access to the file.Here's Perl code for printing all lines from the file /usr/local/widgets/data that contain the word "
blue":open(INPUT, "<", "/acme/widgets/data") or die "Couldn't open /acme/widgets/data for reading: $!\n"; while (<INPUT>) { print if /blue/; } close(INPUT);Central to file access in Perl is the filehandle, likeINPUTin the previous code example. Filehandles are symbols inside your Perl program that you associate with an external file, usually using theopenfunction. Whenever your program performs an input or output operation, it provides that operation with an internal filehandle, not an external filename. It's the job ofopento make that association, and ofcloseto break it. Actually, any of several functions can be used to open files, and handles can refer to entities beyond mere files on disk; see Recipe 7.1 for details.While users think of open files in terms of those files' names, Perl programs do so using their filehandles. But as far as the operating system itself is concerned, an open file is nothing more than a file descriptor, which is a small, non-negative integer. Thefilenofunction divulges the system file descriptor of its filehandle argument. Filehandles are enough for most file operations, but for when they aren't, Recipe 7.9 turns a system file descriptor into a filehandle you can use from Perl.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauNothing is more central to data processing than the file. As with everything else in Perl, easy things are easy and hard things are possible. Common tasks (opening files, reading data, writing data) use simple I/O functions and operators, whereas fancier functions do hard things like non-blocking I/O and file locking.This chapter deals with the mechanics of file access: opening a file, telling subroutines which files to work with, locking files, and so on. Chapter 8 deals with techniques for working with the contents of a file: reading, writing, shuffling lines, and other operations you can do once you have access to the file.Here's Perl code for printing all lines from the file /usr/local/widgets/data that contain the word "
blue":open(INPUT, "<", "/acme/widgets/data") or die "Couldn't open /acme/widgets/data for reading: $!\n"; while (<INPUT>) { print if /blue/; } close(INPUT);Central to file access in Perl is the filehandle, likeINPUTin the previous code example. Filehandles are symbols inside your Perl program that you associate with an external file, usually using theopenfunction. Whenever your program performs an input or output operation, it provides that operation with an internal filehandle, not an external filename. It's the job ofopento make that association, and ofcloseto break it. Actually, any of several functions can be used to open files, and handles can refer to entities beyond mere files on disk; see Recipe 7.1 for details.While users think of open files in terms of those files' names, Perl programs do so using their filehandles. But as far as the operating system itself is concerned, an open file is nothing more than a file descriptor, which is a small, non-negative integer. Thefilenofunction divulges the system file descriptor of its filehandle argument. Filehandles are enough for most file operations, but for when they aren't, Recipe 7.9 turns a system file descriptor into a filehandle you can use from Perl.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Opening a File
- InhaltsvorschauYou want to read or write a file from Perl.Use
openwith two arguments for convenience or with three arguments for precision. Usesysopenfor access to low-level features.Theopenfunction takes arguments specifying the internal filehandle to open, the external filename, and some indication of how to open it (the access mode). Called with only two arguments, the second comprises both path and mode:open(SOURCE, "< $path") or die "Couldn't open $path for reading: $!\n"; open(SINK, "> $path") or die "Couldn't open $path for writing: $!\n";When called with three (or more) arguments, the mode is split out from the path, lest there be any ambiguity between one and the other:open(SOURCE, "<", $path) or die "Couldn't open $path for reading: $!\n"; open(SINK, ">", $path) or die "Couldn't open $path for writing: $!\n";Thesysopenfunction takes either three or four arguments: filehandle, filename, file-access flags, plus an optional permissions value. The flags argument is a number constructed from constants provided by the Fcntl module:use Fcntl; sysopen(SOURCE, $path, O_RDONLY) or die "Couldn't open $path for reading: $!\n"; sysopen(SINK, $path, O_WRONLY, 0600) or die "Couldn't open $path for writing: $!\n";If you passopenorsysopena scalar variable that's undefined, Perl fills in that variable with a new, anonymous filehandle.open(my $fh, "<", $path) or die "Couldn't open $path for reading: $!\n";All input and output goes through filehandles, regardless of whether filehandles are mentioned. Filehandles aren't exclusively connected to regular files in the filesystem—they're also used to communicate with other programs (see Chapter 16) and for network communication (see Chapter 17). Theopenfunction can also be used to manipulate file descriptors, as discussed in Recipe 7.9.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Opening Files with Unusual Filenames
- InhaltsvorschauYou want to open a file with a funny filename, such as "
-", or one that starts with<,>, or|; has leading or trailing whitespace; or ends with|. You don't want these to triggeropen's do-what-I-mean behavior, since in this case, that's not what you mean.Whenopenis called with three arguments, not two, placing the mode in the second argument:open(HANDLE, "<", $filename) or die "cannot open $filename : $!\n";
Or simply usesysopen:sysopen(HANDLE, $filename, O_RDONLY) or die "cannot open $filename: $!\n";
Whenopenis called with three arguments, the access mode and the filename are kept separate. But when called with only two arguments,openhas to extract the access mode and the filename from a single string. If your filename begins with the same characters used to specify an access mode,opencould easily do something unexpected. Imagine the following code:$filename = shift @ARGV; open(INPUT, $filename) or die "Couldn't open $filename : $!\n";
If the user gave ">/etc/passwd" as the filename on the command line, this code would attempt to open /etc/passwd for writing. We can try to give an explicit mode, say for writing:open(OUTPUT, ">$filename") or die "Couldn't open $filename for writing: $!\n";but even this would let the user give a filename of ">data", and the code would append to the filedatainstead of erasing the old contents.The easiest solution is to pass three arguments toopen, where the second argument is the mode and the third the path. Now there can be neither confusion nor subterfuge.open(OUTPUT, ">", $filename) or die "Couldn't open $filename for writing: $!\n";Another solution issysopen, which also takes the mode and filename as distinct arguments:use Fcntl; # for file constants sysopen(OUTPUT, $filename, O_WRONLY|O_TRUNC) or die "Can't open $filename for writing: $!\n";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Expanding Tildes in Filenames
- InhaltsvorschauYou want to open filenames like ~username/blah or ~/.mailrc, but
opendoesn't interpret the tilde to mean a home directory.Either use theglobfunction:open(FH, glob("~joebob/somefile")) || die "Couldn't open file: $!";or expand the filename manually with a substitution:$filename =~ s{ ^ ~ ( [^/]* ) } { $1 ? (getpwnam($1))[7] : ( $ENV{HOME} || $ENV{LOGDIR} || (getpwuid($<))[7] ) }ex;There is a useful convention, begun with the Unix csh shell and propagated widely by web addresses of the form http://www.example.com/~user/, that ~ in a filename represents a user's home directory. Thus:~ # current user's home directory ~/blah # file blah in current user's home directory ~user # a particular user's home directory ~user/blah # file blah in a particular user's home directoryUnfortunately, Perl'sopenfunction does not expand wildcards, including tildes. As of the v5.6 release, Perl internally uses the File::Glob module when you use thegloboperator. So all you need to do isglobthe result first.open(MAILRC, "<", "~/.mailrc") # WRONG: tilde is a shell thing or die "can't open ~/.mailrc: $!"; open(MAILRC, "<", glob("~/.mailrc")) # so expand tilde first or die "can't open ~/.mailrc: $!";The alternative solution, the substitution, uses/eto evaluate the replacement as Perl code. If a username follows the tilde, it's stored in$1, whichgetpwnamuses to extract the user's home directory out of the return list. This directory becomes the replacement string. If the tilde was not followed by a username, substitute in either the currentHOMEenvironment variable or theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making Perl Report Filenames in Error Messages
- InhaltsvorschauYour program works with files, but Perl's errors and warnings only report the last used filehandle, not the name of the file.Use the filename as the filehandle:
open($path, "<", $path) or die "Couldn't open $path for reading : $!\n";Ordinarily, error messages say:Argument "3\n" isn't numeric in multiply at tallyweb line 16, <LOG> chunk 17The filehandleLOGdoesn't help much because you don't know which file the handle was connected to. By using the filename itself as indirect filehandle, Perl produces more informative errors and warnings:Argument "3\n" isn't numeric in multiply at tallyweb line 16, </usr/local/data/mylog3.dat> chunk 17Unfortunately, this doesn't work withstrictrefsturned on because the variable$pathdoesn't really have a filehandle in it, only a string that sometimes behaves like one. The chunk number mentioned in warnings and error messages is the current value of the$. variable.Recipe 7.1; theopenfunction in perlfunc(1) and Chapter 29 of Programming PerlEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Storing Filehandles into Variables
- InhaltsvorschauYou want to use a filehandle like a normal variable so you can pass it to or return it from a function, store it in a data structure, and so on.The easiest way to get a filehandle into a variable is to have
openput it there for you:open(my $fh, "<", $filename) or die "$0: can't open $filename: $!";
To store named filehandles into a variable or pass them into or out of a function, use typeglob notation (*FH):$variable = *FILEHANDLE; # save in variable subroutine(*FILEHANDLE); # or pass directly sub subroutine { my $fh = shift; print $fh "Hello, filehandle!\n"; }If you pass an undefined scalar variable as the first argument toopen, Perl allocates an anonymous typeglob and stores a reference to that typeglob in that scalar, effectively creating filehandles on demand. Like all other references, autovivified filehandles are subject to garbage collection, so this code doesn't leak a filehandle:{ open(my $fh, "< /etc/motd") or die; local $/; # slurp mode $text = <$fh>; }When Perl reaches the end of the block,$fhgoes out of scope. As explained earlier in the Introduction, because that variable contained the last reference to the anonymous filehandle created byopen, the variable is garbage collected and the filehandle implicitly closed.Autovivified filehandles, being anonymous and already held in variables, don't help you to understand how to pass named filehandles as function parameters or store them in variables, including elements of arrays or hashes. By named filehandles, we mean those of the formFH, including all predefined handles, such asSTDINandARGV. So let's look at whatFHis and how to extract a scalar value from it to use for all of those things.Named filehandles used in:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a Subroutine That Takes Filehandles as Built-ins Do
- InhaltsvorschauYou can pass a bareword filehandle to Perl functions like
eof, and you'd like to write similar subroutines of your own.Use the Symbol module'squalify_to_refin conjunction with a*prototype on the subroutine:use Symbol qw(qualify_to_ref); sub my_eof (*) { my $handle = shift; $handle = qualify_to_ref($handle, caller( )); # use $handle }The*prototype tells Perl that the function expects a bareword filehandle as its argument. This lets you call the function like so:my_eof(HANDLE);
This works even whenuse strict 'subs' is in effect. The function receives a string as its argument, though. To safely use the argument as a filehandle, you need the Symbol module to turn it into a reference to a typeglob. And since typeglob refs can be used wherever you'd use named filehandles, store that reference in a scalar variable and use the variable as an indirect filehandle within your subroutine.If you pass in a filehandle that is already a reference to a typeglob, like those autovivified byopen, Perl andqualify_to_refstill do the right thing:open(my $fh, "<", $filename) or die; my_eof($fh);
This technique is used in Recipe 7.23.The documentation for the standard module Symbol (also in Chapter 32 of Programming Perl); the "Prototypes" section in the perlsub(1) manpage (also in Chapter 6 of Programming Perl); Recipe 7.23Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Caching Open Output Filehandles
- InhaltsvorschauYou need more output files open simultaneously than your system allows.Use the standard FileCache module:
use FileCache; cacheout ($path); # each time you use a filehandle print $path "output";
FileCache'scacheoutfunction lets you work with more output files than your operating system lets you have open at any one time. If you use it to open an existing file that FileCache is seeing for the first time, the file is truncated to length zero, no questions asked. However, in its opening and closing of files in the background,cacheouttracks files it has opened before and does not overwrite them, but appends to them instead. This does not create directories for you, so if you give it /usr/local/dates/merino.ewe to open but the directory /usr/local/dates doesn't exist,cacheoutwilldie.Thecacheoutfunction checks the value of the C-level constantNOFILEfrom the standard system include file sys/param.h to determine how many concurrently open files are allowed on your system. This value can be incorrect on some systems and even missing on a few (for instance, on those where the maximum number of open file descriptors is a process resource limit that can be set with the limit or ulimit commands). Ifcacheoutcan't get a value forNOFILE, set$FileCache::cacheout_maxopento be four less than the correct value, or choose a reasonable number by trial and error.Example 7-1 splits an xferlog file (created by most FTP servers nowadays) into separate files, each named after the authenticated user. Fields in xferlog files are space-separated, with the fourth field from the last holding the authenticated username.Example 7-1. splitwulog#!/usr/bin/perl # splitwulog - split wuftpd log by authenticated user use FileCache; $outdir = "/var/log/ftp/by-user"; while (<>) { unless (defined ($user = (split)[-4])) { warn "Invalid line: $.\n"; next; } $path = "$outdir/$user"; cacheout $path; print $path $_; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Printing to Many Filehandles Simultaneously
- InhaltsvorschauYou need to output the same data to several different filehandles.If you want to do it without forking, write a
foreachloop that iterates across the filehandles:foreach $filehandle (@FILEHANDLES) { print $filehandle $stuff_to_print; }If you don't mind forking, open a filehandle that's a pipe to a tee program:open(MANY, "| tee file1 file2 file3 > /dev/null") or die $!; print MANY "data\n" or die $!; close(MANY) or die $!;
If you don't have a tee program handy, use the IO::Tee module from CPAN:use IO::Tee; $tee = IO::Tee->new(@FILEHANDLES); print $tee $stuff_to_print;
A filehandle sends output to one file or program only. To duplicate output to several places, you must callprintmultiple times or make a filehandle connected to a program like tee, which distributes its input elsewhere. If you use the first option, it's probably easiest to put the filehandles in a list or array and loop through them (see Recipe 7.5):for $fh (*FH1, *FH2, *FH3) { print $fh "whatever\n" }However, if your system supports the tee(1) program, or if you've installed the Perl version from Recipe 8.25, you may open a pipe to tee and let it do the work of copying the file to several places. Remember that tee normally also copies its output toSTDOUT, so you must redirect tee's standard output to /dev/null if you don't want an extra copy:open (FH, "| tee file1 file2 file3 >/dev/null"); print FH "whatever\n";
You could even redirect your ownSTDOUTto the tee process, and then you're able to use a regularprintdirectly:# make STDOUT go to three files, plus original STDOUT open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n"; print "whatever\n" or die "Writing: $!\n"; close(STDOUT) or die "Closing: $!\n";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Opening and Closing File Descriptors by Number
- InhaltsvorschauYou know which file descriptors you'd like to do I/O on, but Perl requires filehandles, not descriptor numbers.To open the file descriptor, supply
openwith "<&=" or "<&" as the part of the file access mode, combined with a directional arrow:open(FH, "<&=", $FDNUM) # open FH to the descriptor itself open(FH, "<&", $FDNUM); # open FH to a copy of the descriptor
Or use the IO::Handle module'snew_from_fdclass method:use IO::Handle; $fh = IO::Handle->new_from_fd($FDNUM, "r");
To close a file descriptor by number, either use thePOSIX::closefunction or open it first as shown previously.Occasionally you have a file descriptor but no filehandle. Perl's I/O system uses filehandles instead of file descriptors, so you have to make a new filehandle from an already open file descriptor. The "<&", ">&", and "+<&" access modes toopendo this for reading, writing, and updating, respectively. Adding an equals sign to these—making them "<&=", ">&=", and "+<&="—is more parsimonious of file descriptors and nearly always what you want. That's because the underlying implementation of Perl'sopenstatement uses only a C-level fdopen(3) function from the C library, not a dup2(2) syscall that calls the kernel.Thenew_from_fdIO::Handle object method is equivalent to:use IO::Handle; $fh = new IO::Handle; $fh->fdopen($FDNUM, "r"); # open fd 3 for reading
Here's how you'd open file descriptors that the MH mail system feeds its child processes. It identifies them in the environment variableMHCONTEXTFD:$fd = $ENV{MHCONTEXTFD}; open(MHCONTEXT, "<&=", $fd) or die "couldn't fdopen $fd: $!"; # after processing close(MHCONTEXT) or die "couldn't close context file: $!";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Copying Filehandles
- InhaltsvorschauYou want a copy of a filehandle.To create an alias for a named filehandle, say:
*ALIAS = *ORIGINAL;
Useopenwith the&file access mode to create an independent copy of the file descriptor for that filehandle:open(OUTCOPY, ">&STDOUT") or die "Couldn't dup STDOUT: $!"; open(INCOPY, "<&STDIN") or die "Couldn't dup STDIN : $!";
Useopenwith the&=mode to create an alias for that filehandle or file descriptor:open(OUTALIAS, ">&=STDOUT") or die "Couldn't alias STDOUT: $!"; open(INALIAS, "<&=STDIN") or die "Couldn't alias STDIN : $!"; open(BYNUMBER, ">&=5") or die "Couldn't alias file descriptor 5: $!";
With other types of filehandles (typeglobs, objects), use the same technique with a three-argumentopen:open(my $copy, "<&", $original) or die "Couldn't alias original: $!"; open(my $copy, "<&=", $original) or die "Couldn't alias original: $!";
If you create an alias for a filehandle with typeglobs, only one Perl I/O object is still being accessed. If you close one of these aliased filehandles, the I/O object is closed. Any further attempt to use a copy of that filehandle fails, silently by default or, if you have warnings enabled, with the warning "printonclosedfilehandle". When alternating access through aliased filehandles, writes work as you'd expect because there are no duplicated stdio data structures to get out of sync.If you create a copy of a file descriptor withopen(COPY, ">&HANDLE"), you're really calling the dup(2) syscall. You get two independent file descriptors whose file position, locks, and flags are shared, but which have independent stdio buffers. Closing one filehandle doesn't affect its copy. Simultaneously accessing the file through both filehandles is a recipe for disaster. Instead, this technique is normally used to save and restoreEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Creating Temporary Files
- InhaltsvorschauYou need to create a temporary file and have it deleted automatically when your program exits. For instance, if you needed a temporary configuration file to feed a program you're about to launch, you'd need a name for that file so you could pass that filename along to the utility program. In other cases, you may want a temporary file to write to and read from, but don't need a filename for it.Use the
tempfilefunction from the File::Temp module:use File::Temp qw/ tempdir /; $fh = tempfile( ); # just the handle
perhaps in conjunction with a temporary directory:use File::Temp qw/ tempdir /; # or specify a directory $dir = tempdir( CLEANUP => 1 ); ($fh, $filename) = tempfile( DIR => $dir ); $template = "myprogtempXXXXXX"; # trailing Xs are changed ($fh, $filename) = tempfile( $template, DIR => $dir); ($fh, $filename) = tempfile( $template, SUFFIX => ".data");
The File::Temp module's functions are the best way to make temporary files. For one thing, they're extremely easy to use. For another, they're more portable than direct calls to the operating system. But perhaps of greatest importance is the care they take in security matters both various and subtle, especially those involving race conditions.Although this module provides a handful of slightly different functions for creating a temporary file, most are there simply to support legacy interfaces; few users will need more than the basictempfile( )function. This function safely and atomically creates and opens a brand new, empty file in read-write mode. In scalar context, it returns a filehandle to that temporary file; in list context, it returns the handle and pathname of the temporary file:use File::Temp qw(tempfile); # just the handle $fh = tempfile( ); # handle and filename ($fh, $filename) = tempfile( );
Thetempfilefunction optionally accepts an argument containing a template and then named arguments in pairs. Named arguments specify such things as the directory to use instead of the current directory, that a specific file extension should be used, and on systems that support such a thing, whether the tempfile should be immediately unlinked before its handle is returned. (Files whose names have already been deleted from the filesystem are especially difficult for the guys with the black hats to find.) Any trailing X characters in the template are replaced by random characters in the final filename. You might use this feature if you need a temporary file with a specific extension.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Storing a File Inside Your Program Text
- InhaltsvorschauYou have data that you want to bundle with your program and treat as though it were in a file, but you don't want it to be in a different file.Use the
_ _DATA_ _or_ _END_ _tokens after your program code to mark the start of a data block, which can be read inside your program or module from theDATAfilehandle.Use_ _DATA_ _within a module:while ( <DATA>) { # process the line } _ _DATA_ _ # your data goes hereSimilarly, use_ _END_ _within the main program file:while (<main::DATA>) { # process the line } _ _END_ _ # your data goes hereThe_ _DATA_ _and_ _END_ _symbols tell the Perl compiler there's nothing more for it to do in the current file. They represent the logical end for code in a module or a program before the physical end-of-file.Text after_ _DATA_ _or_ _END_ _can be read through the per-packageDATAfilehandle. For example, take the hypothetical module Primes. Text after_ _DATA_ _in Primes.pm can be read from thePrimes::DATAfilehandle._ _END_ _behaves as a synonym for_ _DATA_ _in the main package. Any text occurring after an_ _END_ _token in a module is completely inaccessible.This lets you write self-contained programs instead of keeping data in separate files. Often this is used for documentation. Sometimes it's configuration data or old test data that the program was originally developed with, left lying about should it ever be needed again.Another trick is to useDATAto find out the current program's or module's size or last modification date. On most systems, the$0variable will contain the full pathname to your running script. On systems where$0is not correct, you could try theDATAfilehandle instead. This can be used to pull in the size, modification date, etc. Put a special tokenEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Storing Multiple Files in the DATA Area
- InhaltsvorschauYou've figured out how to use
_ _END_ _or_ _DATA_ _to store a virtual file in your source code, but you now want multiple virtual files in one source file.Use the Inline::Files module from CPAN. Carefully.use Inline::Files; while (<SETUP>) { # ... } while (<EXECUTION>) { # ... } _ _SETUP_ _ everything for the SETUP filehandle goes here _ _EXECUTION_ _ everything for the EXECUTION filehandle goes hereOne limitation with the_ _DATA_ _setup is that you can have only one embedded data file per physical file. The CPAN module Inline::Files cleverly circumvents this restriction by providing logical embedded files. It's used like this:use Inline::Files; # # All your code for the file goes here first, then... # _ _ALPHA_ _ This is the data in the first virtual file, ALPHA. _ _BETA_ _ This is the data in the next virtual file, BETA. _ _OMEGA_ _ This is the data in yet another virtual file, OMEGA. _ _ALPHA_ _ This is more data in the second part of virtual file, ALPHA.
The code is expected to read from filehandles whose names correspond to the double-underbarred symbols: hereALPHA,BETA, andOMEGA. You may have more than one section by the same name in the same program, and differently named sections needn't be read in any particular order. These handles work much like theARGVhandle does. For one thing, they're implicitly opened on first usage. For example, using the following code in the designated spot in the preceding code example:while (<OMEGA>) { print "omega data: $_"; } while (<ALPHA>) { print "alpha data: $_"; }would produce this:omega data: This is the data in yet another virtual file, OMEGA. omega data: alpha data: This is the data in the first virtual file, ALPHA. alpha data: alpha data: This is more data in the second part of virtual file, ALPHA. alpha data:
Also like theARGVhandle, while reading from a particular handle, the list of available virtual files is in the array by that name, and the currently opened virtual file is in the scalar by that name. There's also a hash by that name that holds various bits of status information about that set of virtual files, including the current file, line number, and byte offset. If we used the Perl debugger on this program and dumped out the variables, it might show this:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a Unix-Style Filter Program
- InhaltsvorschauYou want to write a program that takes a list of filenames on the command line and reads from
STDINif no filenames were given. You'd like the user to be able to give the file "-" to indicateSTDINor "someprogram|" to indicate the output of another program. You might want your program to modify the files in place or to produce output based on its input.Read lines with<>:while (<>) { # do something with the line }When you say:while (<>) { # ... }Perl translates this into:unshift(@ARGV, "-") unless @ARGV; while ($ARGV = shift @ARGV) { unless (open(ARGV, $ARGV)) { warn "Can't open $ARGV: $!\n"; next; } while (defined($_ = <ARGV>)) { # ... } }You can accessARGVand$ARGVinside the loop to read more from the filehandle or to find the filename currently being processed. Let's look at how this works.Section 7.14.3.1: Behavior
If the user supplies no arguments, Perl sets@ARGVto a single string, "-". This is shorthand forSTDINwhen opened for reading andSTDOUTwhen opened for writing. It's also what lets the user of your program specify "-" as a filename on the command line to read fromSTDIN.Next, the file-processing loop removes one argument at a time from@ARGVand copies the filename into the global variable$ARGV. If the file cannot be opened, Perl goes on to the next one. Otherwise, it processes a line at a time. When the file runs out, the loop goes back and opens the next one, repeating the process until@ARGVis exhausted.Theopenstatement didn't sayopen(ARGV, "<", $ARGV). There's no extra less-than sign supplied. This allows for interesting effects, like passing the string "gzip-dcEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Modifying a File in Place with a Temporary File
- InhaltsvorschauYou need to update a file in place, and you can use a temporary file.Read from the original file, write changes to a temporary file, and then rename the temporary back to the original:
open(OLD, "<", $old) or die "can't open $old: $!"; open(NEW, ">", $new) or die "can't open $new: $!"; while (<OLD>) { # change $_, then... print NEW $_ or die "can't write $new: $!"; } close(OLD) or die "can't close $old: $!"; close(NEW) or die "can't close $new: $!"; rename($old, "$old.orig") or die "can't rename $old to $old.orig: $!"; rename($new, $old) or die "can't rename $new to $old: $!";This is the best way to update a file "in place."This technique uses little memory compared to the approach that doesn't use a temporary file. It has the added advantages of giving you a backup file and being easier and safer to program.You can make the same changes to the file using this technique that you can with the version that uses no temporary file. For instance, to insert lines at line 20, say:while (<OLD>) { if ($. = = 20) { print NEW "Extra line 1\n"; print NEW "Extra line 2\n"; } print NEW $_; }To delete lines 20 through 30, say:while (<OLD>) { next if 20 .. 30; print NEW $_; }Note thatrenamewon't work across filesystems, so you should create your temporary file in the same directory as the file being modified.The truly paranoid programmer would lock the file during the update. The tricky part is that you have to open the file for writing without destroying its contents before you can get a lock to modify it. Recipe 7.18 shows how to do this.Recipe 7.1; Recipe 7.16; Recipe 7.17; Recipe 7.18Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Modifying a File in Place with the -i Switch
- InhaltsvorschauYou need to modify a file in place from the command line, and you're too lazy for the file manipulation of Recipe 7.15.Use the -i and -p switches to Perl. Write your program on the command line:
% perl -i.orig -p -e 'FILTER COMMAND' file1 file2 file3 ...
or use the switches in programs:#!/usr/bin/perl -i.orig -p # filter commands go here
The -i command-line switch modifies each file in place. It creates a temporary file as in the previous recipe, but Perl takes care of the tedious file manipulation for you. Use it with -p (explained in Recipe 7.14) to turn:while (<>) { if ($ARGV ne $oldargv) { # are we at the next file? rename($ARGV, $ARGV . ".orig"); open(ARGVOUT, ">", $ARGV); # plus error check select(ARGVOUT); $oldargv = $ARGV; } s/DATE/localtime/e; } continue{ print; } select (STDOUT); # restore default outputinto:% perl -pi.orig -e 's/DATE/localtime/e'
The -i switch takes care of making a backup (say-iinstead of-i.origto discard the original file contents instead of backing them up), and -p makes Perl loop over filenames given on the command line (orSTDINif no files were given).The preceding one-liner would turn a file containing the following:Dear Sir/Madam/Ravenous Beast, As of DATE, our records show your account is overdue. Please settle by the end of the month. Yours in cheerful usury, --A. Moneylenderinto:Dear Sir/Madam/Ravenous Beast, As of Sat Apr 25 12:28:33 1998, our records show your account is overdue. Please settle by the end of the month. Yours in cheerful usury, --A. MoneylenderThis switch makes in-place translators a lot easier to write and to read. For instance, this changes isolated instances of "Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Modifying a File in Place Without a Temporary File
- InhaltsvorschauYou need to insert, delete, or change one or more lines in a file, and you don't want to (or can't) use a temporary file.Open the file in update mode ("
+<"), read the whole file into an array of lines, change the array, then rewrite the file and truncate it to its current seek pointer.open(FH, "+<", $FILE) or die "Opening: $!"; @ARRAY = <FH>; # change ARRAY here seek(FH,0,0) or die "Seeking: $!"; print FH @ARRAY or die "Printing: $!"; truncate(FH,tell(FH)) or die "Truncating: $!"; close(FH) or die "Closing: $!";
As explained in this chapter's Introduction, the operating system treats files as unstructured streams of bytes. This makes it impossible to insert, modify, or change bits of the file in place. (Except for the special case of fixed-record-length files, discussed in Recipe 8.13.) You can use a temporary file to hold the changed output, or you can read the entire file into memory, change it, and write it back out again.Reading everything into memory is fine for small files, but doesn't scale well. Trying it on your 800 MB web server log files will either deplete your virtual memory or thrash your machine's VM system. For small files, though, this works:open(F, "+<", $infile) or die "can't read $infile: $!"; $out = ""; while (<F>) { s/DATE/localtime/eg; $out .= $_; } seek(F, 0, 0) or die "can't seek to start of $infile: $!"; print F $out or die "can't print to $infile: $!"; truncate(F, tell(F)) or die "can't truncate $infile: $!"; close(F) or die "can't close $infile: $!";For other examples of things you can do in-place, look at the recipes in Chapter 8.This approach is only for the truly determined. It's harder to write, takes more memory (potentially aEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Locking a File
- InhaltsvorschauMany processes need to update the same file simultaneously.Have all processes honor advisory locking by using
flock:use Fcntl qw(:flock); # for the LOCK_* constants open(FH, "+<", $path) or die "can't open $path: $!"; flock(FH, LOCK_EX) or die "can't flock $path: $!"; # update file, then... close(FH) or die "can't close $path: $!";
Operating systems vary greatly in the type and reliability of locking techniques available. Perl tries hard to give you something that works, even if your operating system uses its own underlying technique. Theflockfunction takes two arguments: a filehandle and a number representing what to do with the lock on that filehandle. The numbers are normally represented by names, such asLOCK_EX, which you can get from the Fcntl or IO::File modules.Locks come in two varieties: shared (LOCK_SH) and exclusive (LOCK_EX). Despite what you might infer by "exclusive," processes aren't required to obey locks on files. Another way of saying this is thatflockimplements advisory locking. It allows processes to let the operating system suspend would-be writers of a file until any readers are finished with it.Flocking files is like putting up a stoplight at an intersection. It works only if people pay attention to whether the light is red or green—or yellow for a shared lock. The red light doesn't stop traffic; it merely signals that traffic should stop. A desperate, ignorant, or rude person will still go flying through the intersection no matter what the light says. Likewise,flockonly blocks out other flockers—not all processes trying to do I/O. Unless everyone is polite, accidents can (and will) happen.The polite process customarily indicates its intent to read from the file by requesting aEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Flushing Output
- InhaltsvorschauWhen printing to a filehandle, output doesn't appear immediately. This is a problem in CGI scripts running on some programmer-hostile web servers where, if the web server sees warnings from Perl before it sees the (buffered) output of your script, it sends the browser an uninformative
500ServerError. These buffering problems also arise with concurrent access to files by multiple programs and when talking with devices or sockets.Disable buffering by setting the per-filehandle variable$|to a true value, customarily1:$old_fh = select(OUTPUT_HANDLE); $| = 1; select($old_fh);
Or, if you don't mind the expense of loading an IO module, disable buffering by invoking theautoflushmethod:use IO::Handle; OUTPUT_HANDLE->autoflush(1);
This works with indirect filehandles as well:use IO::Handle; $fh->autoflush(1);
In most stdio implementations, buffering varies with the type of output device. Disk files are block buffered, often with a buffer size of more than 2K. Pipes and sockets are often buffered with a buffer size between ½K and 2K. Serial devices, including terminals, modems, mice, and joysticks, are normally line-buffered; stdio sends the entire line out only when it gets the newline.Perl'sprintfunction does not directly support truly unbuffered output, i.e., a physical write for each individual character. Instead, it supports command buffering, in which one physical write is made after every separate output command. This isn't as hard on your system as no buffering at all, and it still gets the output where you want it, when you want it.Control output buffering through the$|special variable. Enable command buffering on output handles by setting it to a true value. This does not affect input handles at all; see Recipe 15.6 and Recipe 15.8 for unbuffered input. Set this variable to a false value to use default stdio buffering. Example 7-7 illustrates the difference.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Doing Non-Blocking I/O
- InhaltsvorschauYou want to read from or write to a filehandle without the system blocking your process until the program, file, socket, or device at the other end is ready. This is desired less often of regular files than of special files.Open the file with
sysopen, specifying theO_NONBLOCKoption:use Fcntl; sysopen(MODEM, "/dev/cua0", O_NONBLOCK|O_RDWR) or die "Can't open modem: $!\n";If you already have an open filehandle, invoke theblockingmethod from IO::Handle with an argument of 0:use IO::Handle; MODEM->blocking(0); # assume MODEM already opened
Or use the low-levelfcntlfunction:use Fcntl; $flags = ""; fcntl(HANDLE, F_GETFL, $flags) or die "Couldn't get flags for HANDLE : $!\n"; $flags |= O_NONBLOCK; fcntl(HANDLE, F_SETFL, $flags) or die "Couldn't set flags for HANDLE: $!\n";On a disk file, when no more data can be read because you're at the end of the file, the input operation returns immediately. But suppose the filehandle in question were the user's keyboard or a network connection. In those cases, simply because there's no data there right now doesn't mean there never will be, so the input function normally doesn't return until it gets data. Sometimes, though, you don't want to wait; you want to grab whatever's there and carry on with whatever you were doing.Once a filehandle has been set for non-blocking I/O, thesysreadorsyswritecalls that would otherwise block will instead returnundefand set$!toEAGAIN:use Errno; $rv = syswrite(HANDLE, $buffer, length $buffer); if (!defined($rv) && $!{EAGAIN}) { # would block } elsif ($rv != length $buffer) { # incomplete write } else { # successfully wrote } $rv = sysread(HANDLE, $buffer, $BUFSIZ); if (!defined($rv) && $!{EAGAIN}) { # would block } else { # successfully read $rv bytes from HANDLE }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Determining the Number of Unread Bytes
- InhaltsvorschauYou want to know how many unread bytes are available for reading from a filehandle.Use the
FIONREADioctl call:$size = pack("L", 0); ioctl(FH, $FIONREAD, $size) or die "Couldn't call ioctl: $!\n"; $size = unpack("L", $size); # $size bytes can be readMake sure the input filehandle is unbuffered (because you've used an I/O layer like:unixon it), or use onlysysread.The Perlioctlfunction is a direct interface to the operating system's ioctl(2) system call. If your system doesn't have theFIONREADrequest or the ioctl(2) call, you can't use this recipe.FIONREADand the other ioctl(2) requests are numeric values normally found lurking in C include files.Perl's h2ph tool tries to convert C include files to Perl code, which can berequired.FIONREADends up defined as a function in the sys/ioctl.ph file:require "sys/ioctl.ph"; $size = pack("L", 0); ioctl(FH, FIONREAD( ), $size) or die "Couldn't call ioctl: $!\n"; $size = unpack("L", $size);If h2ph wasn't installed or doesn't work for you, you can manually grep the include files:% grep FIONREAD /usr/include/*/* /usr/include/asm/ioctls.h:#define FIONREAD 0x541BIf you install Inline::C from CPAN, you can write a C subroutine to obtain the constant for you:use Inline C; $FIONREAD = get_FIONREAD( ); # ... _ _END_ _ _ _C_ _ #include <sys/ioctl.h> int get_FIONREAD( ) { return FIONREAD; }If all else fails, write a small C program using the editor of champions:% cat > fionread.c #include <sys/ioctl.h> main( ) { printf("%#08x\n", FIONREAD); } ^D % cc -o fionread fionread.c % ./fionread 0x4004667fThen hardcode it, leaving porting as an exercise to your successor.$FIONREAD = 0x4004667f; # XXX: opsys dependent $size = pack("L", 0); ioctl(FH, $FIONREAD, $size) or die "Couldn't call ioctl: $!\n"; $size = unpack("L", $size);Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading from Many Filehandles Without Blocking
- InhaltsvorschauYou want to learn whether input is available to be read, rather than blocking until there's input the way
<FH>does. This is useful when reading from pipes, sockets, devices, and other programs.Useselectwith a timeout value of 0 seconds if you're comfortable with manipulating bit-vectors representing file descriptor sets:$rin = ""; # repeat next line for all filehandles to poll vec($rin, fileno(FH1), 1) = 1; vec($rin, fileno(FH2), 1) = 1; vec($rin, fileno(FH3), 1) = 1; $nfound = select($rout=$rin, undef, undef, 0); if ($nfound) { # input waiting on one or more of those 3 filehandles if (vec($rout,fileno(FH1),1)) { # do something with FH1 } if (vec($rout,fileno(FH2),1)) { # do something with FH2 } if (vec($rout,fileno(FH3),1)) { # do something with FH3 } }The IO::Select module provides an abstraction layer to hide bit-vector operations:use IO::Select; $select = IO::Select->new( ); # repeat next line for all filehandles to poll $select->add(*FILEHANDLE); if (@ready = $select->can_read(0)) { # input waiting on the filehandles in @ready }Theselectfunction is really two functions in one. If you call it with one argument, you change the current default output filehandle (see Recipe 7.19). If you call it with four arguments, it tells you which filehandles have input waiting or are ready to receive output. This recipe deals only with four-argumentselect.The first three arguments toselectare strings containing bit-vectors. Each bit-vector represents a set of file descriptors to inspect for pending input, pending output, and pending expedited data (like out-of-band or urgent data on a socket), respectively. The final argument is the timeout—how longselectshould spend waiting for status to change. A timeout value of 0 indicates a poll. Timeout can also be a floating-point number of seconds, orEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading an Entire Line Without Blocking
- InhaltsvorschauYou need to read a line of data from a handle that
selectsays is ready for reading, but you can't use Perl's normal<FH>operation (readline) in conjunction withselectbecause<FH>may buffer extra data andselectdoesn't know about those buffers.Use the followingsysreadlinefunction, like this:$line = sysreadline(SOME_HANDLE);
In case only a partial line has been sent, include a number of seconds to wait:$line = sysreadline(SOME_HANDLE, TIMEOUT);
Here's the function to do that:use IO::Handle; use IO::Select; use Symbol qw(qualify_to_ref); sub sysreadline(*;$) { my($handle, $timeout) = @_; $handle = qualify_to_ref($handle, caller( )); my $infinitely_patient = (@_ = = 1 || $timeout < 0); my $start_time = time( ); my $selector = IO::Select->new( ); $selector->add($handle); my $line = ""; SLEEP: until (at_eol($line)) { unless ($infinitely_patient) { return $line if time( ) > ($start_time + $timeout); } # sleep only 1 second before checking again next SLEEP unless $selector->can_read(1.0); INPUT_READY: while ($selector->can_read(0.0)) { my $was_blocking = $handle->blocking(0); CHAR: while (sysread($handle, my $nextbyte, 1)) { $line .= $nextbyte; last CHAR if $nextbyte eq "\n"; } $handle->blocking($was_blocking); # if incomplete line, keep trying next SLEEP unless at_eol($line); last INPUT_READY; } } return $line; } sub at_eol($) { $_[0] =~ /\n\z/ }As described in Recipe 7.22, to determine whether the operating system has data on a particular handle for your process to read, you can use either Perl's built-inselectfunction or thecan_readmethod from the standard IO::Select module.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: netlock
- InhaltsvorschauWhen locking files, we recommend that you use
flockwhen possible. However, on some systems,flock's locking strategy is not reliable. For example, perhaps the person who built Perl on your system configuredflockto use a version of file locking that didn't even try to work over the Net, or you're on the increasingly rare system where noflockemulation exists at all.The following program and module provide a basic implementation of a file locking mechanism. Unlike a normalflock, with this module you lock file names, not file descriptors.Thus, you can use it to lock directories, domain sockets, and other non-regular files. You can even lock files that don't exist yet. It uses a directory created at the same level in the directory structure as the locked file, so you must be able to write to the enclosing directory of the file you wish to lock. A sentinel file within the lock directory contains the owner of the lock. This is also useful with Recipe 7.15 because you can lock the filename even though the file that has that name changes.Thenflockfunction takes one or two arguments. The first is the pathname to lock; the second is the optional amount of time to wait for the lock. The function returns true if the lock is granted, returns false if the timeout expired, and raises an exception should various improbable events occur, such as being unable to write the directory.Set the$File::LockDir::Debugvariable to true to make the module emit messages if it stalls waiting for a lock. If you forget to free a lock and try to exit the program, the module will remove them for you. This won't happen if your program is sent a signal it doesn't trap.Example 7-9 shows a driver program to demonstrate the File::LockDir module.Example 7-9. drivelock#!/usr/bin/perl -w # drivelock - demo File::LockDir module use strict; use File::LockDir; $SIG{INT} = sub { die "outta here\n" }; $File::LockDir::Debug = 1; my $path = shift or die "usage: $0 <path>\n"; unless (nflock($path, 2)) { die "couldn't lock $path in 2 seconds\n"; } sleep 100; nunflock($path);Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: lockarea
- InhaltsvorschauPerl's
flockfunction only locks complete files, not regions of the file. Althoughfcntlsupports locking of a file's regions, this is difficult to access from Perl, largely because no one has written an XS module that portably packs up the necessary structure.The program in Example 7-11 implements fcntl, but only for the three architectures it already knows about: SunOS, BSD, and Linux. If you're running something else, you'll have to figure out the layout of theflockstructure. We did this by eyeballing the C-language sys/fcntl.h#includefile—and running the c2ph program to figure out alignment and typing. This program, while included with Perl, only works on systems with a strong Berkeley heritage, like those listed above. As with Unix—or Perl itself—you don't have to use c2ph, but it sure makes life easier if you can.Thestruct_flockfunction in the lockarea program packs and unpacks in the proper format for the current architectures by consulting the$^Ovariable, which contains your current operating system name. There is nostruct_flockfunction declaration. It's just aliased to the architecture-specific version. Function aliasing is discussed in Recipe 10.14.The lockarea program opens a temporary file, clobbering any existing contents and writing a screenful (80 by 23) of blanks. Each line is the same length.The program then forks one or more times and lets the child processes try to update the file at the same time. The first argument, N, is the number of times to fork to produce2**Nprocesses. So lockarea 1 makes two children, lockarea 2 makes four, lockarea 3 makes eight, lockarea 4 makes sixteen, etc. The more kids, the more contention for the locks.Each process picks a random line in the file, locks that line only, and then updates it. It writes its process ID into the line, prepended with a count of how many times the line has been updated:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 8: File Contents
- InhaltsvorschauThe most brilliant decision in all of Unix was the choice of a single character for the newline sequence.—Mike O'Dell, only half jokinglyBefore the Unix Revolution, every kind of data source and destination was inherently different. Getting two programs merely to understand each other required heavy wizardry and the occasional sacrifice of a virgin stack of punch cards to an itinerant mainframe repairman. This computational Tower of Babel made programmers dream of quitting the field to take up a less painful hobby, like autoflagellation.These days, such cruel and unusual programming is largely behind us. Modern operating systems work hard to provide the illusion that I/O devices, network connections, process control information, other programs, the system console, and even users' terminals are all abstract streams of bytes called files. This lets you easily write programs that don't care where their input came from or where their output goes.Because programs read and write streams of simple text, every program can communicate with every other program. It is difficult to overstate the power and elegance of this approach. No longer dependent upon troglodyte gnomes with secret tomes of JCL (or COM) incantations, users can now create custom tools from smaller ones by using simple command-line I/O redirection, pipelines, and backticks.Treating files as unstructured byte streams necessarily governs what you can do with them. You can read and write sequential, fixed-size blocks of data at any location in the file, increasing its size if you write past the current end. Perl uses an I/O library that emulates C's stdio(3) to implement reading and writing of variable-length records like lines, paragraphs, and words.What can't you do to an unstructured file? Because you can't insert or delete bytes anywhere but at end-of-file, you can't easily change the length of, insert, or delete records. An exception is the last record, which you can delete by truncating the file to the end of the previous record. For other modifications, you need to use a temporary file or work with a copy of the file in memory. If you need to do this a lot, a database system may be a better solution than a raw file (see Chapter 14). Standard with Perl as of v5.8 is the Tie::File module, which offers an array interface to files of records. We use it in Recipe 8.4.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Introduction
- InhaltsvorschauBefore the Unix Revolution, every kind of data source and destination was inherently different. Getting two programs merely to understand each other required heavy wizardry and the occasional sacrifice of a virgin stack of punch cards to an itinerant mainframe repairman. This computational Tower of Babel made programmers dream of quitting the field to take up a less painful hobby, like autoflagellation.These days, such cruel and unusual programming is largely behind us. Modern operating systems work hard to provide the illusion that I/O devices, network connections, process control information, other programs, the system console, and even users' terminals are all abstract streams of bytes called files. This lets you easily write programs that don't care where their input came from or where their output goes.Because programs read and write streams of simple text, every program can communicate with every other program. It is difficult to overstate the power and elegance of this approach. No longer dependent upon troglodyte gnomes with secret tomes of JCL (or COM) incantations, users can now create custom tools from smaller ones by using simple command-line I/O redirection, pipelines, and backticks.Treating files as unstructured byte streams necessarily governs what you can do with them. You can read and write sequential, fixed-size blocks of data at any location in the file, increasing its size if you write past the current end. Perl uses an I/O library that emulates C's stdio(3) to implement reading and writing of variable-length records like lines, paragraphs, and words.What can't you do to an unstructured file? Because you can't insert or delete bytes anywhere but at end-of-file, you can't easily change the length of, insert, or delete records. An exception is the last record, which you can delete by truncating the file to the end of the previous record. For other modifications, you need to use a temporary file or work with a copy of the file in memory. If you need to do this a lot, a database system may be a better solution than a raw file (see Chapter 14). Standard with Perl as of v5.8 is the Tie::File module, which offers an array interface to files of records. We use it in Recipe 8.4.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Reading Lines with Continuation Characters
- InhaltsvorschauYou have a file with long lines split over two or more lines, with backslashes to indicate that a continuation line follows. You want to rejoin those split lines. Makefiles, shell scripts, and many other scripting or configuration languages let you break a long line into several shorter ones in this fashion.Build up the complete lines one at a time until reaching one without a backslash:
while (defined($line = <FH>) ) { chomp $line; if ($line =~ s/\\$//) { $line .= <FH>; redo unless eof(FH); } # process full record in $line here }Here's an example input file:DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) \ $(TEXINFOS) $(INFOS) $(MANS) $(DATA) DEP_DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) \ $(TEXINFOS) $(INFO_DEPS) $(MANS) $(DATA) \ $(EXTRA_DIST)You'd like to process that file a record at a time with the escaped newlines ignored. The first record would then be the first two lines, the second record the next three lines, etc.Here's how the algorithm works. Thewhileloop reads lines one at a time. The substitution operators///tries to remove a trailing backslash. If the substitution fails, we've found a line without a backslash at the end. Otherwise, read another record, concatenate it onto the accumulating$linevariable, and useredoto jump back to just inside the opening brace of thewhileloop. This lands us back on thechomp.A frequent problem with files intended to be in this format arises when unnoticed spaces or tabs follow the backslash before the newline. The substitution that found continuation lines would be more forgiving if written this way:if ($line =~ s/\\\s*$//) { # as before }Unfortunately, even if your program is forgiving, surely others will not be. Remember to be liberal in what you accept, but conservative in what you produce.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Counting Lines (or Paragraphs or Records) in a File
- InhaltsvorschauYou need to compute the number of lines in a file.Many systems have a wc program to count lines in a file:
$count = `wc -l < $file`; die "wc failed: $?" if $?; chomp($count);
You could also open the file and read line-by-line until the end, counting lines as you go:open(FILE, "<", $file) or die "can't open $file: $!"; $count++ while <FILE>; # $count now holds the number of lines read
Here's the fastest solution, assuming your line terminator really is "\n":$count += tr/\n/\n/ while sysread(FILE, $_, 2 ** 20);
Although you can use-s$fileto determine the file size in bytes, you generally cannot use it to derive a line count. See the Introduction in Chapter 9 for more on-s.If you can't or don't want to call another program to do your dirty work, you can emulate wc by opening up and reading the file yourself:open(FILE, "<", $file) or die "can't open $file: $!"; $count++ while <FILE>; # $count now holds the number of lines read
Another way of writing this is:open(FILE, "<", $file) or die "can't open $file: $!"; for ($count=0; <FILE>; $count++) { }If you're not reading from any other files, you don't need the$countvariable in this case. The special variable$. holds the number of lines read since a filehandle was last explicitlyclosed:1 while <FILE>; $count = $.;
This reads in all records in the file, then discards them.To count paragraphs, set the global input record separator variable$/to the empty string ("") before reading to make the input operator (<FH>) read a paragraph at a time.$/ = ""; # enable paragraph mode for all reads open(FILE, "<", $file) or die "can't open $file: $!"; 1 while <FILE>; $para_count = $.;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Processing Every Word in a File
- InhaltsvorschauYou need to do something to every word in a file, similar to the
foreachfunction of csh.Eitherspliteach line on whitespace:while (<>) { for $chunk (split) { # do something with $chunk } }or use them//goperator to pull out one chunk at a time:while (<>) { while ( /(\w[\w'-]*)/g ) { # do something with $1 } }Decide what you mean by "word." Sometimes you want anything but whitespace, sometimes you want only program identifiers, and sometimes you want English words. Your definition governs which regular expression to use.The preceding two approaches work differently. Patterns are used in the first approach to decide what is not a word. In the second, they're used to decide what is a word.With these techniques, it's easy to make a word frequency counter. Use a hash to store how many times each word has been seen:# Make a word frequency count %seen = ( ); while (<>) { while ( /(\w[\w'-]*)/g ) { $seen{lc $1}++; } } # output hash in a descending numeric sort of its values foreach $word ( sort { $seen{$b} <=> $seen{$a} } keys %seen) { printf "%5d %s\n", $seen{$word}, $word; }To make the example program count line frequency instead of word frequency, omit the secondwhileloop and use$seen{lc$_}++instead:# Line frequency count %seen = ( ); while (<>) { $seen{lc $_}++; } foreach $line ( sort { $seen{$b} <=> $seen{$a} } keys %seen ) { printf "%5d %s", $seen{$line}, $line; }Odd things that may need to be considered as words include "M.I.T.", "Micro$oft", "o'clock", "49ers", "street-wise", "and/or", "&", "c/o", "St.", "Tschüß", and "Niño". Bear this in mind when you choose a pattern to match. The last two require you to place auselocalein your program and then useEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading a File Backward by Line or Paragraph
- InhaltsvorschauYou want to process each line or paragraph of a text file in reverse.Read all lines into an array, then process that array from the end to the start:
@lines = <FILE>; while ($line = pop @lines) { # do something with $line }Or store an array of lines in reverse order:@lines = reverse <FILE>; foreach $line (@lines) { # do something with $line }Or use the Tie::File module (standard as of v5.8):use Tie::File; tie(@lines, "Tie::File", $FILENAME, mode => 0) or die "Can't tie $FILENAME: $!"; $max_lines = $#lines; for ($i = $max_lines; $i; $i--) { # do something with $lines[$i], eg line number them: printf "%5d %s\n", $i+1, $lines[$i], }The limitations of file access mentioned in this chapter's Introduction prevent reading a line at a time starting from the end. You must read the lines into memory, then process them in reverse order. This requires at least as much available memory as the size of the file, unless you use tricks like Tie::File does.The first technique moves through the array of lines in reverse order. This destructively processes the array, popping an element off the end of the array each time through the loop. We could do it non-destructively with:for ($i = $#lines; $i != -1; $i--) { $line = $lines[$i]; }The second approach generates an array of lines already in reverse order. This array can then be processed non-destructively. We get the reversed lines because the assignment to@linesconfers list context on the return fromreverse, andreverseconfers list context on its argument of<FILE>, which returns a list of all lines in the file.These approaches are easily extended to paragraphs just by changing$/:# this enclosing block keeps local $/ temporary { local $/ = ""; @paragraphs = reverse <FILE>; } foreach $paragraph (@paragraphs) { # do something }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Trailing a Growing File
- InhaltsvorschauYou want to read from a continually growing file, but the read fails when you reach the current end-of-file.Read until end-of-file. Sleep, clear the EOF flag, and read some more. Repeat until interrupted. To clear the EOF flag, either use
seek:for (;;) { while (<FH>) { .... } sleep $SOMETIME; seek(FH, 0, 1); }or use the IO::Handle module'sclearerrmethod:use IO::Handle; for (;;) { while (<FH>) { .... } sleep $SOMETIME; FH->clearerr( ); }When you read until end-of-file, an internal flag is set that prevents further reading. The most direct way to clear this flag is theclearerrmethod, if supported: it's in the IO::Handle modules.$naptime = 1; use IO::Handle; open (LOGFILE, "/tmp/logfile") or die "can't open /tmp/logfile: $!"; for (;;) { while (<LOGFILE>) { print } # or appropriate processing sleep $naptime; LOGFILE->clearerr( ); # clear stdio error flag }Because Perl v5.8 ships with its own stdio implementation, that simple approach should almost always work. On the rare system where it doesn't work, you may need to useseek. Theseekcode given in the Solution tries to move zero bytes from the current position, which nearly always works. It doesn't change the current position, but it should clear the end-of-file condition on the handle so that the next<LOGFILE>operation picks up new data.If that still doesn't work, perhaps because it relies on features of your I/O implementation, you may need to use the followingseekcode, which remembers the old file position explicitly and returns there directly.for (;;) { for ($curpos = tell(LOGFILE); <LOGFILE>; $curpos = tell(LOGFILE)) { # process $_ here } sleep $naptime; seek(LOGFILE, $curpos, 0); # seek to where we had been }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Picking a Random Line from a File
- InhaltsvorschauYou want to return a random line from a file.Use
randand$. (the current line number) to decide which line to print:srand; rand($.) < 1 && ($line = $_) while <>; # $line is the random line
This is a beautiful example of a solution that may not be obvious. We read every line in the file but don't have to store them all in memory. This is great for large files. Each line has a 1 in N (where N is the number of lines read so far) chance of being selected.Here's a replacement for fortune using this algorithm:$/ = "%%\n"; @ARGV = ("/usr/share/games/fortunes") unless @ARGV; srand; rand($.) < 1 && ($adage = $_) while <>; print $adage;If you know line offsets (for instance, you've created an index) and the number of lines, you can randomly select a line and jump to its offset in the file, but you usually don't have such an index.Here's a more rigorous explanation of how the algorithm works. The function callrand($.)picks a random number between 0 and the current line number. Therefore, you have a one in N chance, that is,
, of keeping the Nth line. Therefore you've a 100% chance of keeping the first line, a 50% chance of keeping the second, a 33% chance of keeping the third, and so on. The question is whether this is fair for all N, where N is any positive integer.First, some concrete examples, then abstract ones.Obviously, a file with one line (N=1) is fair: you always keep the first line because
= 100%, making it fair for files of 1 line. For a file with two lines, N=2. You always keep the first line; then when reaching the second line, you have a 50% chance of keeping it. Thus, both lines have an equal chance of being selected, which shows that N=2 is fair. For a file with three lines, N=3. You have a one-third chance, 33%, of keeping that third line. That leaves a two-thirds chance of retaining one of the first two out of the three lines. But we've already shown that for those first two lines there's a 50-50 chance of selecting either one. 50 percent of two-thirds is one-third. Thus, you have a one-third chance of selecting each of the three lines of the file.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Randomizing All Lines
- InhaltsvorschauYou want to copy a file and randomly reorder its lines.Read all lines into an array, shuffle the array using List::Util's
shufflefunction, and write the shuffled lines back out:use List::Util qw(shuffle); while (<INPUT>) { push(@lines, $_); } @lines = shuffle(@lines); foreach (@reordered) { print OUTPUT $_; }The easiest approach is to read all lines into memory and shuffle them there. Because you don't know where lines start in the file, you can't just shuffle a list of line numbers and then extract lines in the order they'll appear in the shuffled file. Even if you did know the byte offsets of the start of each line, it would probably still be slower because you'd beseeking around in the file instead of sequentially reading it from start to finish.If you have a version of Perl older than v5.8, you can download the List::Util module from CPAN.The documentation for the standard List::Util module; Recipe 2.6; Recipe 2.7; Recipe 4.18Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading a Particular Line in a File
- InhaltsvorschauYou want to extract a single line from a file.The simplest solution is to read the lines until you get to the one you want:
# looking for line number $DESIRED_LINE_NUMBER $. = 0; do { $LINE = <HANDLE> } until $. = = $DESIRED_LINE_NUMBER || eof;If you are going to be doing this a lot and the file fits into memory, read the file into an array:@lines = <HANDLE>; $LINE = $lines[$DESIRED_LINE_NUMBER];
The standard (as of v5.8) Tie::File ties an array to a file, one line per array element:use Tie::File; use Fcntl; tie(@lines, Tie::File, $FILE, mode => O_RDWR) or die "Cannot tie file $FILE: $!\n"; $line = $lines[$sought - 1];
If you have the DB_File module, itsDB_RECNOaccess method ties an array to a file, one line per array element:use DB_File; use Fcntl; $tie = tie(@lines, DB_File, $FILE, O_RDWR, 0666, $DB_RECNO) or die "Cannot open file $FILE: $!\n"; # extract it $line = $lines[$sought - 1];Each strategy has different features, useful in different circumstances. The linear access approach is easy to write and best for short files. The Tie::File module gives good performance, regardless of the size of the file or which line you're reading (and is pure Perl, so doesn't require any external libraries). The DB_File mechanism has some initial overhead, but later accesses are faster than with linear access, so use it for long files that are accessed more than once and are accessed out of order.It is important to know whether you're counting lines from 0 or 1. The$. variable is 1 after the first line is read, so count from 1 when using linear access. The index mechanism uses many offsets, so count from 0. Tie::File and DB_File treat the file's records as an array indexed from 0, so count lines from 0.Here are three different implementations of the same program,Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Processing Variable-Length Text Fields
- InhaltsvorschauYou want to extract variable-length fields from your input.Use
splitwith a pattern matching the field separators.# given $RECORD with field separated by a pattern, # extract a list of fields @FIELDS = split(/PATTERN/, $RECORD);
Thesplitfunction takes up to three arguments:PATTERN,EXPRESSION, andLIMIT. TheLIMITparameter is the maximum number of fields to split into. (If the input contains more fields, they are returned unsplit in the final list element.) IfLIMITis omitted, all fields (except any final empty ones) are returned.EXPRESSIONgives the string value to split. IfEXPRESSIONis omitted,$_is split.PATTERNis a pattern matching the field separator. IfPATTERNis omitted, contiguous stretches of whitespace are used as the field separator and leading empty fields are silently discarded.If your input field separator isn't a fixed string, you might wantsplitto return the field separators as well as the data by using parentheses inPATTERNto save the field separators. For instance:split(/([+-])/, "3+5-2");
returns the values:(3, "+", 5, "-", 2)
To split colon-separated records in the style of the /etc/passwd file, use:@fields = split(/:/, $RECORD);
The classic application ofsplitis whitespace-separated records:@fields = split(/\s+/, $RECORD);
If$RECORDstarted with whitespace, this last use ofsplitwould have put an empty string into the first element of@fieldsbecausesplitwould consider the record to have an initial empty field. If you didn't want this, you could use this special form ofsplit:@fields = split(" ", $RECORD);This behaves likesplitwith a pattern of/\s+/, but ignores leading whitespace.When the record separator can appear in the record, you have a problem. The usual solution is to escape occurrences of the record separator in records by prefixing them with a backslash. See Recipe 1.18.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Removing the Last Line of a File
- InhaltsvorschauYou'd like to remove the last line from a file.Use the standard (as of v5.8) Tie::File module and delete the last element from the tied array:
use Tie::File; tie @lines, Tie::File, $file or die "can't update $file: $!"; delete $lines[-1];
The Tie::File solution is the most efficient solution, at least for large files, because it doesn't have to read through the entire file to find the last line and doesn't read the entire file into memory. It is, however, considerably slower for small files than code you could implement yourself by hand. That doesn't mean you shouldn't use Tie::File; it just means you've optimized for programmer time instead of for computer time.If you don't have Tie::File and can't install it from CPAN, read the file a line at a time and keep track of the byte address of the last line you've seen. When you've exhausted the file, truncate to the last address you saved:open (FH, "+<", $file) or die "can't update $file: $!"; while (<FH>) { $addr = tell(FH) unless eof(FH); } truncate(FH, $addr) or die "can't truncate $file: $!";Remembering the offset is more efficient than reading the whole file into memory because it holds only one given line at a time. Although you still have to grope your way through the whole file, you can use this technique on files larger than available memory.The documentation for the standard Tie::File module; thetruncateandtellfunctions in perlfunc(1) and in Chapter 29 of Programming Perl; your system's open(2) and fopen(3) manpages; Recipe 8.18Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Processing Binary Files
- InhaltsvorschauYou want to read 8-bit binary data as 8-bit binary data, i.e., neither as characters in a particular encoding nor as a text file with any newline or end-of-file conversions that your I/O library might want to do.Use the
binmodefunction on the filehandle:binmode(HANDLE);
Thebinmodefunction lets you specify new I/O layers for a filehandle. The default layer to specify is:raw, which removes any layers that would interfere with binary data. The Solution is thus equivalent to:binmode(HANDLE, ":raw");
except that explicitly specifying:rawonly works on Perl 5.8 and later. The one-argument form ofbinmodeworks on all versions of Perl.Because Perl makes:crlfthe default if you are on an operating system that needs it, you should rarely (if ever) need to specify:crlfin your program. Furthermore, it's generally not wise to add or remove the:crlflayer once you've begun reading from the file, as there may be data already read into buffers that you can't unread. You can, however, safely change the:encoding(...)layer midstream (when parsing XML, for example).You should get into the habit of callingbinmodewhen you open a binary file. This will make your program portable to systems that might (un)helpfully translate bytes in your binary file into something unusable.You may specify the I/O layers when youopena filehandle, rather than usingbinmodeafter the fact:open(FH, "< :raw", $filename); # binary mode
Specify the default set of layers for all subsequently opened input and output filehandles with theopenpragma:use open IN => ":raw"; # binary files
The PerlIO(3) manpage; theopenandbinmodefunctions in perlfunc(1) and in Chapter 29 ofEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using Random-Access I/O
- InhaltsvorschauYou have to read a binary record from the middle of a large file but don't want to read a record at a time to get there.Once you know the record's size, multiply it by the record number to get the byte address, and then
seekto that byte address andreadthe record:$ADDRESS = $RECSIZE * $RECNO; seek(FH, $ADDRESS, 0) or die "seek:$!"; read(FH, $BUFFER, $RECSIZE);
The Solution assumes the first record has aRECNOof 0. If you're counting from one, use:$ADDRESS = $RECSIZE * ($RECNO-1);
This is best applied to binary data. Applying it to text files assumes you have a constant character width and constant line length. This rules out most Unicode encodings, any kind of Windows text file, and any text file where lines can have different lengths.Theseekfunction in perlfunc(1) and in Chapter 29 of Programming Perl; Recipe 8.13Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Updating a Random-Access File
- InhaltsvorschauYou want to read an old record from a binary file, change its values, and write back the record.After
reading the old record,packup the updated values,seekto the previous address, and write it back.use Fcntl; # for SEEK_SET and SEEK_CUR $ADDRESS = $RECSIZE * $RECNO; seek(FH, $ADDRESS, SEEK_SET) or die "Seeking: $!"; read(FH, $BUFFER, $RECSIZE) = = $RECSIZE or die "Reading: $!"; @FIELDS = unpack($FORMAT, $BUFFER); # update fields, then $BUFFER = pack($FORMAT, @FIELDS); seek(FH, -$RECSIZE, SEEK_CUR) or die "Seeking: $!"; print FH $BUFFER; close FH or die "Closing: $!";You don't have to use anything fancier thanprintin Perl to output a record. Remember that the opposite ofreadis notwritebutprint, although oddly enough, the opposite ofsysreadissyswrite.The example program shown in Example 8-4, weekearly, takes one argument: the user whose record you want to backdate by a week. (Of course, in practice, you wouldn't really want to (nor be able to!) mess with the system accounting files.) This program requires write access to the file to be updated, since it opens the file in update mode. After fetching and altering the record, it packs it up again, skips backward in the file one record, and writes it out.Example 8-4. weekearly#!/usr/bin/perl -w # weekearly -- set someone's login date back a week use User::pwent; use IO::Seekable; $typedef = "L A12 A16"; # linux fmt; sunos is "L A8 A16" $sizeof = length(pack($typedef, ( ))); $user = shift(@ARGV) || $ENV{USER} || $ENV{LOGNAME}; $address = getpwnam($user)->uid * $sizeof; open (LASTLOG, "+<:raw", "/var/log/lastlog") or die "can't update /var/log/lastlog: $!"; seek(LASTLOG, $address, SEEK_SET) or die "seek failed: $!"; read(LASTLOG, $buffer, $sizeof) = = $sizeof or die "read failed: $!"; ($time, $line, $host) = unpack($typedef, $buffer); $time -= 24 * 7 * 60 * 60; # back-date a week $buffer = pack($typedef, $time, $line, $time); seek(LASTLOG, -$sizeof, SEEK_CUR) # backup one record or die "seek failed: $!"; print LASTLOG $record; close(LASTLOG) or die "close failed: $!";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading a String from a Binary File
- InhaltsvorschauYou want to read a NUL-terminated string from a file, starting at a particular address.Ensure you're working with a binary file, set
$/to an ASCII NUL, and read the string with<>:binmode(FH); # binary mode $old_rs = $/; # save old $/ $/ = "\0"; # ASCII 0: NUL seek(FH, $addr, SEEK_SET) or die "Seek error: $!\n"; $string = <FH>; # read string chomp $string; # remove NUL $/ = $old_rs; # restore old $/
You can uselocalto save and restore$/:{ local $/ = "\0"; # ... } # $/ is automatically restoredThe example program shown in Example 8-5, bgets, accepts a filename and one or more byte addresses as arguments. Decimal, octal, or hexadecimal addresses may be specified. For each address, the program reads and prints the null- or EOF-terminated string at that position.Example 8-5. bgets#!/usr/bin/perl -w # bgets - get a string from an address in a binary file use IO::Seekable; use open IO => ":raw"; # binary mode on all opened handles ($file, @addrs) = @ARGV or die "usage: $0 file addr ..."; open(FH, $file) or die "cannot open $file: $!"; $/ = "\000"; foreach $addr (@addrs) { $addr = oct $addr if $addr =~ /^0/; seek(FH, $addr, SEEK_SET) or die "can't seek to $addr in $file: $!"; printf qq{%#x %#o %d "%s"\n}, $addr, $addr, $addr, scalar <>; }Example 8-6 is a simple implementation of the Unix strings program.Example 8-6. strings#!/usr/bin/perl -w # strings - pull strings out of a binary file $/ = "\0"; use open IO => ":raw"; while (<>) { while (/([\040-\176\s]{4,})/g) { print $1, "\n"; } }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading Fixed-Length Records
- InhaltsvorschauYou want to read a file whose records have a fixed length.Use
readandunpack:# $RECORDSIZE is the length of a record, in bytes. # $TEMPLATE is the unpack template for the record # FILE is the file to read from # @FIELDS is an array, one element per field until ( eof(FILE) ) { read(FILE, $record, $RECORDSIZE) = = $RECORDSIZE or die "short read\n"; @FIELDS = unpack($TEMPLATE, $record); }Because the file in question is not a text file, you can't use<FH>or IO::Handle'sgetlinemethod to read records. Instead, you must simplyreada particular number of bytes into a variable. This variable contains one record's data, which you decode usingunpackwith the appropriate format.For binary data, the catch is determining that format. When reading data written by a C program, this can mean peeking at C include files or manpages describing the structure layout, and this requires knowledge of C. It also requires that you become unnaturally chummy with your C compiler, because otherwise it's hard to predict field padding and alignment (such as thex2in the format used in Recipe 8.24). If you're lucky enough to be on a Berkeley Unix system or a system supporting gcc, then you may be able to use the c2ph tool distributed with Perl to cajole your C compiler into helping you with this.The tailwtmp program at the end of this chapter uses the format described in utmp(5) under Linux, and works on its /var/log/wtmp and /var/run/utmp files. Once you commit to working in binary format, machine dependencies creep in fast. It probably won't work unaltered on your system, but the procedure is still illustrative. Here is the relevant layout from the C include file on Linux:#define UT_LINESIZE 12 #define UT_NAMESIZE 8 #define UT_HOSTSIZE 16 struct utmp { /* here are the pack template codes */ short ut_type; /* s for short, must be padded */ pid_t ut_pid; /* i for integer */ char ut_line[UT_LINESIZE]; /* A12 for 12-char string */ char ut_id[2]; /* A2, but need x2 for alignment */ time_t ut_time; /* l for long */ char ut_user[UT_NAMESIZE]; /* A8 for 8-char string */ char ut_host[UT_HOSTSIZE]; /* A16 for 16-char string */ long ut_addr; /* l for long */ };Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading Configuration Files
- InhaltsvorschauYou want to allow users of your program to change its behavior through configuration files.Either process a file in trivial VAR=VALUE format, setting a hash key-value pair for each setting:
while (<CONFIG>) { chomp; # no newline s/#.*//; # no comments s/^\s+//; # no leading white s/\s+$//; # no trailing white next unless length; # anything left? my ($var, $value) = split(/\s*=\s*/, $_, 2); $User_Preferences{$var} = $value; }or better yet, treat the config file as full Perl code:do "$ENV{HOME}/.progrc";The first solution lets you read config files in a trivial format like this (comments and empty lines are allowed):# set class C net NETMASK = 255.255.255.0 MTU = 296 DEVICE = cua1 RATE = 115200 MODE = adaptive
After you're done, you can pull in a setting by using something like$User_Preferences{"RATE"}to find the value 115200. If you wanted the config file to set the global variable by that name, instead of assigning to the hash, use this:no strict "refs"; $$var = $value;
and the$RATEvariable would contain 115200.The second solution usesdoto pull in raw Perl code directly. When used with an expression instead of a block,dointerprets the expression as a filename. This is nearly identical to usingrequire, but without risk of taking a fatal exception. In the second format, the config file would look like:# set class C net $NETMASK = "255.255.255.0"; $MTU = 0x128; # Brent, please turn on the modem $DEVICE = "cua1"; $RATE = 115_200; $MODE = "adaptive";
If you don't see the point of having extra punctuation and live code, consider this: you can have all of Perl at your disposal. You can now add arbitrary logic and tests to your simple assignments:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Testing a File for Trustworthiness
- InhaltsvorschauYou want to read from a file, perhaps because it has configuration information. You want to use the file only if it can't be written to (or perhaps not even be read from) by anyone else than its owner.Use the
statfunction to retrieve ownership and file permissions information. You can use the built-in version, which returns a list:( $dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks ) = stat($filename) or die "no $filename: $!"; $mode &= 07777; # discard file type infoOr you can use the by-name interface:use File::stat; $info = stat($filename) or die "no $filename: $!"; if ($info->uid = = 0) { print "Superuser owns $filename\n"; } if ($info->atime > $info->mtime) { print "$filename has been read since it was written.\n"; }Usually you trust users to set file permissions as they wish. If they want others to read their files, or even to write to them, that's their business. Applications such as editors, mailers, and shells are often more discerning, though, refusing to evaluate code in configuration files if anyone but the owner can write to them. This helps avoid Trojan horse attacks. Security-minded programs such as ftp and ssh may even reject config files that can be read by anyone but their owner.If the file is writable by someone other than the owner or is owned by someone other than the current user or the superuser, it shouldn't be trusted. To figure out file ownership and permissions, thestatfunction is used. The following function returns true if the file is deemed safe and false otherwise. If thestatfails,undefis returned.use File::stat; sub is_safe { my $path = shift; my $info = stat($path); return unless $info; # owner neither superuser nor me # the real uid is in stored in the $< variable if (($info->uid != 0) && ($info->uid != $<)) { return 0; } # check whether group or other can write file. # use 066 to detect either reading or writing if ($info->mode & 022) { # someone else can write this return 0 unless -d _; # non-directories aren't safe # but directories with the sticky bit (01000) are return 0 unless $info->mode & 01000; } return 1; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Treating a File as an Array
- InhaltsvorschauYour file contains a list of lines or records, and you'd like to be able to use Perl's powerful array operations to access and manipulate the file.Use the Tie::File module, standard with v5.8 of Perl:
use Tie::File; use Fcntl; tie @data, Tie::File, $FILENAME or die "Can't tie to $filename : $!\n"; # use array operations on @data to work with the file
The Tie::File module makes a file appear to be an array, one record per element. You can then fetch and assign to elements of the array, use array functions likepushandsplice, use negative indices, orreverseit, and in every instance you're really working with the data on disk.If you don't specify how Tie::File should open the file, it is opened for read and write access and created if it doesn't exist. To specify a particular access mode (see Recipe 7.1), pass the Fcntl mode with themodeparameter when youtie. For example:use Fcntl; tie(@data, Tie::File, $filename, mode => O_RDONLY) or die "Can't open $filename for reading: $!\n";
When you alter the array, the file is rewritten on disk. For example, if you change the length of an element, all records later in the file must be copied to make the change. Take this code:foreach (@data) { s/Perl Cookbook/Perl Cookbook (2nd edition)/g; }That's close because you change the length of record 0, forcing a copy of records 1..N. Then you change the length of record 1, forcing a copy of records 2..N. It's better to defer the update until all changes have been made and then have Tie::File update the file in one single write. To do this, call a method on the object behind the tied array:(tied @data)->defer; # defer updates foreach (@data) { s/Perl Cookbook/Perl Cookbook (2nd edition)/g; } (tied @data)->flush;Exactly how much rewriting to defer is governed by how much memory you let Tie::File use, because the only way to keep track of changes without updating the file is to store those changes in memory. The Tie::File manpage shows how to change options for memory use.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Setting the Default I/O Layers
- InhaltsvorschauYou want to ensure all files opened by your program use a particular set of I/O layers. For example, you know that every file will contain UTF-8 data.Use the
openpragma:use open IO => ":raw:utf8";
You can easily specify I/O layers when you open a filehandle directly, but that doesn't help you when the filehandle is opened by someone else's code (possibly even the Perl core). Theopenpragma lets you specify a default set of layers for everyopenthat doesn't specify its own layers.Theopenmodule also offers separateINandOUTcontrol for input and output handles. For example, to read bytes and emit UTF-8:use open "IN" => ":bytes", "OUT" => ":utf8";
The:stdoption tellsopento apply the input and output layers toSTDINandSTDOUT/STDERR. For example, the following code makes input handles read Greek (ISO 8859-7) and output handles write in the UTF-8 Unicode encoding. Then it applies the same layers toSTDIN,STDOUT, andSTDERR:use open "IN" => ":encoding(Greek)", # reading Greek "OUT" => ":utf8", # writing 8-bit data in Unicode UTF-8, ":std"; # STDIN is Greek,The documentation for the standardopenpragma; Recipe 8.12 and Recipe 8.19Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading or Writing Unicode from a Filehandle
- InhaltsvorschauYou have a file containing text in a particular encoding and when you read data from that into a Perl string, Perl treats it as a series of 8-bit bytes. You'd like to work with characters instead of bytes because your encoding characters can take more than one byte. Also, if Perl doesn't know about your encoding, it may fail to identify certain characters as letters. Similarly, you may want to output text in a particular encoding.Use I/O layers to tell Perl that data from that filehandle is in a particular encoding.
open(my $ifh, "<:encoding(ENCODING_NAME)", $filename); open(my $ofh, ">:encoding(ENCODING_NAME)", $filename);
Perl's text manipulation functions handle UTF-8 strings just as well as they do 8-bit data—they just need to know what type of data they're working with. Each string in Perl is internally marked as either UTF-8 or 8-bit data. Theencoding(...)layer converts data between variable external encodings and the internal UTF-8 within Perl. This is done by way of the Encode module.In the section on Unicode Support in Perl back in the Introduction to Chapter 1, we explained how under Unicode, every different character had a different code point (i.e., a different number) associated with it. Assigning all characters unique code points solves many problems. No longer does the same number, like 0xC4, represent one character under one character repertoire (e.g., a LATIN CAPITAL LETTER A WITH DIAERESIS under ISO-8859-1) and a different character in another repertoire (e.g., a GREEK CAPITAL LETTER DELTA under ISO-8859-7).This neatly solves many problems, but still leaves one important issue: the precise format used in memory or disk for each code point. If most code points fit in 8 bits, it would seem wasteful to use, say, a full 32 bits for each character. But if every character is the same size as every other character, the code is easier to write and may be faster to execute.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Converting Microsoft Text Files into Unicode
- InhaltsvorschauYou have a text file written on a Microsoft computer that looks like garbage when displayed. How do you fix this?Set the encoding layer appropriately when reading to convert this into Unicode:
binmode(IFH, ":encoding(cp1252)") || die "can't binmode to cp1252 encoding: $!";Suppose someone sends you a file in cp1252 format, Microsoft's default in-house 8-bit character set. Files in this format can be annoying to read—while they might claim to be Latin1, they are not, and if you look at them with Latin1 fonts loaded, you'll get garbage on your screen. A simple solution is as follows:open(MSMESS, "< :crlf :encoding(cp1252)", $inputfile) || die "can't open $inputfile: $!";Now data read from that handle will be automatically converted into Unicode when you read it in. It will also be processed in CRLF mode, which is needed on systems that don't use that sequence to indicate end of line.You probably won't be able to write out this text as Latin1. That's because cp1252 includes characters that don't exist in Latin1. You'll have to leave it in Unicode, and displaying Unicode properly may not be as easy as you wish, because finding tools to work with Unicode is something of a quest in its own right. Most web browsers support ISO 10646 fonts; that is, Unicode fonts (seehttp://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html). Whether your text editor does is a different matter, although both emacs and vi (actually, vim, not nvi) have mechanisms for handling Unicode. The authors used the following xterm(1) command to look at text:xterm -n unicode -u8 -fn -misc-fixed-medium-r-normal--20-200-75-75-c-100-iso10646-1
But many open questions still exist, such as cutting and pasting of Unicode data between windows.TheEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Comparing the Contents of Two Files
- InhaltsvorschauYou have two files and want to see whether they're the same or different.Use the standard File::Compare module with filenames, typeglobs, or any indirect filehandles:
use File::Compare; if (compare($FILENAME_1, $FILENAME_2) = = 0) { # they're equal } if (compare(*FH1, *FH2) = = 0) { # they're equal } if (compare($fh1, $fh2) = = 0) { # they're equal }The File::Compare module (standard as of v5.8 and available on CPAN if you have an earlier version of Perl) compares two files for equality. Thecomparefunction, exported by default, returns 0 when the files are equal, 1 when they differ, and -1 when any error occurs during reading.To compare more than two filehandles, simply loop, comparing two at a time:# ensure all filehandles in @fh hold the same data foreach $fh (@fh[1..$#fh]) { if (compare($fh[0], $fh)) { # $fh differs } }If you want details of exactly how two files differ, use the Text::Diff module from CPAN:use Text::Diff; $diff = diff(*FH1, *FH2); $diff = diff($FILENAME_1, $FILENAME_2, { STYLE => "Context" });In addition to filehandles,diffcan also take filenames, strings, and even arrays of records. Pass a hash of options as the third argument. TheSTYLEoption controls the type of output returned; it can be "Unified" (the default), "Context", or "OldStyle". You can even write your own class for custom diff formats.The value returned bydiffis a string similar to the output of the diff(1) program. This string is in valid diff format, suitable for feeding into patch(1). Although Text::Diff will not always produce the same output as GNU diff, byte for byte, its diffs are nevertheless correct.The documentation for the standard File::Compare module; the documentation for the CPAN module Text::Diff; theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Pretending a String Is a File
- InhaltsvorschauYou have data in string, but would like to treat it as a file. For example, you have a subroutine that expects a filehandle as an argument, but you would like that subroutine to work directly on the data in your string instead. Additionally, you don't want to write the data to a temporary file.Use the scalar I/O in Perl v5.8:
open($fh, "+<", \$string); # read and write contents of $string
Perl's I/O layers include support for input and output from a scalar. When you read a record with<$fh>, you are reading the next line from$string. When you write a record withprint, you change$string. You can pass$fhto a function that expects a filehandle, and that subroutine need never know that it's really working with data in a string.Perl respects the various access modes inopenfor strings, so you can specify that the strings be opened as read-only, with truncation, in append mode, and so on:open($fh, "<", \$string); # read only open($fh, ">", \$string); # write only, discard original contents open($fh, "+>", \$string); # read and write, discard original contents open($fh, "+<", \$string); # read and write, preserve original contents
These handles behave in all respects like regular filehandles, so all I/O functions work, such asseek,truncate,sysread, and friends.Theopenfunction in perlfunc(1) and in Chapter 29 of Programming Perl; Recipe 8.12 and Recipe 8.19Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: tailwtmp
- InhaltsvorschauEvery time a user logs into or out of a Unix system, a record is added to the wtmp file. You can't use the normal tail program on it, because the file is in binary format. The tailwtmp program in Example 8-7 knows the format of the binary file and shows every new record as it appears. You'll have to adjust the
packformat for your own system.Example 8-7. tailwtmp#!/usr/bin/perl -w # tailwtmp - watch for logins and logouts; # uses linux utmp structure, from utmp(5) $typedef = "s x2 i A12 A4 l A8 A16 l"; $sizeof = length pack($typedef, ( ) ); use IO::File; open(WTMP, "< :raw", "/var/log/wtmp") or die "can't open /var/log/wtmp: $!"; seek(WTMP, 0, SEEK_END); for (;;) { while (read(WTMP, $buffer, $sizeof) = = $sizeof) { ($type, $pid, $line, $id, $time, $user, $host, $addr) = unpack($typedef, $buffer); next unless $user && ord($user) && $time; printf "%1d %-8s %-12s %2s %-24s %-16s %5d %08x\n", $type,$user,$line,$id,scalar(localtime($time)), $host,$pid,$addr; } for ($size = -s WTMP; $size = = -s WTMP; sleep 1) { } WTMP->clearerr( ); }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: tctee
- InhaltsvorschauNot all systems support the classic tee program for splitting output pipes to multiple destinations. This command sends the output from someprog to /tmp/output and to the mail pipe beyond:
% someprog | tee /tmp/output | Mail -s "check this" user@host.org
This program helps not only users who aren't on Unix systems and don't have a regular tee; it also helps those who are, because it offers features not found on other versions of tee.The four flag arguments are -i to ignore interrupts, -a to append to output files, -u for unbuffered output, and -n to omit copying the output on to standard out.Because this program uses Perl's magicopen, you can specify pipes as well as files.% someprog | tctee f1 "|cat -n" f2 ">>f3"
That sends the output from someprog to the files f1 and f2, appends it to f3, sends a copy to the program cat -n, and also produces the stream on standard output.The program in Example 8-8 is one of many venerable Perl programs written nearly a decade ago that still runs perfectly well. If written from scratch now, we'd probablyusestrict, warnings, and ten to thirty thousand lines of modules. But if it ain't broke . . .Example 8-8. tctee#!/usr/bin/perl # tctee - clone that groks process tees # perl3 compatible, or better. while ($ARGV[0] =~ /^-(.+)/ && (shift, ($_ = $1), 1)) { next if /^$/; s/i// && (++$ignore_ints, redo); s/a// && (++$append, redo); s/u// && (++$unbuffer, redo); s/n// && (++$nostdout, redo); die "usage $0 [-aiun] [filenames] ...\n"; } if ($ignore_ints) { for $sig ("INT", "TERM", "HUP", "QUIT") { $SIG{$sig} = "IGNORE"; } } $SIG{"PIPE"} = "PLUMBER"; $mode = $append ? ">>" : ">"; $fh = "FH000"; unless ($nostdout) { %fh = ("STDOUT", "standard output"); # always go to stdout } $| = 1 if $unbuffer; for (@ARGV) { if (!open($fh, (/^[^>|]/ && $mode) . $_)) { warn "$0: cannot open $_: $!\n"; # like sun's; i prefer die $status++; next; } select((select($fh), $| = 1)[0]) if $unbuffer; $fh{$fh++} = $_; } while (<STDIN>) { for $fh (keys %fh) { print $fh $_; } } for $fh (keys %fh) { next if close($fh) || !defined $fh{$fh}; warn "$0: couldnt close $fh{$fh}: $!\n"; $status++; } exit $status; sub PLUMBER { warn "$0: pipe to \"$fh{$fh}\" broke!\n"; $status++; delete $fh{$fh}; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: laston
- InhaltsvorschauWhen you log in to a Unix system, it tells you when you last logged in. That information is stored in a binary file called lastlog. Each user has their own record; UID 8 is at record 8, UID 239 at record 239, and so on. To find out when a given user last logged in, convert their login name to a number, seek to their record in that file, read, and unpack. Doing so with shell tools is hard, but with the laston program, it's easy. Here's an example:
% laston gnat gnat UID 314 at Mon May 25 08:32:52 2003 on ttyp0 from below.perl.comThe program in Example 8-9 is much newer than the tctee program in Example 8-8, but it's less portable. It uses the Linux binary layout of the lastlog file. You'll have to change this for other systems.Example 8-9. laston#!/usr/bin/perl -w # laston - find out when given user last logged on use User::pwent; use IO::Seekable qw(SEEK_SET); open (LASTLOG, "< :raw", "/var/log/lastlog") or die "can't open /var/log/lastlog: $!"; $typedef = "L A12 A16"; # linux fmt; sunos is "L A8 A16" $sizeof = length(pack($typedef, ( ))); for $user (@ARGV) { $U = ($user =~ /^\d+$/) ? getpwuid($user) : getpwnam($user); unless ($U) { warn "no such uid $user\n"; next; } seek(LASTLOG, $U->uid * $sizeof, SEEK_SET) or die "seek failed: $!"; read(LASTLOG, $buffer, $sizeof) = = $sizeof or next; ($time, $line, $host) = unpack($typedef, $buffer); printf "%-8s UID %5d %s%s%s\n", $U->name, $U->uid, $time ? ("at " . localtime($time)) : "never logged in", $line && " on $line", $host && " from $host"; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: Flat File Indexes
- InhaltsvorschauIt sometimes happens that you need to jump directly to a particular line number in a file, but the lines vary in length, so you can't use Recipe 8.12. Although you could start at the beginning of the file and read every line, this is inefficient if you're making multiple queries.The solution is to build an index of fixed-width records, one per line. Each record contains the offset in the data file of the corresponding line. The subroutine in Example 8-10 takes the data file and a filehandle to send the index to. It reads a record at a time and prints the current offset in the file to the index, packed into a big-ending unsigned 32-bit integer; see the documentation for the
packfunction in perlfunc(1) for alternative storage types.Example 8-10. build_index# usage: build_index(*DATA_HANDLE, *INDEX_HANDLE) sub build_index { my $data_file = shift; my $index_file = shift; my $offset = 0; while (<$data_file>) { print $index_file pack("N", $offset); $offset = tell($data_file); } }Once you have an index, it becomes easy to read a particular line from the data file. Jump to that record in the index, read the offset, and jump to that position in the data file. The next line you read will be the one you want. Example 8-11 returns the line, given the line number and the index and data file handles.Example 8-11. line_with_index# usage: line_with_index(*DATA_HANDLE, *INDEX_HANDLE, $LINE_NUMBER) # returns line or undef if LINE_NUMBER was out of range sub line_with_index { my $data_file = shift; my $index_file = shift; my $line_number = shift; my $size; # size of an index entry my $i_offset; # offset into the index of the entry my $entry; # index entry my $d_offset; # offset into the data file $size = length(pack("N", 0)); $i_offset = $size * ($line_number-1); seek($index_file, $i_offset, 0) or return; read($index_file, $entry, $size); $d_offset = unpack("N", $entry); seek($data_file, $d_offset, 0); return scalar(<$data_file>); }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 9: Directories
- InhaltsvorschauUnix has its weak points, but its file system is not one of them.—Chris TorekTo fully understand directories, you need to be acquainted with the underlying mechanics. The following explanation is slanted toward the Unix filesystem, for whose syscalls and behavior Perl's directory access routines were designed, but it is applicable to some degree to most other platforms.A filesystem consists of two parts: a set of data blocks where the contents of files and directories are kept, and an index to those blocks. Each entity in the filesystem has an entry in the index, be it a plain file, a directory, a link, or a special file like those in /dev. Each entry in the index is called an inode (short for index node). Since the index is a flat index, inodes are addressed by number.A directory is a specially formatted file, whose inode entry marks it as a directory. A directory's data blocks contain a set of pairs. Each pair consists of the name of something in that directory and the inode number of that thing. The data blocks for /usr/bin might contain:NameInodebc
17du29nvi8pineEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauTo fully understand directories, you need to be acquainted with the underlying mechanics. The following explanation is slanted toward the Unix filesystem, for whose syscalls and behavior Perl's directory access routines were designed, but it is applicable to some degree to most other platforms.A filesystem consists of two parts: a set of data blocks where the contents of files and directories are kept, and an index to those blocks. Each entity in the filesystem has an entry in the index, be it a plain file, a directory, a link, or a special file like those in /dev. Each entry in the index is called an inode (short for index node). Since the index is a flat index, inodes are addressed by number.A directory is a specially formatted file, whose inode entry marks it as a directory. A directory's data blocks contain a set of pairs. Each pair consists of the name of something in that directory and the inode number of that thing. The data blocks for /usr/bin might contain:NameInodebc
17du29nvi8pine55Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Getting and Setting Timestamps
- InhaltsvorschauYou need to retrieve or alter when a file was last modified (written or changed) or accessed (read).Use
statto get those times andutimeto set them. Both functions are built into Perl:($READTIME, $WRITETIME) = (stat($filename))[8,9]; utime($NEWREADTIME, $NEWWRITETIME, $filename);
As explained in the Introduction, three different times are associated with an inode in the traditional Unix filesystem. Of these, any user can set theatimeandmtimewithutime, assuming the user has write access to the parent directory of the file. There is effectively no way to change thectime. This example shows how to callutime:$SECONDS_PER_DAY = 60 * 60 * 24; ($atime, $mtime) = (stat($file))[8,9]; $atime -= 7 * $SECONDS_PER_DAY; $mtime -= 7 * $SECONDS_PER_DAY; utime($atime, $mtime, $file) or die "couldn't backdate $file by a week w/ utime: $!";You must callutimewith bothatimeandmtimevalues. If you want to change only one, you must callstatfirst to get the other:$mtime = (stat $file)[9]; utime(time, $mtime, $file);
This is easier to understand if you use File::stat:use File::stat; utime(time, stat($file)->mtime, $file);
Useutimeto make it appear as though you never touched a file at all (beyond itsctimebeing updated). For example, to edit a file, use the program in Example 9-1.Example 9-1. uvi#!/usr/bin/perl -w # uvi - vi a file without changing its access times $file = shift or die "usage: uvi filename\n"; ($atime, $mtime) = (stat($file))[8,9]; system($ENV{EDITOR} || "vi", $file); utime($atime, $mtime, $file) or die "couldn't restore $file to orig times: $!";Thestatandutimefunctions in perlfunc(1) and in Chapter 29 ofEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Deleting a File
- InhaltsvorschauYou want to delete a file. Perl's
deletefunction isn't what you want.Use Perl'sunlinkfunction:unlink($FILENAME) or die "Can't delete $FILENAME: $!\n"; unlink(@FILENAMES) = = @FILENAMES or die "Couldn't unlink all of @FILENAMES: $!\n";
Theunlinkfunction takes its name from the Unix syscall. Perl'sunlinktakes a list of filenames and returns the number of filenames successfully deleted. This return value can then be tested with||oror:unlink($file) or die "Can't unlink $file: $!";
unlinkdoesn't report which filenames it couldn't delete, only how many it deleted. Here's one way to test for successful deletion of many files and report the number deleted:unless (($count = unlink(@filelist)) = = @filelist) { warn "could only delete $count of " . (@filelist) . " files"; }Aforeachover@filelistwould permit individual error messages.Under Unix, deleting a file from a directory requires write access to the directory, not to the file, because it's the directory you're changing. Under some circumstances, you could remove a file you couldn't write to or write to a file you couldn't remove.If you delete a file that some process still has open, the operating system removes the directory entry but doesn't free up data blocks until all processes have closed the file. This is how thetmpfilefunction in File::Temp works (see Recipe 7.11).Theunlinkfunction in perlfunc(1) and in Chapter 29 of Programming Perl; your system's unlink(2) manpage; Recipe 7.11Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Copying or Moving a File
- InhaltsvorschauYou need to copy a file, but Perl has no built-in copy function.Use the
copyfunction from the standard File::Copy module:use File::Copy; copy($oldfile, $newfile);
You can do it by hand:open(IN, "<", $oldfile) or die "can't open $oldfile: $!"; open(OUT, ">", $newfile) or die "can't open $newfile: $!"; $blksize = (stat IN)[11] || 16384; # preferred block size? while (1) { $len = sysread IN, $buf, $blksize); if (!defined $len) { next if $! =~ /^Interrupted/; # ^Z and fg on EINTR die "System read error: $!\n"; } last unless $len; $offset = 0; while ($len) { # Handle partial writes. defined($written = syswrite OUT, $buf, $len, $offset) or die "System write error: $!\n"; $len -= $written; $offset += $written; }; } close(IN); close(OUT);or you can call your system's copy program:system("cp $oldfile $newfile"); # unix system("copy $oldfile $newfile"); # dos, vmsThe File::Copy module providescopyandmovefunctions. These are more convenient than resorting to low-level I/O calls and more portable than callingsystem. This version ofmoveworks across file-system boundaries; the standard Perl built-inrename(usually) does not.use File::Copy; copy("datafile.dat", "datafile.bak") or die "copy failed: $!"; move("datafile.dat", "datafile.new") or die "move failed: $!";Because these functions return only a simple success status, you can't easily tell which file prevented the copy or move from working. Copying the files manually lets you pinpoint which files didn't copy, but it fills your program with complexsysreads andsyswrites.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Recognizing Two Names for the Same File
- InhaltsvorschauYou want to determine whether two filenames in a list correspond to the same file on disk (because of hard and soft links, two filenames can refer to a single file). You might do this to make sure that you don't change a file you've already worked with.Maintain a hash, keyed by the device and inode number of the files you've seen. The values are the names of the files:
%seen = ( ); sub do_my_thing { my $filename = shift; my ($dev, $ino) = stat $filename; unless ($seen{$dev, $ino}++) { # do something with $filename because we haven't # seen it before } }A key in%seenis made by combining the device number ($dev) and inode number ($ino) of each file. Files that are the same will have the same device and inode numbers, so they will have the same key.If you want to maintain a list of all files of the same name, instead of counting the number of times seen, save the name of the file in an anonymous array.foreach $filename (@files) { ($dev, $ino) = stat $filename; push( @{ $seen{$dev,$ino} }, $filename); } foreach $devino (sort keys %seen) { ($dev, $ino) = split(/$;/o, $devino); if (@{$seen{$devino}} > 1) { # @{$seen{$devino}} is a list of filenames for the same file } }The$;variable contains the separator string using the old multidimensional associative array emulation syntax,$hash{$x,$y,$z}. It's still a one-dimensional hash, but it has composite keys. The key is reallyjoin($;=>$x,$y,$z). Thesplitseparates them again. Although you'd normally just use a real multilevel hash directly, here there's no need, and it's cheaper not to.The$;($SUBSEP) variable in perlvar(1), and in the "Special Variables" section of Chapter 28 of Programming Perl; thestatfunction inEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Processing All Files in a Directory
- InhaltsvorschauYou want to do something to each file in a particular directory.Use
opendirto open the directory andreaddirto retrieve every filename:opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; while (defined($file = readdir(DIR))) { # do something with "$dirname/$file" } closedir(DIR);Theopendir,readdir, andclosedirfunctions operate on directories asopen,<>, andcloseoperate on files. Both use handles, but the directory handles used byopendirand friends are different from the filehandles used byopenand friends. In particular, you can't use<>on a directory handle.In scalar context,readdirreturns the next filename in the directory until it reaches the end of the directory, when it returnsundef. In list context it returns the rest of the filenames in the directory or an empty list if there were no files left. As explained in this chapter's Introduction, the filenames returned byreaddirdo not include the directory name. When you work with the filenames returned byreaddir, you must either move to the right directory first or prepend the directory to the filename.This shows one way of prepending:$dir = "/usr/local/bin"; print "Text files in $dir are:\n"; opendir(BIN, $dir) or die "Can't open $dir: $!"; while( $file = readdir BIN) { print "$file\n" if -T "$dir/$file"; } closedir(BIN);Thereaddirfunction will return the special directories "." (the directory itself) and ".." (the parent of the directory). Most people skip those files with code like:while ( defined ($file = readdir BIN) ) { next if $file =~ /^\.\.?$/; # skip . and .. # ... }Like filehandles, bareword directory handles are per-package constructs. You can use thelocal*DIRHANDLEsyntax to get a new bareword directory handle. Alternatively, pass an undefined scalar as the first argument toEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Globbing, or Getting a List of Filenames Matching a Pattern
- InhaltsvorschauYou want to get a list of filenames similar to those produced by MS-DOS's
*.*and Unix's*.h. This is called globbing, and the filename wildcard expression is called a glob, or occasionally a fileglob to distinguish it from a typeglob.Perl provides globbing with the semantics of the Unix C shell through theglobkeyword and<>:@list = <*.c>; @list = glob("*.c");You can also usereaddirto extract the filenames manually:opendir(DIR, $path); @files = grep { /\.c$/ } readdir(DIR); closedir(DIR);In versions of Perl before v5.6, Perl's built-ingloband<WILDCARD>notation (not to be confused with<FILEHANDLE>) ran an external program (often the csh shell) to get the list of filenames. This led to globbing being tarred with security and performance concerns. As of v5.6, Perl uses the File::Glob module to glob files, which solves the security and performance problems of the old implementation. Globs have C shell semantics on non-Unix systems to encourage portability. In particular, glob syntax isn't regular expression syntax—globuses?to mean "any single character" and*to mean "zero or more characters," soglob("f?o*")matchesfloandfloodbut notfo.For complex rules about which filenames you want, roll your own selection mechanism usingreaddirand regular expressions.At its simplest, anopendirsolution usesgrepto filter the list returned byreaddir:@files = grep { /\.[ch]$/i } readdir(DH);As always, the filenames returned don't include the directory. When you use the filename, prepend the directory name to get the full pathname:opendir(DH, $dir) or die "Couldn't open $dir for reading: $!"; @files = ( ); while( defined ($file = readdir(DH)) ) { next unless /\.[ch]$/i; my $filename = "$dir/$file"; push(@files, $filename) if -T $filename; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Processing All Files in a Directory Recursively
- InhaltsvorschauYou want to do something to each file and subdirectory in a particular directory.Use the standard File::Find module.
use File::Find; sub process_file { # do whatever; } find(\&process_file, @DIRLIST);File::Find provides a convenient way to process a directory recursively. It does the directory scans and recursion for you. All you do is passfinda code reference and a list of directories. For each file in those directories, recursively,findcalls your function.Before calling your function,findby default changes to the directory being visited, whose path relative to the starting directory is stored in the$File::Find::dirvariable.$_is set to the basename of the file being visited, and the full path of that file can be found in$File::Find::name. Your code can set$File::Find::pruneto true to tellfindnot to descend into the directory just seen.This simple example demonstrates File::Find. We givefindan anonymous subroutine that prints the name of each file visited and adds a/to the names of directories:@ARGV = qw(.) unless @ARGV; use File::Find; find sub { print $File::Find::name, -d && "/", "\n" }, @ARGV;The -d file test operator returns the empty string '' if it fails, making the&&return that, too. But if -d succeeds, the&&returns "/", which is then printed.The following program prints the total bytes occupied by everything in a directory, including subdirectories. It givesfindan anonymous subroutine to keep a running sum of the sizes of each file it visits. That includes all inode types, including the sizes of directories and symbolic links, not just regular files. Once thefindfunction returns, the accumulated sum is displayed.use File::Find; @ARGV = (".") unless @ARGV; my $sum = 0; find sub { $sum += -s }, @ARGV; print "@ARGV contains $sum bytes\n";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Removing a Directory and Its Contents
- InhaltsvorschauYou want to remove a directory tree recursively without using
rm-r.Use thefinddepthfunction from File::Find, shown in Example 9-3.Example 9-3. rmtree1#!/usr/bin/perl # rmtree1 - remove whole directory trees like rm -r use File::Find; die "usage: $0 dir ..\n" unless @ARGV; find { bydepth => 1, no_chdir => 1, wanted => sub { if (!-l && -d _) { rmdir or warn "couldn't rmdir directory $_: $!"; } else { unlink or warn "couldn't unlink file $_: $!"; } } } => @ARGV;Or usermtreefrom File::Path, as shown in Example 9-4.Example 9-4. rmtree2#!/usr/bin/perl # rmtree2 - remove whole directory trees like rm -r use File::Path; die "usage: $0 dir ..\n" unless @ARGV; foreach $dir (@ARGV) { rmtree($dir); }These programs remove an entire directory tree. Use with extreme caution!The File::Find module supports an alternate interface in whichfind's first argument is a hash reference containing options and their settings. Thebydepthoption is the same as callingfinddepthinstead offind. This is guaranteed to visit all files beneath a directory before the directory itself, just what we need to remove a directory and its contents. Theno_chdiroption stopsfindfrom descending into directories during processing; under this option,$_is the same as$File::Find::name. Finally, thewantedoption takes a code reference, our oldwanted( )function.We use two different functions,rmdirandunlink; both default to$_if no argument is provided. Theunlinkfunction deletes only files, andEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Renaming Files
- InhaltsvorschauYou have many files whose names you want to change.Use a
foreachloop and therenamefunction:foreach $file (@NAMES) { my $newname = $file; # change $newname rename($file, $newname) or warn "Couldn't rename $file to $newname: $!\n"; }This is straightforward.renametakes two arguments. The first is the filename to change, and the second is its new name. Perl'srenameis a frontend to the operating system's rename syscall, which typically won't rename files across filesystem boundaries.A small change turns this into a genericrenamescript, such as the one by Larry Wall shown in Example 9-5.Example 9-5. rename#!/usr/bin/perl -w # rename - Larry's filename fixer $op = shift or die "Usage: rename expr [files]\n"; chomp(@ARGV = <STDIN>) unless @ARGV; for (@ARGV) { $was = $_; eval $op; die $@ if $@; rename($was,$_) unless $was eq $_; }This script's first argument is Perl code that alters the filename (stored in$_) to reflect how you want the file renamed. It can do this because it uses anevalto do the hard work. It also skipsrenamecalls when the filename is untouched. This lets you simply use wildcards likerenameEXPR*instead of making long lists of filenames.Here are five examples of calling the rename program from the shell:% rename 's/\.orig$//' *.orig % rename "tr/A-Z/a-z/ unless /^Make/" * % rename '$_ .= ".bad"' *.f % rename 'print "$_: "; s/foo/bar/ if <STDIN> =~ /^y/i' * % find /tmp -name "*~" -print | rename 's/^(.+)~$/.#$1/'
The first shell command removes a trailing ".orig" from each filename.The second converts uppercase to lowercase. Because a translation is used rather than theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Splitting a Filename into Its Component Parts
- InhaltsvorschauYou want to extract a filename, its enclosing directory, or the extension(s) from a string that contains a full pathname.Use routines from the standard File::Basename module.
use File::Basename; $base = basename($path); $dir = dirname($path); ($base, $dir, $ext) = fileparse($path);
The standard File::Basename module contains routines to split up a filename.dirnameandbasenamesupply the directory and filename portions, respectively:$path = "/usr/lib/libc.a"; $file = basename($path); $dir = dirname($path); print "dir is $dir, file is $file\n"; # dir is /usr/lib, file is libc.a
Thefileparsefunction can extract the extension. Passfileparsethe path to decipher and a regular expression that matches the extension. You must supply a pattern because an extension isn't necessarily dot-separated. Consider ".tar.gz": is the extension ".tar", ".gz", or ".tar.gz"? By specifying the pattern, you control which you get.$path = "/usr/lib/libc.a"; ($name,$dir,$ext) = fileparse($path,'\..*'); print "dir is $dir, name is $name, extension is $ext\n"; # dir is /usr/lib/, name is libc, extension is .a
By default, these routines parse pathnames using your operating system's normal conventions for directory separators by consulting the$^O($OSNAME) variable, which holds a string identifying the platform you're running on. That value was determined when Perl was built and installed. You can change the default by calling thefileparse_set_fstyperoutine. This alters the behavior of subsequent calls to the File::Basename functions:fileparse_set_fstype("MacOS"); $path = "Hard%20Drive:System%20Folder:README.txt"; ($name,$dir,$ext) = fileparse($path,'\..*'); print "dir is $dir, name is $name, extension is $ext\n"; # dir is Hard%20Drive:System%20Folder, name is README, extension is .txtEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Working with Symbolic File Permissions Instead of Octal Values
- InhaltsvorschauYou want to print, inspect, or change permissions on a file or directory, but you don't want to specify the permissions in octal (e.g.,
0644,0755). You want to print permissions as ls(1) shows them (e.g.,-rwx-r-xr-x) and specify permissions changes in the way that chmod(1) does (e.g.,g-wto remove write access for the group).Use the CPAN module Stat::lsMode to convert numeric permissions to a string:use Stat::lsMode; $lsmode = file_mode($pathname);
Use the CPAN module File::chmod to manipulate symbolic permissions:use File::chmod; chmod("g=rw,o=-w", @files); # group can read/write, others can't write chmod("-rwxr-xr--", @files); # ls-style permissionsThe Stat::lsMode module provides functions for generating ls-style permissions strings. Thefile_modefunction takes a pathname and returns a permissions string. This string is false if the pathname doesn't exist or Perl can't stat it. If all goes well, you get a string like "drwxr-x---" for a directory or "-rwxr-x----" for a file. For more fine-grained control, Stat::lsMode offersformat_mode, which takes a numeric permissions value and returns the 10-character ls-style string.Notice the leadingdand-in those strings. This indicates the type of file whose permissions you're inspecting:-means regular file,dmeans directory,lmeans symbolic link, and so on. Theformat_permsfunction from Stat::lsMode does the same job asformat_mode, but it returns a nine-character string, which does not have the type indicator. For example:use Stat::lsMode; print file_mode("/etc"), "\n"; print format_mode((stat "/etc")[2]), "\n"; drwxr-xr-x r-xr-xr-xThe File::chmod module gives you achmodthat accepts these nine-character permissions strings:use File::chmod; chmod("rwxr-xr-x", @files);These strings are three clusters of three characters. The three clusters represent what the user, group, and others can do to the file (respectively). The three characters representEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: symirror
- InhaltsvorschauThe program in Example 9-6 recursively duplicates a directory tree, making a shadow forest full of symlinks pointing back at the real files.Example 9-6. symirror
--- #!/usr/bin/perl # symirror - build spectral forest of symlinks use warnings; use strict; use Cwd qw(realpath); use File::Find qw(find); die "usage: $0 realdir mirrordir" unless @ARGV == 2; our $SRC = realpath $ARGV[0]; our $DST = realpath $ARGV[1]; my $oldmask = umask 077; # in case was insanely uncreatable chdir $SRC or die "can't chdir $SRC: $!"; unless (-d $DST) { mkdir($DST, 0700) or die "can't mkdir $DST: $!"; } find { wanted => \&shadow, postprocess => \&fixmode, } => "."; umask $oldmask; sub shadow { (my $name = $File::Find::name) =~ s!^\./!!; # correct name return if $name eq "."; if (-d) { # make a real dir; we'll copy mode later mkdir("$DST/$name", 0700) or die "can't mkdir $DST/$name: $!"; } else { # all else gets symlinked symlink("$SRC/$name", "$DST/$name") or die "can't symlink $SRC/$name to $DST/$name: $!"; } } sub fixmode { my $dir = $File::Find::dir; my $mode = (stat("$SRC/$dir"))[2] & 07777; chmod($mode, "$DST/$dir") or die "can't set mode on $DST/$dir: $!"; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: lst
- InhaltsvorschauHave you ever wondered what the newest or biggest files within a directory are? The standard ls program has options for listing out directories sorted in time order (the -t flag) and for recursing into subdirectories (the -R flag). However, it pauses at each directory to display the sorted contents of just that directory. It doesn't descend through all subdirectories first and then sort everything it found.The following lst program does that. Here's an example using its -l flag to get a long listing:
% lst -l /etc 12695 0600 1 root wheel 512 Fri May 29 10:42:41 1998 /etc/ssh_random_seed 12640 0644 1 root wheel 10104 Mon May 25 7:39:19 1998 /etc/ld.so.cache 12626 0664 1 root wheel 12288 Sun May 24 19:23:08 1998 /etc/psdevtab 12304 0644 1 root root 237 Sun May 24 13:59:33 1998 /etc/exports 12309 0644 1 root root 3386 Sun May 24 13:24:33 1998 /etc/inetd.conf 12399 0644 1 root root 30205 Sun May 24 10:08:37 1998 /etc/sendmail.cf 18774 0644 1 gnat perldoc 2199 Sun May 24 9:35:57 1998 /etc/X11/XMetroconfig 12636 0644 1 root wheel 290 Sun May 24 9:05:40 1998 /etc/mtab 12627 0640 1 root root 0 Sun May 24 8:24:31 1998 /etc/wtmplock 12310 0644 1 root tchrist 65 Sun May 24 8:23:04 1998 /etc/issue ....
/etc/X11/XMetroconfig showed up in the middle of the listing for /etc because it wasn't just for /etc, but for everything within that directory, recursively.Other supported options include sorting on read time instead of write time using -u and sorting on size rather than time with -s. The -i flag takes the list of filenames from standard input instead of recursing withEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 10: Subroutines
- InhaltsvorschauComposing mortals with immortal fire.—W. H. Auden, "Three Songs for St Cecilia's Day"To avoid the dangerous practice of copying and pasting code, larger programs reuse chunks of code as subroutines and functions. We'll use the terms subroutine and function interchangeably because Perl doesn't distinguish between the two. Even object-oriented methods are just subroutines that are called using a special syntax, described in Chapter 13.A subroutine is declared with the
subkeyword. Here's a simple subroutine definition:sub hello { $greeted++; # global variable print "hi there!\n"; }The typical way of calling that subroutine is:hello( ); # call subroutine hello with no arguments/parameters
Because Perl compiles your program before executing it, it doesn't matter where subroutines are declared. Definitions don't have to be in the same file as your main program. They can be pulled in from other files using thedo,require, oruseoperators, as described in Chapter 12. They can even be created on the fly usingevalor AUTOLOAD, or generated using closures, which can act as function templates.If you are familiar with other programming languages, several characteristics of Perl's functions may surprise you if you're unprepared for them. Most recipes in this chapter illustrate how to be aware of—and to take advantage of—these properties.-
Perl functions have no formal, named parameters, but this is not necessarily a bad thing. See Recipe 10.1 and Recipe 10.7.
-
All variables are global unless declared otherwise. See Recipe 10.2, Recipe 10.3, and Recipe 10.13 for details.
-
Passing or returning more than one array or hash normally causes them to lose their separate identities. See Recipe 10.5, Recipe 10.8, Recipe 10.9, and Recipe 10.11 to avoid this.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Introduction
- InhaltsvorschauTo avoid the dangerous practice of copying and pasting code, larger programs reuse chunks of code as subroutines and functions. We'll use the terms subroutine and function interchangeably because Perl doesn't distinguish between the two. Even object-oriented methods are just subroutines that are called using a special syntax, described in Chapter 13.A subroutine is declared with the
subkeyword. Here's a simple subroutine definition:sub hello { $greeted++; # global variable print "hi there!\n"; }The typical way of calling that subroutine is:hello( ); # call subroutine hello with no arguments/parameters
Because Perl compiles your program before executing it, it doesn't matter where subroutines are declared. Definitions don't have to be in the same file as your main program. They can be pulled in from other files using thedo,require, oruseoperators, as described in Chapter 12. They can even be created on the fly usingevalor AUTOLOAD, or generated using closures, which can act as function templates.If you are familiar with other programming languages, several characteristics of Perl's functions may surprise you if you're unprepared for them. Most recipes in this chapter illustrate how to be aware of—and to take advantage of—these properties.-
Perl functions have no formal, named parameters, but this is not necessarily a bad thing. See Recipe 10.1 and Recipe 10.7.
-
All variables are global unless declared otherwise. See Recipe 10.2, Recipe 10.3, and Recipe 10.13 for details.
-
Passing or returning more than one array or hash normally causes them to lose their separate identities. See Recipe 10.5, Recipe 10.8, Recipe 10.9, and Recipe 10.11 to avoid this.
-
A function can know in which context it was called, how many arguments it was called with, and even which other function called it. See Recipe 10.4 and Recipe 10.6 to find out how.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Accessing Subroutine Arguments
- InhaltsvorschauYou have written a function that takes arguments supplied by its caller, and you need to access those arguments.The special array
@_holds the values passed in as the function's arguments. Thus, the first argument to the function is in$_[0], the second in$_[1], and so on. The number of arguments is simplyscalar(@_).For example:sub hypotenuse { return sqrt( ($_[0] ** 2) + ($_[1] ** 2) ); } $diag = hypotenuse(3,4); # $diag is 5Most subroutines start by copying arguments into named private variables for safer and more convenient access:sub hypotenuse { my ($side1, $side2) = @_; return sqrt( ($side1 ** 2) + ($side2 ** 2) ); }It's been said that programming has only three nice numbers: zero, one, and however many you please. Perl's subroutine mechanism was designed to facilitate writing functions with as many—or as few—elements in the parameter and return lists as you wish. All incoming parameters appear as separate scalar values in the special array@_, which is automatically local to each function (see Recipe 10.13). To return a value or values from a subroutine, use thereturnstatement with arguments. If there is noreturnstatement, the return value is the result of the last evaluated expression.Here are some sample calls to thehypotenusefunction defined in the Solution:print hypotenuse(3, 4), "\n"; # prints 5 @a = (3, 4); print hypotenuse(@a), "\n"; # prints 5
If you look at the arguments used in the second call tohypotenuse, it might appear that only one argument was passed: the array@a. This isn't what happens—the elements of@aare copied into the@_array separately. Similarly, if you called a function with(@a,@b), you'd be giving it all arguments from both arrays. This is the same principle of flattened lists at work as in:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making Variables Private to a Function
- InhaltsvorschauYour subroutine needs temporary variables. You shouldn't use global variables, because another subroutine might also use the same variables.Use
myto declare a variable private to a region of your program:sub somefunc { my $variable; # $variable is invisible outside somefunc( ) my ($another, @an_array, %a_hash); # declaring many variables at once # ... }Themyoperator confines a variable to a particular region of code in which it can be used and accessed. Outside that region, it can't be accessed. This region is called its scope.Variables declared withmyhave lexical scope, meaning that they exist only within a specific textual region of code. For instance, the scope of$variablein the Solution is the function it was defined in,somefunc. The variable is created whensomefuncis entered, and it is destroyed when the function returns. The variable can be accessed only from inside the function, not from outside.A lexical scope is usually a block of code with braces around it, such as those defining the body of thesomefuncsubroutine or those marking the code blocks ofif,while,for,foreach, andeval. An entire source file and the string argument toevalare each a lexical scope; think of them as blocks with invisible braces delimiting their confines. Because a lexical scope is most often found as a brace-delimited block, when discussing lexical variables we sometimes say that they are visible only in their block, but what we really mean is that they're visible only in their scope.The code that can legally access amyvariable is determined statically at compile time and never changes, and so lexical scoping is sometimes referred to as static scoping, especially when in contrast to dynamic scoping, a topic we'll cover in Recipe 10.13.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Creating Persistent Private Variables
- InhaltsvorschauYou want a variable to retain its value between calls to a subroutine but not be visible outside that routine. For instance, you'd like your function to keep track of how many times it was called.Wrap the function in another block, then declare
myvariables in that block's scope rather than in the function's:{ my $variable; sub mysub { # ... accessing $variable } }If the variables require initialization, make that block an INIT so the variable is guaranteed to be set before the main program starts running:INIT { my $variable = 1; # initial value sub othersub { # ... accessing $variable } }Unlike local variables in C or C++, Perl's lexical variables don't necessarily get recycled just because their scope has exited. If something more permanent is still aware of the lexical, it will stick around. In this code,mysubuses$variable, so Perl doesn't reclaim the variable when the block around the definition ofmysubends.Here's how to write a counter:{ my $counter; sub next_counter { return ++$counter } }Each timenext_counteris called, it increments and returns the$countervariable. The first timenext_counteris called,$counteris undefined, so it behaves as though it were 0 for the++. The variable is not part ofnext_counter's scope, but rather part of the block surrounding it. No code from outside can change$counterexcept by callingnext_counter.Generally, you should use an INIT for the extra scope. Otherwise, you could call the function before its variables were initialized.INIT { my $counter = 42; sub next_counter { return ++$counter } sub prev_counter { return --$counter } }This technique creates the Perl equivalent of C's static variables. Actually, it's a little better: rather than being limited to just one function, both functions share their private variable.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Determining Current Function Name
- InhaltsvorschauYou want to determine the name of the currently running function. This is useful for creating error messages that don't need to be changed if you copy and paste the subroutine code.Use the
callerfunction:$this_function = (caller(0))[3];
Code can always determine the current source line number via the special symbol_ _LINE_ _, the current file via_ _FILE_ _, and the current package via_ _PACKAGE_ _. But no such symbol for the current subroutine name exists, let alone the name for the subroutine that called this one.The built-in functioncallerhandles all of these. In scalar context it returns the calling function's package name, but in list context it returns much more. You can also pass it a number indicating how many frames (nested subroutine calls) back you'd like information about: 0 is your own function, 1 is your caller, and so on.Here's the full syntax, where$iis how far back you're interested in:($package, $filename, $line, $subr, $has_args, $wantarray # 0 1 2 3 4 5 $evaltext, $is_require, $hints, $bitmask # 6 7 8 9 )= caller($i);
Here's what each of those return values means:-
$package -
The package in which the code was compiled.
-
$filename -
The name of the file in which the code was compiled, reporting
-eif launched from that command-line switch, or-if the script was read from standard input. -
$line
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Passing Arrays and Hashes by Reference
- InhaltsvorschauYou want to pass a function more than one array or hash and have each remain distinct. For example, you want to put the algorithm from Recipe 4.8 into a subroutine. This subroutine must then be called with two arrays that remain distinct.Pass arrays and hashes by reference, using the backslash operator:
array_diff( \@array1, \@array2 );
See Chapter 11 for more about manipulation of references. Here's a subroutine that expects array references, along with code to call it correctly:@a = (1, 2); @b = (5, 8); @c = add_vecpair( \@a, \@b ); print "@c\n"; 6 10 sub add_vecpair { # assumes both vectors the same length my ($x, $y) = @_; # copy in the array references my @result; for (my $i=0; $i < @$x; $i++) { $result[$i] = $x->[$i] + $y->[$i]; } return @result; }A potential problem with this function is that it doesn't verify the number and types of arguments passed into it. You could check explicitly this way:unless (@_ = = 2 && ref($x) eq 'ARRAY' && ref($y) eq 'ARRAY') { die "usage: add_vecpair ARRAYREF1 ARRAYREF2"; }If all you plan to do isdieon error (see Recipe 10.12), you can sometimes omit this check, since dereferencing the wrong kind of reference triggers an exception anyway. However, good defensive programming style encourages argument validation for all functions.The sections on "Passing References" and on "Prototypes" in Chapter 6 of Programming Perl and on "Pass by Reference" in perlsub(1); Recipe 10.11; Chapter 11; Chapter 8 of Programming PerlEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Detecting Return Context
- InhaltsvorschauYou want to know in which context your function was called. This lets one function do different things, depending on how its return value or values are used, just like many of Perl's built-in functions.Use the
wantarray( )function, which has three possible return values, depending on how the current function was called:if (wantarray( )) { # list context } elsif (defined wantarray( )) { # scalar context } else { # void context }Many built-in functions act differently when called in scalar context than they do when called in list context. A user-defined function can learn which context it was called in by checkingwantarray. List context is indicated by a true return value. Ifwantarrayreturns a value that is false but defined, then the function's return value will be used in scalar context. Ifwantarrayreturnsundef, your function isn't being asked to provide any value at all.if (wantarray( )) { print "In list context\n"; return @many_things; } elsif (defined wantarray( )) { print "In scalar context\n"; return $one_thing; } else { print "In void context\n"; return; # nothing } mysub( ); # void context $a = mysub( ); # scalar context if (mysub( )) { } # scalar context @a = mysub( ); # list context print mysub( ); # list contextThereturnandwantarrayfunctions in Chapter 29 of Programming Perl and in perlfunc(1)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Passing by Named Parameter
- InhaltsvorschauYou want to make a function with many parameters that are easy to call so that programmers remember what the arguments do, rather than having to memorize their order.Name each parameter in the call:
thefunc(INCREMENT => "20s", START => "+5m", FINISH => "+30m"); thefunc(START => "+5m", FINISH => "+30m"); thefunc(FINISH => "+30m"); thefunc(START => "+5m", INCREMENT => "15s");
Then in the subroutine, create a hash loaded up with default values plus the array of named pairs.sub thefunc { my %args = ( INCREMENT => '10s', FINISH => 0, START => 0, @_, # argument pair list goes here ); if ($args{INCREMENT} =~ /m$/ ) { ... } }Functions whose arguments require a particular order work well for short argument lists, but as the number of parameters increases, it's awkward to make some optional or have default values. You can only leave out trailing arguments, never initial ones.A more flexible approach allows the caller to supply arguments using name-value pairs. The first element of each pair is the argument name; the second, its value. This makes for self-documenting code because you can see the parameters' intended meanings without having to read the full function definition. Even better, programmers using your function no longer have to remember argument order, and they can leave unspecified any extraneous, unused arguments.This works by having the function declare a private hash variable to hold the default parameter values. Put the current arguments,@_, after the default values, so the actual arguments override the defaults because of the order of the values in the assignment.A common variation on this is to preface the parameter name with a hyphen, intended to evoke the feel of command-line parameters:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Skipping Selected Return Values
- InhaltsvorschauYou have a function that returns many values, but you only care about some of them. The
statfunction is a classic example: you often want only one value from its long return list (mode, for instance).Either assign to a list that hasundefin some positions:($a, undef, $c) = func( );
or else take a slice of the return list, selecting only what you want:($a, $c) = (func( ))[0,2];
Using dummy temporary variables is wasteful; plus it feels artificial and awkward:($dev,$ino,$DUMMY,$DUMMY,$uid) = stat($filename);
A nicer style is to useundefinstead of dummy variables to discard a value:($dev,$ino,undef,undef,$uid) = stat($filename);
Or you can take a slice, picking up just the values you care about:($dev,$ino,$uid,$gid) = (stat($filename))[0,1,4,5];
If you want to put an expression into list context and discard all of its return values (calling it simply for side effects), you can assign this to the empty list:( ) = some_function( );
This last strategy is rather like a list version of thescalaroperator—it calls the function in list context, even in a place it wouldn't otherwise do so. You can get just a count of return values this way:$count = ( ) = some_function( );
or you can call it in list context and make sure it returns some non-zero number of items (which you immediately discard) like this:if (( ) = some_function( )) { .... }If you hadn't assigned to the empty list, the Boolean context of theiftest would have called the function in scalar context.The section on "List Values and Arrays" in Chapter 2 of Programming Perl and perlsub(1); Recipe 3.1Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Returning More Than One Array or Hash
- InhaltsvorschauYou want a function to return more than one array or hash, but the return list flattens into just one long list of scalars.Return references to the hashes or arrays:
($array_ref, $hash_ref) = somefunc( ); sub somefunc { my @array; my %hash; # ... return ( \@array, \%hash ); }Just as all arguments collapse into one flat list of scalars, return values do, too. Functions that want to return multiple, distinct arrays or hashes need to return those by reference, and the caller must be prepared to receive references. If a function wants to return three separate hashes, for example, it should use one of the following:sub fn { ..... return (\%a, \%b, \%c); # or return \(%a, %b, %c); # same thing }The caller must expect a list of hash references returned by the function. It cannot just assign to three hashes.(%h0, %h1, %h2) = fn( ); # WRONG! @array_of_hashes = fn( ); # eg: $array_of_hashes[2]{"keystring"} ($r0, $r1, $r2) = fn( ); # eg: $r2->{"keystring"}The general discussions on references in Chapter 11, and in Chapter 8 of Programming Perl; Recipe 10.5Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Returning Failure
- InhaltsvorschauYou want to return a value indicating that your function failed.Use a bare
returnstatement without any argument, which returnsundefin scalar context and the empty list( )in list context.return;
Areturnwithout an argument means:sub empty_retval { return ( wantarray ? ( ) : undef ); }You can't use justreturnundef, because in list context you will get a list of one value:undef. If your caller says:if (@a = yourfunc( )) { ... }then the "error" condition will be perceived as true because@awill be assigned (undef) and then evaluated in scalar context. This yields1, the number of elements assigned to@a, which is true. You could use thewantarrayfunction to see what context you were called in, but a barereturnis a clear and tidy solution that always works:unless ($a = sfunc( )) { die "sfunc failed" } unless (@a = afunc( )) { die "afunc failed" } unless (%a = hfunc( )) { die "hfunc failed" }Some of Perl's built-in functions have a peculiar return value. Bothfcntlandioctlhave the curious habit of returning the string "0buttrue" in some circumstances. (This magic string is conveniently exempt from nagging warnings about improper numerical conversions.) This has the advantage of letting you write code like this:ioctl(....) or die "can't ioctl: $!";
That way, code doesn't have to check for a defined zero as distinct from the undefined value, as it would for thereadorglobfunctions. "0buttrue" is zero when used numerically. It's rare that this kind of return value is needed. A more common (and spectacular) way to indicate failure in a function is to raise an exception, as described in Recipe 10.12.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Prototyping Functions
- InhaltsvorschauYou want to use function prototypes so the compiler can check your argument types.Perl has something of a prototype facility, but it isn't what you're thinking. Perl's function prototypes are more like a context coercion used to write functions that behave like some Perl built-ins, such as
pushandpop.Manually checking the validity of a function's arguments can't happen until runtime. If you make sure the function is declared before it is used, you can tickle the compiler into using a very limited form of prototype checking. But don't confuse Perl's function prototypes with those found in any other language.A Perl function prototype is zero or more spaces, backslashes, or type characters enclosed in parentheses after the subroutine definition or name. A backslashed type symbol means that the argument is passed by reference, and the argument in that position must start with that type character.A prototype can impose context on the prototyped function's arguments. This is done when Perl compiles your program. But this does not always mean that Perl checks the number or type of arguments; since a scalar prototype is like inserting ascalarin front of just one argument, sometimes an implicit conversion occurs instead. For example, if Perl seesfunc(3,5)for a function prototyped assubfunc($), it will stop with a compile-time error. But if it seesfunc(@array)with the same prototype, it will merely put@arrayinto scalar context instead of complaining that you passed an array, but it wanted a scalar.This is so important that it bears repeating: don't use Perl prototypes expecting the compiler to check type and number of arguments for you. It does a little bit of that, sometimes, but mostly it's about helping you type less, and sometimes to emulate the calling and parsing conventions of built-in functions.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Handling Exceptions
- InhaltsvorschauHow do you safely call a function that might raise an exception? How do you create a function that raises an exception?Sometimes you encounter a problem so exceptional that merely returning an error isn't strong enough, because the caller could unintentionally ignore the error. Use
dieSTRINGfrom your function to trigger an exception:die "some message"; # raise exception
The caller can wrap the function call in anevalto intercept that exception, then consult the special variable$@to see what happened:eval { func( ) }; if ($@) { warn "func raised an exception: $@"; }Raising exceptions is not a facility to be used lightly. Most functions should return an error using a barereturnstatement. Wrapping every call in an exception trap is tedious and unsightly, removing the appeal of using exceptions in the first place.But, on rare occasions, failure in a function should cause the entire program to abort. Rather than calling the irrecoverableexitfunction, you should calldieinstead, which at least gives the programmer the chance to cope. If no exception handler has been installed viaeval, then the program aborts at that point.To detect this, wrap the call to the function with a blockeval. The$@variable will be set to the offending exception if one occurred; otherwise, it will be false.eval { $val = func( ) }; warn "func blew up: $@" if $@;Anyevalcatches all exceptions, not just specific ones. Usually you should propagate unexpected exceptions to an enclosing handler. For example, suppose your function raised an exception containing the string "Fullmoon!". You could safely trap that exception while letting others through by inspecting the$@variable. Callingdiewithout an argument uses the contents ofEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Saving Global Values
- InhaltsvorschauYou need to temporarily save away the value of a global variable.Use the
localoperator to save a previous global value, automatically restoring it when the current block exits:our $age = 18; # declare and set global variable if (CONDITION) { local $age = 23; func( ); # sees temporary value of 23 } # Perl restores the old value at block exitDespite its name, Perl'slocaloperator does not create a local variable. That's whatmydoes. Instead,localmerely preserves an existing value for the duration of its enclosing block. Hindsight shows that iflocalhad been called save_value instead, much confusion could have been avoided.Three places where you must uselocalinstead ofmyare:-
You need to give a global variable a temporary value, especially
$_. -
You need to create a local file or directory handle or a local function.
-
You want to temporarily change just one element of an array or hash.
Section 10.13.3.1: Using local( ) for temporary values for globals
The first situation is more apt to happen with predefined, built-in variables than with user variables. Often these are variables that Perl consults for hints for its high-level operations. In particular, any function that uses$_, implicitly or explicitly, should certainly have a local$_. This is annoyingly easy to forget to do. See Recipe 13.15 for one solution to this.Another common target forlocalis the$/variable, a global that implicitly affects the behavior of thereadlineoperator used in<FH>operations.$para = get_paragraph(*FH); # pass filehandle glob $para = get_paragraph(*FH); # pass filehandle by glob reference $para = get_paragraph(*IO{FH}); # pass filehandle by IO reference sub get_paragraph { my $fh = shift; local $/ = ''; my $paragraph = <$fh>; chomp($paragraph); return $paragraph; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Redefining a Function
- InhaltsvorschauYou want to temporarily or permanently redefine a function, but functions can't be assigned to.To redefine a function, assign a new code reference to the typeglob of the name of that function. Use
localif you want this redefinition to be temporary.undef &grow; # silence -w complaints of redefinition *grow = \&expand; grow( ); # calls expand( ) { local *grow = \&shrink; # only until this block exists grow( ); # calls shrink( ) }Unlike a variable (but like named filehandles, directory handles, and formats), a named function cannot be directly assigned to. It's just a name and doesn't vary. You can manipulate it almost as though it were a variable, because you can directly manipulate the runtime symbol table using typeglobs like*footo produce interesting aliasing effects.Assigning a reference to a typeglob changes what is accessed the next time a symbol of the referent's type is needed. This is what the Exporter does when you import a function or variable from one package into another. Since this is direct manipulation of the package symbol table, it works only on package variables (globals), not lexicals.*one::var = \%two::Table; # make %one::var alias for %two::Table *one::big = \&two::small; # make &one::big alias for &two::small
A typeglob is one of those things you can only uselocalon, notmy. If you do uselocal, the aliasing effect is then limited to the duration of the current block.local *fred = \&barney; # temporarily alias &fred to &barney
If the value assigned to a typeglob is not a reference but itself another typeglob, then all types by that name are aliased. The types aliased in a full typeglob assignment are scalar, array, hash, function, filehandle, directory handle, and format. That means that assigningEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Trapping Undefined Function Calls with AUTOLOAD
- InhaltsvorschauYou want to intercept calls to undefined functions so you can handle them gracefully.Declare a function called
AUTOLOADfor the package whose undefined function calls you'd like to trap. While running, that package's$AUTOLOADvariable contains the name of the undefined function being called.Another strategy for creating similar functions is to use a proxy function. If you call an undefined function, instead of automatically raising an exception, you can trap the call. If the function's package has a function namedAUTOLOAD, then this function is called in its place, with the special package global$AUTOLOADset to the package-qualified function name. TheAUTOLOADsubroutine can then do whatever that function would do.sub AUTOLOAD { my $color = our $AUTOLOAD; $color =~ s/.*:://; return "<FONT COLOR='$color'>@_</FONT>"; } #note: sub chartreuse isn't defined. print chartreuse("stuff");When the nonexistentmain::chartreusefunction is called, rather than raising an exception,main::AUTOLOADis called with the same arguments as you passedchartreuse. The package variable$AUTOLOADwould contain the stringmain::chartreusebecause that's the function it's proxying.The technique using typeglob assignments shown in Recipe 10.14 is faster and more flexible than usingAUTOLOAD. It's faster because you don't have to run the copy and substitute. It's more flexible because it lets you do this:{ local *yellow = \&violet; local (*red, *green) = (\&green, \&red); print_stuff( ); }Whileprint_stuff( )is running, including from within any functions it calls, anything printed in yellow will come out violet, and the red and green texts will exchange colors.Aliasing subroutines like this won't handle calls to undefined subroutines.AUTOLOADdoes.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Nesting Subroutines
- InhaltsvorschauYou want subroutines to nest, such that one subroutine is visible and callable only from another. When you try the obvious approach of nesting
subFOO{subBAR{}...}, Perl gives warnings about variables that will not stay shared.Instead of making the inner functions normal subroutines, make them closures and temporarily assign their references to the typeglob of the right name to create a localized function.If you use nested subroutines in other programming languages with their own private variables, you'll have to work at it a bit in Perl. The intuitive coding of this kind of thing gives the warning "will not stay shared." For example, this won't work:sub outer { my $x = $_[0] + 35; sub inner { return $x * 19 } # WRONG return $x + inner( ); }The following is a workaround:sub outer { my $x = $_[0] + 35; local *inner = sub { return $x * 19 }; return $x + inner( ); }Nowinner( )can be called only from withinouter( )because of the temporary assignments of the closure. Once called, it has normal access to the lexical variable$xfrom the scope ofouter( ).This essentially creates a function local to another function, something not directly supported in Perl; however, the programming isn't always clear.The sections on "Symbol Tables" in Chapter 10 in Programming Perl and in perlmod(1); the sections on "Closures" and "Symbol Table References" in Chapter 8 of Programming Perl and the discussion of closures in perlref(1); Recipe 10.13; Recipe 11.4Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a Switch Statement
- InhaltsvorschauYou want to write a multiway branch statement, much as you can in C using its
switchstatement or in the shell usingcase—but Perl seems to support neither.Use the Switch module, standard as of the v5.8 release of Perl.use Switch; switch ($value) { case 17 { print "number 17" } case "snipe" { print "a snipe" } case /[a-f]+/i { print "pattern matched" } case [1..10,42] { print "in the list" } case (@array) { print "in the array" } case (%hash) { print "in the hash" } else { print "no case applies" } }The Switch module extends Perl's basic syntax by providing a powerful and flexibleswitchconstruct. In fact, it's so powerful and flexible that instead of a complete description of how it works, we'll instead provide examples of some common uses. For the full story, make sure to consult the documentation that accompanies the module.Aswitchtakes an argument and a mandatory block, within which can occur any number ofcases. Each of thosecases also takes an argument and a mandatory block. The arguments to eachcasecan vary in type, allowing (among many other things) any or all of string, numeric, or regex comparisons against theswitch's value. When thecaseis an array or hash (or reference to the same), thecasematches if theswitchvalue corresponds to any of the array elements or hash keys. If nocasematches, a trailingelseblock will be executed.Unlike certain languages' multiway branching constructs, here once a validcaseis found and its block executed, control transfers out of the enclosingswitch. In other words, there's no implied fall-through behavior the way there is in C. This is considered desirable because even the best of programmers will occasionally forget about fall-through.However, this is Perl, so you can have your cake and eat it, too. Just use aEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: Sorting Your Mail
- InhaltsvorschauThe program in Example 10-1 sorts a mailbox by subject by reading input a paragraph at a time, looking for one with a "
From" at the start of a line. When it finds one, it searches for the subject, strips it of any "Re: " marks, and stores its lowercased version in the@subarray. Meanwhile, the messages themselves are stored in a corresponding@msgsarray. The$msgnovariable keeps track of the message number.Example 10-1. bysub1#!/usr/bin/perl # bysub1 - simple sort by subject my(@msgs, @sub); my $msgno = -1; $/ = ''; # paragraph reads while (<>) { if (/^From/m) { /^Subject:\s*(?:Re:\s*)*(.*)/mi; $sub[++$msgno] = lc($1) || ''; } $msgs[$msgno] .= $_; } for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) { print $msgs[$i]; }Thatsortis only sorting array indices. If the subjects are the same,cmpreturns 0, so the second part of the||is taken, which compares the message numbers in the order they originally appeared.Ifsortwere fed a list like(0,1,2,3), that list would get sorted into a different permutation, perhaps(2,1,3,0). We iterate across them with aforloop to print out each message.Example 10-2 shows how an awk programmer might code this program, using the -00 switch to read paragraphs instead of lines.Example 10-2. bysub2#!/usr/bin/perl -n00 # bysub2 - awkish sort-by-subject INIT { $msgno = -1 } $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m; $msg[$msgno] .= $_; END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }Perl programmers have used parallel arrays like this since Perl 1. Keeping each message in a hash is a more elegant solution, though. We'll sort on each field in the hash, by making an anonymous hash as described in Chapter 11.Example 10-3 is a program similar in spirit to Example 10-1 and Example 10-2.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 11: References and Records
- InhaltsvorschauWith as little a web as this will I ensnare as great a fly as Cassio.—Shakespeare, Othello, Act II, scene iPerl provides three fundamental data types: scalars, arrays, and hashes. It's certainly possible to write many programs without complex records, but most programs need something more sophisticated than simple variables and lists.Perl's three built-in types combine with references to produce arbitrarily complex and powerful data structures. Selecting the proper data structure and algorithm can make the difference between an elegant program that does its job quickly and an ungainly concoction that's glacially slow to execute and consumes system resources voraciously.The first part of this chapter shows how to create and use plain references. The second part shows how to create higher-order data structures out of references.To grasp the concept of references, you must first understand how Perl stores values in variables. Each defined variable has associated with it a name and the address of a chunk of memory. This idea of storing addresses is fundamental to references because a reference is a value that holds the location of another value. The scalar value that contains the memory address is called a reference. Whatever value lives at that memory address is called its referent. See Figure 11-1.
Figure 11-1: Reference and referentThe referent could be any built-in type (scalar, array, hash, ref, code, or glob) or a user-defined type based on one of the built-ins.Referents in Perl are typed. This means, for example, that you can't treat a reference to an array as though it were a reference to a hash. Attempting to do so raises a runtime exception. No mechanism for type casting exists in Perl. This is considered a feature.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauPerl provides three fundamental data types: scalars, arrays, and hashes. It's certainly possible to write many programs without complex records, but most programs need something more sophisticated than simple variables and lists.Perl's three built-in types combine with references to produce arbitrarily complex and powerful data structures. Selecting the proper data structure and algorithm can make the difference between an elegant program that does its job quickly and an ungainly concoction that's glacially slow to execute and consumes system resources voraciously.The first part of this chapter shows how to create and use plain references. The second part shows how to create higher-order data structures out of references.To grasp the concept of references, you must first understand how Perl stores values in variables. Each defined variable has associated with it a name and the address of a chunk of memory. This idea of storing addresses is fundamental to references because a reference is a value that holds the location of another value. The scalar value that contains the memory address is called a reference. Whatever value lives at that memory address is called its referent. See Figure 11-1.
Figure 11-1: Reference and referentThe referent could be any built-in type (scalar, array, hash, ref, code, or glob) or a user-defined type based on one of the built-ins.Referents in Perl are typed. This means, for example, that you can't treat a reference to an array as though it were a reference to a hash. Attempting to do so raises a runtime exception. No mechanism for type casting exists in Perl. This is considered a feature.So far, it may look as though a reference were little more than a raw address with strong typing. But it's far more than that. Perl takes care of automatic memory allocation and deallocation (garbage collection) for references, just as it does for everything else. Every chunk of memory in Perl has aEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Taking References to Arrays
- InhaltsvorschauYou need to manipulate an array by reference.To get a reference to an array:
$aref = \@array; $anon_array = [1, 3, 5, 7, 9]; $anon_copy = [ @array ]; @$implicit_creation = (2, 4, 6, 8, 10);
To deference an array reference, precede it with an at sign (@):push(@$anon_array, 11);
Or use a pointer arrow plus a bracketed subscript for a particular element:$two = $implicit_creation->[0];
To get the last index number by reference, or the number of items in that referenced array:$last_idx = $#$aref; $num_items = @$aref;
Or defensively embracing and forcing context:$last_idx = $#{ $aref }; $num_items = scalar @{ $aref };Here are array references in action:# check whether $someref contains a simple array reference if (ref($someref) ne "ARRAY") { die "Expected an array reference, not $someref\n"; } print "@{$array_ref}\n"; # print original data @order = sort @{ $array_ref }; # sort it push @{ $array_ref }, $item; # append new element to orig arrayIf you can't decide whether to use a reference to a named array or to create a new one, here's a simplistic guideline that will prove right more often than not. Only take a reference to an existing array to return the reference out of scope, thereby creating an anonymous array, or to pass the array by reference to a function. For virtually all other cases, use[@array]to create a new array reference with a copy of the old values.Automatic reference counting and the backslash operator make a powerful combination:sub array_ref { my @array; return \@array; } $aref1 = array_ref( ); $aref2 = array_ref( );Each timearray_refis called, the function allocates a new piece of memory forEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making Hashes of Arrays
- InhaltsvorschauFor each key in a hash, only one scalar value is allowed, but you'd like to use one key to store and retrieve multiple values. That is, you'd like the value to produce a list.Use references to arrays as the hash values. Use
pushto append:push(@{ $hash{"KEYNAME"} }, "new value");Then, dereference the value as an array reference when printing out the hash:foreach $string (keys %hash) { print "$string: @{$hash{$string}}\n"; }You can only store scalar values in a hash. References, however, are scalars. This solves the problem of storing multiple values for one key by making$hash{$key}a reference to an array containing the values for$key. Normal hash operations acting on individual scalar values (insertion, deletion, iteration, and testing for existence) are now written with array operations acting on lists of values (likepush,splice, andforeach).Here's how to give a key many values:$hash{"a key"} = [ 3, 4, 5 ]; # anonymous arrayOnce you have a key with many values, here's how to use them:@values = @{ $hash{"a key"} };To append a new value to the array of values associated with a particular key, usepush:push @{ $hash{"a key"} }, $value;One common application of such data structures is inverting a hash that may have several keys with the same associated value. When inverted, you end up with a hash that has many values for the same key. This is addressed in Recipe 5.9.Be warned that this:@residents = @{ $phone2name{$number} };causes a runtime exception underusestrictbecause you're dereferencing an undefined reference where autovivification won't occur. You must do this instead:@residents = exists( $phone2name{$number} ) ? @{ $phone2name{$number} } : ( );Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Taking References to Hashes
- InhaltsvorschauYou need to manipulate a hash by reference. This might be because it was passed into a function that way or because it's part of a larger data structure.To get a hash reference:
$href = \%hash; $anon_hash = { "key1" => "value1", "key2" => "value2", ... }; $anon_hash_copy = { %hash };To dereference a hash reference:%hash = %$href; $value = $href->{$key}; @slice = @$href{$key1, $key2, $key3}; # note: no arrow! @keys = keys %$href;To check whether something is a hash reference:if (ref($someref) ne "HASH") { die "Expected a hash reference, not $someref\n"; }This example prints out all keys and values from two predefined hashes:foreach $href ( \%ENV, \%INC ) { # OR: for $href ( \(%ENV,%INC) ) { foreach $key ( keys %$href ) { print "$key => $href->{$key}\n"; } }Access slices of hashes by reference as you'd access slices of arrays by reference. For example:@values = @$hash_ref{"key1", "key2", "key3"}; for $val (@$hash_ref{"key1", "key2", "key3"}) { $val += 7; # add 7 to each value in hash slice }The Introductionin Chapter 5; Chapter 8 of Programming Perl; perlref(1); Recipe 11.9Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Taking References to Functions
- InhaltsvorschauYou need to manipulate a subroutine by reference. This might happen if you need to create a signal handler, a Tk callback, or a hash of function pointers.To get a code reference:
$cref = \&func; $cref = sub { ... };To call a code reference:@returned = $cref->(@arguments); @returned = &$cref(@arguments);
If the name of a function isfunc, you can produce a reference to it by prefixing its name with\&. You can also dynamically allocate anonymous functions using thesub{ }notation. These code references can be stored just like any other reference.It is possible to store the name of a function in a variable, such as:$funcname = "thefunc"; &$funcname( );
but that's not a very good solution for several reasons. First, it uses symbolic references, not real (hard) references, and so is forbidden under theusestrict"refs" pragma. Symbolic references to variables are usually a bad idea, since they can't access lexical variables, only globals, and aren't reference counted. Second, as written it doesn't include package information, so if executed in a different package, it would try to call the wrong function. Finally, in the odd case that the function were redefined at some point, the symbolic reference would get whatever the current definition for the function was, whereas the hard reference would still access the old definition.Instead of placing the name of the function in the variable, use the backslash operator to create a reference to the function. This is the normal way to store a function in a variable or pass along to another function. You can mix and match references to named functions with references to unnamed ones:my %commands = ( "happy" => \&joy, "sad" => \&sullen, "done" => sub { die "See ya!" }, "mad" => \&angry, ); print "How are you? "; chomp($string = <STDIN>); if ($commands{$string}) { $commands{$string}->( ); } else { print "No such command: $string\n"; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Taking References to Scalars
- InhaltsvorschauYou want to create and manipulate a reference to a scalar value.To create a reference to a scalar variable, use the backslash operator:
$scalar_ref = \$scalar; # get reference to named scalar
To create a reference to an anonymous scalar value (a value that isn't in a variable), assign to a dereferenced undefined variable:undef $anon_scalar_ref; $$anon_scalar_ref = 15;
This creates a reference to a constant scalar:$anon_scalar_ref = \15;
Use${...}to dereference:print ${ $scalar_ref }; # dereference it ${ $scalar_ref } .= "string"; # alter referent's valueIf you want to create many new anonymous scalars, use a subroutine that returns a reference to a lexical variable out of scope, as explained in this chapter's Introduction:sub new_anon_scalar { my $temp; return \$temp; }Dereference a scalar reference by prefacing it with$to get at its contents:$sref = new_anon_scalar( ); $$sref = 3; print "Three = $$sref\n"; @array_of_srefs = ( new_anon_scalar( ), new_anon_scalar( ) ); ${ $array[0] } = 6.02e23; ${ $array[1] } = "avocado"; print "\@array contains: ", join(", ", map { $$_ } @array ), "\n";Notice we put braces around$array[0]and$array[1]. If we tried to say$$array[0], the tight binding of dereferencing would turn it into$array->[0]. It would treat$arrayas an array reference and return the element at index zero.Here are other examples where it is safe to omit the braces:$var = `uptime`; # $var holds text $vref = \$var; # $vref "points to" $var if ($$vref =~ /load/) { } # look at $var, indirectly chomp $$vref; # alter $var, indirectlyAs mentioned in the Introduction, you may use therefbuilt-in to inspect a reference for its referent's type. CallingEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Creating Arrays of Scalar References
- InhaltsvorschauYou want to create and manipulate an array of references to scalars. This arises when you pass variables by reference to a function so the function can change their values.To create an array, either backslash each scalar in the list to store in the array:
@array_of_scalar_refs = ( \$a, \$b );
or simply backslash the entire list, taking advantage of the backslash operator's distributive property:@array_of_scalar_refs = \( $a, $b );
To get or set the value of an element of the list, use${...}:${ $array_of_scalar_refs[1] } = 12; # $b = 12In the following examples,@arrayis a simple array containing references to scalars (an array of references is not a reference to an array). To access the original data indirectly, braces are mandatory.($a, $b, $c, $d) = (1 .. 4); # initialize @array = (\$a, \$b, \$c, \$d); # refs to each scalar @array = \( $a, $b, $c, $d); # same thing! @array = map { \my $anon } 0 .. 3; # allocate 4 anon scalar refs ${ $array[2] } += 9; # $c now 12 ${ $array[ $#array ] } *= 5; # $d now 20 ${ $array[-1] } *= 5; # same; $d now 100 $tmp = $array[-1]; # using temporary $$tmp *= 5; # $d now 500The two assignments to@arrayare equivalent—the backslash operator is distributive across a list. So preceding a list (including a slice or a function's return list, but not an array) with a backslash is the same as applying a backslash to everything in that list. The ensuing code changes the values of the variables whose references were stored in the array.Here's how to deal with such an array without explicit indexing:use Math::Trig qw(pi); # load the constant pi foreach $sref (@array) { # prepare to change $a,$b,$c,$d ($$sref **= 3) *= (4/3 * pi); # replace with spherical volumes }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using Closures Instead of Objects
- InhaltsvorschauYou want records with private state, behavior, and identity, but you don't want to learn object-oriented programming to accomplish this.Write a function that returns (by reference) a hash of code references. These code references are closures created in the same scope, so when they execute, they'll share bindings to the same private variables.Because a closure is a binding of code and data, it can implement what might be thought of as an object.Here's an example that creates and returns a hash of anonymous functions.
mkcountertakes an argument of a seed counter and returns a reference to a hash of code references that you can use to manipulate the counter indirectly.$c1 = mkcounter(20); $c2 = mkcounter(77); printf "next c1: %d\n", $c1->{NEXT}->( ); # 21 printf "next c2: %d\n", $c2->{NEXT}->( ); # 78 printf "next c1: %d\n", $c1->{NEXT}->( ); # 22 printf "last c1: %d\n", $c1->{PREV}->( ); # 21 printf "old c2: %d\n", $c2->{RESET}->( ); # 77The code values in the hash references in$c1and$c2maintain their own separate state. Here's how to set that up:sub mkcounter { my $count = shift; my $start = $count; my $bundle = { "NEXT" => sub { return ++$count }, "PREV" => sub { return --$count }, "GET" => sub { return $count }, "SET" => sub { $count = shift }, "BUMP" => sub { $count += shift }, "RESET" => sub { $count = $start }, }; $bundle->{"LAST"} = $bundle->{"PREV"}; return $bundle; }Because the lexical variables used by the closures in the$bundlehash reference are returned by the function, they are not deallocated. The next timemkcounteris called, the closures get a different set of variable bindings for the same code. Because no one outside those closures can access these two variables, this assures true privacy.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Creating References to Methods
- InhaltsvorschauYou want to store a reference to a method.Create a closure that makes the proper method call on the appropriate object.When you ask for a reference to a method, you're asking for more than just a raw function pointer. You also need to record which object the method needs to be called upon as the object contains the data the method will work with. The best way to do this is using a closure. Assuming
$objis lexically scoped, you can say:$mref = sub { $obj->meth(@_) }; # later... $mref->("args", "go", "here");Even when$objgoes out of scope, the closure stored in$mrefhas captured it. Later when it's called indirectly, the correct object is used for the method call.Be aware that the notation:$sref = \$obj->meth;
doesn't do what you probably expected. It first calls the method on that object and gives you either a reference to the return value or a reference to the last of the return values if the method returns a list.Thecanmethod from the UNIVERSAL base class, while appealing, is also unlikely to produce what you want.$cref = $obj->can("meth");This produces a code ref to the appropriate method (should one be found), but one that carries no object information. Think of it as a raw function pointer. The information about the object is lost. That's why you need a closure to capture both the object state as well as the method to call.The discussion on methods in the Introduction to Chapter 13; the section on "Closures" in Chapter 8 of Programming Perl; Recipe 11.7; Recipe 13.8Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Constructing Records
- InhaltsvorschauYou want to create a record data type.Use a reference to an anonymous hash.Suppose you wanted to create a data type that contained various data fields. The easiest way is to use an anonymous hash. For example, here's how to initialize and use that record:
$record = { NAME => "Jason", EMPNO => 132, TITLE => "deputy peon", AGE => 23, SALARY => 37_000, PALS => [ "Norbert", "Rhys", "Phineas"], }; printf "I am %s, and my pals are %s.\n", $record->{NAME}, join(", ", @{$record->{PALS}});Just having one of these records isn't much fun—you'd like to build larger structures. For example, you might want to create a%bynamehash that you could initialize and use this way:# store record $byname{ $record->{NAME} } = $record; # later on, look up by name if ($rp = $byname{"Aron"}) { # false if missing printf "Aron is employee %d.\n", $rp->{EMPNO}; } # give jason a new pal push @{$byname{"Jason"}->{PALS}}, "Theodore"; printf "Jason now has %d pals\n", scalar @{$byname{"Jason"}->{PALS}};That makes%bynamea hash of hashes because its values are hash references. Looking up employees by name would be easy using such a structure. If we find a value in the hash, we store a reference to the record in a temporary variable,$rp, which we then use to get any field we want.We can use our existing hash tools to manipulate%byname. For instance, we could use theeachiterator to loop through it in an arbitrary order:# Go through all records while (($name, $record) = each %byname) { printf "%s is employee number %d\n", $name, $record->{EMPNO}; }What about looking employees up by employee number? Just build and use another data structure, an array of hashes called@employees. If your employee numbers aren't consecutive (for instance, they jump from 1 to 159997) an array would be a bad choice. Instead, you should use a hash mapping employee number to record. For consecutive employee numbers, use an array:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading and Writing Hash Records to Text Files
- InhaltsvorschauYou want to read or write hash records stored in text files.Use a simple file format with one field per line:
FieldName: Value
and separate records with blank lines.If you have an array of records that you'd like to store into and retrieve from a text file, you can use a simple format based on mail headers. The format's simplicity requires that the keys have neither colons nor newlines, and the values not have newlines.This code writes them out:foreach $record (@Array_of_Records) { for $key (sort keys %$record) { print "$key: $record->{$key}\n"; } print "\n"; }Reading them in is easy, too.$/ = ""; # paragraph read mode while (<>) { my @fields = split /^([^:]+):\s*/m; shift @fields; # for leading null field push(@Array_of_Records, { map /(.*)/, @fields }); }Thesplitacts upon$_, its default second argument, which contains a full paragraph. The pattern looks for start of line (not just start of record, thanks to the/m) followed by one or more non-colons, followed by a colon and optional whitespace. Whensplit's pattern contains parentheses, these are returned along with the values. The return values placed in@fieldsare in key-value order, with a leading null field we shift off. The braces in the call topushproduce a reference to a new anonymous hash, which we copy@fieldsinto. Since that array was stored in order of the needed key-value pairing, this makes for well-ordered hash contents.All you're doing is reading and writing a plain text file, so you can use related recipes for additional components. You could use Recipe 7.18 to ensure that you have clean, concurrent access; Recipe 1.18 to store colons and newlines in keys and values; and Recipe 11.3 to store more complex structures.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Printing Data Structures
- InhaltsvorschauYou want to print out a data structure.If the output's legibility and layout are important, write your own custom printing routine.If you are in the Perl debugger, use the
xcommand:DB<1> $reference = [ { "foo" => "bar" }, 3, sub { print "hello, world\n" } ]; DB<2> x $reference 0 ARRAY(0x1d033c) 0 HASH(0x7b390) foo' = 'bar 1 3 2 CODE(0x21e3e4) -> &main::_ _ANON_ _[(eval 15)[/usr/local/...perl5db.pl:17]:2] in (eval 15)[/usr/local/.../perl5db.pl:17]:2-2From within your own programs, use theDumperfunction from the standard module Data::Dumper:use Data::Dumper; print Dumper($reference);
Or if you'd like output formatted in the same style as the Debugger uses:use Dumpvalue; Dumpvalue->new->dumpValue($reference);
Sometimes you'll want to make a dedicated function for your data structure that delivers a particular output format, but often this is overkill. If you're running under the Perl debugger, thexandXcommands provide nice pretty-printing. Thexcommand is more useful because it works on both global and lexical variables, whereasXworks only on globals. Passxa reference to the data structure you want to print.DB<3> x @INC 0 ARRAY(0x807d0a8) 0 '/home/tchrist/perllib 1 '/usr/lib/perl5/i686-linux/5.00403 2 '/usr/lib/perl5 3 '/usr/lib/perl5/site_perl/i686-linux 4 '/usr/lib/perl5/site_perl 5 '.
The standard Dumpvalue module provides the Debugger's output formatting using an object-oriented interface. Here's an example:use Dumpvalue; Dumpvalue->new->dumpvars("main", "INC"); @INC = (Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Copying Data Structures
- InhaltsvorschauYou need to copy a complex data structure.Use the
dclonefunction from the standard Storable module:use Storable; $r2 = dclone($r1);
Two types of "copy" are sometimes confused. A surface copy (also known as shallow copy) simply copies references without creating copies of the data behind them:@original = ( \@a, \@b, \@c ); @surface = @original;
A deep copy creates an entirely new structure with no overlapping references. This copies references to one layer deep:@deep = map { [ @$_ ] } @original;If@a,@b, and@cthemselves contain references, the precedingmapis no longer adequate. Writing your own code to deep-copy structures is laborious and rapidly becomes tiresome.The Storable module provides a function calleddclonethat recursively copies its argument:use Storable qw(dclone); $r2 = dclone($r1);
This only works on references or blessed objects of type SCALAR, ARRAY, HASH, or CODE; references of type GLOB, IO, and the more esoteric types are not supported. ThesafeFreezefunction from the FreezeThaw module supports even these types when used in the same address space by using a reference cache that could interfere with garbage collection and object destructors under some circumstances.Becausedclonetakes and returns references, you must add extra punctuation if you have a hash or arrays to copy:%newhash = %{ dclone(\%oldhash) };The documentation for the standard Storable and Data::Dumper modules, and for the FreezeThaw CPAN module.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Storing Data Structures to Disk
- InhaltsvorschauYou want to save your large, complex data structure to disk so you don't have to reconstruct it from scratch each time your program runs.Use the Storable module's
storeandretrievefunctions:use Storable; store(\%hash, "filename"); # later on... $href = retrieve("filename"); # by ref %hash = %{ retrieve("filename") }; # direct to hashThe Storable module uses C functions and a binary format to walk Perl's internal data structures and lay out its data. It's more efficient than a pure Perl and string-based approach, but it's also more fragile.Thestoreandretrievefunctions expect binary data using the machine's own byte-ordering. This means files created with these functions cannot be shared across different architectures.nstoredoes the same jobstoredoes, but keeps data in canonical (network) byte order, at a slight speed cost:use Storable qw(nstore); nstore(\%hash, "filename"); # later ... $href = retrieve("filename");No matter whetherstoreornstorewas used, you need to call the sameretrieveroutine to restore the objects in memory. The producer must commit to portability, but the consumer doesn't have to. Code need only be changed in one place when the producer has a change of heart; the code thus offers a consistent interface for the consumer, who does not need to know or care.Thestoreandnstorefunctions don't lock the files they work on. If you're worried about concurrent access, open the file yourself, lock it using Recipe 7.18, and then usestore_fdor its slower but machine-independent versionnstore_fd.Here's code to save a hash to a file, with locking. We don't open with theO_TRUNCflag because we have to wait to get the lock before we can clobber the file.use Storable qw(nstore_fd); use Fcntl qw(:DEFAULT :flock); sysopen(DF, "/tmp/datafile", O_RDWR|O_CREAT, 0666) or die "can't open /tmp/datafile: $!"; flock(DF, LOCK_EX) or die "can't lock /tmp/datafile: $!"; nstore_fd(\%hash, *DF) or die "can't store hash\n"; truncate(DF, tell(DF)); close(DF);Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Transparently Persistent Data Structures
- InhaltsvorschauYou have a complex data structure that you want to persist outside your program.Use MLDBM and either (preferably) DB_File or else GDBM_File:
use MLDBM qw(DB_File); use Fcntl; tie(%hash, "MLDBM", "testfile.db", O_CREAT|O_RDWR, 0666) or die "can't open tie to testfile.db: $!"; # ... act on %hash untie %hash;A hash with 100,000 items in it would undoubtably take considerable time to build. Storing this to disk, either slowly by hand or quickly with Storable, is still an expensive operation in memory and computation.The DBM modules solve this by tying hashes to disk database files. Rather than reading the whole structure in at once, they only pull in what they need, when they need it. To the user, it looks like a hash that persists across program invocations.Unfortunately, the values in this persistent hash must be plain strings. You cannot readily use a database file as a backing store for a hash of hashes, a hash of arrays, and so on—just for a hash of strings.However, the MLDBM module from CPAN allows you to store references in a database. It uses Data::Dumper to stringify these references for external storage:use MLDBM qw(DB_File); use Fcntl; tie(%hash, "MLDBM", "testfile.db", O_CREAT|O_RDWR, 0666) or die "can't open tie to testfile.db: $!";Now you can use%hashto fetch or store complex records from disk. The only drawback is that you can't access the references piecemeal. You have to pull in the reference from the database, work with it, and then store it back.# this doesn't work! $hash{"some key"}[4] = "fred"; # RIGHT $aref = $hash{"some key"}; $aref->[4] = "fred"; $hash{"some key"} = $aref;Recipe 11.13Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Coping with Circular Data Structures Using Weak References
- InhaltsvorschauYou have an inherently self-referential data structure, so Perl's reference-based garbage collection system won't notice when that structure is no longer being used. You want to prevent your program from leaking memory.Convert all internal references within the data structure into weak references so they don't increment the reference count.Perl's memory management system relies on an underlying reference count to know when to reclaim memory. In practice, this works fairly well except for one particular situation: when a variable directly or indirectly points at itself. Consider:
{ my ($a, $b); ($a, $b) = \($b, $a); # same as (\$b, \$a); }The two underlying scalars that$aand$brepresent each start out with a reference count of one apiece in the first line of the block. In the second line, those scalars are each initialized to contain a reference to the other variable;$apoints to$band vice versa. Saving a reference increments the underlying reference count on the scalars, so now both refcounts are set to two. As the block exits and those lexical variables become unreachable (by name), both refcounts are decremented by one, leaving one in each—forever. Since the refcounts can never reach zero, memory used by those two underlying scalars will never be reclaimed. You'll leak two scalars every time that block executes; if it's a loop or a subroutine, you could eventually run out of memory.The standard Devel::Peek module'sDumpfunction shows you underlying reference counts, plus a whole lot more. This code:use Devel::Peek; $a = 42; $b = \$a; Dump $a;
produces this output:SV = IV(0xd7cc4) at 0xd72b8 REFCNT = 2 FLAGS = (IOK,pIOK) IV = 42
The important thing to notice there is that the refcount is two. That's because the scalar can be reached two ways: once via the variable namedEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: Outlines
- InhaltsvorschauOutlines are a simple (and thus popular) way of structuring data. The hierarchy of detail implied by an outline maps naturally to our top-down way of thinking about the world. The only problem is that it's not obvious how to represent outlined data as a Perl data structure.Take, for example, this simple outline of some musical genres:
Alternative .Punk ..Emo ..Folk Punk .Goth ..Goth Rock ..Glam Goth Country .Old Time .Bluegrass .Big Hats Rock .80s ..Big Hair ..New Wave .60s ..British ..American
Here we use a period to indicate a subgroup. There are many different formats in which that outline could be output. For example, you might write the genres out in full:Alternative Alternative - Punk Alternative - Punk - Emo Alternative - Punk - Folk Punk Alternative - Goth ...
You might number the sections:1 Alternative 1.1 Punk 1.1.1 Emo 1.1.2 Folk Punk 1.2 Goth ...
or alphabetize:Alternative Alternative - Goth Alternative - Goth - Glam Goth Alternative - Goth - Goth Rock Alternative - Punk Alternative - Punk - Emo ...
or show inheritance:Alternative Punk - Alternative Emo - Punk - Alternative Folk Punk - Punk - Alternative Goth - Alternative Goth Rock - Goth - Alternative ...
These transformations are all much easier than it might seem. The trick is to represent the levels of the hierarchy as elements in an array. For example, you'd represent the third entry in the sample outline as:@array = ("Alternative", "Goth", "Glam Goth");Now reformatting the entry is trivial. There's an elegant way to parse the input file to get this array representation:while (<FH>) { chomp; $tag[$in = s/\G\.//g] = $_; # do something with @tag[0..$in] }The substitution deletes leading periods from the current entry, returning how many it deleted. This number indicates the indentation level of the current entry.Alphabetizing is now simple using the Unixsortprogram:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: Binary Trees
- InhaltsvorschauBecause Perl's built-in data types are already powerful, high-level, dynamic data types in their own right, most code can use what's already provided. If you just want quick lookups, you nearly always want to use a simple hash. As Larry has said, "The trick is to use Perl's strengths rather than its weaknesses."However, hashes provide no inherent ordering. To traverse the hash in a particular order, you must first extract its keys and then sort them. If you find yourself doing so many times, performance will suffer, but probably not enough to justify the time required to craft a fancy algorithm.A tree structure provides ordered traversals. How do you write a tree in Perl? First, you grab one of your favorite textbooks on data structures; the authors recommend Cormen et al., as mentioned in Other Books in the Preface. Using an anonymous hash to represent each node in the tree, translate the algorithms in the book into Perl. This is usually much more straightforward than you would imagine.The program code in Example 11-1 demonstrates an ordered binary tree implementation using anonymous hashes. Each node has three fields: a left child, a right child, and a value. The crucial property of an ordered binary tree is that at every node, all left children have values that are less than the current node value, and all right children have values that are greater.The main program does three things. First, it creates a tree with 20 random nodes. Then it shows the in-order, pre-order, and post-order traversals of that tree. Finally, it allows the user to enter a key and reports whether that key is in the tree.The
insertfunction takes advantage of Perl's implicit pass-by-reference behavior on scalars to initialize an empty tree when asked to insert into an empty node. The assignment of the new node back to$_[0]alters the value in its caller.Although this data structure takes much more memory than a simple hash and the lookups are slower, the ordered traversals themselves are faster.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 12: Packages, Libraries, and Modules
- InhaltsvorschauLike all those possessing a library, Aurelian was aware that he was guilty of not knowing his in its entirety.—Jorge Luis Borges, The TheologiansImagine that you have two separate programs, both of which work fine by themselves, and you decide to make a third program that combines the best features from the first two. You copy both programs into a new file or cut and paste selected pieces. You find that the two programs had variables and functions with the same names that should remain separate. For example, both might have an
initfunction or a global$countvariable. When merged into one program, these separate parts would interfere with each other.The solution to this problem is packages. Perl uses packages to partition the global namespace. The package is the basis for both traditional modules and object-oriented classes. Just as directories contain files, packages contain identifiers. Every global identifier (variables, functions, file and directory handles, and formats) has two parts: its package name and the identifier proper. These two pieces are separated from one another with a double colon. For example, the variable$CGI::needs_binmodeis a global variable named$needs_binmode, which resides in packageCGI.Where the filesystem uses slashes to separate the directory from the filename, Perl uses a double colon.$Names::startupis the variable named$startupin the packageNames, whereas$Dates::startupis the$startupvariable in packageDates. Saying$startupby itself without a package name means the global variable$startupin the current package. (This assumes that no lexical$startupvariable is currently visible. Lexical variables are explained in Chapter 10.) When looking at an unqualified variable name, a lexical takes precedence over a global. Lexicals live in scopes; globals live in packages. If you really want the global instead, you need to fully qualify it.packageis a compile-time declaration that sets the default package prefix for unqualified global identifiers, much asEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauImagine that you have two separate programs, both of which work fine by themselves, and you decide to make a third program that combines the best features from the first two. You copy both programs into a new file or cut and paste selected pieces. You find that the two programs had variables and functions with the same names that should remain separate. For example, both might have an
initfunction or a global$countvariable. When merged into one program, these separate parts would interfere with each other.The solution to this problem is packages. Perl uses packages to partition the global namespace. The package is the basis for both traditional modules and object-oriented classes. Just as directories contain files, packages contain identifiers. Every global identifier (variables, functions, file and directory handles, and formats) has two parts: its package name and the identifier proper. These two pieces are separated from one another with a double colon. For example, the variable$CGI::needs_binmodeis a global variable named$needs_binmode, which resides in packageCGI.Where the filesystem uses slashes to separate the directory from the filename, Perl uses a double colon.$Names::startupis the variable named$startupin the packageNames, whereas$Dates::startupis the$startupvariable in packageDates. Saying$startupby itself without a package name means the global variable$startupin the current package. (This assumes that no lexical$startupvariable is currently visible. Lexical variables are explained in Chapter 10.) When looking at an unqualified variable name, a lexical takes precedence over a global. Lexicals live in scopes; globals live in packages. If you really want the global instead, you need to fully qualify it.packageis a compile-time declaration that sets the default package prefix for unqualified global identifiers, much aschdirsets the default directory prefix for relative pathnames. This effect lasts until the end of the current scope (a brace-enclosed block, file, orEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Defining a Module's Interface
- InhaltsvorschauYou want the standard Exporter module to define the external interface to your module.In module file YourModule.pm, place the following code. Fill in the ellipses as explained in the Discussion section.
package YourModule; use strict; our (@ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS, $VERSION); use Exporter; $VERSION = 1.00; # Or higher @ISA = qw(Exporter); @EXPORT = qw(...); # Symbols to autoexport (:DEFAULT tag) @EXPORT_OK = qw(...); # Symbols to export on request %EXPORT_TAGS = ( # Define names for sets of symbols TAG1 => [...], TAG2 => [...], ... ); ######################## # your code goes here ######################## 1; # this should be your last lineIn other files where you want to use YourModule, choose one of these lines:use YourModule; # Import default symbols into my package use YourModule qw(...); # Import listed symbols into my package use YourModule ( ); # Do not import any symbols use YourModule qw(:TAG1); # Import whole tag set
The standard Exporter module handles the module's external interface. Although you could define your ownimportmethod for your package, almost no one does this.When someone saysuseYourModule, this does arequire"YourModule.pm" statement followed aYourModule->import( )method call, both during compile time. Theimportmethod inherited from the Exporter package looks for global variables in your package to govern its behavior. Because they must be package globals, we've declared them withourto satisfyusestrict. These variables are:-
$VERSION -
When a module is loaded, a minimal required version number can be supplied. If the version isn't at least this high, the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Trapping Errors in require or use
- InhaltsvorschauYou need to load in a module that might not be present on your system. This normally results in a fatal exception. You want to detect and trap these failures.Wrap the
requireorusein aneval, and wrap theevalin a BEGIN block:# no import BEGIN { unless (eval "require $mod; 1") { warn "couldn't require $mod: $@"; } } # imports into current package BEGIN { unless (eval "use $mod; 1") { warn "couldn't use $mod: $@"; } }You usually want a program to fail if it tries to load a module that is missing or doesn't compile. Sometimes, though, you'd like to recover from that error, perhaps trying an alternative module instead. As with any other exception, you insulate yourself from compilation errors with aneval.You don't want to useeval{BLOCK}, because this traps only runtime exceptions, anduseis a compile-time event. Instead, you must useeval"string" to catch compile-time problems as well. Remember,requireon a bareword has a slightly different meaning thanrequireon a variable. It adds a ".pm" and translates double-colons into your operating system's path separators, canonically/(as in URLs), but sometimes\, :, or even . on some systems.If you need to try several modules in succession, stopping at the first one that works, you could do something like this:BEGIN { my($found, @DBs, $mod); $found = 0; @DBs = qw(Giant::Eenie Giant::Meanie Mouse::Mynie Moe); for $mod (@DBs) { if (eval "require $mod") { $mod->import( ); # if needed $found = 1; last; } } die "None of @DBs loaded" unless $found; }We wrap theevalin a BEGIN block to ensure the module-loading happens at compile time instead of runtime.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Delaying use Until Runtime
- InhaltsvorschauYou have a module that you don't need to load each time the program runs, or whose inclusion you wish to delay until after the program starts up.Either break up the
useinto its separaterequireandimportcomponents, or else employ theuseautousepragma.Programs that check their arguments and abort with a usage message on error have no reason to load modules they never use. This delays the inevitable and annoys users. But thoseusestatements happen during compilation, not execution, as explained in the Introduction.Here, an effective strategy is to place argument checking in a BEGIN block before loading the modules. The following is the start of a program that checks to make sure it was called with exactly two arguments, which must be whole numbers, before going on to load the modules it will need:BEGIN { unless (@ARGV = = 2 && (2 = = grep {/^\d+$/} @ARGV)) { die "usage: $0 num1 num2\n"; } } use Some::Module; use More::Modules;A related situation arises in programs that don't always use the same set of modules every time they're run. For example, the factors program from Chapter 2 needs the infinite precision arithmetic library only when the -b command-line flag is supplied. Ausestatement would be pointless within a conditional because it's evaluated at compile time, long before theifcan be checked. So we use arequireinstead:if ($opt_b) { require Math::BigInt; }Because Math::BigInt is an object-oriented module instead of a traditional one, no import was needed. If you have an import list, specify it with aqw( )construct as you would withuse. For example, rather than this:use Fcntl qw(O_EXCL O_CREAT O_RDWR);
you might say this instead:require Fcntl; Fcntl->import(qw(O_EXCL O_CREAT O_RDWR));
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making Variables Private to a Module
- InhaltsvorschauYou want to make a variable private to a package.You can't. But you can make them private to the file that the module sits in, which usually suffices.Remember that a package is just a way of grouping variables and functions together, conferring no privacy. Anything in a package is by definition global and accessible from anywhere. Packages only group; they don't hide.For privacy, only lexical variables will do. A module is implemented in a Module.pm file, with all its globals in the package named Module. Because that whole file is by definition a scope and lexicals are private to a scope, creating file-scoped lexicals is effectively the same thing as a module-private variable.If you alternate packages within a scope, though, you may be surprised that the scope's lexicals are still visible throughout that scope. That's because a package statement only sets a different prefix for a global identifier; it does not end the current scope, not does it begin a new one.
package Alpha; my $aa = 10; $x = "azure"; package Beta; my $bb = 20; $x = "blue"; package main; print "$aa, $bb, $x, $Alpha::x, $Beta::x\n"; 10, 20, , azure, blueWas that the output you expected? The two lexicals,$aaand$bb, are still in scope because we haven't left the current block, file, oreval. You might think of globals and lexicals as existing in separate dimensions, forever unrelated to each other. Package statements have nothing to do with lexicals. By setting the current prefix, the first global variable$xis really$Alpha::x, whereas the second$xis now$Beta::xbecause of the intervening package statement changing the default prefix. Package identifiers, if fully qualified, can be accessed from anywhere, as we've done in theprintstatement.So, packages can't have privacy—but modules can because they're in a file, which is always its own scope. Here's a simple module, placed in the fileEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making Functions Private to a Module
- InhaltsvorschauYou want to make a function private to a package.You can't. But you can make a private variable and store a reference to an anonymous function in it.
# this is the file SomeModule.pm package Some_Module; my $secret_function = sub { # your code here }; sub regular_function { # now call your "private" function via the code ref $secret_function->(ARG1, ARG2); }Even a function that isn't exported can still be accessed by anyone, anywhere if they qualify that function's name with its package. That's because function names are always in the package symbol table, which is globally accessible.By creating a lexical variable at the file scope, code in that module file below the point of declaration has full access to that variable. Code in other files will not, because those scopes are unrelated. The subroutine created viasub { ....}is anonymous, so there's no name in the symbol table for anyone outside to find. Not even other code in the module can call the function by name, since it doesn't have one, but that code can use the lexical variable to dereference the code reference indirectly.$secret_function->(ARGS); # infix deref form &$secret_function(ARGS); # prefix deref form
Curiously, if you really wanted to, you could give this anonymous function a temporary name. Using the technique outlined in Recipe 10.16, assign the code reference to a localized typeglob, like this:sub module_function { local *secret = $secret_function; Other_Package::func1( ); secret(ARG1, ARG2); Yet_Another_Package::func2( ); }Now for the duration ofmodule_function, your previously secret function can be called using a direct function call; no indirection required. However, code outside the module can also find that function. In the example, it doesn't matter whetherEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Determining the Caller's Package
- InhaltsvorschauYou need to find out the current or calling package.To find the current package:
$this_pack = _ _PACKAGE_ _;
To find the caller's package:$that_pack = caller( );
The_ _PACKAGE_ _symbol returns the package that the code is currently being compiled into. This doesn't interpolate into double-quoted strings:print "I am in package _ _PACKAGE_ _\n"; # WRONG! I am in package _ _PACKAGE_ _Needing to figure out the caller's package arose more often in older code that received as input a string of code to beevaluated, or a filehandle, format, or directory handle name. Consider a call to a hypotheticalrunitfunction:package Alpha; runit('$line = <TEMP>'); package Beta; sub runit { my $codestr = shift; eval $codestr; die if $@; }Becauserunitwas compiled in a different package than was currently executing, when theevalruns, it acts as though it were passed$Beta::lineandBeta::TEMP. The old workaround was to include your caller's package first:package Beta; sub runit { my $codestr = shift; my $hispack = caller; eval "package $hispack; $codestr"; die if $@; }That approach works only when$lineis a global variable. If it's lexical, that won't help at all. Instead, arrange forrunitto accept a reference to a subroutine:package Alpha; runit( sub { $line = <TEMP> } ); package Beta; sub runit { my $coderef = shift; &$coderef( ); }This not only works with lexicals, but has the added benefit of checking the code's syntax at compile time, which is a major win.If all that's being passed in is a filehandle, it's more portable to use theSymbol::qualifyfunction. This function takes a name and package to qualify the name into. If the name needs qualification, it fixes it; otherwise, it's left alone. But that's considerably less efficient than aEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Automating Module Cleanup
- InhaltsvorschauYou need to create module setup code and cleanup code that gets called automatically, without user intervention.For setup code, put executable statements outside subroutine definitions in the module file. For cleanup code, use an END subroutine in that module.In some languages, the programmer must remember to call module initialization code before accessing any of that module's regular functions. Similarly, when the program is done, the programmer may have to call module-specific finalization code.Not so in Perl. For per-module initialization code, executable statements outside of any subroutines in your module suffice. When the module is loaded in, that code runs right then and there. The user never has to remember to do this, because it's done automatically.Now, why would you want automatic cleanup code? It depends on the module. You might want to write a shutdown message to a logfile, tell a database server to commit any pending state, refresh a screen, or return the tty to its original state.Suppose you want a module to log quietly whenever a program using it starts up or finishes. Add code in an END subroutine to run after your program finishes:
$Logfile = "/tmp/mylog" unless defined $Logfile; open(LF, ">>", $Logfile) or die "can't append to $Logfile: $!"; select(((select(LF), $|=1))[0]); # unbuffer LF logmsg("startup"); sub logmsg { my $now = scalar gmtime; print LF "$0 $$ $now: @_\n" or die "write to $Logfile failed: $!"; } END { logmsg("shutdown"); close(LF) or die "close $Logfile failed: $!"; }The first part of code, outside any subroutine declaration, is executed at module load time. The module user doesn't have to do anything special to make this happen. Someone might be unpleasantly surprised, however, if the file couldn't be accessed, since theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Keeping Your Own Module Directory
- InhaltsvorschauYou don't want to install your own personal modules in the standard per-system extension library.You have several choices: use Perl's -I command line switch; set your
PERL5LIBenvironment variable; or employ theuselibpragma, possibly in conjunction with the FindBin module.The@INCarray contains a list of directories to consult whendo,require, orusepulls in code from another file. You can print these out easily from the command line:% perl -e 'printf "%d %s\n", $i++, $_ for @INC' 0 /usr/local/lib/perl5/5.8.0/OpenBSD.i386-openbsd 1 /usr/local/lib/perl5/5.8.0 2 /usr/local/lib/perl5/site_perl/5.8.0/OpenBSD.i386-openbsd 3 /usr/local/lib/perl5/site_perl/5.8.0 4 /usr/local/lib/perl5/site_perl/5.6.0 5 /usr/local/lib/perl5/site_perl/5.00554 6 /usr/local/lib/perl5/site_perl/5.005 7 /usr/local/lib/perl5/site_perl 8
The first two directories, elements 0 and 1 of@INC, are respectively the standard architecture-dependent and architecture-independent directories, which all standard libraries, modules, and pragmas will go into. You have two of them because some modules contain information or formatting that makes sense only on that particular architecture. For example, the Config module contains information that cannot be shared across several architectures, so it goes in the 0th array element. Modules that include compiled C components, such as Socket.so, are also placed there. Most modules, however, go in the platform-independent directory in the 1st element.The next pair, elements 2 and 3, fulfills roles analogous to elements and 1, but on a site-specific basis. Suppose you have a module that didn't come with Perl, such as one from CPAN or that you wrote yourself. When you or (more likely) your system administrator installs this module, its components go into one of the site-specific directories. You are encouraged to use these for any modules that your entire site should be able to access conveniently.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Preparing a Module for Distribution
- InhaltsvorschauYou want to prepare your module in standard distribution format so you can easily send your module to a friend. Better yet, you plan to contribute your module to CPAN so everyone can use it.It's best to start with Perl's standard h2xs tool. Let's say you want to make a Planets module or an Astronomy::Orbits module. You'd type:
% h2xs -XA -n Planets % h2xs -XA -n Astronomy::Orbits
These commands make subdirectories called ./Planets/ and ./Astronomy/Orbits/, respectively, where you will find all the components you need to get you started. The -n flag names the module you want to make, -X suppresses creation of XS (external subroutine) components, and -A means the module won't use the AutoLoader.Writing modules is easy—once you know how. Writing a proper module is like filling out a legal contract: it's full of places to initial, sign, and date exactly right. If you miss any, it's not valid. Instead of hiring a contract lawyer, you can get a quick start on writing modules using the h2xs program. This tool gives you a skeletal module file with the right parts filled in, and it also gives you the other files needed to correctly install your module and its documentation or to bundle up for contributing to CPAN or sending off to a friend.h2xs is something of a misnomer because XS is Perl's external subroutine interface for linking with C or C ++. But the h2xs tool is also extremely convenient for preparing a distribution even when you aren't using the XS interface.Let's look at the module file that h2xs has made. Because the module is called Astronomy::Orbits, the user specifies notuseOrbitsbut ratheruseAstronomy::Orbits. Therefore an extra Astronomy subdirectory is made, under which an Orbits subdirectory is placed. Here is the first and perhaps most important line of Orbit.pm:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Speeding Module Loading with SelfLoader
- InhaltsvorschauYou'd like to load a very large module quickly.Use the SelfLoader module:
require Exporter; require SelfLoader; @ISA = qw(Exporter SelfLoader); # # other initialization or declarations here # _ _DATA_ _ sub abc { .... } sub def { .... }When you load a module usingrequireoruse, the entire module file must be read and compiled (into internal parse trees, not into byte code or native machine code) right then. For very large modules, this annoying delay is unnecessary if you need only a few functions from a particular file.To address this problem, the SelfLoader module delays compilation of each subroutine until that subroutine is actually called. SelfLoader is easy to use: just place your module's subroutines underneath the_ _DATA_ _marker so the compiler will ignore them, use arequireto pull in the SelfLoader, and include SelfLoader in the module's@ISAarray. That's all there is to it. When your module is loaded, the SelfLoader creates stub functions for all routines below_ _DATA_ _. The first time a function gets called, the stub replaces itself by first compiling the real function and then calling it.There is one significant restriction on modules that employ the SelfLoader (or the AutoLoader for that matter, described in Recipe 12.11). SelfLoaded or AutoLoaded subroutines have no access to lexical variables in the file whose_ _DATA_ _block they are in because they are compiled viaevalin an imported AUTOLOAD block. Such dynamically generated subroutines are therefore compiled in the scope of SelfLoader's or AutoLoader's AUTOLOAD.Whether the SelfLoader helps or hinders performance depends on how many subroutines the module has, how large they are, and whether they are all called over the lifetime of the program or not.You should initially develop and test your module without SelfLoader. Commenting out theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Speeding Up Module Loading with Autoloader
- InhaltsvorschauYou want to use the AutoLoader module.The easiest solution is to use the h2xs facility to create a directory and all the files you need. Here we assume you have your own directory, ~/perllib/, which contains your personal library modules.
% h2xs -Xn Sample % cd Sample % perl Makefile.PL LIB=~/perllib % (edit Sample.pm) % make install
The AutoLoader addresses the same performance issues as the SelfLoader. It also provides stub functions that get replaced by real ones the first time they're called. But instead of looking for functions all in the same file, hidden below a_ _DATA_ _marker, the AutoLoader expects to find the real definition for each function in its own file. If your Sample.pm module had two functions,fooandbar, then the AutoLoader would expect to find them in Sample/auto/foo.al and Sample/auto/bar.al, respectively. Modules employing the AutoLoader load faster than those using the SelfLoader, but at the cost of extra files, disk space, and complexity.This setup sounds complicated. If you were doing it manually, it probably would be. Fortunately, h2xs helps out tremendously here. Besides creating a module directory with templates for your Sample.pm file and other files you need, it also generates a Makefile that uses the AutoSplit module to break your module's functions into little files, one function per file. Themakeinstallrule installs these so they will be found automatically. All you have to do is put the module functions down below an_ _END_ _line (rather than a_ _DATA_ _line as in SelfLoader) that h2xs already created.As with the SelfLoader, it's easier to develop and test your module without the AutoLoader. Just comment out the_ _END_ _line while developing it.The same restrictions about invisibility of file lexicals that apply to modules using the SelfLoader also apply when using the AutoLoader, so using file lexicals to maintain private state doesn't work. If state is becoming that complex and significant an issue, consider writing an object module instead of a traditional one.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Overriding Built-in Functions
- InhaltsvorschauYou want to replace a standard, built-in function with your own version.Import that function from another module into your own namespace.Suppose you want to give a function of your own the same name as one of Perl's core built-ins. If you write:
sub time { "it's howdy doody time" } print time( );then you won't get your function called—you'll still get Perl's original, built-in version. You could use an explicit ampersand to call the function:print &time( );
because that always gets your function, never the built-in. But then you forego any prototype checking and context coercion on the function's arguments. However, there is a way to override that.Many (but not all) of Perl's built-in functions may be overridden. This is not something to be attempted lightly, but it is possible. You might do this, for example, if you are running on a platform that doesn't support the function that you'd like to emulate. Or, you might want to add your own wrapper around the built-in.Not all reserved words have the same status. Those that return a negative number in the C-languagekeyword( )function in the toke.c file in your Perl source kit may be overridden. Keywords that cannot be overridden as of v5.8.1 aredefined,delete,do,else,elsif,eval,exists,for,foreach,format,glob,goto,grep,if,last,local,m,map,my,next,no,our,package,pos,print,printf,prototype,q,qq,qr,qw,qx,redo,require,return,s,scalar,sort,split,study,sub,tie,tied,tr,undef,unless,untie,until,use,while, andy. The rest can.A standard Perl module that overrides a built-in is Cwd, which can overloadchdir. Others are the by-name versions of functions that return lists: File::stat, Net::hostent, Net::netent, Net::protoent, Net::servent, Time::gmtime, Time::localtime, Time::tm, User::grent, and User::pwent. These modules all override built-in functions likeEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Overriding a Built-in Function in All Packages
- InhaltsvorschauYou want to change the definition of a core built-in function within your entire program, not just the current package.Manually import, via direct symbol-table manipulation, the function into the CORE::GLOBAL pseudopackage.
*CORE::GLOBAL::int = \&myown_int;
The technique demonstrated in the previous recipe only overrides a built-in in a particular package. It doesn't change everything for your whole program, no matter what package that function is called from. To do so would risk changing the behavior of code from modules you didn't write, and which were therefore not prepared for the change.It has been said that Unix was not designed to stop you from doing stupid things, because that would also stop you from doing clever things. So, too, with Perl. Just because overriding a function in all packages at once might seem, well, imprudent doesn't mean a clever person won't someday find a marvelous use for such a facility.For example, let's suppose that you've decided that the coreintfunction's behavior of integer truncation, also known as rounding toward zero, is so annoying to your program that you want to provide an alternative by the same name. This would do it:package Math::Rounding; use warnings; use Carp; use Exporter; our @EXPORT = qw(int); our @ISA = qw(Exporter); sub int(;$) { my $arg = @_ ? shift : $_; use warnings FATAL => "numeric"; # promote to die( )ing my $result = eval { sprintf("%.0f", $arg) }; if ($@) { die if $@ !~ /isn't numeric/; $@ =~ s/ in sprintf.*/ in replacement int/s; croak $@; } else { return $result; } }Your replacement version usessprintf( )to round to the closest integer. It also raises an exception if passed a non-numeric string. A program could access this function either by saying:use Math::Rounding ( ); $y = Math::Rounding::int($x);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reporting Errors and Warnings Like Built-ins
- InhaltsvorschauYou want to generate errors and warnings in your modules, but when you use
warnordie, the user sees your own filename and line number. You'd like your functions to act like built-ins and report messages from the perspective of the user's code, not your own.The standard Carp module provides functions to do this. Usecarpinstead ofwarn. Usecroak(for a short message) andconfess(for a long message) instead ofdie.Like built-ins, some of your module's functions generate warnings or errors if all doesn't go well. Think aboutsqrt: when you pass it a negative number (and you haven't used the Math::Complex module), an exception is raised, producing a message such as "Can'ttakesqrtof-3at/tmp/negrootline17", where /tmp/negroot is the name of your own program. But if you write your own function thatdies, perhaps like this:sub even_only { my $n = shift; die "$n is not even" if $n & 1; # one way to test #.... }then the message will say it's coming from the file youreven_onlyfunction was itself compiled in, rather than from the file the user was in when they called your function. That's where the Carp module comes in handy. Instead of usingdie, usecroakinstead:use Carp; sub even_only { my $n = shift; croak "$n is not even" if $n % 2; # here's another #.... }If you just want to complain about something, but have the message report where in the user's code the problem occurred, callcarpinstead ofwarn. For example:use Carp; sub even_only { my $n = shift; if ($n & 1) { # test whether odd number carp "$n is not even, continuing"; ++$n; } #.... }Many built-ins emit warnings only when theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Customizing Warnings
- InhaltsvorschauYou would like your module to respect its caller's settings for lexical warnings, but you can't inspect the predefined
$^Wvariable to determine those settings.Your module should use this pragma:use warnings::register;
Then from inside your module, use thewarnings::enabledfunction from that module as described in the Discussion to check whether the caller has warnings enabled. This works for both the old-style, global warnings and for lexical warnings set via theusewarningspragma.Perl's -w command-line flag, mirrored by the global$^Wvariable, suffers from several problems. For one thing, it's an all-or-nothing affair, so if you turn it on for the program, module code included by that program—including code you may not have written—is also affected by it. For another, it's at best cumbersome to control compile-time warnings with it, forcing you to resort to convolutedBEGINblocks. Finally, suppose you were interested in numeric warnings but not any other sort; you'd have to write a$SIG{_ _WARN_ _}handler to sift through all warnings to find those you did or did not want to see.Lexical warnings, first introduced in Perl v5.6, address all this and more. By lexical, we mean that their effects are constrained to the lexical scope in whichusewarningsorno warningsoccurs. Lexical warnings pay no attention to the -w command-line switch. Now when you turn warnings on in one scope, such as the main program's file scope, that doesn't enable warnings in modules you load. You can also selectively enable or disable individual categories of warnings. For example:use warnings qw(numeric uninitialized); use warnings qw(all); no warnings qw(syntax);
Thewarnings::registerpragma permits a module to check the warnings preferences of its caller's lexical scope. The pragma also creates a new warning category, taken from the name of the current package. These user-defined warning categories are easily distinguishable from the built-in warning categories because a module's package always starts (orEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Referring to Packages Indirectly
- InhaltsvorschauYou want to refer to a variable or function in a package unknown until runtime, but syntax like
$packname::$varnameis illegal.Use symbolic references:{ no strict "refs"; $val = ${ $packname . "::" . $varname }; @vals = @{ $packname . "::" . $aryname }; &{ $packname . "::" . $funcname }("args"); ($packname . "::" . $funcname) -> ("args"); }A package declaration has meaning at compile time. If you don't know the name of the package or variable until runtime, you'll have to resort to symbolic references for direct access to the package symbol table. Assuming you normally run withusestrictin effect, you must disable part of it to use symbolic references. Once you've used thenostrict"refs" directive in that block, build up a string with the fully qualified name of the variable or function you're interested in. Then dereference this name as though it were a proper Perl reference.During the prehistoric eras (before Perl 5), programmers were forced to use anevalfor this kind of thing:eval "package $packname; \$'$val = \$$varname"; # set $main'val die if $@;
As you see, this approach makes quoting difficult. It's also comparatively slow. Fortunately, you never need to do this just to access variables indirectly by name. Symbolic references are a necessary compromise.Similarly,evalcould be used to define functions on the fly. Suppose you wanted to be able to get the base 2 or base 10 logs of numbers:printf "log2 of 100 is %.2f\n", log2(100); printf "log10 of 100 is %.2f\n", log10(100);
Perl has only the natural log function. Here's how one could useevalto create these functions at runtime. Here we'll create functions namedlog2up throughlog999:$packname = "main"; for ($i = 2; $i < 1000; $i++) { $logN = log($i); eval "sub ${packname}::log$i { log(shift) / $logN }"; die if $@; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using h2ph to Translate C #include Files
- InhaltsvorschauSomeone gave you code that generates the bizarre error message:
Can't locate sys/syscall.ph in @INC (did you run h2ph?) (@INC contains: /usr/lib/perl5/i686-linux/5.00404 /usr/lib/perl5 /usr/lib/perl5/site_perl/i686-linux /usr/lib/perl5/site_perl .) at some_program line 7You want to know what it means and how to fix it.Get your system administrator to do this, running as the superuser:% cd /usr/include; h2ph sys/syscall.h
However, most include files require other include files, which means you should probably just translate them all:% cd /usr/include; h2ph *.h */*.h
If that reports too many filenames or misses some that are more deeply nested, try this instead:% cd /usr/include; find . -name "*.h" -print | xargs h2ph
A file whose name ends in.phhas been created by the h2ph tool, which translates C preprocessor directives from C#includefiles into Perl. The goal is to allow Perl code to access the same constants as C code. h2xs is a better approach in most cases because it provides compiled C code for your modules, not Perl code simulating C code. However, using h2xs requires a lot more programming savvy (at least, for accessing C code) than h2ph does.When h2ph's translation process works, it's wonderful. When it doesn't, you're probably out of luck. As system architectures and include files become more complex, h2ph fails more frequently. If you're lucky, the constants you need are already in the Fcntl, Socket, or POSIX modules. The POSIX module implements constants from sys/file.h, sys/errno.h, and sys/wait.h, among others. It also allows fancy tty handling, as described in Recipe 15.8.So what can you do with these .ph files? Here are a few examples. The first uses the pessimally non-portableEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using h2xs to Make a Module with C Code
- InhaltsvorschauYou'd like to access your system's unique C functions from Perl.Use the h2xs tool to generate the necessary template files, fill the files in appropriately, and then type:
% perl Makefile.PL % make
A Perl module need not be written solely in Perl. As with any other module, first pick a module name and use h2xs on it. We'll make aFineTime::timefunction with the same semantics as in the previous recipe, but this time around, we'll implement it using real C.First, we run the following command:% h2xs -cn FineTime
If we had a .h file with function prototype declarations, we could include that, but because we're writing this one from scratch, we'll use the -c switch to omit building code to translate any#definesymbols. The -n switch says to create a module directory named FineTime/, which will have the following files:ManifestList of files in the distributionChangesChange logMakefile.PLA meta-makefileFineTime.pmThe Perl partsFineTime.xsEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing Extensions in C with Inline::C
- InhaltsvorschauYou'd like to write functions in C that you can call from Perl. You may already have tried XS and found it harmful to your mental health.Use the Inline::C module available from CPAN:
use Inline C; $answer = somefunc(20, 4); print "$answer\n"; # prints 80 _ _END_ _ _ _C_ _ double somefunc(int a, int b) { /* Inline knows most basic C types */ double answer = a * b; return answer; }Inline::C was created as an alternative to the XS system for building C extension modules. Rather than jumping through all the hoopla of h2xs and the format of an .xs file, Inline::C lets you embed C code into your Perl program. There are also Inline modules for Python, Ruby, and Java, among other languages.By default, your C source is in the_ _END_ _or_ _DATA_ _section of your program after a_ _C_ _token. This permits multiple Inlined language blocks in a single file. If you want, use a here document when you load Inline:use Inline C <<'END_OF_C'; double somefunc(int a, int b) { /* Inline knows most basic C types */ double answer = a * b; return answer; } END_OF_CInline::C scans the source code for ANSI-style function definitions. When it finds a function definition it knows how to deal with, it creates a Perl wrapper for the function. Inline can automatically translate the basic C data types (double,int,char *, etc.) by using the typemap that comes with Perl. A typemap shows Perl how to convert between C values and Perl data types, and you can install your own if you need to use more complex data structures than the basic typemap supports.You can link against external libraries, parse header files as h2xs does, pass and return multiple values, handle objects, and more. See the Inline::C-Cookbook manpage that comes with the Inline::C module for more details.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Documenting Your Module with Pod
- InhaltsvorschauYou need to document your module, but don't know what format to use.Embed your documentation in the your module file using pod format.Pod stands for plain old documentation. It's documentation embedded in your program using a very simple markup format. Programmers are notorious for writing the code first and the documentation never, so pod was designed to make writing documentation so easy that anyone can and will do so. Sometimes this even works.When Perl is parsing your source code, a line starting with an equals sign (where a new statement is expected) says to ignore all text until it finds a line beginning with
=cut, after which it will start parsing code again. This lets you mix code and documentation throughout your Perl program or module file. Since it's mostly plain text, type in your documentation as literal text, or nearly so. The translators try to be clever and make output-specific decisions so the programmer doesn't have to specifically format variable names, function calls, etc.Perl ships with several translators that filter generic pod format into specific output styles. These include pod2man to change your pods into troff for use with the man program or for phototypesetting and printing; pod2html for creating web pages (which works even on non-Unix systems); and pod2text for plain ASCII. Other translators, such as pod2ipf, pod2fm, pod2texi, pod2latex, and pod2ps, may also be available or can be found on CPAN.Many books are written using proprietary word processors with limited scripting capabilities. Not this one! It was written in pod format using common text editors (vi for Tom, emacs for Nat). The final book was produced by converting the pod source files to FrameMaker.Although formally documented in perlpodEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Building and Installing a CPAN Module
- InhaltsvorschauYou want to install a module file that you downloaded from CPAN over the Net or obtained from a CD.Type the following commands into your shell. It will build and install Version 4.54 of the Some::Module package.
% gunzip Some-Module-4.54.tar.gz % tar xf Some-Module-4.54 % cd Some-Module-4.54 % perl Makefile.PL % make % make test % make install
Like most programs on the Net, Perl modules are available in source kits stored as tar archives in GNU zip format. If tar warns of "Directorychecksumerrors", then you downloaded the binary file in text format, mutilating it.You'll probably have to become a privileged user with adequate permissions to install the module in the system directories. Standard modules are installed in a directory like /usr/lib/perl5, whereas third-party modules are installed in /usr/lib/perl5/site_ perl.Here's a sample run, showing the installation of the MD5 module:% gunzip MD5-1.7.tar.gz % tar xf MD5-1.7.tar % cd MD5-1.7 % perl Makefile.PL Checking if your kit is complete.. Looks good Writing Makefile for MD5 % make mkdir ./blib mkdir ./blib/lib cp MD5.pm ./blib/lib/MD5.pm AutoSplitting MD5 (./blib/lib/auto/MD5) /usr/bin/perl -I/usr/local/lib/perl5/i386 .. .. cp MD5.bs ./blib/arch/auto/MD5/MD5.bs chmod 644 ./blib/arch/auto/MD5/MD5.bsmkdir ./blib/man3 Manifying ./blib/man3/MD5.3 % make test PERL_DL_NONLAZY=1 /usr/bin/perl -I./blib/arch -I./blib/lib -I/usr/local/lib/perl5/i386-freebsd/5.00404 -I/usr/local/lib/perl5 test.pl 1..14 ok 1 ok 2 .. ok 13 ok 14 % sudo make install Password Installing /usr/local/lib/perl5/site_perl/i386-freebsd/./auto/MD5/
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Example: Module Template
- InhaltsvorschauFollowing is the skeleton of a module. If you want to write a module of your own, you can copy this and customize it.
package Some::Module; # must live in Some/Module.pm use strict; require Exporter; # set the version for version checking our $VERSION = 0.01; our @ISA = qw(Exporter); our @EXPORT = qw(&func1 &func2 &func4); our %EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ], # your exported package globals go here, # as well as any optionally exported functions our @EXPORT_OK = qw($Var1 %Hashit &func3); use vars qw($Var1 %Hashit); # non-exported package globals go here our(@more, $stuff); # initialize package globals, first exported ones $Var1 = ""; %Hashit = ( ); # then the others (which are still accessible as $Some::Module::stuff) $stuff = ""; @more = ( ); # all file-scoped lexicals must be created before # the functions below that use them. # file-private lexicals go here my $priv_var = ""; my %secret_hash = ( ); # here's a file-private function as a closure, # callable as &$priv_func. my $priv_func = sub { # stuff goes here. }; # make all your functions, whether exported or not; # remember to put something interesting in the { } stubs sub func1 { .... } # no prototype sub func2( ) { .... } # proto'd void sub func3($$) { .... } # proto'd to 2 scalars # this one isn't auto-exported, but could be called! sub func4(\%) { .... } # proto'd to 1 hash ref END { } # module cleanup code here (global destructor) 1;Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: Finding Versions and Descriptions of Installed Modules
- InhaltsvorschauPerl comes with many modules included standard. Even more can be found on CPAN. The following program prints out the names, versions, and descriptions of all modules installed on your system. It uses standard modules like File::Find and includes several techniques described in this chapter.To run it, type:
% pmdesc
It prints a list of modules and their descriptions:FileHandle (2.00) - supply object methods for filehandles IO::File (1.06021) - supply object methods for filehandles IO::Select (1.10) - OO interface to the select system call IO::Socket (1.1603) - Object interface to socket communications ..With the -v flag, pmdesc provides the names of the directories the files are in:% pmdesc -v <<<Modules from /usr/lib/perl5/i686-linux/5.00404>>> FileHandle (2.00) - supply object methods for filehandles ..
The -w flag warns if a module doesn't come with a pod description, and -s sorts the module list within each directory.The program is given in Example 12-3.Example 12-3. pmdesc#!/usr/bin/perl -w # pmdesc - describe pm files # tchrist@perl.com use strict; use File::Find qw(find); use Getopt::Std qw(getopts); use Carp; use vars ( q!$opt_v!, # give debug info q!$opt_w!, # warn about missing descs on modules q!$opt_a!, # include relative paths q!$opt_s!, # sort output within each directory ); $| = 1; getopts("wvas") or die "bad usage"; @ARGV = @INC unless @ARGV; # Globals. wish I didn't really have to do this. use vars ( q!$Start_Dir!, # The top directory find was called with q!%Future!, # topdirs find will handle later ); my $Module; # install an output filter to sort my module list, if wanted. if ($opt_s) { if (open(ME, "-|")) { $/ = ""; while (<ME>) { chomp; print join("\n", sort split /\n/), "\n"; } exit; } } MAIN: { my %visited; my ($dev,$ino); @Future{@ARGV} = (1) x @ARGV; foreach $Start_Dir (@ARGV) { delete $Future{$Start_Dir}; print "\n << Modules from $Start_Dir>>\n\n" if $opt_v; next unless ($dev,$ino) = stat($Start_Dir); next if $visited{$dev,$ino}++; next unless $opt_a || $Start_Dir =~ m!^/!; find(\&wanted, $Start_Dir); } exit; } # calculate module name from file and directory sub modname { local $_ = $File::Find::name; if (index($_, $Start_Dir . "/") = = 0) { substr($_, 0, 1+length($Start_Dir)) = ""; } s { / } {::}gx; s { \.p(m|od)$ } { }x; return $_; } # decide if this is a module we want sub wanted { if ( $Future{$File::Find::name} ) { warn "\t(Skipping $File::Find::name, qui venit in futuro.)\n" if 0 and $opt_v; $File::Find::prune = 1; return; } return unless /\.pm$/ && -f; $Module = &modname; # skip obnoxious modules if ($Module =~ /^CPAN(\Z|::)/) { warn("$Module -- skipping because it misbehaves\n"); return; } my $file = $_; unless (open(POD, "<", $file)) { warn "\tcannot open $file: $!"; # if $opt_w; return 0; } $: = " -:"; local $/ = ""; local $_; while (<POD>) { if (/=head\d\s+NAME/) { chomp($_ = <POD>); s/^.*?-\s+//s; s/\n/ /g; #write; my $v; if (defined ($v = getversion($Module))) { print "$Module ($v) "; } else { print "$Module "; } print "- $_\n"; return 1; } } warn "\t(MISSING DESC FOR $File::Find::name)\n" if $opt_w; return 0; } # run Perl to load the module and print its verson number, redirecting # errors to /dev/null sub getversion { my $mod = shift; my $vers = `$^X -m$mod -e 'print \$${mod}::VERSION' 2>/dev/null`; $vers =~ s/^\s*(.*?)\s*$/$1/; # remove stray whitespace return ($vers || undef); } format = ^<<<<<<<<<<<<<<<<<~~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $Module, $_ .Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 13: Classes, Objects, and Ties
- InhaltsvorschauAll the world over, I will back the masses against the classes.—William E. Gladstone, Speech at Liverpool, 28 June 1886Although Perl was not initially conceived of as an object-oriented language, within a few years of its initial release, complete support for object-oriented programming had been added. As usual, Perl doesn't try to enforce one true style, but embraces many. This helps more people do their job the way they want to do it.You don't have to use objects to write programs, unlike Java, where programs are instances of objects. If you want to, though, you can write Perl programs that use nearly every weapon in the object-oriented arsenal. Perl supports classes and objects, single and multiple inheritance, instance methods and class methods, access to overridden methods, constructors and destructors, operator overloading, proxy methods through autoloading, delegation, a rooted hierarchy for all objects, and two levels of garbage collection.You can use as many or as few object-oriented techniques as you want and need. Ties are the only part of Perl where you must use object orientation. And even then, only the module implementor need be aware of this; the casual user gets to remain blissfully unaware of the internal mechanics. Ties, discussed in Recipe 13.15, let you transparently intercept access to a variable. For example, you can use ties to create hashes that support lookups by key or value instead of just by key.If you ask 10 people what object orientation is, you'll get 10 different answers. People bandy about terms like abstraction and encapsulation, trying to isolate the basic units of object-oriented programming languages and give them big names to write papers and books about. Not all object-oriented languages offer the same features, yet they are still deemed object-oriented. This, of course, produces more papers and books.We follow the nomenclature used in Perl's documentation, theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Introduction
- InhaltsvorschauAlthough Perl was not initially conceived of as an object-oriented language, within a few years of its initial release, complete support for object-oriented programming had been added. As usual, Perl doesn't try to enforce one true style, but embraces many. This helps more people do their job the way they want to do it.You don't have to use objects to write programs, unlike Java, where programs are instances of objects. If you want to, though, you can write Perl programs that use nearly every weapon in the object-oriented arsenal. Perl supports classes and objects, single and multiple inheritance, instance methods and class methods, access to overridden methods, constructors and destructors, operator overloading, proxy methods through autoloading, delegation, a rooted hierarchy for all objects, and two levels of garbage collection.You can use as many or as few object-oriented techniques as you want and need. Ties are the only part of Perl where you must use object orientation. And even then, only the module implementor need be aware of this; the casual user gets to remain blissfully unaware of the internal mechanics. Ties, discussed in Recipe 13.15, let you transparently intercept access to a variable. For example, you can use ties to create hashes that support lookups by key or value instead of just by key.If you ask 10 people what object orientation is, you'll get 10 different answers. People bandy about terms like abstraction and encapsulation, trying to isolate the basic units of object-oriented programming languages and give them big names to write papers and books about. Not all object-oriented languages offer the same features, yet they are still deemed object-oriented. This, of course, produces more papers and books.We follow the nomenclature used in Perl's documentation, the perlobj(1) manpage, and Chapter 12 of Programming Perl. An object is a variable that belongs to a class. Methods are functions associated with a class. In Perl, a class is a package—and usually a module. An object is a reference to something associated with a class.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Constructing an Object
- InhaltsvorschauYou want to create a way for your users to generate new objects.Make a constructor. In Perl, the constructor method must not only initialize its object, but must also first allocate memory for it, typically using an anonymous hash. C++ constructors, on the other hand, are called with memory already allocated. Some of the object-oriented world would call C++'s constructors initializers.Here's the canonical object constructor in Perl:
sub new { my $class = shift; my $self = { }; bless($self, $class); return $self; }This is the equivalent one-liner:sub new { bless( { }, shift ) }Any method that allocates and initializes a new object acts as a constructor. The most important thing to remember is that a reference isn't an object untilblesshas been called on it. The simplest possible constructor, although not particularly useful, is the following:sub new { bless({ }) }Let's add some initialization:sub new { my $self = { }; # allocate anonymous hash bless($self); # init two sample attributes/data members/fields $self->{START} = time( ); $self->{AGE} = 0; return $self; }This constructor isn't very useful, because it uses the single-argument form ofbless, which always blesses the object into the current package. This means it can't be usefully inherited from; objects it constructs will always be blessed into the class that thenewfunction was compiled into. With inheritance, this is not necessarily the class on whose behalf the method was invoked.To solve this, have the constructor heed its first argument. For a class method, this is the package name. Pass this class name as the second argument tobless:sub new { my $classname = shift; # What class are we constructing? my $self = { }; # Allocate new memory bless($self, $classname); # Mark it of the right type $self->{START} = time( ); # init data fields $self->{AGE} = 0; return $self; # And give it back }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Destroying an Object
- InhaltsvorschauYou want to run special code whenever an object is no longer used. This is sometimes needed when the object is an interface to the outside world—or contains circular data structures—and must clean up after itself. You might remove temporary files, break circular links, gracefully disconnect from a socket, or kill a spawned subprocess.Create a method named
DESTROY. This will be invoked when there are no more references to the object, or else when the program shuts down, whichever comes first. You don't need to do any memory deallocation here, just any finalization code that makes sense for the class.sub DESTROY { my $self = shift; printf("$self dying at %s\n", scalar localtime); }Every story has a beginning and an end. The beginning of the object's story is its constructor, explicitly invoked when the object comes into existence. The end of its story is the destructor, a method implicitly invoked when an object leaves this life. Any per-object clean-up code is placed in the destructor, which must be namedDESTROY.Why can't destructors have arbitrary names? Because although constructors are explicitly called by name, the destructor is not. Destruction happens automatically via Perl's garbage collection (GC) system, which is currently implemented as a quick but lazy reference-based GC system. To know what to call, Perl insists that the destructor be namedDESTROY. If more than one object goes out of scope at once, Perl makes no promise about invoking destructors in any particular order.Why isDESTROYin all caps? Perl on occasion uses purely uppercase function names as a convention to indicate that the function will be automatically called by Perl. Others that are called implicitly includeBEGIN,INIT,END,AUTOLOAD, plus all methods used by tied objects (see Recipe 13.15), such asSTOREandFETCH.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Managing Instance Data
- InhaltsvorschauEach data attribute of an object, sometimes named data members or properties, needs its own method for access. How do you write functions that manipulate the object's instance data?Either write pairs of get and set methods that affect the appropriate key in the object hash, like this:
sub get_name { my $self = shift; return $self->{NAME}; } sub set_name { my $self = shift; $self->{NAME} = shift; }or make single methods that do both jobs depending on whether they're passed an argument:sub name { my $self = shift; if (@_) { $self->{NAME} = shift } return $self->{NAME}; }When setting a new value, sometimes it may be useful to return not that new value, but the previous one:# returns previously set value if changing it sub age { my $self = shift; my $oldage = $self->{AGE}; if (@_) { $self->{AGE} = shift } return $oldage; } $previous_age = $obj->age( $obj->age( ) + $TIME_PASSES );Methods are how you implement the public interface to the object. A proper class doesn't encourage anyone to poke around inside its innards. Each data attribute should have a method to update it, retrieve it, or both. If a user writes code like this:$him = Person->new( ); $him->{NAME} = "Sylvester"; $him->{AGE} = 23;then an argument could justifiably be made that they have violated the interface and so deserve whatever they get.For nominally private data elements, you may omit methods that access them. However, then if—better make that when—you update the implementation, you'll need to scour the class to find where other methods within the class rely upon the particular representation that you're now changing. To be squeaky clean, you could have the class itself go through a mediated, functional interface to access instance data.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Managing Class Data
- InhaltsvorschauYou need a method invoked on behalf of the whole class, not just on one object. This might be a procedural request, or it might be a global data attribute shared by all instances of the class.Instead of expecting a reference as their first argument as object methods do, class methods expect a string containing the name of the class. Class methods access package data, not object data, as in the
populationmethod shown here:package Person; $Body_Count = 0; sub population { return $Body_Count } sub new { # constructor $Body_Count++; return bless({ }, shift); } sub DESTROY { --$Body_Count } # destructor # later, the user can say this: package main; for (1..10) { push @people, Person->new } printf "There are %d people alive.\n", Person->population( ); There are 10 people aliveNormally, each object has its own complete state stored within itself. The value of a data attribute in one object is unrelated to the value that attribute might have in another instance of the same class. For example, setting her gender here does nothing to his gender, because they are different objects with distinct states:$him = Person->new( ); $him->gender("male"); $her = Person->new( ); $her->gender("female");Imagine a classwide attribute where changing the attribute for one instance changes it for all of them. Just as some programmers prefer capitalized global variables, some prefer uppercase names when the method affects class data instead of instance data. Here's an example of using a class method namedMax_Bounds:FixedArray->Max_Bounds(100); # set for whole class $alpha = FixedArray->new( ); printf "Bound on alpha is %d\n", $alpha->Max_Bounds( ); 100 $beta = FixedArray->new( ); $beta->Max_Bounds(50); # still sets for whole class printf "Bound on alpha is %d\n", $alpha->Max_Bounds( );Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using Classes as Structs
- InhaltsvorschauYou're used to structured data types more complex than Perl's arrays and hashes, such as C's structs and Pascal's records. You've heard that Perl's classes are comparable, but you aren't an object-oriented programmer.Use the standard Class::Struct module's
structto declare data structures reminiscent of those in the C programming language:use Class::Struct; # load struct-building module struct Person => { # create a definition for a "Person" name => '$', # name field is a scalar age => '$', # age field is also a scalar peers => '@', # but peers field is an array (reference) }; my $p = Person->new( ); # allocate an empty Person struct $p->name("Jason Smythe"); # set its name field $p->age(13); # set its age field $p->peers( ["Wilbur", "Ralph", "Fred" ] ); # set its peers field # or this way: @{$p->peers} = ("Wilbur", "Ralph", "Fred"); # fetch various values, including the zeroth friend printf "At age %d, %s's first friend is %s.\n", $p->age, $p->name, $p->peers(0);TheClass::Struct::structfunction builds struct-like classes on the fly. It creates a class of the name given in the first argument, complete with a constructor namednewand per-field accessor methods.In the structure layout definition, the keys are the names of the fields and the values are the data type. This type can be one of the three base types: '$' for scalars, '@' for arrays, and '%' for hashes. Each accessor method can be invoked without arguments to fetch the current value, or with an argument to set the value. For a field whose type is an array or hash, a zero-argument method invocation returns a reference to the entire array or hash, a one-argument invocation retrieves the value at that subscript, and a two-argument invocation sets the value at that subscript.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Cloning Constructors
- InhaltsvorschauYou want to write a constructor method that might be invoked on an existing object, and if so, to use that object for default values.Start your constructor like this:
my $proto = shift; my $class = ref($proto) || $proto; my $parent = ref($proto) && $proto;
The$classvariable will contain the class to bless into, and the$parentvariable will either be false, or else the object you're cloning.Sometimes you need another object of the same type as the current one. You could do this:$ob1 = SomeClass->new( ); # later on $ob2 = (ref $ob1)->new( );
but that's not very clear. It's clearer to have a single constructor that behaves correctly, regardless of whether its invocant is a class name or an existing object of that class. As a class method, it should return a new object with the default initialization. As an instance method, it should return a new object initialized from the object it was invoked on:$ob1 = Widget->new( ); $ob2 = $ob1->new( );
Here's a version ofnewthat takes this into consideration:sub new { my $proto = shift; my $class = ref($proto) || $proto; my $parent = ref($proto) && $proto; my $self; # check whether we're shadowing a new from @ISA if (@ISA && $proto->SUPER::can("new") ) { $self = $proto->SUPER::new(@_); } else { $self = { }; bless ($self, $class); } $self->{PARENT} = $parent; $self->{START} = time( ); # init data fields $self->{AGE} = 0; return $self; }Initializing doesn't have to mean simply copying values from the parent. If you're writing a linked list or binary tree class, your constructor can return a new object linked into the list or tree, when invoked as an instance method.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Copy Constructors
- InhaltsvorschauYou would like to provide users of your class with a copy method, or you would like to copy an object for which no copy method has been provided by the class.Use the
dclone( )function from the standard Storable module.use Storable qw(dclone); use Carp; sub copy { my $self = shift; croak "can't copy class $self" unless ref $self; my $copy = Storable::dclone($self); return $copy; }As described in Recipe 11.12, the Storable module'sdclonefunction will recursively copy (virtually) any data structure. It works on objects, too, correctly giving you back new objects that are appropriately blessed. This assumes that the underlying types are SCALAR, ARRAY, HASH, or CODE refs. Things like GLOB and IO refs won't serialize.Some classes already provide methods to copy their objects; others do not, not so much out of intent as out of neglect. Consider this:sub UNIVERSAL::copy { my $self = shift; unless (ref $self) { require Carp; Carp::croak("can't copy class $self"); } require Storable; my $copy = Storable::dclone($self); return $copy; }Now all objects can be copied, providing they're of the supported types. Classes that provide their owncopymethods are unaffected, but any class that doesn't provide its owncopymethod will pick up this definition. We placed therequireon Storable within the function call itself so that you load Storable only if you actually plan to use it. Likewise, we placed the one for Carp inside the test that will end up using it. By usingrequire, we delay loading until the module is actually needed.We also avoidusebecause it would import things into our current package. This could be antisocial. From the previous code snippet, you cannot determine what package you're even in. Just because we've declared a subroutine namedcopyEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Invoking Methods Indirectly
- InhaltsvorschauYou want to invoke a method by a name that isn't known until runtime.Store the method name as a string in a scalar variable and use it where you would use the real method name to the right of the arrow operator:
$methname = "flicker"; $obj->$methname(10); # invokes $ob->flicker(10); # invoke three methods on the object, by name foreach $m ( qw(start run stop) ) { $obj->$m( ); }Sometimes you need to invoke a method whose name you've stored somewhere. You can't take the address of a method, but you can store its name. If you have a scalar variable$methcontaining the method name, invoke the method on an object$crystalwith$crystal->$meth( ).@methods = qw(name rank serno); %his_info = map { $_ => $ob->$_( ) } @methods; # same as this: %his_info = ( 'name' => $ob->name( ), 'rank' => $ob->rank( ), 'serno' => $ob->serno( ), );If you're desperate to devise a way to get a method's address, you should try to rethink your algorithm. For example, instead of incorrectly taking\$ob->method( ), which simply applies the backslash to that method's return value or values, do this:my $fnref = sub { $ob->method(@_) };Now when it's time to invoke that indirectly, you would use:$fnref->(10, "fred");
and have the closure in turn correctly use the original value of$ob(provided$obwas a lexical variable) that was around when it was created to call:$ob->method(10, "fred");
This works even if$obhas gone out of scope. This solution is much cleaner.When using indirect method invocation, it is permitted to store a subroutine reference in the scalar variable instead of a string representing the method name. No verification that the function represents a valid method.The code reference returned by the UNIVERSALcanmethod should probably not be used for indirect method invocation on objects other than the one on which it was called, or at least those of the same class. That's because you have no reason to believe that this will be a valid method when applied to an object of an arbitrary class.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Determining Subclass Membership
- InhaltsvorschauYou want to know whether an object is an instance of a particular class or that class's subclasses. Perhaps you want to decide whether a particular method should be invoked on an arbitrary object.Use methods from the special UNIVERSAL class:
$obj->isa("HTTP::Message"); # as object method HTTP::Response->isa("HTTP::Message"); # as class method if ($obj->can("method_name")) { .... } # check method validityWouldn't it be convenient if all objects were rooted at some ultimate base class? That way you could give every object common methods without having to add to each@ISA. Well, you can. You don't see it, but Perl pretends there's an extra element at the end of@ISA—the package named UNIVERSAL.UNIVERSAL has only a few predefined methods, although you are free to add your own. These are built right into your Perl binary, so they don't take extra time to load. Predefined methods includeisa,can, andVERSION. All three may be used for both sorts of invocants: classes and objects.Theisamethod reports whether its invocant inherits the class name directly or indirectly from the class name supplied as the argument. This saves having to traverse the hierarchy yourself, and is much better than testing with an exact check against the string returned by therefbuilt-in. You may even supply a basic type thatrefmight return as an argument, such as SCALAR, ARRAY, HASH, or GLOB.$has_io = $fd->isa("IO::Handle") || $fd->isa("GLOB"); $itza_handle = IO::Socket->isa("IO::Handle");Type checks like this are sometimes frowned upon as being too constraining. If you just want to know if a certain method can be invoked against something, it might be better to just try to invoke the method you're hoping will be there instead of checking for the class.Another possibility is to use another UNIVERSAL method,Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing an Inheritable Class
- InhaltsvorschauYou're not sure whether you've designed your class robustly enough to be inherited.Use the "empty subclass test" on your class.Imagine you've implemented a class named Person that supplies a constructor named
new, and methods such asageandname. Here's the straightforward implementation:package Person; sub new { my $class = shift; my $self = { }; return bless $self, $class; } sub name { my $self = shift; $self->{NAME} = shift if @_; return $self->{NAME}; } sub age { my $self = shift; $self->{AGE} = shift if @_; return $self->{AGE}; }You might use the class in this way:use Person; my $dude = Person->new( ); $dude->name("Jason"); $dude->age(23); printf "%s is age %d.\n", $dude->name, $dude->age;Now consider another class, the one named Employee:package Employee; use Person; @ISA = ("Person"); 1;There's not a lot to that one. All it's doing is loading in class Person and stating that Employee will inherit any needed methods from Person. Since Employee has no methods of its own, it will get all of its methods from Person. We rely upon an Employee to behave just like a Person.Setting up an empty class like this is called the empty base class test; that is, it creates a derived class that does nothing but inherit from a base class. If the original base class has been designed properly, then the new derived class can be used as a drop-in replacement for the old one. This means you should be able to change just the class name and everything will still work:use Employee; my $empl = Employee->new( ); $empl->name("Jason"); $empl->age(23); printf "%s is age %d.\n", $empl->name, $empl->age;By proper design, we mean using only the two-argument form ofbless, avoiding any direct access of class data, and exporting nothing. In theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Accessing Overridden Methods
- InhaltsvorschauYour class's constructor method overrides the constructor of its parent class. You want your constructor to invoke the parent class's constructor.Learn about the special pseudoclass, SUPER.
sub meth { my $self = shift; $self->SUPER::meth( ); }In languages like C++ where constructors don't actually allocate memory but just initialize the object, all base class constructors are automatically invoked for you. In languages like Java and Perl, you have to invoke them yourself.To invoke a method in a particular class, the notation$self->SUPER::meth( )is used. This is an extension of the regular notation that means to begin searching for a method in a particular class. It is valid only from within an overridden method. Here's a comparison of styles:$self->meth( ); # Call wherever first meth is found $self->Where::meth( ); # Start looking in package "Where" $self->SUPER::meth( ); # Call overridden version
Simple users of the class should probably restrict themselves to the first line in the previous example. The second is possible, but not suggested for this situation, because we have the special notation shown in the third line, which only works within the overridden method.An overriding constructor should invoke its SUPER's constructor to allocate and bless the object, limiting itself to instantiating any data fields needed. It makes sense here to separate the object allocation code from the object initialization code for reasons that will become clear a couple paragraphs from now. We'll name it with a leading underscore, a convention indicating a nominally private method. Think of it as a "Do Not Disturb" sign.sub new { my $classname = shift; # What class are we constructing? my $self = $classname->SUPER::new(@_); $self->_init(@_); return $self; # And give it back } sub _init { my $self = shift; $self->{START} = time( ); # init data fields $self->{AGE} = 0; $self->{EXTRA} = { @_ }; # anything extra }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Generating Attribute Methods Using AUTOLOAD
- InhaltsvorschauYour object needs accessor methods to set or get its data fields, and you're tired of writing them all out one at a time.Carefully use Perl's AUTOLOAD mechanism as something of a proxy method generator so you don't have to create them all yourself each time you want to add a new data field.Perl's AUTOLOAD mechanism intercepts all possible undefined method invocations. To disallow arbitrary data names, we store the list of permitted fields in a hash. The AUTOLOAD method checks for whether the accessed field name is in that hash.
package Person; use strict; use Carp; our(%ok_field); # Authorize four attribute fields for my $attr ( qw(name age peers parent) ) { $ok_field{$attr}++; } sub AUTOLOAD { my $self = shift; my $attr = our $AUTOLOAD; $attr =~ s/.*:://; return unless $attr =~ /[^A-Z]/; # skip DESTROY and all-cap methods croak "invalid attribute method: ->$attr( )" unless $ok_field{$attr}; $self->{uc $attr} = shift if @_; return $self->{uc $attr}; } sub new { my $proto = shift; my $class = ref($proto) || $proto; my $parent = ref($proto) && $proto; my $self = { }; bless($self, $class); $self->parent($parent); return $self; } 1;This class supports a constructor method namednew, plus four attribute methods:name,age,peers, andparent. Use the module this way:use Person; my ($dad, $kid); $dad = Person->new; $dad->name("Jason"); $dad->age(23); $kid = $dad->new; $kid->name("Rachel"); $kid->age(2); printf "Kid's parent is %s\n", $kid->parent->name; Kid's parent is JasonThis is tricky when producing inheritance trees. Suppose you'd like an Employee class that had every data attribute of the Person class, plus two new ones, likesalaryandboss. Class Employee can't rely upon an inheritedPerson::AUTOLOADto determine what Employee's attribute methods are. So each class would need its ownEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Coping with Circular Data Structures Using Objects
- InhaltsvorschauYou have an inherently self-referential data structure, so Perl's reference-based garbage collection system won't notice when it's no longer being used. You want to prevent your program from leaking memory.Create a non-circular container object that holds a pointer to the self-referential data structure. Define a
DESTROYmethod for the containing object's class that manually breaks the self-referential circularities.Or use weak references, as described in Recipe 11.15.Many interesting data structures include references back to themselves. This can occur in code as simple as this:$node->{NEXT} = $node;As soon as you do that, you've created a circularity that will hide the data structure from Perl's referenced-based garbage collection system. Destructors will eventually be invoked when your program exits, but sometimes you don't want to wait that long.A circular linked list is similarly self-referential. Each node contains a front pointer, a back pointer, and the node's value. If you implement it with references in Perl, you get a circular set of references and the data structure won't be automatically garbage collected when there are no external references to its nodes.Making each node an instance of class Ring doesn't solve the problem. What you want is for Perl to clean up this structure as it would any other structure—which it will do if you implement your object as a structure that contains a reference to the real circle. That reference will be stored in the "DUMMY" field:package Ring; # return an empty ring structure sub new { my $class = shift; my $node = { }; $node->{NEXT} = $node->{PREV} = $node; my $self = { DUMMY => $node, COUNT => 0 }; bless $self, $class; return $self; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Overloading Operators
- InhaltsvorschauYou want to use familiar operators like
= =or+on objects from a class you've written, or you want to define the print interpolation value for objects.Use theuseoverloadpragma. Here are two of the more commonly overloaded operators:use overload '<=>' => \&threeway_compare; sub threeway_compare { my ($s1, $s2) = @_; return uc($s1->{NAME}) cmp uc($s2->{NAME}); } use overload '""' => \&stringify; sub stringify { my $self = shift; return sprintf "%s (%05d)", ucfirst(lc($self->{NAME})), $self->{IDNUM}; }When you use built-in types, certain operators apply, like+for addition or . for string concatenation. With theuseoverloadpragma, you can customize these operators so they do something special on your own objects.This pragma takes a list of operator/function call pairs, such as:package TimeNumber; use overload '+' => \&my_plus, '-' => \&my_minus, '*' => \&my_star, '/' => \&my_slash;Those four operators can now be used with objects of class TimeNumber, and the listed functions will be called as method invocations. These functions can do anything you'd like.Here's a simple example of an overload of+for use with an object that holds hours, minutes, and seconds. It assumes that both operands are of a class that has anewmethod that can be invoked as an object method, and that the structure names are as shown:sub my_plus { my($left, $right) = @_; my $answer = $left->new( ); $answer->{SECONDS} = $left->{SECONDS} + $right->{SECONDS}; $answer->{MINUTES} = $left->{MINUTES} + $right->{MINUTES}; $answer->{HOURS} = $left->{HOURS} + $right->{HOURS}; if ($answer->{SECONDS} >= 60) { $answer->{SECONDS} %= 60; $answer->{MINUTES} ++; } if ($answer->{MINUTES} >= 60) { $answer->{MINUTES} %= 60; $answer->{HOURS} ++; } return $answer; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Creating Magic Variables with tie
- InhaltsvorschauYou want to add special processing to a variable or handle.Use the
tiefunction to give your ordinary variables object hooks.Anyone who's ever used a DBM file under Perl has already used tied objects. Perhaps the most excellent way of using objects is such that the user need never notice them. Withtie, you can bind a variable or handle to a class, after which all access to the tied variable or handle is transparently intercepted by specially named object methods (see Table 13-2).The most importanttiemethods are FETCH to intercept read access, STORE to intercept write access, and the constructor, which is one of TIESCALAR, TIEARRAY, TIEHASH, or TIEHANDLE.Table 13-2: How tied variables are interpreted User codeExecuted codetie $s, "SomeClass"SomeClass->TIESCALAR( )$p = $s$p = $obj->FETCH( )$s = 10$obj->STORE(10)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 14: Database Access
- InhaltsvorschauI only ask for information.—Charles Dickens, David CopperfieldEverywhere you find data, you find databases. At the simplest level, every file can be considered a database. At the most complex level, expensive and complex relational database systems handle thousands of transactions per second. In between are countless improvised schemes for fast access to loosely structured data. Perl can work with all of them.Early in the history of computers, people noticed that flat file databases don't scale to large data sets. Flat files were tamed using fixed-length records or auxiliary indices, but updating became expensive, and previously simple applications bogged down with I/O overhead.After some head-scratching, clever programmers devised a better solution. As hashes in memory provide more flexible access to data than do arrays, hashes on disk offer more convenient kinds of access than array-like text files. These benefits in access time cost you space, but disk space is cheap these days (or so the reasoning goes).The DBM library gives Perl programmers a simple, easy-to-use database. You use the same standard operations on hashes bound to DBM files as you do on hashes in memory. In fact, that's how you use DBM databases from Perl. You use
tieto associate a hash with a class and a file. Then whenever you access the hash, the class consults or changes the DBM database on disk. The olddbmopenfunction also did this, but only let you use one DBM implementation in your program, so you couldn't copy from one format to another.Recipe 14.1 shows how to create a DBM database and gives tips on using it efficiently. Although you can do with DBM files the same things you do with regular hashes, their disk-based nature leads to performance concerns that don't exist with in-memory hashes. Because DBM files are disk-based and can be shared between processors, use a sentinel lock file (see Recipe 7.24) to regulate concurrent access to them.Recipes Recipe 14.2 and Recipe 14.4 explain these concerns and show how to work around them. DBM files also make possible operations that aren't available using regular hashes. Recipe 14.5 explains two of these things.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauEverywhere you find data, you find databases. At the simplest level, every file can be considered a database. At the most complex level, expensive and complex relational database systems handle thousands of transactions per second. In between are countless improvised schemes for fast access to loosely structured data. Perl can work with all of them.Early in the history of computers, people noticed that flat file databases don't scale to large data sets. Flat files were tamed using fixed-length records or auxiliary indices, but updating became expensive, and previously simple applications bogged down with I/O overhead.After some head-scratching, clever programmers devised a better solution. As hashes in memory provide more flexible access to data than do arrays, hashes on disk offer more convenient kinds of access than array-like text files. These benefits in access time cost you space, but disk space is cheap these days (or so the reasoning goes).The DBM library gives Perl programmers a simple, easy-to-use database. You use the same standard operations on hashes bound to DBM files as you do on hashes in memory. In fact, that's how you use DBM databases from Perl. You use
tieto associate a hash with a class and a file. Then whenever you access the hash, the class consults or changes the DBM database on disk. The olddbmopenfunction also did this, but only let you use one DBM implementation in your program, so you couldn't copy from one format to another.Recipe 14.1 shows how to create a DBM database and gives tips on using it efficiently. Although you can do with DBM files the same things you do with regular hashes, their disk-based nature leads to performance concerns that don't exist with in-memory hashes. Because DBM files are disk-based and can be shared between processors, use a sentinel lock file (see Recipe 7.24) to regulate concurrent access to them.Recipes Recipe 14.2 and Recipe 14.4 explain these concerns and show how to work around them. DBM files also make possible operations that aren't available using regular hashes. Recipe 14.5 explains two of these things.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making and Using a DBM File
- InhaltsvorschauYou want to create, populate, inspect, or delete values in a DBM database.Use
tieto open the database and make it accessible through a hash. Then use the hash as you normally would. When you're done, calluntie:use DB_File; # load database module tie %HASH, "DB_File", $FILENAME # open database to be accessed or die "Can't open $FILENAME:$!\n"; # through %HASH $V = $HASH{$KEY}; # retrieve from database $HASH{$KEY} = $VALUE; # put value into database if (exists $HASH{$KEY}) { # check whether in database # ... } delete $HASH{$KEY}; # delete from database untie %HASH; # close the databaseAccessing a database as a hash is powerful but easy, giving you a persistent hash that sticks around after the program using it has finished running. It's also much faster than loading in a new hash every time; even if the hash has a million entries, your program starts up virtually instantaneously.The program in Example 14-1 treats the database as though it were a normal hash. You can even callkeysoreachon it. Likewise,existsanddefinedare implemented for tied DBM hashes. Unlike a normal hash, a DBM hash does not distinguish between those two functions.Example 14-1. userstats#!/usr/bin/perl -w # userstats - generates statistics on who is logged in. # call with an argument to display totals use DB_File; $db = "/tmp/userstats.db"; # where data is kept between runs tie(%db, 'DB_File', $db) or die "Can't open DB_File $db : $!\n"; if (@ARGV) { if ("@ARGV" eq "ALL") { @ARGV = sort keys %db; } foreach $user (@ARGV) { print "$user\t$db{$user}\n"; } } else { @who = `who`; # run who(1) if ($?) { die "Couldn't run who: $?\n"; # exited abnormally } # extract username (first thing on the line) and update foreach $line (@who) { $line =~ /^(\S+)/; die "Bad line from who: $line\n" unless $1; $db{$1}++; } } untie %db;Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Emptying a DBM File
- InhaltsvorschauYou want to clear out a DBM file.Open the database and assign
( )to it. Usetie:use DB_File; tie(%HASH, "DB_File", $FILENAME) or die "Can't open FILENAME: $!\n"; %HASH = ( ); untie %HASH;
Alternatively, delete the file and reopen:unlink $FILENAME or die "Couldn't unlink $FILENAME to empty the database: $!\n"; tie(%HASH => "DB_File", $FILENAME) or die "Couldn't create $FILENAME database: $!\n";It may be quicker to delete the file and create a new one than to reset it, but doing so opens you up to a race condition that trips up a careless program or makes it vulnerable to an attacker. The attacker could make a link pointing to the file /etc/precious with the same name as your file between the time when you deleted the file and when you recreated it. When the DBM library opens the file, it clobbers /etc/precious.If you delete a DB_File database and recreate it, you'll lose any customizable settings like page size, fill-factor, and so on. This is another good reason to assign the empty list to the tied hash.The documentation for the standard DB_File module, also in Chapter 32 of Programming Perl; theunlinkfunction in perlfunc(1); Recipe 14.1Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Converting Between DBM Files
- InhaltsvorschauYou have a file in one DBM format, but another program expects input in a different DBM format.Read the keys and values from the initial DBM file and write them to a new file in the different DBM format as in Example 14-2.Example 14-2. db2gdbm
#!/usr/bin/perl -w # db2gdbm: converts DB to GDBM use strict; use DB_File; use GDBM_File; unless (@ARGV = = 2) { die "usage: db2gdbm infile outfile\n"; } my ($infile, $outfile) = @ARGV; my (%db_in, %db_out); # open the files tie(%db_in, 'DB_File', $infile) or die "Can't tie $infile: $!"; tie(%db_out, 'GDBM_File', $outfile, GDBM_WRCREAT, 0666) or die "Can't tie $outfile: $!"; # copy (don't use %db_out = %db_in because it's slow on big databases) while (my($k, $v) = each %db_in) { $db_out{$k} = $v; } # these unties happen automatically at program exit untie %db_in; untie %db_out;Call the program as:% db2gdbm /tmp/users.db /tmp/users.gdbm
When multiple types of DBM file are used in the same program, you have to usetie, not thedbmopeninterface. That's because withdbmopenyou can use only one database format, which is why its use is deprecated.Copying hashes by simple assignment, as in%new=%old, works on DBM files. However, it loads everything into memory first as a list, which doesn't matter with small hashes, but can be prohibitively expensive in the case of DBM files. For database hashes, useeachto iterate through them instead.The documentation for the standard modules GDBM_File, NDBM_File, SDBM_File, DB_File, some of which are in Chapter 32 of Programming Perl; Recipe 14.1Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Merging DBM Files
- InhaltsvorschauYou want to combine two DBM files into a single DBM file with original key-value pairs.Either merge the databases by treating their hashes as lists:
%OUTPUT = (%INPUT1, %INPUT2);
or, more wisely, by iterating over each key-value pair:%OUTPUT = ( ); foreach $href ( \%INPUT1, \%INPUT2 ) { while (my($key, $value) = each(%$href)) { if (exists $OUTPUT{$key}) { # decide which value to use and set $OUTPUT{$key} if necessary } else { $OUTPUT{$key} = $value; } } }This straightforward application of Recipe 5.11 comes with the same caveats. Merging hashes by treating them as lists requires that the hashes be preloaded into memory, creating a potentially humongous temporary list. If you're dealing with large hashes, have little virtual memory, or both, then you want to iterate over the keys witheachto save memory.Another difference between these merging techniques is what to do if the same key exists in both input databases. The blind assignment merely overwrites the first value with the second value. The iterative merging technique lets you decide what to do. Possibilities include issuing a warning or error, choosing the first over the second, choosing the second over the first, or concatenating the new value to the old one. If you're using the MLDBM module, you can even store them both, using an array reference to the two values.Recipe 5.11; Recipe 14.6Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Sorting Large DBM Files
- InhaltsvorschauYou want to process a large dataset you'd like to commit to a DBM file in a particular order.Use the DB_File's B-tree bindings and supply a comparison function of your own devising:
use DB_File; # specify the Perl sub to do key comparison using the # exported $DB_BTREE hash reference $DB_BTREE->{'compare'} = sub { my ($key1, $key2) = @_ ; "\L$key1" cmp "\L$key2" ; }; tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE) or die "can't tie $filename: $!";An annoyance of hashes, whether in memory or as DBM files, is that they do not maintain proper ordering. The CPAN module Tie::IxHash can make a regular hash in memory maintain its insertion order, but that doesn't help for DBM databases or arbitrary sorting criteria.The DB_File module supports a nice solution to this using a B-tree implementation. One advantage of a B-tree over a regular DBM hash is its ordering. When the user defines a comparison function, all calls tokeys,values, andeachare automatically ordered. For example, Example 14-3 is a program that maintains a hash whose keys will always be sorted case-insensitively.Example 14-3. sortdemo#!/usr/bin/perl # sortdemo - show auto dbm sorting use strict; use DB_File; $DB_BTREE->{'compare'} = sub { my ($key1, $key2) = @_ ; "\L$key1" cmp "\L$key2" ; }; my %hash; my $filename = '/tmp/sorthash.db'; tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE) or die "can't tie $filename: $!"; my $i = 0; for my $word (qw(Can't you go camp down by Gibraltar)) { $hash{$word} = ++$i; } while (my($word, $number) = each %hash) { printf "%-12s %d\n", $word, $number; }By default, the entries in a B-tree DB_File database are stored alphabetically. Here, though, we provide a case-insensitive comparison function, so usingEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Storing Complex Data in a DBM File
- InhaltsvorschauYou want values in a DBM file to be something other than scalars. For instance, you use a hash of hashes in your program and want to store them in a DBM file for other programs to access, or you want them to persist across process runs.Use the CPAN module MLDBM to store more complex values than strings and numbers.
use MLDBM 'DB_File'; tie(%HASH, 'MLDBM', [... other DBM arguments]) or die $!;
Specify a particular serializing module with:use MLDBM qw(DB_File Storable);
MLDBM uses a serializing module like Storable, Data::Dumper, or FreezeThaw (see Recipe 11.4) to convert data structures to and from strings so that they can be stored in a DBM file. It doesn't store references; instead, it stores the data those references refer to:# %hash is a tied hash $hash{"Tom Christiansen"} = [ "book author", 'tchrist@perl.com' ]; $hash{"Tom Boutell"} = [ "shareware author", 'boutell@boutell.com' ]; # names to compare $name1 = "Tom Christiansen"; $name2 = "Tom Boutell"; $tom1 = $hash{$name1}; # snag local pointer $tom2 = $hash{$name2}; # and another print "Two Toming: $tom1 $tom2\n"; Tom Toming: ARRAY(0x73048) ARRAY(0x73e4c)Each time MLDBM retrieves a data structure from the DBM file, it generates a new copy of that data. To compare data that you retrieve from a MLDBM database, you need to compare the values within the structure:if ($tom1->[0] eq $tom2->[0] && $tom1->[1] eq $tom2->[1]) { print "You're having runtime fun with one Tom made two.\n"; } else { print "No two Toms are ever alike.\n"; }This is more efficient than:if ($hash{$name1}->[0] eq $hash{$name2}->[0] && # INEFFICIENT $hash{$name1}->[1] eq $hash{$name2}->[1]) { print "You're having runtime fun with one Tom made two.\n"; } else { print "No two Toms are ever alike.\n"; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Persistent Data
- InhaltsvorschauYou want your variables to retain their values between calls to your program.Use a MLDBM to store the values between calls to your program:
use MLDBM "DB_File"; my ($VARIABLE1,$VARIABLE2); my $Persistent_Store = "/projects/foo/data"; BEGIN { my %data; tie(%data, "MLDBM", $Persistent_Store) or die "Can't tie to $Persistent_Store : $!"; $VARIABLE1 = $data{VARIABLE1}; $VARIABLE2 = $data{VARIABLE2}; # ... untie %data; } END { my %data; tie (%data, "MLDBM", $Persistent_Store) or die "Can't tie to $Persistent_Store : $!"; $data{VARIABLE1} = $VARIABLE1; $data{VARIABLE2} = $VARIABLE2; # ... untie %data; }An important limitation of MLDBM is that you can't add to or alter the structure in the reference without assignment to a temporary variable. We do this in the sample program in Example 14-4, assigning to$array_refbefore wepush. You can't simply do this:push(@{$db{$user}}, $duration);For a start, MLDBM doesn't allow it. Also,$db{$user}might not be in the database (the array reference isn't automatically created as it would be if%dbweren't tied to a DBM file). This is why we testexists$db{$user}when we give$array_refits initial value. We're creating the empty array for the case where it doesn't already exist.Example 14-4. mldbm-demo#!/usr/bin/perl -w # mldbm_demo - show how to use MLDBM with DB_File use MLDBM "DB_File"; $db = "/tmp/mldbm-array"; tie %db, "MLDBM", $db or die "Can't open $db : $!"; while(<DATA>) { chomp; ($user, $duration) = split(/\s+/, $_); $array_ref = exists $db{$user} ? $db{$user} : [ ]; push(@$array_ref, $duration); $db{$user} = $array_ref; } foreach $user (sort keys %db) { print "$user: "; $total = 0; foreach $duration (@{ $db{$user} }) { print "$duration "; $total += $duration; } print "($total)\n"; } _ _END_ _ gnat 15.3 tchrist 2.5 jules 22.1 tchrist 15.9 gnat 8.7Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Saving Query Results to Excel or CSV
- InhaltsvorschauYou want to query a relational database and create a file of the results so that another program or person can use them. The two common formats people want to get data in are CSV and Excel.Use the CPAN module DBIx::Dump to dump the statement handle after the query:
use DBIx::Dump; use DBI; # ... connect to your database as normal $sth = $dbh->prepare("SELECT ..."); # your query here $sth->execute( ); $out = DBIx::Dump->new('format' => $FORMAT, # excel or csv 'output' => $FILENAME, # file to save as 'sth' => $sth); $out->dump( );The CPAN module DBIx::Dump supports Excel and CSV file formats. It uses the CPAN module Spreadsheet::WriteExcel to write Excel files, and the CPAN module Text::CSV_XS to write CSV files.The first row in the output files holds the column names. For example:ID,NAME 1,Nat 2,Tom 4,Larry 5,Damian 6,Jon 7,Dan
The documentation for the CPAN modules DBIx::Dump, Spreadsheet::WriteExcel, and Text::CSV_XS; Recipe 14.17Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Executing an SQL Command Using DBI
- InhaltsvorschauYou want to send SQL queries to a database system such as Oracle, Sybase, mSQL, or MySQL, and process the results.Use the DBI (DataBase Interface) and DBD (DataBase Driver) modules available from CPAN:
use DBI; $dbh = DBI->connect('dbi:driver:database', 'username', 'auth', { RaiseError => 1, AutoCommit => 1}); $dbh->do($NON_SELECT_SQL_STATEMENT); $results = $dbh->selectall_arrayref($SELECT_SQL_STATEMENT); $sth = $dbh->prepare($SQL_SELECT_STATEMENT); $sth->execute( ); while (@row = $sth->fetchrow_array) { # ... } $dbh->disconnect( );The DBI module abstracts away the different database APIs, offering you a single set of functions for accessing every database. The actual work of connecting to a database, issuing queries, parsing results, etc. is done by a DBD module specific to that database (e.g., DBD::mysql, DBD::Oracle, etc.).All work with databases via the DBI is done through handles. A handle is simply an object, created by callingDBI->connect. This is attached to a specific database and driver using theDBI->connectcall.The first argument toDBI->connectis a single string with three colon-separated fields. This DSN (Data Source Name) identifies the database you're connecting to. The first field is alwaysdbi(though this is case-insensitive, soDBIwill do just as well), and the second is the name of the driver you're going to use (Oracle,mysql, etc.). The rest of the string is passed by the DBI module to the requested driver module (DBD::mysql, for example) where it identifies the database.The second and third arguments authenticate the user.The fourth argument is an optional hash reference defining attributes of the connection. PrintError controls whether DBI warns when a DBI method fails (the default is true; setting it to a false value keeps DBI quiet). Setting RaiseError is like PrintError except thatEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Escaping Quotes
- InhaltsvorschauYou want to put Perl values into queries as literal strings, but you're not sure how your database wants strings to be quoted.Use the database handle's
quotemethod:$quoted = $dbh->quote($unquoted);
This$quotedvalue is now suitable for interpolation into queries:$sth->prepare("SELECT id,login FROM People WHERE name = $quoted");Or simply use placeholders in your query and DBI automatically quotes strings for you:$sth->prepare("SELECT id,login FROM People WHERE name = ?"); $sth->execute($unquoted);Each database has its own quoting idiosyncrasies, so leave the quoting to thequotemethod or placeholders rather than trying to roll your own quoting function. Not only is hardcoding quotes into your SQL non-portable, it doesn't take into account the possibility that the strings you're interpolating might have quotes in them. For example, take this:$sth = $dbh->prepare(qq{SELECT id,login FROM People WHERE name="$name"});If$nameisJon "maddog" Orwant, then you are effectively preparing this query, which is invalid SQL:SELECT id,login FROM People WHERE name="Jon "maddog" Orwant"
The only strange quoting behavior fromquoteis this: because the DBI represents NULL values asundef, if you passundeftoquote, it returnsNULLwithout quotes.The documentation with the DBI module from CPAN;http://dbi.perl.org; Programming the Perl DBIEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Dealing with Database Errors
- InhaltsvorschauYou want your program to catch and handle database errors, possibly displaying informative error messages.The best solution is to enable RaiseError when you connect to the database, then wrap database calls in
eval:$dbh = DBI->connect($DSN, $user, $password, { RaiseError => 1 }); eval { $dbh->do($SQL); $sth = $dbh->prepare($SQL2); $sth->execute( ); while (@row = $sth->fetchrow_array) { # ... } }; if ($@) { # recover here using $DBI::lasth->errstr to get # the error message }The logic here is simple: first tell DBI todieif there's a problem with your SQL (otherwise, the database can't do what you wanted it to). Then, wrap the code that mightdieinevalto catch fatal errors. Next, check$@(either the error message you would havedied with or empty if there was no error) to see whether something went wrong. If it did, somehow deal with the error.DBI supplies the$DBI::lasthvariable, containing the last handle acted on. If something went wrong, that's the handle that caused it. You could use the$@error message, but that also has the "died at file ... line ..." text fromdiethat you might not want. To discover the SQL statement that died, use$DBI::lasth->{Statement}. If you are using only the one handle, you can call the methods directly on your handle instead of on$DBI::lasth:$msg = $dbh->errstr; $sql = $dbh->{Statement};An alternative approach is to disable RaiseError and check the return value for each database call. Methods such asdoandexecutereturn a true value if successful, so you can say:$dbh->do($SQL) or die $dbh->errstr; $sth->execute( ) or die $sth->errstr;
Thedomethod returns the number of rows affected, but in such a way that it always returns a true value if successful. (If you're curious how to do this, see the Introduction to Chapter 1 for the gory details of how Perl decides what's true and what's false.)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Repeating Queries Efficiently
- InhaltsvorschauYou have a query that you want to execute repeatedly, and you'd like to do it as efficiently as possible. Sometimes you have several queries that are similar, but not quite identical, that you'd like to execute efficiently (for example, you have a loop through an array of names and want to
SELECT...WHEREname=$name).Take advantage of the fact that you can repeatedlyexecutea query that you needprepareonly once.$sth = $dbh->prepare($SQL); # execute query ten times for ($i=0; $i < 10; $i++) { $sth->execute( ); while (@row = $sth->fetchrow_array) { # ... } }If you have changing parameters, use the DBI's binding features:$sth = $dbh->prepare('SELECT uid,login FROM People WHERE name = ?'); foreach $person (@names) { $sth->execute($person); while (@row = $sth->fetchrow_array) { # ... } }"Prepare once, execute often" is one secret to DBI success. By separating preparation from execution, the database server can parse and optimize queries once and then execute them many times. Most databases can do this even when the queries contain placeholders for values to be filled when the query is executed.The process of replacing placeholders with actual values is known as binding. The simplest way is to bind when you execute:$sth = $dbh->prepare('SELECT id,login FROM People WHERE middle_initial = ?'); $sth->execute('J');If you have multiple parameters to bind, pass more values toexecute:$sth = $dbh->prepare('SELECT * FROM Addresses WHERE House = ? AND Street LIKE ?'); $sth->execute('221b', 'Baker%');You don't have to do the binding and the execution in one step. Thebind_paramfunction binds without executing:$sth = $dbh->prepare('SELECT id,login FROM People WHERE middle_initial = ?'); $sth->bind_param(1, 'J'); $sth->execute( );Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Building Queries Programmatically
- InhaltsvorschauYou want to create searches at runtime. For example, you want users of your program to be able to specify combinations of columns and allowable ranges of values.Build a list of clauses and
jointhem together to form the SQL WHERE clause:if ($year_min) { push @clauses, "Year >= $year_min" } if ($year_max) { push @clauses, "Year <= $year_max" } if ($bedrooms_min) { push @clauses, "Beds >= $bedrooms_min" } if ($bedrooms_max) { push @clauses, "Beds <= $bedrooms_max" } # ... $clause = join(" AND ", @clauses); $sth = $dbh->prepare("SELECT beds,baths FROM Houses WHERE $clause");Don't try to build up a string in a loop:$where = ''; foreach $possible (@names) { $where .= ' OR Name=' . $dbh->quote($possible); }That code will end up creating a WHERE clause like:OR Name="Tom" OR Name="Nat" OR Name="Larry" OR Name="Tim"
Then you end up having to lop off the leading "OR". It's much cleaner to usemapand never have the extra text at the start:$where = join(" OR ", map { "Name=".$dbh->quote($_) } @names);Themapproduces a list of strings like:Name="Nat" Name="Tom" Name="Larry" Name="Tim"
and then they're joined together with "OR" to create a well-formed clause:Name="Nat" OR Name="Tom" OR Name="Larry" OR Name="Tim"
Unfortunately, you cannot use placeholders here:$sth = $dbh->prepare("SELECT id,login FROM People WHERE ?"); # BAD $sth->bind_param(1, $where);As explained in Recipe 14.12, placeholders can only be used for simple scalar values and not entire clauses. However, there is an elegant solution: construct the clause and the values to be bound in parallel:if ($year_min) { push @clauses, "Year >= ?"; push @bind, $year_min } if ($year_max) { push @clauses, "Year <= ?"; push @bind, $year_max } if ($bedrooms_min) { push @clauses, "Beds >= ?"; push @bind, $bedrooms_min } if ($bedrooms_max) { push @clauses, "Beds <= ?"; push @bind, $bedrooms_max } $clause = join(" AND ", @clauses); $sth = $dbh->prepare("SELECT id,price FROM Houses WHERE $clause"); $sth->execute(@bind);Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding the Number of Rows Returned by a Query
- InhaltsvorschauYou want to find out how many rows were returned by a query.For operations that aren't queries (such as INSERTs, UPDATEs, and DELETEs), the
domethod returns the number of rows affected, -1 when it can't determine the right value, or elseundefin case of failure.$rows = $dbh->do("DELETE FROM Conference WHERE Language='REBOL'"); if (! defined $rows) { # failed, but this is not needed if RaiseError is active } else { print "Deleted $rows rows\n"; }You can't reliably get row counts from queries without either fetching all of the results and then counting them, or writing another query.The easiest way to find out how many rows a query will return is to use the COUNT function in SQL. For example, take this query:SELECT id,name FROM People WHERE Age > 30
To find out how many rows it will return, simply issue this query:SELECT COUNT(*) FROM People WHERE Age > 30
If the database is so volatile that you're afraid the number of rows will change between the COUNT query and the data-fetching query, your best option is to fetch the data and then count rows yourself.With some DBD modules,executereturns the number of rows affected. This isn't portable and may change in the future.The documentation with the DBI module from CPAN;http://dbi.perl.org; Programming the Perl DBIEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using Transactions
- InhaltsvorschauA single change to the database requires several INSERT, UPDATE, or DELETE commands in SQL. For example, you might have to add a person to the People table, add an address to the Address table, and add a link between them to the LivesAt table. The database is in a logically inconsistent state from the first insert until the last completes. If another client queries the database, it will get inconsistent data (e.g., there'll be a Person with no address).You want to perform the update in such a way that another client never sees an inconsistent database—either all of the changes or none of them should be visible in the database during and after the changes, regardless of any client or server failures during processing.Use transactions. The DBI supports these via the
commitandrollbackmethods on a database handle. Use them thus:$dbh->{AutoCommit} = 0; # enable transactions $dbh->{RaiseError} = 1; # die( ) if a query has problems eval { # do inserts, updates, deletes, queries here $dbh->commit( ); }; if ($@) { warn "Transaction aborted: $@"; eval { $dbh->rollback( ) }; # in case rollback( ) fails # do your application cleanup here }The AutoCommit option controls whether the database commits each change as soon as you issue the command. When AutoCommit is disabled, the database won't update until you call thecommitmethod. If midway through the series of updates you change your mind or an error occurs, therollbackmethod undoes all pending changes.You don't have to explicitly set the AutoCommit and RaiseError attributes before each transaction. For convenience, set those attributes in theconnectcall:$dbh = DBI->connect($dsn, $username, $password, { AutoCommit => 0, RaiseError => 1 });Because RaiseError causes DBI to calldiewhenever a database operation fails, you break out of theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Viewing Data One Page at a Time
- InhaltsvorschauYou want to display the contents of a table or the results of a query one page at a time.Keep track of which record you're starting with, then use that value to decide how many records to skip before you display a page worth. If your database supports a LIMIT clause that takes a range, use this to avoid transferring unnecessary rows into your program.The example code in this recipe pages through the contents of a table. To page through the results of a query, select the data into a temporary table and page through that.In desktop applications (e.g., Tk), you can keep track of the current page number yourself. With web applications, the easiest thing to do is to use query parameters in the URL to indicate where you are. For example:
/users-report/view?start=1
Begin by finding out how many records there are in total:$row = $Dbh->selectrow_arrayref("SELECT COUNT(*) FROM Users"); $count = $row->[0];Find the first record to display by looking at thestartparameter, then calculate the last record from that. You need to know the number of records per page, which here we assume is in the$Page_Sizevariable:$first = param('start') || 1; $last = $first + $Page_Size - 1; $last = $count if $last > $count; # don't go past the endNow fetch the data into an array and display the records you're interested in:$results = $Dbh->selectall_arrayref('SELECT id,lastname,firstname FROM Users ORDER BY lastname,firstname,id'); for (my $i=$first; $i <= $last; $i++) { my $user = $results->[$i-1]; # result 1 is in row 0 printf("%d. %s, %s.<br>\n", $i, $user->[1], $user->[2]); }That will produce output like:1. Brocard, Leon.<br> 2. Cawley, Piers.<br> 3. Christiansen, Tom.<br>Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Querying a CSV File with SQL
- InhaltsvorschauYou want to use SQL to insert, delete, or retrieve data from a comma-separated values (CSV) file.Use the DBD::CSV module from CPAN:
use DBI; $dbh = DBI->connect("dbi:CSV:f_dir=/home/gnat/payroll", "", "", { AutoCommit => 1, RaiseError => 1 }); $dbh->do("UPDATE salaries SET salary = salary * 2 WHERE name = 'Nat'"); $sth = $dbh->prepare("SELECT name,salary FROM salaries WHERE name = 'Nat'"); $sth->execute( ); while (@row = $sth->fetchrow_array) { # ... } $sth->finish( ); $dbh->disconnect( );A "table" in CSV terms is a file (the table name becomes the filename). The tables are kept in the directory specified by thef_dirparameter in theconnectmethod call. The DBD::CSV module supportsCREATEandDROPto make and destroy tables:$dbh->do("CREATE TABLE salaries (salary FLOAT, name CHAR(20))");Valid column types are: TINYINT, BIGINT, LONGVARBINARY, VARBINARY, BINARY, LONGVARCHAR, CHAR, NUMERIC, DECIMAL, INTEGER, SMALLINT, FLOAT, REAL, and DOUBLE.When you access a table, the DBD::CSV module locks the corresponding file with the flock(2) syscall. If flock(2) isn't supported on the filesystem containing the CSV file, two processes will be able to access the file at the same time, possibly leading to incorrect results or lost data.If you're reading or writing an Excel CSV file, you need to tell the DBD::CSV module that the value separator is actually a semicolon:$dbh = DBI->connect('dbi:CSV:f_dir=/home/gnat/payroll;csv_sep_char=\;');We need to quote the semicolon to preventconnectfrom thinking it's separatingcsv_sep_char=from another connection attribute. We use single quotes rather than double quotes to avoid having to backslash the backslash:$dbh = DBI->connect("dbi:CSV:f_dir=/home/gnat/payroll;csv_sep_char=\\;");Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using SQL Without a Database Server
- InhaltsvorschauYou want to make complex SQL queries but don't want to maintain a relational database server.Use the DBD::SQLite module from CPAN:
use DBI; $dbh = DBI->connect("dbi:SQLite:dbname=/Users/gnat/salaries.sqlt", "", "", { RaiseError => 1, AutoCommit => 1 }); $dbh->do("UPDATE salaries SET salary = 2 * salary WHERE name = 'Nat'"); $sth = $dbh->prepare("SELECT id,deductions FROM salaries WHERE name = 'Nat'"); # ...An SQLite database lives in a single file, specified with thedbnameparameter in the DBI constructor. Unlike most relational databases, there's no database server here—DBD::SQLite interacts directly with the file. Multiple processes can read from the same database file at the same time (with SELECTs), but only one process can make changes (and other processes are prevented from reading while those changes are being made).SQLite supports transactions. That is, you can make a number of changes to different tables, but the updates won't be written to the file until you commit them:use DBI; $dbh = DBI->connect("dbi:SQLite:dbname=/Users/gnat/salaries.sqlt", "", "", { RaiseError => 1, AutoCommit => 0 }); eval { $dbh->do("INSERT INTO people VALUES (29, 'Nat', 1973)"); $dbh->do("INSERT INTO people VALUES (30, 'William', 1999)"); $dbh->do("INSERT INTO father_of VALUES (29, 30)"); $dbh->commit( ); }; if ($@) { eval { $dbh->rollback( ) }; die "Couldn't roll back transaction" if $@; }SQLite is a typeless database system. Regardless of the types specified when you created a table, you can put any type (strings, numbers, dates, blobs) into any field. Indeed, you can even create a table without specifying any types:CREATE TABLE people (id, name, birth_year);
The only time that data typing comes into play is when comparisons occur, either through WHERE clauses or when the database has to sort values. The database ignores the type of the column and looks only at the type of the specific value being compared. Like Perl, SQLite recognizes only strings and numbers. Two numbers are compared as floating-point values, two strings are compared as strings, and a number is always less than a string when values of two different types are compared.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: ggh—Grep Netscape Global History
- InhaltsvorschauThis program divulges the contents of Netscape's history.db file. It can be called with full URLs or with a (single) pattern. If called without arguments, it displays every entry in the history file. The ~/.netscape/history.db file is used unless the -database option is given.Each output line shows the URL and its access time. The time is converted into
localtimerepresentation with -localtime (the default) orgmtimerepresentation with -gmtime—or left in raw form with -epochtime, which is useful for sorting by date.To specify a pattern to match against, give one single argument without a://.To look up one or more URLs, supply them as arguments:% ggh http://www.perl.com/index.html
To find out a link you don't quite recall, use a regular expression (a single argument without a://is a pattern):% ggh perl
To find out everyone you've mailed:% ggh mailto:
To find out the FAQ sites you've visited, use a snazzy Perl pattern with an embedded/imodifier:% ggh -regexp '(?i)\bfaq\b'
If you don't want the internal date converted to localtime, use -epoch:% ggh -epoch http://www.perl.com/perl/
If you prefer gmtime to localtime, use -gmtime:% ggh -gmtime http://www.perl.com/perl/
To look at the whole file, give no arguments (but perhaps redirect to a pager):% ggh | less
If you want the output sorted by date, use the -epoch flag:% ggh -epoch | sort -rn | less
If you want it sorted by date into your local time zone format, use a more sophisticated pipeline:% ggh -epoch | sort -rn | perl -pe 's/\d+/localtime $&/e' | less
The Netscape release notes claim that they're using NDBM format. This is misleading: they're actually using Berkeley DB format, which is why we require DB_File (not supplied standard with all systems Perl runs on) instead of NDBM_File (which is). The program is shown in Example 14-7.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 15: Interactivity
- InhaltsvorschauAnd then the Windows failed—and then I could not see to see—Emily Dickinson, "I heard a Fly buzz—when I died"Everything we use has a user interface: VCRs, computers, telephones, even books. Our programs have user interfaces: do we have to supply arguments on the command line? Can we drag and drop files into the program? Do we have to press
Enterafter every response we make, or can the program read a single keystroke at a time?This chapter won't discuss designing user interfaces: entire bookshelves are filled with books written on the subject. Instead, we focus on implementing user interfaces—parsing command-line arguments, reading a character at a time, writing anywhere on the screen, and writing a graphical user interface.The simplest user interface is what we are called line mode interfaces. Line mode programs normally read entire lines and write characters or entire lines. Filters like grep and utilities like mail exemplify this type of interface. We don't really talk much about this type of interface in this chapter, because so much of the rest of the book does.A more complex interface is what is called full-screen mode. Programs such as vi, elm, and lynx have full-screen interfaces. They read single characters at a time and can write to any character position on the screen. We address this type of interface in Recipe 15.4, Recipe 15.6, Recipe 15.9, Recipe 15.10, and Recipe 15.11.Still more complex are the graphical user interfaces (GUIs). Programs with GUIs can address individual pixels, not just characters. GUIs often follow a windowing metaphor, in which a program creates windows that appear on the user's display device. The windows are filled with widgets, which include things like scrollbars to drag or buttons to click. Netscape Navigator provides a full graphical user interface, as does your window manager. Perl can use many GUI toolkits, but here we'll cover the Tk toolkit, since it's the most well-known and portable. See Recipe 15.14, Recipe 15.15, and Recipe 15.22.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauEverything we use has a user interface: VCRs, computers, telephones, even books. Our programs have user interfaces: do we have to supply arguments on the command line? Can we drag and drop files into the program? Do we have to press
Enterafter every response we make, or can the program read a single keystroke at a time?This chapter won't discuss designing user interfaces: entire bookshelves are filled with books written on the subject. Instead, we focus on implementing user interfaces—parsing command-line arguments, reading a character at a time, writing anywhere on the screen, and writing a graphical user interface.The simplest user interface is what we are called line mode interfaces. Line mode programs normally read entire lines and write characters or entire lines. Filters like grep and utilities like mail exemplify this type of interface. We don't really talk much about this type of interface in this chapter, because so much of the rest of the book does.A more complex interface is what is called full-screen mode. Programs such as vi, elm, and lynx have full-screen interfaces. They read single characters at a time and can write to any character position on the screen. We address this type of interface in Recipe 15.4, Recipe 15.6, Recipe 15.9, Recipe 15.10, and Recipe 15.11.Still more complex are the graphical user interfaces (GUIs). Programs with GUIs can address individual pixels, not just characters. GUIs often follow a windowing metaphor, in which a program creates windows that appear on the user's display device. The windows are filled with widgets, which include things like scrollbars to drag or buttons to click. Netscape Navigator provides a full graphical user interface, as does your window manager. Perl can use many GUI toolkits, but here we'll cover the Tk toolkit, since it's the most well-known and portable. See Recipe 15.14, Recipe 15.15, and Recipe 15.22.The final class of UIs is one we won't address here—web user interfaces. Increasingly, people are eschewing the complicated programming of a fully responsive GUI whose every pixel is addressable, preferring relatively clunky and plain-looking HTML pages. After all, everyone has a web browser, but not everyone can figure out how to install Perl/Tk. We cover the Web in Chapter 19, Chapter 20, and Chapter 21.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Parsing Program Arguments
- InhaltsvorschauYou want to let users change your program's behavior by giving options on the command line. For instance, you want to allow the user to control the level of output that your program produces with a -v (verbose) option.Use the standard Getopt::Std module to permit single-character options:
use Getopt::Std; # -v ARG, -D ARG, -o ARG, sets $opt_v, $opt_D, $opt_o getopt("vDo"); # -v ARG, -D ARG, -o ARG, sets $args{v}, $args{D}, $args{o} getopt("vDo", \%args); getopts("vDo:"); # -v, -D, -o ARG, sets $opt_v, $opt_D, $opt_o getopts("vDo:", \%args); # -v, -D, -o ARG, sets $args{v}, $args{D}, $args{o}Or, use the standard Getopt::Long module to permit named arguments:use Getopt::Long; GetOptions( "verbose" => \$verbose, # --verbose "Debug" => \$debug, # --Debug "output=s" => \$output ); # --output=string or --output stringMost traditional programs like ls and rm take single-character options (also known as flags or switches), such as -l and -r. In the case of ls -l and rm -r, the argument is Boolean: either it is present or it isn't. Contrast this with gcc -o compiledfile source.c, where compiledfile is a value associated with the option -o. We can combine Boolean options into a single option in any order. For example:% rm -r -f /tmp/testdir
Another way of saying this is:% rm -rf /tmp/testdir
The Getopt::Std module, part of the standard Perl distribution, parses these types of traditional options. Itsgetoptfunction takes a single string of characters (each corresponding to an option that takes a value), parses the command-line arguments stored in@ARGV, and sets a global variable for each option. For example, the value for the -D option will be stored inEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Testing Whether a Program Is Running Interactively
- InhaltsvorschauYou want to know whether your program is being called interactively or not. For instance, a user running your program from a shell is interactive, whereas the program being called from cron is not.Use
-tto testSTDINandSTDOUT:sub I_am_interactive { return -t STDIN && -t STDOUT; }If you're on a POSIX system, test process groups:use POSIX qw/getpgrp tcgetpgrp/; sub I_am_interactive { my $tty; open($tty, "<", "/dev/tty") or die "can't open /dev/tty: $!"; my $tpgrp = tcgetpgrp(fileno($tty)); my $pgrp = getpgrp( ); close $tty; return ($tpgrp = = $pgrp); }The-tfile test operator tells whether the filehandle or file is a tty device. Such devices are signs of interactive use. This only tells you whether your program has been redirected. Running your program from the shell and redirectingSTDINandSTDOUTmakes the-tversion ofI_am_interactivereturn false. Called from cron,I_am_interactivealso returns false.The POSIX test tells you whether your program has exclusive control over its tty. A program whose input and output has been redirected still can control its tty if it wants to, so the POSIX version ofI_am_interactivereturns true. A program run from cron has no tty, soI_am_interactivereturns false.WhicheverI_am_interactiveyou choose to use, here's how you'd call it:while (1) { if (I_am_interactive( )) { print "Prompt: "; } $line = <STDIN>; last unless defined $line; # do something with the line }Or, more clearly:sub prompt { print "Prompt: " if I_am_interactive( ) } for (prompt( ); $line = <STDIN>; prompt( )) { # do something with the line }The documentation for the standard POSIX module, also in Chapter 32 ofEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Clearing the Screen
- InhaltsvorschauYou want to clear the screen.Use the Term::Cap module to send the appropriate character sequence. Use POSIX:: Termios to get the output speed of the terminal (or guess 9600 bps). Use
evalto trap any exceptions that arise using POSIX::Termios.use Term::Cap; $OSPEED = 9600; eval { require POSIX; my $termios = POSIX::Termios->new( ); $termios->getattr; $OSPEED = $termios->getospeed; }; $terminal = Term::Cap->Tgetent({OSPEED=>$OSPEED}); $terminal->Tputs('cl', 1, STDOUT);Or, just run the clear command:system("clear");If you clear the screen a lot, cache the return value from the termcap or clear command:$clear = $terminal->Tputs('cl'); $clear = `clear`;Then you can clear the screen a hundred times without running clear a hundred times:print $clear;
Your system's clear(1) and termcap(5) manpages (if you have them); the documentation for the standard module Term::Cap module, also in Chapter 32 of Programming Perl; the documentation for the Term::Lib module from CPANEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Determining Terminal or Window Size
- InhaltsvorschauYou need to know the size of the terminal or window. For instance, you want to format text so that it doesn't pass the righthand boundary of the screen.Either use the
ioctldescribed in Recipe 12.17, or else use the CPAN module Term::ReadKey:use Term::ReadKey; ($wchar, $hchar, $wpixels, $hpixels) = GetTerminalSize( );
GetTerminalSizereturns four elements: the width and height in characters and the width and height in pixels. If the operation is unsupported for the output device (for instance, if output has been redirected to a file), it returns an empty list.Here's how you'd graph the contents of@values, assuming no value is less than 0:use Term::ReadKey; ($width) = GetTerminalSize( ); die "You must have at least 10 characters" unless $width >= 10; $max = 0; foreach (@values) { $max = $_ if $max < $_; } $ratio = ($width-10)/$max; # chars per unit foreach (@values) { printf("%8.1f %s\n", $_, "*" x ($ratio*$_)); }The documentation for the Term::ReadKey module from CPAN; Recipe 12.17Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Changing Text Color
- InhaltsvorschauYou want text to appear in different colors on the screen. For instance, you want to emphasize a mode line or highlight an error message.Use the CPAN module Term::ANSIColor to send the ANSI color-change sequences to the user's terminal:
use Term::ANSIColor; print color("red"), "Danger, Will Robinson!\n", color("reset"); print "This is just normal text.\n"; print colored("<BLINK>Do you hurt yet?</BLINK>", "blink");Or, you can use convenience functions from Term::ANSIColor:use Term::ANSIColor qw(:constants); print RED, "Danger, Will Robinson!\n", RESET;
Term::ANSIColor prepares escape sequences that some (but far from all) terminals will recognize. For example, if you normally launch a color-xterm, this recipe will work. If you normally use the normal xterm program, or have a vt100 in your kitchen, it won't.There are two ways of using the module: either by calling the exported functionscolor($attribute)andcolored($text,$attribute), or by using convenience functions likeBOLD,BLUE, andRESET.Attributes can be a combination of colors and controls. The colors areblack,red,green,yellow,blue,magenta,on_black,on_red,on_green,on_yellow,on_blue,on_magenta,on_cyan, andon_white. (Apparently orange and purple don't matter.) The controls areclear,reset,bold,underline,underscore,blink,reverse, andconcealed.clearandresetare synonyms, as areunderlineandunderscore.resetrestores the colors to the way they were when the program started, andconcealedmakes foreground and background colors the same.You can combine attributes:# rhyme for the deadly coral snake print color("red on_black"), "venom lack\n"; print color("red on_yellow"), "kill that fellow\n"; print color("green on_cyan blink"), "garish!\n"; print color("reset");Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading Single Characters from the Keyboard
- InhaltsvorschauYou want to read a single character from the keyboard. For instance, you've displayed a menu of one-character options, and you don't want to require users to press the Enter key to make their selection.Use the CPAN module Term::ReadKey to put the terminal into
cbreakmode, read characters fromSTDIN, and then put the terminal back into its normal mode:use Term::ReadKey; ReadMode 'cbreak'; $key = ReadKey(0); ReadMode 'normal';
Term::ReadKey can put the terminal into many modes—cbreakis just one of them.cbreakmode makes each character available to your program as it is typed (see Example 15-1). It also echoes the characters to the screen; see Recipe 15.10 for an example of a mode that does not echo.Example 15-1. sascii#!/usr/bin/perl -w # sascii - Show ASCII values for keypresses use Term::ReadKey; ReadMode('cbreak'); print "Press keys to see their ASCII values. Use Ctrl-C to quit.\n"; while (1) { $char = ReadKey(0); last unless defined $char; printf(" Decimal: %d\tHex: %x\n", ord($char), ord($char)); } ReadMode('normal');Usingcbreakmode doesn't prevent the terminal's device driver from interpreting end-of-file and flow-control characters. If you want to be able to read a real Ctrl-C (which normally sends aSIGINTto your process) or a Ctrl-D (which indicates end-of-file under Unix), you want to userawmode.An argument of0toReadKeyindicates that we want a normal read usinggetc. If no input is available, the program will pause until there is some. We can also pass-1to indicate a non-blocking read, or a number greater than0to indicate the number of seconds to wait for input to become available; fractional seconds are allowed. Non-blocking reads and timed-out reads return eitherEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Ringing the Terminal Bell
- InhaltsvorschauYou want to sound an alarm on the user's terminal.Print the "
\a" character to sound a bell:print "\aWake up!\n";
Or use the "vb" terminal capability to show a visual bell:use Term::Cap; $OSPEED = 9600; eval { require POSIX; my $termios = POSIX::Termios->new( ); $termios->getattr; $OSPEED = $termios->getospeed; }; $terminal = Term::Cap->Tgetent({OSPEED=>$OSPEED}); $vb = ""; eval { $terminal->Trequire("vb"); $vb = $terminal->Tputs('vb', 1); }; print $vb; # ring visual bellThe "\a" escape is the same as "\cG", "\007", and "\x07". They all correspond to the ASCII BEL character and cause an irritating ding. In a crowded terminal room at the end of the semester, this beeping caused by dozens of vi novices all trying to get out of insert mode at once can be maddening. The visual bell is a workaround to avoid irritation. Based upon the polite principle that terminals should be seen and not heard (at least, not in crowded rooms), some terminals let you briefly reverse the foreground and background colors to give a flash of light instead of an audible ring.Not every terminal supports the visual bell, which is why weevalthe code that finds it. If the terminal doesn't support it,Trequirewilldiewithout having changed the value of$vbfrom "". If the terminal does support it, the value of$vbwill be set to the character sequence to flash the bell.There's a better approach to the bell issue in graphical terminal systems like xterm. Many of these let you enable the visual bell from the enclosing application itself, allowing all programs that blindly output achr(7)to become less noisy.The section on "String Literals" in Chapter 2 of Programming PerlEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using POSIX termios
- InhaltsvorschauYou'd like to manipulate your terminal characteristics directly.Use the POSIX
termiosinterface.Think of everything you can do with the stty command—you can set everything from special characters to flow control and carriage-return mapping. The standard POSIX module provides direct access to the low-level terminal interface to implement stty-like capabilities in your program.Example 15-2 finds what your tty's erase and kill characters are (probably backspace and Ctrl-U). Then it sets them back to their original values out of antiquity,#and@, and has you type something. It restores them when done.Example 15-2. demo POSIX termios#!/usr/bin/perl -w # demo POSIX termios use POSIX qw(:termios_h); $term = POSIX::Termios->new; $term->getattr(fileno(STDIN)); $erase = $term->getcc(VERASE); $kill = $term->getcc(VKILL); printf "Erase is character %d, %s\n", $erase, uncontrol(chr($erase)); printf "Kill is character %d, %s\n", $kill, uncontrol(chr($kill)); $term->setcc(VERASE, ord('#')); $term->setcc(VKILL, ord('@')); $term->setattr(1, TCSANOW); print("erase is #, kill is @; type something: "); $line = <STDIN>; print "You typed: $line"; $term->setcc(VERASE, $erase); $term->setcc(VKILL, $kill); $term->setattr(1, TCSANOW); sub uncontrol { local $_ = shift; s/([\200-\377])/sprintf("M-%c",ord($1) & 0177)/eg; s/([\0-\37\177])/sprintf("^%c",ord($1) ^ 0100)/eg; return $_; }Here's a module called HotKey that implements areadkeyfunction in pure Perl. It doesn't provide any benefit over Term::ReadKey, but it shows POSIX termios in action:# HotKey.pm package HotKey; @ISA = qw(Exporter); @EXPORT = qw(cbreak cooked readkey); use strict; use POSIX qw(:termios_h); my ($term, $oterm, $echo, $noecho, $fd_stdin); $fd_stdin = fileno(STDIN); $term = POSIX::Termios->new( ); $term->getattr($fd_stdin); $oterm = $term->getlflag( ); $echo = ECHO | ECHOK | ICANON; $noecho = $oterm & ~$echo; sub cbreak { $term->setlflag($noecho); # ok, so i don't want echo either $term->setcc(VTIME, 1); $term->setattr($fd_stdin, TCSANOW); } sub cooked { $term->setlflag($oterm); $term->setcc(VTIME, 0); $term->setattr($fd_stdin, TCSANOW); } sub readkey { my $key = ''; cbreak( ); sysread(STDIN, $key, 1); cooked( ); return $key; } END { cooked( ) } 1;Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Checking for Waiting Input
- InhaltsvorschauYou want to know whether keyboard input is waiting without actually reading it.Use the CPAN module Term::ReadKey, and try to read a key in non-blocking mode by passing it an argument of
-1:use Term::ReadKey; ReadMode ('cbreak'); if (defined ($char = ReadKey(-1)) ) { # input was waiting and it was $char } else { # no input was waiting } ReadMode ('normal'); # restore normal tty settingsThe-1parameter toReadKeyindicates a non-blocking read of a character. If no character is available,ReadKeyreturnsundef.The documentation for the Term::ReadKey module from CPAN; Recipe 15.6Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading Passwords
- InhaltsvorschauYou want to read input from the keyboard without the keystrokes being echoed on the screen. For instance, you want to read passwords as passwd does, i.e., without displaying the user's password.Use the CPAN module Term::ReadKey, set the input mode to
noecho, and then useReadLine:use Term::ReadKey; ReadMode('noecho'); $password = ReadLine(0);Example 15-3 shows how to verify a user's password. If your system uses shadow passwords, only the superuser can get the encrypted form of the password withgetpwuid. Everyone else just gets*as the password field of the database, which is useless for verifying passwords.Example 15-3. checkuser#!/usr/bin/perl -w # checkuser - demonstrates reading and checking a user's password use Term::ReadKey; print "Enter your password: "; ReadMode 'noecho'; $password = ReadLine 0; chomp $password; ReadMode 'normal'; print "\n"; ($username, $encrypted) = ( getpwuid $<)[0,1]; if (crypt($password, $encrypted) ne $encrypted) { die "You are not $username\n"; } else { print "Welcome, $username\n"; }The documentation for the Term::ReadKey module from CPAN; thecryptandgetpwuidfunctions in Chapter 29 of Programming Perl and in perlfunc(1), which demonstrate using the stty(1) command; your system's crypt(3) and passwd(5) manpages (if you have them)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Editing Input
- InhaltsvorschauYou want a user to be able to edit a line before sending it to you for reading.Use the standard Term::ReadLine library along with the Term::ReadLine::Gnu module from CPAN:
use Term::ReadLine; $term = Term::ReadLine->new("APP DESCRIPTION"); $OUT = $term->OUT || *STDOUT; $term->addhistory($fake_line); $line = $term->readline($prompt); print $OUT "Any program output\n";The program in Example 15-4 acts as a crude shell. It reads a line and passes it to the shell to execute. Thereadlinemethod reads a line from the terminal, with editing and history recall. It automatically adds the user's line to the history.Example 15-4. vbsh#!/usr/bin/perl -w # vbsh - very bad shell use strict; use Term::ReadLine; use POSIX qw(:sys_wait_h); my $term = Term::ReadLine->new("Simple Shell"); my $OUT = $term->OUT( ) || *STDOUT; my $cmd; while (defined ($cmd = $term->readline('$ ') )) { my @output = `$cmd`; my $exit_value = $? >> 8; my $signal_num = $? & 127; my $dumped_core = $? & 128; printf $OUT "Program terminated with status %d from signal %d%s\n", $exit_value, $signal_num, $dumped_core ? " (core dumped)" : ""; print @output; $term->addhistory($cmd); }If you want to seed the history with your own functions, use theaddhistorymethod:$term->addhistory($seed_line);
You can't seed with more than one line at a time. To remove a line from the history, use theremove_historymethod, which takes an index into the history list.0is the first (least recent) entry,1the second, and so on up to the most recent history lines.$term->remove_history($line_number);
To get a list of history lines, use theGetHistorymethod, which returns a list of the lines:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Managing the Screen
- InhaltsvorschauYou want to control the screen layout or highlighting, detect when special keys are pressed, or present full-screen menus, but you don't want to think about what kind of display device the user has.Use the Curses module from CPAN, which makes use of your native curses(3) library.The curses library provides easy access to the full-screen display in an efficient and device-independent fashion. (By display, we mean any cursor-addressable monitor.) With Curses, you write high-level code to put data on the logical display, building it up character by character or string by string. When you want output to show up, call the
refreshfunction. The library generates output consisting only of the changes on the virtual display since the last call torefresh. This is particularly appreciated on a slow connection.The example program in Example 15-5, called rep, demonstrates this. Call it with arguments of the program to run, like any of these:% rep ps aux % rep netstat % rep -2.5 lpq
The rep script will repeatedly call the listed command, printing its output to the screen, updating only what changed since the previous run. This is most effective when the changes between runs are small. It maintains the current date in reverse video at the bottom-right corner of your screen.By default, rep waits 10 seconds before rerunning the command. You can change this delay period by calling it an optional number of seconds (which can be a decimal number) as shown in the previous example when calling lpq. You may also hit any key during the pause for it to run the command right then.Example 15-5. rep#!/usr/bin/perl -w # rep - screen repeat command use strict; use Curses; my $timeout = 10; if (@ARGV && $ARGV[0] =~ /^-(\d+\.?\d*)$/) { $timeout = $1; shift; } die "usage: $0 [ -timeout ] cmd args\n" unless @ARGV; initscr( ); # start screen noecho( ); cbreak( ); nodelay(1); # so getch( ) is non-blocking $SIG{INT} = sub { done("Ouch!") }; sub done { endwin( ); print "@_\n"; exit; } while (1) { while ((my $key = getch( )) ne ERR) { # maybe multiple keys done("See ya") if $key eq 'q' } my @data = `(@ARGV) 2>&1`; # gather output+errors for (my $i = 0; $i < $LINES; $i++) { addstr($i, 0, $data[$i] || ' ' x $COLS); } standout( ); addstr($LINES-1, $COLS - 24, scalar localtime); standend( ); move(0,0); refresh( ); # flush new output to display my ($in, $out) = ('', ''); vec($in,fileno(STDIN),1) = 1; # look for key on stdin select($out = $in,undef,undef,$timeout);# wait up to this long }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Controlling Another Program with Expect
- InhaltsvorschauYou want to automate interaction with a full-screen program that expects to have a terminal behind
STDINandSTDOUT.Use the Expect module from CPAN:use Expect; $command = Expect->spawn("program to run") or die "Couldn't start program: $!\n"; # prevent the program's output from being shown on our STDOUT $command->log_stdout(0); # wait 10 seconds for "Password:" to appear unless ($command->expect(10, "Password")) { # timed out } # wait 20 seconds for something that matches /[lL]ogin: ?/ unless ($command->expect(20, -re => '[lL]ogin: ?')) { # timed out } # wait forever for "invalid" to appear unless ($command->expect(undef, "invalid")) { # error occurred; the program probably went away } # send "Hello, world" and a carriage return to the program print $command "Hello, world\r"; # if the program will terminate by itself, finish up with $command->soft_close( ); # if the program must be explicitly killed, finish up with $command->hard_close( );This module requires two other modules from CPAN: IO::Pty and IO::Stty. It sets up a pseudo-terminal to interact with programs that insist on talking to the terminal device driver. People often use this for talking to passwd to change passwords. telnet (Net::Telnet, described in Recipe 18.6, is probably more suitable and portable) and ftp are also programs that expect a real tty.Start the program you want to run withExpect->spawn, passing a program name and arguments either in a single string or as a list. Expect starts the program and returns an object representing that program, orundefif the program couldn't be started.To wait for the program to emit a particular string, use theexpectmethod. Its first argument is the number of seconds to wait for the string, orundefto wait forever. To wait for a string, give that string as the second argument toEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Creating Menus with Tk
- InhaltsvorschauYou want to create a window that has a menu bar at the top.Use the Tk Menubutton and Frame widgets:
use Tk; $main = MainWindow->new( ); # Create a horizontal space at the top of the window for the # menu to live in. $menubar = $main->Frame(-relief => "raised", -borderwidth => 2) ->pack (-anchor => "nw", -fill => "x"); # Create a button labeled "File" that brings up a menu $file_menu = $menubar->Menubutton(-text => "File", -underline => 1) ->pack (-side => "left" ); # Create entries in the "File" menu $file_menu->command(-label => "Print", -command => \&Print);This is considerably easier if you use the-menuitemsshortcut:$file_menu = $menubar->Menubutton(-text => "File", -underline => 1, -menuitems => [ [ Button => "Print",-command => \&Print ], [ Button => "Save",-command => \&Save ] ]) ->pack(-side => "left");Menus in applications can be viewed as four separate components working together: Frames, Menubuttons, Menus, and Menu Entries. The Frame is the horizontal bar at the top of the window that the menu resides in (the menubar). Inside the Frame are a set of Menubuttons, corresponding to Menus: File, Edit, Format, Buffers, and so on. When the user clicks on a Menubutton, the Menubutton brings up the corresponding Menu, a vertically arranged list of Menu Entries.Options on a Menu are labels (Open, for example) or separators (horizontal lines dividing one set of entries from another in a single menu).Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Creating Dialog Boxes with Tk
- InhaltsvorschauYou want to create a dialog box, i.e., a new top-level window with buttons to make the window go away. The dialog box might also have other items, such as labels and text entry widgets for creating a fill-out form. You could use such a dialog box to collect registration information, and you want it to go away when registration is sent or if the user chooses not to register.For simple jobs, use the Tk::DialogBox widget:
use Tk::DialogBox; $dialog = $main->DialogBox( -title => "Register This Program", -buttons => [ "Register", "Cancel" ] ); # add widgets to the dialog box with $dialog->Add( ) # later, when you need to display the dialog box $button = $dialog->Show( ); if ($button eq "Register") { # ... } elsif ($button eq "Cancel") { # ... } else { # this shouldn't happen }A DialogBox has two parts: the bottom is a set of buttons, and the top has the widgets of your choosing.Showing a DialogBox pops it up and returns the button the user selected.Example 15-6 contains a complete program demonstrating the DialogBox.Example 15-6. tksample3#!/usr/bin/perl -w # tksample3 - demonstrate dialog boxes use Tk; use Tk::DialogBox; $main = MainWindow->new( ); $dialog = $main->DialogBox( -title => "Register", -buttons => [ "Register", "Cancel" ], ); # the top part of the dialog box will let people enter their names, # with a Label as a prompt $dialog->add("Label", -text => "Name")->pack( ); $entry = $dialog->add("Entry", -width => 35)->pack( ); # we bring up the dialog box with a button $main->Button( -text => "Click Here For Registration Form", -command => \®ister) ->pack(-side => "left"); $main->Button( -text => "Quit", -command => sub { exit } ) ->pack(-side => "left"); MainLoop; # # register # # Called to pop up the registration dialog box # sub register { my $button; my $done = 0; do { # show the dialog $button = $dialog->Show; # act based on what button they pushed if ($button eq "Register") { my $name = $entry->get; if (defined($name) && length($name)) { print "Welcome to the fold, $name\n"; $done = 1; } else { print "You didn't give me your name!\n"; } } else { print "Sorry you decided not to register.\n"; $done = 1; } } until $done; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Responding to Tk Resize Events
- InhaltsvorschauYou've written a Tk program, but your widget layout goes awry when the user resizes their window.You can prevent the user from resizing the window by intercepting the Configure event:
use Tk; $main = MainWindow->new( ); $main->bind('<Configure>' => sub { $xe = $main->XEvent; $main->maxsize($xe->w, $xe->h); $main->minsize($xe->w, $xe->h); });Or you can usepackto control how each widget resizes and expands when the user resizes its container:$widget->pack( -fill => "both", -expand => 1 ); $widget->pack( -fill => "x", -expand => 1 );
By default, packed widgets resize if their container changes size—they don't scale themselves or their contents to the new size. This can lead to empty space between widgets, or cropped or cramped widgets if the user resizes the window.One solution is to prevent resizing. Webindto theConfigureevent, which is sent when a widget's size or position changes, registering a callback to reset the window's size. This is how you'd ensure a pop-up error-message box couldn't be resized.You often want to let the user resize the application's windows. You must then define how each widget will react. Do this through the arguments to thepackmethod:-fillcontrols the dimensions the widget will resize in, and-expandcontrols whether the widget's size will change to match available space. The-expandoption takes a Boolean value, true or false. The-filloption takes a string indicating the dimensions the widget can claim space in: "x", "y", "both", or "none".The solution requires both options. Without-fill,-expandwon't claim space to grow into. Without-expand,-fillwill claim empty space but won't expand in it.Different parts of your application will behave differently. The main area of a web browser, for example, should probably change size in both dimensions when the window is resized. You'd pack the widget like this:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Removing the DOS Shell Window with Windows Perl/Tk
- InhaltsvorschauYou have written a Perl program for the Windows port of Perl and Tk, but you get a DOS shell window every time you start your program.Add this to the start of your program:
BEGIN { if ($^O eq 'MSWin32') { require Win32::Console; Win32::Console::Free( ); } }The Win32::Console module lets you control the terminal window that launched your program. All you need to do is close that window (orFreeit in, in the peculiar parlance of the Windows API) and voilà—no pesky DOS shell window.The documentation for the Win32::Console module, which is included with distributions of Perl destined for Microsoft systemsEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Graphing Data
- InhaltsvorschauYou have numerical data that you want to represent as a bar, pie, or line chart.Use the GD::Graph::* modules from CPAN:
use GD::Graph::lines; # bars, lines, points, pie $chart = GD::Graph::lines->new(480,320); $chart->set(x_label => $X_AXIS_LABEL, # no axes for pie chart y_label => $Y_AXIS_LABEL, title => $GRAPH_TITLE, # ... more options possible ); $plot = $chart->plot($DATA_REF) or die $chart->error; # do something with $plot->png which is the image in PNG formHere is a sample data structure (every row must have the same number of values):$DATA_REF = [ [ 1990, 1992, 1993, 1995, 2002 ], # X values [ 10, 15, 18, 20, 25 ], # first dataset [ 9, undef,17, undef,12 ], # second dataset # ... ];The GD::Graph module requires you to have the GD module installed, which itself depends on a C library available fromhttp://www.boutell.com/gd/. Early versions of this library created GIF images, but since the owners of the GIF patent are cracking down, the library now emits PNG and JPEG images:$png_data = $plot->png; $jpg_data = $plot->jpeg;
The documentation for GD::Graph lists a large number of options you can fine-tune (colors, fonts, placement), but the most important ones are labels and the image title. There are no axes to label in pie charts, so thex_labelandy_labeloptions are not available. By default, pie charts are drawn with a pseudo-3D look, which you can disable by setting the3doption to a false value.Recipe 15.23 contains a program that (crudely) extracts the day of the week on which each mail message in a mailbox was sent, and then graphs that data.Documentation for the GD and GD::Graph modules;Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Thumbnailing Images
- InhaltsvorschauYou have a large image and you want to create a smaller version of that image, the thumbnail. For example, on a web site you might use a thumbnail as a preview to let readers see the basic image before they decide whether to download the larger original.Use the Image::Magick module from CPAN:
use Image::Magick; $image = Image::Magick->new( ); $image->Read($ORIGINAL_FILENAME); $image->Resize(geometry => '120x90'); $image->Write($THUMBNAIL_FILENAME);
The Image::Magick module is a frontend to the ImageMagick suite, available fromhttp://imagemagick.sourceforge.net. It handles many complex and powerful image manipulations, but here we're only concerned with the very simple resizing.TheResizemethod'sgeometryparameter indicates the new geometry (widthxheight). You can also specify percentages: '75%' to resize each axis proportionally to 3/4 of its original size, or '10%x30%' to resize the X axis to 10% of its original value and the Y axis to 30%.You can also specify a filter to use and how much to blur or sharpen the image with that filter:$image->Resize(geometry => '120x90', filter => 'Gaussian', blur => 2);Ablurvalue greater than 1 indicates blurring; a value less than 1 indicates sharpening. The valid filters arePoint,Box,Triangle,Hermite,Hanning,Hamming,Blackman,Gaussian,Quadratic,Cubic,Catrom,Mitchell,Lanczos,Bessel, andSinc.The documentation for the Image::Magick modules; Perl Graphics ProgrammingEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Adding Text to an Image
- InhaltsvorschauYou want to write text onto an existing image. For example, you want to add a small copyright message to all photos on your web site.Use the GD module from CPAN:
use GD; $image = GD::Image->new($FILENAME); $blue = $image->colorAllocate(0,0,255); $image->string(gdTinyFont, 10, 10, "Copyright Me, 2037", $blue); # write $image->png( ) to file
The GD module can load only certain file formats; precisely which depends on the C libraries available when the underlying C library of GD was built. At the time of this writing, GD could read and write PNG, JPEG, XBM, XPM, and WBMP (Windows Bitmap), as well as its own GD2 and GD formats.The arguments to thestringmethod are: the font to use, the x and y coordinates to draw at, the string to draw, and the color to draw the text in.GD comes with five fonts:gdTinyFont,gdSmallFont,gdMediumBoldFont,gdLargeFont, andgdGiantFont. If your GD was compiled to handle TrueType fonts, you can write with a TrueType font using:$image->stringFT($color, $font, $point_size, $angle, $x, $y, $string);
Here,$fontis the absolute pathname of the .ttf file containing the TrueType font. The$point_sizeand$angleparameters indicate the size (in points; fractions are acceptable) and rotation from horizontal (in radians). For example:$image->stringFT($blue, '/Users/gnat/fonts/arial.ttf', 8, 0, 10, 20, 'Copyright Me Me Me');The documentation for the GD module; Perl Graphics ProgrammingEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: Small termcap Program
- InhaltsvorschauThis program clears your screen and scribbles all over it until you interrupt it. It shows how to use Term::Cap to clear the screen, move the cursor, and write anywhere on the screen. It also uses Recipe 16.6.The program text is shown in Example 15-8.Example 15-8. tcapdemo
#!/usr/bin/perl -w # tcapdemo - show off direct cursor placement use POSIX; use Term::Cap; init( ); # Initialize Term::Cap. zip( ); # Bounce lines around the screen. finish( ); # Clean up afterward. exit( ); # Two convenience functions. clear_screen is obvious, and # clear_end clears to the end of the screen. sub clear_screen { $tcap->Tputs('cl', 1, *STDOUT) } sub clear_end { $tcap->Tputs('cd', 1, *STDOUT) } # Move the cursor to a particular location. sub gotoxy { my($x, $y) = @_; $tcap->Tgoto('cm', $x, $y, *STDOUT); } # Get the terminal speed through the POSIX module and use that # to initialize Term::Cap. sub init { $| = 1; $delay = (shift( ) || 0) * 0.005; my $termios = POSIX::Termios->new( ); $termios->getattr; my $ospeed = $termios->getospeed; $tcap = Term::Cap->Tgetent ({ TERM => undef, OSPEED => $ospeed }); $tcap->Trequire(qw(cl cm cd)); } # Bounce lines around the screen until the user interrupts with # Ctrl-C. sub zip { clear_screen( ); ($maxrow, $maxcol) = ($tcap->{_li} - 1, $tcap->{_co} - 1); @chars = qw(* - / | \ _ ); sub circle { push(@chars, shift @chars); } $interrupted = 0; $SIG{INT} = sub { ++$interrupted }; $col = $row = 0; ($row_sign, $col_sign) = (1,1); do { gotoxy($col, $row); print $chars[0]; select(undef, undef, undef, $delay); $row += $row_sign; $col += $col_sign; if ($row = = $maxrow) { $row_sign = -1; circle; } elsif ($row = = 0 ) { $row_sign = +1; circle; } if ($col = = $maxcol) { $col_sign = -1; circle; } elsif ($col = = 0 ) { $col_sign = +1; circle; } } until $interrupted; } # Clean up the screen. sub finish { gotoxy(0, $maxrow); clear_end( ); }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: tkshufflepod
- InhaltsvorschauThis short program uses Tk to list the
=head1sections in the file using the Listbox widget, and it lets you drag the sections around to reorder them. When you're done, press "s" or "q" to save or quit. You can even double-click a section to view it with the Pod widget. It writes the section text to a temporary file in /tmp and removes the file when the Pod widget is destroyed.Call it with the name of the Pod file to view:% tkshufflepod chap15.pod
We used this a lot when we wrote this book.The program text is shown in Example 15-9.Example 15-9. tkshufflepod#!/usr/bin/perl -w # tkshufflepod - reorder =head1 sections in a pod file use Tk; use Tk::Pod; use strict; # declare variables my $podfile; # name of the file to open my $m; # main window my $l; # listbox my ($up, $down); # positions to move my @sections; # list of pod sections my $all_pod; # text of pod file (used when reading) # read the pod file into memory, and split it into sections. $podfile = shift || "-"; undef $/; open(F, " < $podfile") or die "Can't open $podfile : $!\n"; $all_pod = <F>; close(F); @sections = split(/(?= =head1)/, $all_pod); # turn @sections into an array of anonymous arrays. The first element # in each of these arrays is the original text of the message, while # the second element is the text following =head1 (the section title). foreach (@sections) { /(.*)/; $_ = [ $_, $1 ]; } # fire up Tk and display the list of sections. $m = MainWindow->new( ); $l = $m->Listbox('-width' => 60)->pack('-expand' => 1, '-fill' => 'both'); foreach my $section (@sections) { $l->insert("end", $section->[1]); } # permit dragging by binding to the Listbox widget. $l->bind( '<Any-Button>' => \&down ); $l->bind( '<Any-ButtonRelease>' => \&up ); # permit viewing by binding double-click $l->bind( '<Double-Button>' => \&view ); # 'q' quits and 's' saves $m->bind( '<q>' => sub { exit } ); $m->bind( '<s>' => \&save ); MainLoop; # down(widget): called when the user clicks on an item in the Listbox. sub down { my $self = shift; $down = $self->curselection;; } # up(widget): called when the user releases the mouse button in the # Listbox. sub up { my $self = shift; my $elt; $up = $self->curselection;; return if $down = = $up; # change selection list $elt = $sections[$down]; splice(@sections, $down, 1); splice(@sections, $up, 0, $elt); $self->delete($down); $self->insert($up, $sections[$up]->[1]); } # save(widget): called to save the list of sections. sub save { my $self = shift; open(F, "> $podfile") or die "Can't open $podfile for writing: $!"; print F map { $_->[0] } @sections; close F; exit; } # view(widget): called to display the widget. Uses the Pod widget. sub view { my $self = shift; my $temporary = "/tmp/$$-section.pod"; my $popup; open(F, "> $temporary") or warn ("Can't open $temporary : $!\n"), return; print F $sections[$down]->[0]; close(F); $popup = $m->Pod('-file' => $temporary); $popup->bind('<Destroy>' => sub { unlink $temporary } ); }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: graphbox
- InhaltsvorschauThe graphbox program shown in Example 15-10 generates a bar graph of how many email messages were sent on each day of the week, using the GD::Graph::Bars module (see Recipe 15.18). It extracts the day of the week from the
Date: headers and then plots the results.Example 15-10. graphbox#!/usr/bin/perl -w # graphbox - graph number of messages by day of week they were sent use GD::Graph::bars; use Getopt::Std; use strict; my %count; # running total of messages for each day of the week my $chart; # the GD::Graph::bars object my $plot; # the GD object containing the actual graph my @DAYS = qw(Mon Tue Wed Thu Fri Sat Sun); my $day_re = join("|", @DAYS); $day_re = qr/$day_re/; # process options my %Opt; getopts('ho:', \%Opt); if ($Opt{h} or !$Opt{o}) { die "Usage:\n\t$0 -o outfile.png < mailbox\n"; } # extract dates from Date headers (guessing!) while (<>) { if (/^Date: .*($day_re)/) { $count{$1}++; } } # build graph $chart = GD::Graph::bars->new(480,320); $chart->set(x_label => "Day", y_label => "Messages", title => "Mail Activity"); $plot = $chart->plot([ [ @DAYS ], [ @count{@DAYS} ], ]); # save it open(F, "> $Opt{o}") or die "Can't open $Opt{o} for writing: $!\n"; print F $plot->png; close F;Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 16: Process Management and Communication
- InhaltsvorschauIt is quite a three-pipe problem, and I beg that you won't speak to me for fifty minutes.—Sherlock Holmes, The Red-Headed LeaguePerl may be many things to many people, but to most of us it is the glue that connects diverse components. This chapter is about launching commands and connecting separate processes together. It's about managing their creation, communication, and ultimate demise. It's about systems programming.When it comes to systems programming, Perl, as usual, makes easy things easy and hard things possible. If you want to use it as you would the shell, Perl is happy to assist you. If you want to roll up your sleeves for low-level hacking like a hardcore C programmer, you can do that, too.Because Perl lets you get so close to the system, portability issues can sneak in. This chapter is the most Unix-centric chapter of the book. It will be tremendously useful to those on Unix systems, but of limited use to others. (If you're not on Unix, consult the perlport(3) manpage that came with Perl to see which of the techniques we describe are available on other operating systems or emulated by Perl.) We deal with features that aren't as universal as strings and numbers and basic arithmetic. Most basic operations work more or less the same everywhere. But if you're not using some kind of Unix or other POSIX conformant system, most of the interesting features in this chapter may work differently for you—or not at all. Check the documentation that came with your Perl port if you aren't sure.You might even be pleasantly surprised. Windows users, for example, are often astonished to learn that Perl's
forkfunction, long unique to Unix, is supported on their platform. See perlfork(1).In this chapter, we cover the proper care and feeding of your own child processes. Sometimes this means launching a standalone command and letting it have its own way with the world (usingsystem). Other times it means keeping a tight rein on your child, feeding it carefully filtered input or taking hold of its output stream (backticks and pipedEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauPerl may be many things to many people, but to most of us it is the glue that connects diverse components. This chapter is about launching commands and connecting separate processes together. It's about managing their creation, communication, and ultimate demise. It's about systems programming.When it comes to systems programming, Perl, as usual, makes easy things easy and hard things possible. If you want to use it as you would the shell, Perl is happy to assist you. If you want to roll up your sleeves for low-level hacking like a hardcore C programmer, you can do that, too.Because Perl lets you get so close to the system, portability issues can sneak in. This chapter is the most Unix-centric chapter of the book. It will be tremendously useful to those on Unix systems, but of limited use to others. (If you're not on Unix, consult the perlport(3) manpage that came with Perl to see which of the techniques we describe are available on other operating systems or emulated by Perl.) We deal with features that aren't as universal as strings and numbers and basic arithmetic. Most basic operations work more or less the same everywhere. But if you're not using some kind of Unix or other POSIX conformant system, most of the interesting features in this chapter may work differently for you—or not at all. Check the documentation that came with your Perl port if you aren't sure.You might even be pleasantly surprised. Windows users, for example, are often astonished to learn that Perl's
forkfunction, long unique to Unix, is supported on their platform. See perlfork(1).In this chapter, we cover the proper care and feeding of your own child processes. Sometimes this means launching a standalone command and letting it have its own way with the world (usingsystem). Other times it means keeping a tight rein on your child, feeding it carefully filtered input or taking hold of its output stream (backticks and pipedopens). Without even starting a new process, you can useEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Gathering Output from a Program
- InhaltsvorschauYou want to run a program and collect its output into a variable.Either use backticks:
$output = `program args`; # collect output into one multiline string @output = `program args`; # collect output into array, one line per element
or use Recipe 16.4:open(my $fh, "-|", "program", @args) or die "Can't run program: $!\n"; while (<$fh>) { $output .= $_; } close $fh;The backticks are a convenient way to run other programs and gather their output. The backticks do not return until the called program exits. Perl goes to some trouble behind the scenes to collect the output, so it is inefficient to use the backticks and ignore their return value:`fsck -y /dev/rsd1a`; # BAD AND SCARY
The backtick operator calls the shell to run the command. This makes it unsafe when used in a program with special privileges, but lets you use shell wildcards in the command:@files = `ls -1 /music/*.mp3`;
If you want to read the output of a wildcarded command line as it's generated (and don't mind the potential security problems), use this form ofopen:open(README, "ls -l /music/*.mp3 |") or die "Can't run program: $!\n"; while(<README>) { # the latest line is in $_ } close(README);In versions of Perl before 5.8, this two-argument form ofopenwas the only one available to you. In those versions of Perl, you wrote the solution as:open(FH, "program @args |") or die "Can't run program: $!\n";
Here's a low-level workaround, usingpipe(to create two connected filehandles),fork(to split off a new process), andexec(to replace the new process with the program to read from):use POSIX qw(:sys_wait_h); my ($readme, $writeme); pipe $readme, $writeme; if ($pid = fork) { # parent $SIG{CHLD} = sub { 1 while ( waitpid(-1, WNOHANG)) > 0 }; close $writeme; } else { die "cannot fork: $!" unless defined $pid; # child open(STDOUT, ">&=", $writeme) or die "Couldn't redirect STDOUT: $!"; close $readme; exec($program, $arg1, $arg2) or die "Couldn't run $program : $!\n"; } while (<$readme>) { $string .= $_; # or push(@strings, $_); } close($readme);Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Running Another Program
- InhaltsvorschauFrom one program you want to run another, pause until it is done, and then continue with the original program. The other program should have the same
STDINandSTDOUTas you have.Callsystemwith a string to have the shell interpret the string as a command line:$status = system("vi $myfile");If you don't want the shell involved, passsystema list:$status = system("vi", $myfile);Thesystemfunction is the simplest and most generic way to run another program in Perl. It doesn't gather the program'sSTDOUTlike backticks oropen. Instead, its return value is (essentially) that program's exit status. While the new program is running, your main program is suspended, so the new program can read from yourSTDINand write to yourSTDOUTso users can interact with it.Likeopen,exec, and backticks,systemuses the shell to start the program whenever it's called with one argument. This is convenient when you want to do redirection or other tricks:system("cmd1 args | cmd2 | cmd3 >outfile"); system("cmd args < infile >outfile 2>errfile");To avoid the shell, callsystemwith a list of arguments:$status = system($program, $arg1, $arg); die "$program exited funny: $?" unless $status = = 0;
The returned status value is not just the exit value: it includes the signal number (if any) that the process died from. This is the same value thatwaitsets$?to. See Recipe 16.19 to learn how to decode this value.Thesystemfunction ignoresSIGINTandSIGQUITwhile child processes are running. That way those signals will kill only the child process. If you want your main program to die as well, check the return value ofsystemor the value of the$?variable.if (($signo = system(@arglist)) &= 127) { die "program killed by signal $signo\n"; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Replacing the Current Program with a Different One
- InhaltsvorschauYou want to replace the running program with another, as when checking parameters and setting up the initial environment before running another program.Use the built-in
execfunction. Ifexecis called with a single argument containing metacharacters, the shell will be used to run the program:exec("archive *.data") or die "Couldn't replace myself with archive: $!\n";If you passexecmore than one argument, the shell will not be used:exec("archive", "accounting.data") or die "Couldn't replace myself with archive: $!\n";If called with a single argument containing no shell metacharacters, the argument will be split on whitespace and then interpreted as though the resulting list had been passed toexec:exec("archive accounting.data") or die "Couldn't replace myself with archive: $!\n";Theexecfunction in Perl is a direct interface to the execlp(2) syscall, which replaces the current program with another, leaving the process intact. The program that callsexecgets wiped clean, and its place in the operating system's process table is taken by the program specified in the arguments toexec. As a result, the new program has the same process ID ($$) as the original program. If the specified program couldn't be run,execreturns a false value and the original program continues. Be sure to check for this.As withsystem(see Recipe 16.2), an indirect object identifies the program to be run:exec { '/usr/local/bin/lwp-request' } 'HEAD', $url;The first real argument ('HEAD' here) is what the new program will be told it is. Some programs use this to control their behavior, and others use it for logging. The main use of this, however, is thatexeccalled with an indirect object will never use the shell to run the program.If youEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading or Writing to Another Program
- InhaltsvorschauYou want to run another program and either read its output or supply the program with input.Use
openwith a pipe symbol at the beginning or end. To read from a program, put the pipe symbol at the end:$pid = open $readme, "-|", "program", "arguments" or die "Couldn't fork: $!\n"; while (<$readme>) { # ... } close $readme or die "Couldn't close: $!\n";To write to the program, put the pipe at the beginning:$pid = open $writeme, "|-", "program", "arguments" or die "Couldn't fork: $!\n"; print $writeme "data\n"; close $writeme or die "Couldn't close: $!\n";In the case of reading, this is similar to using backticks, except you have a process ID and a filehandle, and the shell is never involved. If you want Perl to use the shell when it sees shell-special characters in its argument—for example, to let the shell do filename wildcard expansion and I/O redirection—then you must use the two-argument form ofopen:open($writeme, "| program args"); open($readme, "program args |");
However, sometimes this isn't desirable. Pipedopens that include unchecked user data would be unsafe while running in taint mode or in untrustworthy situations.Notice how we specifically callcloseon the filehandle. When you useopento connect a filehandle to a child process, Perl remembers this and automatically waits for the child when you close the filehandle. If the child hasn't exited by then, Perl waits until it does. This can be a very, very long wait if your child doesn't exit:$pid = open $f, "-|", "sleep", "100000"; # child goes to sleep close $f; # and the parent goes to lala land
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Filtering Your Own Output
- InhaltsvorschauYou want to postprocess your program's output without writing a separate program to do so.Use the forking form of
opento attach a filter to yourself. For example, this will restrict your program to a hundred lines of output:head(100); while (<>) { print; } sub head { my $lines = shift || 20; return if $pid = open STDOUT, "|-"; die "cannot fork: $!" unless defined $pid; while (<STDIN>) { print; last unless --$lines; } exit; }It's easy to add an output filter. Just use the forking open on your ownSTDOUT, and let the child filterSTDINtoSTDOUT, performing whatever alterations you care about. Notice that we install the output filter before we generate the output. This makes sense—you can't filter your output if it has already left your program. Any such filters should be applied in LIFO order—the last one inserted is the first one run.Here's an example that uses two output filters. One numbers lines; the other quotes the lines like a mail reply. When run on /etc/motd, you get something like:1: > Welcome to Linux, version 2.0.33 on a i686 2: > 3: > "The software required `Windows 95 or better' 4: > so I installed Linux.If you reversed the order of the two filters, you'd get:> 1: Welcome to Linux, Kernel version 2.0.33 on a i686 > 2 > 3: "The software required `Windows 95 or better' > 4: so I installed Linux.The program is in Example 16-1.Example 16-1. qnumcat#!/usr/bin/perl # qnumcat - demo additive output filters number( ); # push number filter on STDOUT quote( ); # push quote filter on STDOUT while (<>) { # act like /bin/cat print; } close STDOUT; # tell kids we're done--politely exit; sub number { my $pid; return if $pid = open STDOUT, "|-"; die "cannot fork: $!" unless defined $pid; while (<STDIN>) { printf "%d: %s", $., $_ } exit; } sub quote { my $pid; return if $pid = open STDOUT, "|-"; die "cannot fork: $!" unless defined $pid; while (<STDIN>) { print "> $_" } exit; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Preprocessing Input
- InhaltsvorschauYou'd like your programs to work on files with funny formats, such as compressed files or remote web documents specified with a URL, but your program only knows how to access regular text in local files.Take advantage of Perl's easy pipe handling by changing your input files' names to pipes before opening them.To autoprocess gzipped or compressed files by decompressing them with gzip, use:
@ARGV = map { /\.(gz|Z)$/ ? "gzip -dc $_ |" : $_ } @ARGV; while (<>) { # ....... }To fetch URLs before processing them, use the GET program from LWP (see Chapter 20):@ARGV = map { m#^\w+://# ? "GET $_ |" : $_ } @ARGV; while (<>) { # ....... }You might prefer to fetch just the text, of course, not the HTML. That just means using a different command, perhaps lynx -dump.As shown in Recipe 16.1, Perl's built-inopenfunction is magical: you don't have to do anything special to get Perl to open a pipe instead of a file. (That's why it's sometimes called magic open and, when applied to implicitARGVprocessing, magic ARGV.) If it looks like a pipe, Perl will open it like a pipe. We take advantage of this by rewriting certain filenames to include a decompression or other preprocessing stage. For example, the file "09tails.gz" becomes "gzcat-dc09tails.gz|".This technique has further applications. Suppose you wanted to read /etc/passwd if the machine isn't using NIS, and the output of ypcat passwd if it is. You'd use the output of the domainname program to decide if you're running NIS, and then set the filename to open to be either "</etc/passwd" or "ypcat passwd|":$pwdinfo = `domainname` =~ /^(\(none\))?$/ ? '</etc/passwd' : 'ypcat passwd |'; open(PWD, $pwdinfo) or die "can't open $pwdinfo: $!";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading STDERR from a Program
- InhaltsvorschauYou want to run a program as you would with
system, backticks, oropen, but you don't want itsSTDERRto be sent to yourSTDERR. You would like to be able to either ignore or read theSTDERR.Use the shell's numeric redirection and duplication syntax for file descriptors. (We don't check the return value fromopenhere in order to make the examples easier to read, but you should always check it in your programs!)To capture a command'sSTDERRandSTDOUTtogether:$output = `cmd 2>&1`; # with backticks # or $pid = open(PH, "cmd 2>&1 |"); # with an open pipe while (<PH>) { } # plus a readTo capture a command'sSTDOUTand discard itsSTDERR:$output = `cmd 2>/dev/null`; # with backticks # or $pid = open(PH, "cmd 2>/dev/null |"); # with an open pipe while (<PH>) { } # plus a readTo capture a command'sSTDERRand discard itsSTDOUT:$output = `cmd 2>&1 1>/dev/null`; # with backticks # or $pid = open(PH, "cmd 2>&1 1>/dev/null |"); # with an open pipe while (<PH>) { } # plus a readTo exchange a command'sSTDOUTandSTDERR, i.e., capture theSTDERRbut have itsSTDOUTcome out on our oldSTDERR:$output = `cmd 3>&1 1>&2 2>&3 3>&-`; # with backticks # or $pid = open(PH, "cmd 3>&1 1>&2 2>&3 3>&-|"); # with an open pipe while (<PH>) { } # plus a readTo read both a command'sSTDOUTand itsSTDERRseparately, it's easiest and safest to redirect them separately to files, and then read from those files when the program is done:system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");When you launch a command with backticks, a pipedEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Controlling Input and Output of Another Program
- InhaltsvorschauYou want to both write to and read from another program. The
openfunction lets you do one or the other, but not both.Use the standard IPC::Open2 module:use IPC::Open2; $pid = open2(*README, *WRITEME, $program); print WRITEME "here's your input\n"; $output = <README>; close(WRITEME); close(README); waitpid($pid, 0);
Wanting simultaneous read and write access to another program is very common, but surprisingly perilous. That's one reason the built-inopendoesn't permit:open(my $double_handle, "| program args |") # WRONG
The big problem here is buffering. Because you can't force the other program to unbuffer its output, you can't guarantee that your reads won't block. If you block trying to read at the same time the other process blocks waiting for you to send something, you've achieved the unholy state of deadlock. There you'll both stay, wedged, until someone kills your process or the machine reboots.If you control the other process's buffering because you wrote the other program and know how it works, then IPC::Open2 may be the module for you. If you pass undefined scalar values as the first two arguments,open2creates new filehandles:use IPC::Open2; $pid = open2(my $reader, my $writer, $program);
Alternatively, you can pass in arguments that look like "<&OTHERFILEHANDLE" or ">&OTHERFILEHANDLE", which specify existing filehandles for the child process to read from or write to. These filehandles don't have to be controlled by your program—they may be connected to other programs, files, or sockets.You can specify the program either as a list (where the first element is the program name and the remaining elements are arguments to the program) or as a single string (which is passed to the shell as a command to start the program). If you also want control over the process's standard error, use the IPC::Open3 module and see the next recipe.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Controlling the Input, Output, and Error of Another Program
- InhaltsvorschauYou want full control over a command's input, output, and error streams.Carefully use the standard IPC::Open3 module, possibly in conjunction with the standard IO::Select module.If you're interested in only one of the program's
STDIN,STDOUT, orSTDERR, the task is simple. When you want to manage two or more of these, however, it abruptly stops being simple. Multiplexing multiple I/O streams is never a pretty picture. Here's an easy workaround:@all = `($cmd | sed -e 's/^/stdout: /' ) 2>&1`; for (@all) { push @{ s/stdout: // ? \@outlines : \@errlines }, $_ } print "STDOUT:\n", @outlines, "\n"; print "STDERR:\n", @errlines, "\n";If you don't have sed on your system, you'll find that for simple cases like this, perl -pe works just as well as sed -e.However, that's not really simultaneous processing. All we're doing is markingSTDOUTlines with "stdout:" and then stripping them back out once we've read all theSTDOUTandSTDERRthe program produced.You can use the standard IPC::Open3 module for this. Mysteriously, the argument order is different for IPC::Open3 than for IPC::Open2.open3($write_me, $read_me, $errors, "program to run");
Using this has even more potential for chaos than usingopen2. If you're reading the program'sSTDERRas it is trying to write more than one buffer's worth to itsSTDOUT, the program will block on the write because its buffers are full, and you will block on the read because there's nothing available.You can avoid this deadlock by mimickingopen3withfork,open, andexec; making all filehandles unbuffered; and usingsysread,syswrite, andselectto decide which readable filehandle to read a byte from. This makes your program slower and bulkier, though, and it doesn't solve the classicopen2deadlock where each program is expecting the other to say something.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Communicating Between Related Processes
- InhaltsvorschauYou have two related processes that need to communicate, and you need better control than you can get from
open,system, and backticks.Usepipeand thenfork:my ($reader, $writer); pipe $reader, $writer; if (fork) { # run parent code, either reading or writing, not both } else { # run child code, either reading or writing, not both }Or use a special forking form ofopen:if ($pid = (open $child, "|-")) { # run parent code, writing to child } else { die "cannot fork: $!" unless defined $pid; # otherwise run child code here, reading from parent }Or, going the other way:if ($pid = open ($child, "-|")) { # run parent code, reading from child } else { die "cannot fork: $!" unless defined $pid; # otherwise run child code here, writing to parent }Pipes are simply two connected filehandles, where data written to one filehandle can be read by the other. Thepipefunction creates two filehandles linked in this way, one writable and one readable. Even though you can't take two already existing filehandles and link them,pipecan be used for communication between processes. One process creates a pair of filehandles with thepipefunctions, then forks off a child, resulting in two distinct processes both running in the same program, each with a copy of the connected filehandles. As withopen, ifpipeis passed undefined scalars instead of filehandles, it creates filehandles in those scalars.It doesn't matter which process is the reader and which is the writer, so long as one of them takes one role and its peer process takes the other. You can only have one-way communication. (But read on.)We'll pull in the IO::Handle module so we can call itsautoflushmethod. (You could instead play theselectgames described in Chapter 7, if you prefer a lightweight solution.) If we didn't, our single line of output would get lodged in the pipe and not make it through to the other side until we closed that handle.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making a Process Look Like a File with Named Pipes
- InhaltsvorschauYou want a process to intercept all access to a file. For instance, you want to make your ~/.plan file a program that returns a random quote.Use named pipes. First create one, probably from your shell:
% mkfifo /path/to/named.pipe
Here's a reader for it:open($fifo, "<", "/path/to/named.pipe") or die $!; while (<$fifo>) { print "Got: $_"; } close $fifo;Here's a writer for it:open($fifo, ">", "/path/to/named.pipe") or die $!; print $fifo "Smoke this.\n"; close $fifo;
A named pipe, or FIFO as they are also known, is a special file that acts as a buffer to connect processes on the same machine. Ordinary pipes also allow processes to communicate, but those processes must have inherited the filehandles from their parents. To use a named pipe, a process need know only the named pipe's filename. In most cases, processes don't even need to be aware that they're reading from a FIFO.Named pipes can be read from and written to just as though they were ordinary files (unlike Unix-domain sockets as discussed in Chapter 17). Data written into the FIFO is buffered up by the operating system, then read back in the order it was written in. Because a FIFO acts as a buffer to connect processes, opening one for reading will block until another process opens it for writing, and vice versa. If youopenfor read and write using the +< mode toopen, you won't block (on most systems), because your process could be both reader and writer.Let's examine how to use a named pipe so people will get a different file each time they finger you. To create a named pipe, use mkfifo or mknod to create a named pipe called .plan in your home directory:% mkfifo ~/.plan # isn't this everywhere yet? % mknod ~/.plan p # in case you don't have mkfifo
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Sharing Variables in Different Processes
- InhaltsvorschauYou want to share variables across forks or between unrelated processes.Use SysV IPC if your operating system supports it.While SysV IPC (shared memory, semaphores, etc.) isn't as widely used as pipes, named pipes, and sockets for interprocess communication, it still has some interesting properties. Normally, however, you can't expect to use shared memory via
shmgetor the mmap(2) syscall to share a variable among several processes. That's because Perl would reallocate your string when you weren't wanting it to.The CPAN module IPC::Shareable takes care of that. Using a clevertiemodule, SysV shared memory, and the Storable module from CPAN allows data structures of arbitrary complexity to be shared among cooperating processes on the same machine. These processes don't even have to be related to each other.Example 16-11 is a simple demonstration of the module.Example 16-11. sharetest#!/usr/bin/perl # sharetest - test shared variables across forks use IPC::Shareable; $handle = tie $buffer, 'IPC::Shareable', undef, { destroy => 1 }; $SIG{INT} = sub { die "$$ dying\n" }; for (1 .. 10) { unless ($child = fork) { # i'm the child die "cannot fork: $!" unless defined $child; squabble( ); exit; } push @kids, $child; # in case we care about their pids } while (1) { print "Buffer is $buffer\n"; sleep 1; } die "Not reached"; sub squabble { my $i = 0; while (1) { next if $buffer =~ /^$$\b/o; $handle->shlock( ); $i++; $buffer = "$$ $i"; $handle->shunlock( ); } }The starting process creates the shared variable, forks off 10 children, and then sits back and prints out the value of the buffer every second or so, forever, or until you hit Ctrl-C.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Listing Available Signals
- InhaltsvorschauYou want to know the signals your operating system provides.If your shell has a built-in kill -l command, use it:
% kill -l HUP INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV USR2 PIPE ALRM TERM CHLD CONT STOP TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH POLL PWR
Or using just Perl, print the keys in%SIGif you have release 5.004 or later:% perl -e 'print join(" ", keys %SIG), "\n"' XCPU ILL QUIT STOP EMT ABRT BUS USR1 XFSZ TSTP INT IOT USR2 INFO TTOU ALRM KILL HUP URG PIPE CONT SEGV VTALRM PROF TRAP IO TERM WINCH CHLD FPE TTIN SYSBefore Version 5.004, you had to use the Config module:% perl -MConfig -e 'print $Config{sig_name}' ZERO HUP INT QUIT ILL TRAP ABRT EMT FPE KILL BUS SEGV SYS PIPE ALRM TERM URG STOP TSTP CONT CHLD TTIN TTOU IO XCPU XFSZ VTALRM PROF WINCH INFO USR1 USR2 IOTIf your version of Perl is before 5.004, you have to usesignameandsignoin Config to find the list of available signals, sincekeys%SIGwasn't implemented then.The following code retrieves by name and number the available signals from Perl's standard Config.pm module. Use@signameindexed by number to get the signal name, and%signoindexed by name to get the signal number.use Config; defined $Config{sig_name} or die "No sigs?"; $i = 0; # Config prepends fake 0 signal called "ZERO". foreach $name (split(' ', $Config{sig_name})) { $signo{$name} = $i; $signame[$i] = $name; $i++; }The documentation for the standard Config module, also in Chapter 32 of Programming Perl; the "Signals" sections in Chapter 16 of Programming Perl and in perlipc(1)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Sending a Signal
- InhaltsvorschauYou want to send a signal to a process. This could be sent to your own process or to another on the same system. For instance, you caught
SIGINTand want to pass it on to your children.Usekillto send a signal by name or number to the process IDs listed in the remaining arguments:kill 9 => $pid; # send $pid a signal 9 kill -1 => $pgrp; # send whole job a signal 1 kill USR1 => $$; # send myself a SIGUSR1 kill HUP => @pids; # send a SIGHUP to processes in @pids
Perl'skillfunction is an interface to the syscall of the same name. The first argument is the signal to send, identified by number or by name; subsequent arguments are process IDs to send the signal to. It returns the count of processes successfully signaled. You can only send signals to processes running under the same real or saved UID as your real or effective UID—unless you're the superuser.If the signal number is negative, Perl interprets remaining arguments as process group IDs and sends that signal to all those groups' processes using the killpg(2) syscall.A process group is essentially a job. It's how the operating system ties related processes together. For example, when you use your shell to pipe one command into another, you've started two processes, but only one job. When you use Ctrl-C to interrupt the current job or Ctrl-Z to suspend it, this sends the appropriate signals to the entire job, which may be more than one process.killcan also check whether a process is alive. Sending the special pseudo-signal number 0 checks whether it's legal for you to send a signal to the process—without actually sending one. If it returns true, the process is still alive. If it returns false, the process either has changed its effective UID (in which caseEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Installing a Signal Handler
- InhaltsvorschauYou want to control how your program responds to signals. You need to do this if you want to catch Ctrl-C, avoid accumulating finished subprocesses, or prevent your process from dying when it writes to a child that has gone away.Use the
%SIGhash to install your own handler by name or by code reference:$SIG{QUIT} = \&got_sig_quit; # call &got_sig_quit for every SIGQUIT $SIG{PIPE} = 'got_sig_pipe'; # call main::got_sig_pipe for every SIGPIPE $SIG{INT} = sub { $ouch++ }; # increment $ouch for every SIGINT%SIGalso lets you ignore a signal:$SIG{INT} = 'IGNORE'; # ignore the signal INTIt also restores handling for that signal to the default:$SIG{STOP} = 'DEFAULT'; # restore default STOP signal handlingPerl uses the%SIGhash to control what happens when signals are received. Each key in%SIGcorresponds to a signal. Each value is the action to take when Perl receives the corresponding signal. Perl provides two special behaviors: "IGNORE" to take no action when a particular signal is received, and "DEFAULT" to perform the default Unix action for that signal.Although a C programmer might think of a signal asSIGINT, Perl uses justINT. Perl figures you only use signal names in functions that deal with signals, so theSIGprefix is redundant. This means that you'll assign to$SIG{CHLD}to change what your process does when it gets aSIGCHLD.If you want to run your own code when a given signal is received, you have two choices of what to put in the hash: either a code reference or a subroutine name. (This means you can't name a signal handler IGNORE or DEFAULT if you store the string, but they'd be mighty strange names for signal handlers anyway.) If you use a subroutine name that isn't qualified by a package, Perl will interpret this name to be a function in theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Temporarily Overriding a Signal Handler
- InhaltsvorschauYou want to install a signal handler only for a particular subroutine. For instance, your subroutine catches
SIGINT, and you don't want to disturbSIGINThandling outside the subroutine.Uselocalto temporarily override a signal's behavior:# the signal handler sub ding { $SIG{INT} = \&ding; warn "\aEnter your name!\n"; } # prompt for name, overriding SIGINT sub get_name { local $SIG{INT} = \&ding; my $name; print "Kindly Stranger, please enter your name: "; chomp( $name = <> ); return $name; }You must uselocalrather thanmyto save away one value out of%SIG. The change remains in effect throughout the execution of that block, including in anything called from it. In this case, that's theget_namesubroutine. If the signal is delivered while another function that your function calls is running, your signal handler is triggered—unless the called subroutine installs its own signal handler. The previous value of the hash is automatically restored when the block exits. This is one of the (few) places where dynamic scoping is more convenient than confusing.Recipe 10.13; Recipe 16.15; Recipe 16.18Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a Signal Handler
- InhaltsvorschauYou want to write a subroutine that will be called whenever your program receives a signal.A signal handler is just a subroutine. With some risk, you can do anything in a signal handler you'd do in any Perl subroutine, but the more you do, the riskier it gets.Some systems require you to reinstall your signal handler after each signal:
$SIG{INT} = \&got_int; sub got_int { $SIG{INT} = \&got_int; # but not for SIGCHLD! # ... }Some systems restart blocking operations, such as reading data. In such cases, you must calldiewithin the handler and trap it witheval:my $interrupted = 0; sub got_int { $interrupted = 1; $SIG{INT} = 'DEFAULT'; # or 'IGNORE' die; } eval { $SIG{INT} = \&got_int; # ... long-running code that you don't want to restart }; if ($interrupted) { # deal with the signal }At the C level, signals can interrupt just about anything. Unfortunately, this means that signals could interrupt Perl while Perl is changing its own internal data structures, leaving those data structures inconsistent and leading to a core dump. As of Perl 5.8, Perl tries very hard to ensure that this doesn't happen—when you install a signal handler, Perl installs a C-level signal handler that says "Perl received this signal." When Perl's data structures are consistent (after each operation it performs), the Perl interpreter checks to see whether a signal was received. If one was, your signal handler is called.This prevents core dumps, but at the cost of slightly delaying signals in cases where one of Perl's built-in operations takes a long time to finish. For example, building a long list like this:@a = 1..5_000_000;
might take 10 seconds on a heavily loaded system, but you won't be able to interrupt it because Perl will not check whether a signal was received while the list is being built. There are two operations in this statement, list generation and assignment, and Perl checks for signals only after each operation completes.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Catching Ctrl-C
- InhaltsvorschauYou want to intercept Ctrl-C, which would otherwise kill your whole program. You'd like to ignore it or run your own function when the signal is received.Set a handler for
SIGINT. Set it to "IGNORE" to make Ctrl-C have no effect:$SIG{INT} = 'IGNORE';Or set it to a subroutine of your own devising to respond to Ctrl-C:$SIG{INT} = \&tsktsk; sub tsktsk { $SIG{INT} = \&tsktsk; # See ``Writing A Signal Handler'' warn "\aThe long habit of living indisposeth us for dying.\n"; }Ctrl-C isn't directly affecting your program. The terminal driver processing your keystrokes recognizes the Ctrl-C combination (or whatever you've set your terminal to recognize as the interrupt character), and sends aSIGINTto every process in the foreground process group (foreground job) for that terminal. The foreground job normally comprises all programs started from the shell on a single command line, plus any other programs run by those programs. See Signals in the Introduction to this chapter for details.The interrupt character isn't the only special character interpreted by your terminal driver. Typestty-ato find out your current settings:% stty -a speed 9600 baud; 38 rows; 80 columns; lflags: icanon isig iexten echo echoe -echok echoke -echonl echoctl -echoprt -altwerase -noflsh -tostop -flusho pendin -nokerninfo -extproc iflags: -istrip icrnl -inlcr -igncr ixon -ixoff ixany imaxbel -ignbrk brkint -inpck -ignpar -parmrk oflags: opost onlcr oxtabs cflags: cread cs8 -parenb -parodd hupcl -clocal -cstopb -crtscts -dsrflow -dtrflow -mdmbuf cchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = <undef;> eol2 = <undef; erase = ^H; intr = ^C; kill = ^U; lnext = ^V;>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Avoiding Zombie Processes
- InhaltsvorschauYour program forks children, but the dead children accumulate, fill up your process table, and aggravate your system administrator.If you don't need to record the children that have terminated, use:
$SIG{CHLD} = 'IGNORE';To keep better track of deceased children, install aSIGCHLDhandler to callwaitpid:use POSIX ":sys_wait_h"; $SIG{CHLD} = \&REAPER; sub REAPER { my $stiff; while (($stiff = waitpid(-1, WNOHANG)) > 0) { # do something with $stiff if you want } $SIG{CHLD} = \&REAPER; # install *after* calling waitpid }When a process exits, the system keeps it in the process table so the parent can check its status—whether it terminated normally or abnormally. Fetching a child's status (thereby freeing it to drop from the system altogether) is rather grimly called reaping dead children. (This entire recipe is full of ways to harvest your dead children. If this makes you queasy, we understand.) It involves a call towaitorwaitpid. Some Perl functions (pipedopens,system, and backticks) will automatically reap the children they make, but you must explicitly wait when you useforkto manually start another process.To avoid accumulating dead children, simply tell the system that you're not interested in them by setting$SIG{CHLD}to "IGNORE". If you want to know which children die and when, you'll need to usewaitpid.Thewaitpidfunction reaps a single process. Its first argument is the process to wait for—use-1to mean any process—and its second argument is a set of flags. We use the WNOHANG flag to makewaitpidimmediately return0if there are no dead children. A flag value of0is supported everywhere, indicating a blocking wait. Callwaitpidfrom aSIGCHLDhandler, as we do in the Solution, to reap the children as soon as they die.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Blocking Signals
- InhaltsvorschauYou'd like to delay the reception of a signal, possibly to prevent unpredictable behavior from signals that can interrupt your program at any point.Use the POSIX module's interface to the sigprocmask(2) syscall. This is available only if your system is POSIX conformant.To block a signal around an operation:
use POSIX qw(:signal_h); $sigset = POSIX::SigSet->new(SIGINT); # define the signals to block $old_sigset = POSIX::SigSet->new; # where the old sigmask will be kept sigprocmask(SIG_BLOCK, $sigset, $old_sigset) or die "Could not block SIGINT\n";To unblock:defined sigprocmask(SIG_UNBLOCK, $old_sigset) or die "Could not unblock SIGINT\n";The POSIX standard introducedsigactionandsigprocmaskto give you better control over how signals are delivered. Thesigprocmaskfunction controls delayed delivery of signals, andsigactioninstalls handlers. If available, Perl usessigactionwhen you change%SIG.To usesigprocmask, first build a signal set usingPOSIX::SigSet->new. This takes a list of signal numbers. The POSIX module exports functions named after the signals, which return their signal numbers.use POSIX qw(:signal_h); $sigset = POSIX::SigSet->new( SIGINT, SIGKILL );
Pass the POSIX::SigSet object tosigprocmaskwith the SIG_BLOCK flag to delay signal delivery, SIG_UNBLOCK to restore delivery of the signals, or SIG_SETMASK to block only signals in the POSIX::SigSet. The most paranoid of programmers block signals for aforkto prevent a signal handler in the child process being called before Perl can update the child's$$variable, its process id. If the signal handler were called immediately and reported$$in that handler, it could possibly report its parent's$$, not its own. This issue does not arise often.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Timing Out an Operation
- InhaltsvorschauYou want to make sure an operation doesn't take more than a certain amount of time. For instance, you're running filesystem backups and want to abort if it takes longer than an hour. Or, you want to give the user a limited amount of time to respond to a query.To interrupt a long-running operation, set a
SIGALRMhandler to calldie, in effect transforming the signal into an exception. Set an alarm withalarm, thenevalyour code:eval { local $SIG{ALRM} = sub { die "alarm clock restart" }; alarm 10; # schedule alarm in 10 seconds eval { ######## # long-running operation goes here ######## }; alarm 0; # cancel the alarm }; alarm 0; # race condition protection die if $@ && $@ !~ /alarm clock restart/; # reraiseThealarmfunction takes one argument: the integer number of seconds before the kernel sends your process aSIGALRM, that is, an alarm signal. It may be delivered after that time in busy time-sharing systems. The default action forSIGALRMis to terminate your program, so you should install your own signal handler.Because this example should work no matter what operation is being timed out, we take some special precautions in case your long-running operation contains a slow syscall. Slow syscalls are those that don't return immediately, but await some external event, such as for I/O to happen or some sort of timer to go off. These external events includeread(includingreadline, the<FH>operator),write, andopenon certain devices, fifos, and sockets, as well asaccept,connect,send,recv,flock,wait,waitpid, and of course,sleep. If the alarm hits while you're in a slow syscall and you simply catch the signal and return, you'll go right back into that syscall. That's because Perl automatically restarts syscalls where it's able to. The only way out of them is to raise an exception throughEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Turning Signals into Fatal Errors
- InhaltsvorschauEND blocks aren't run when your program dies from an uncaught signal. Your program gets such signals, and you'd like your END blocks to have a chance to clean up.Use the
sigtrappragma:use sigtrap qw(die untrapped normal-signals);
Untrapped signals cause your program to die without running END blocks. Although you could manually install signal handlers that calldie, this becomes tedious for a lot of signals:$SIG{INT} = $SIG{HUP} = $SIG{PIPE} = $SIG{TERM} = sub { die };Thesigtrappragma provides a convenient shorthand for installing such handlers:use sigtrap qw(die untrapped normal-signals);
Thedieimport tellssigtrapto calldie(you can also importstack-traceto install handlers that trigger stack traces). Theuntrappedimport tellssigtrapto install handlers only for signals that don't already have them, so if you handleSIGPIPEyourself,sigtrapwon't replace your handler.normal-signalsis one of several imports that specify predefined lists of useful signals to trap. The signal lists are given in Table 16-2.Table 16-2: Signal lists ListSignalsnormal-signalsHUP,INT,PIPE,TERMerror-signalsABRT,Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: sigrand
- InhaltsvorschauThe following program gives you random signatures by using named pipes. It expects the signatures file to have records in the format of the fortune program—that is, each possible multiline record is terminated with "
%%\n". Here's an example:Make is like Pascal: everybody likes it, so they go in and change it. --Dennis Ritchie %% I eschew embedded capital letters in names; to my prose-oriented eyes, they are too awkward to read comfortably. They jangle like bad typography. --Rob Pike %% God made the integers; all else is the work of Man. --Kronecker %% I'd rather have :rofix than const. --Dennis Ritchie %% If you want to program in C, program in C. It's a nice language. I use it occasionally... :-) --Larry Wall %% Twisted cleverness is my only skill as a programmer. --Elizabeth Zwicky %% Basically, avoid comments. If your code needs a comment to be understood, it would be better to rewrite it so it's easier to understand. --Rob Pike %% Comments on data are usually much more helpful than on algorithms. --Rob Pike %% Programs that write programs are the happiest programs in the world. --Andrew Hume %%We check whether we're already running by using a file with our PID in it. If sending a signal number 0 indicates that PID still exists (or, rarely, that something else has reused it), we just exit. We also look at the current Usenet posting to decide whether to look for a per-newsgroup signature file. That way, you can have different signatures for each newsgroup you post to. For variety, a global signature file is still on occasion used even if a per-newsgroup file exists.You can even use sigrand on systems without named pipes if you remove the code to create a named pipe and extend the sleep interval before file updates. ThenEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 17: Sockets
- InhaltsvorschauGlendower: I can call spirits from the vasty deep.Hotspur: Why so can I, or so can any man, But will they come when you do call for them?—Shakespeare, Henry IV, part I, Act III scene 1Sockets are endpoints for communication. Some types of sockets provide reliable communications. Others offer few guarantees, but consume low system overhead. Socket communication can be used to let processes talk on just one machine or over the Internet.In this chapter we consider the two most commonly used types of sockets: streams and datagrams. Streams provide a bidirectional, sequenced, and reliable channel of communication—similar to pipes. Datagram sockets do not guarantee sequenced, reliable delivery, but they do guarantee that message boundaries will be preserved when read. Your system may support other types of sockets as well; consult your socket(2) manpage or equivalent documentation for details.We also consider both the Internet and Unix domains. The Internet domain gives sockets two-part names: a host (an IP address in a particular format) and a port number. In the Unix domain, sockets are named using files (e.g., /tmp/mysock).In addition to domains and types, sockets also have a protocol associated with them. Protocols are not very important to the casual programmer, as there is rarely more than one protocol for a given domain and type of socket.Domains and types are normally identified by numeric constants (available through functions exported by the Socket and IO::Socket modules). Stream sockets have the type SOCK_STREAM, and datagram sockets have the type SOCK_DGRAM. The Internet domain is PF_INET, and the Unix domain PF_UNIX. (POSIX uses PF_LOCAL instead of PF_UNIX, but PF_UNIX will almost always be an acceptable constant simply because of the preponderance of existing software that uses it.) You should use these symbolic names instead of numbers because the numbers may change (and historically, have).Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
- Introduction
- InhaltsvorschauSockets are endpoints for communication. Some types of sockets provide reliable communications. Others offer few guarantees, but consume low system overhead. Socket communication can be used to let processes talk on just one machine or over the Internet.In this chapter we consider the two most commonly used types of sockets: streams and datagrams. Streams provide a bidirectional, sequenced, and reliable channel of communication—similar to pipes. Datagram sockets do not guarantee sequenced, reliable delivery, but they do guarantee that message boundaries will be preserved when read. Your system may support other types of sockets as well; consult your socket(2) manpage or equivalent documentation for details.We also consider both the Internet and Unix domains. The Internet domain gives sockets two-part names: a host (an IP address in a particular format) and a port number. In the Unix domain, sockets are named using files (e.g., /tmp/mysock).In addition to domains and types, sockets also have a protocol associated with them. Protocols are not very important to the casual programmer, as there is rarely more than one protocol for a given domain and type of socket.Domains and types are normally identified by numeric constants (available through functions exported by the Socket and IO::Socket modules). Stream sockets have the type SOCK_STREAM, and datagram sockets have the type SOCK_DGRAM. The Internet domain is PF_INET, and the Unix domain PF_UNIX. (POSIX uses PF_LOCAL instead of PF_UNIX, but PF_UNIX will almost always be an acceptable constant simply because of the preponderance of existing software that uses it.) You should use these symbolic names instead of numbers because the numbers may change (and historically, have).Protocols have names such as
tcpandudp, which correspond to numbers that the operating system uses. Thegetprotobynamefunction (built into Perl) returns the number when given a protocol name. Pass protocol numberEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a TCP Client
- InhaltsvorschauYou want to connect to a socket on a remote machine.This solution assumes you're using the Internet to communicate. For TCP-like communication within a single machine, see Recipe 17.6.Either use the standard IO::Socket::INET class:
use IO::Socket; $socket = IO::Socket::INET->new(PeerAddr => $remote_host, PeerPort => $remote_port, Proto => "tcp", Type => SOCK_STREAM) or die "Couldn't connect to $remote_host:$remote_port : $@\n"; # ... do something with the socket print $socket "Why don't you call me anymore?\n"; $answer = <$socket>; # and terminate the connection when we're done close($socket);or create a socket by hand for better control:use Socket; # create a socket socket(TO_SERVER, PF_INET, SOCK_STREAM, getprotobyname('tcp')); # build the address of the remote machine $internet_addr = inet_aton($remote_host) or die "Couldn't convert $remote_host into an Internet address: $!\n"; $paddr = sockaddr_in($remote_port, $internet_addr); # connect connect(TO_SERVER, $paddr) or die "Couldn't connect to $remote_host:$remote_port : $!\n"; # ... do something with the socket print TO_SERVER "Why don't you call me anymore?\n"; # and terminate the connection when we're done close(TO_SERVER);While coding this by hand requires a lot of steps, the IO::Socket::INET class wraps them all in a convenient constructor. The important things to know are where you're going (the PeerAddr and PeerPort parameters) and how you're getting there (the Type parameter). IO::Socket::INET tries to determine these things from what you've given it. It deduces Proto from the Type and Port if possible, and assumestcpotherwise.PeerAddr is a string containing either a hostname ("www.oreilly.comEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a TCP Server
- InhaltsvorschauYou want to write a server that waits for clients to connect over the network to a particular port.This recipe assumes you're using the Internet to communicate. For TCP-like communication within a single Unix machine, see Recipe 17.6.Use the standard IO::Socket::INET class:
use IO::Socket; $server = IO::Socket::INET->new(LocalPort => $server_port, Type => SOCK_STREAM, Reuse => 1, Listen => 10 ) # or SOMAXCONN or die "Couldn't be a tcp server on port $server_port : $@\n"; while ($client = $server->accept( )) { # $client is the new connection } close($server);Or craft it by hand for better control:use Socket; # make the socket socket(SERVER, PF_INET, SOCK_STREAM, getprotobyname('tcp')); # so we can restart our server quickly setsockopt(SERVER, SOL_SOCKET, SO_REUSEADDR, 1); # build up my socket address $my_addr = sockaddr_in($server_port, INADDR_ANY); bind(SERVER, $my_addr) or die "Couldn't bind to port $server_port : $!\n"; # establish a queue for incoming connections listen(SERVER, SOMAXCONN) or die "Couldn't listen on port $server_port : $!\n"; # accept and process connections while (accept(CLIENT, SERVER)) { # do something with CLIENT } close(SERVER);Setting up a server is more complicated than being a client. The optionallistenfunction tells the operating system how many pending, unanswered connections can queue up while waiting for your server. Thesetsockoptfunction used in the Solution allows you to avoid waiting two minutes after killing your server before you restart it again (valuable in testing). Thebindcall registers your server with the kernel so others can find you. Finally,accepttakes the incoming connections one by one.The numeric argument tolistenEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Communicating over TCP
- InhaltsvorschauYou want to read or write data over a TCP connection.This recipe assumes you're using the Internet to communicate. For TCP-like communication within a single machine, see Recipe 17.6.Use
printor<>:print SERVER "What is your name?\n"; chomp ($response = <SERVER>);
Or usesendandrecv:defined (send(SERVER, $data_to_send, $flags)) or die "Can't send : $!\n"; recv(SERVER, $data_read, $maxlen, $flags) or die "Can't receive: $!\n";Or use the corresponding methods on an IO::Socket object:use IO::Socket; $server->send($data_to_send, $flags) or die "Can't send: $!\n"; $server->recv($data_read, $maxlen, $flags) or die "Can't recv: $!\n";To find out whether data can be read or written, use theselectfunction, which is nicely wrapped by the standard IO::Socket class:use IO::Select; $select = IO::Select->new( ); $select->add(*FROM_SERVER); $select->add($to_client); @read_from = $select->can_read($timeout); foreach $socket (@read_from) { # read the pending data from $socket }Sockets handle two completely different types of I/O, each with attendant pitfalls and benefits. The normal Perl I/O functions used on files (except forseekandsysseek) work for stream sockets, but datagram sockets require the system callssendandrecv, which work on complete records.Awareness of buffering issues is particularly important in socket programming. That's because buffering, while designed to enhance performance, can interfere with the interactive feel that some programs require. Gathering input with<>may try to read more data from the socket than is yet available as it looks for a record separator. Bothprintand<>usestdiobuffers, so unless you've changed autoflushing (see the Introduction to Chapter 7) on the socket handle, your data won't be sent to the other end as soon as youEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Setting Up a UDP Client
- InhaltsvorschauYou want to exchange messages with another process using UDP (datagrams).To set up a UDP socket handle, use either the low-level Socket module on your own filehandle:
use Socket; socket(SOCKET, PF_INET, SOCK_DGRAM, getprotobyname("udp")) or die "socket: $!";or else IO::Socket, which returns an anonymous one:use IO::Socket; $handle = IO::Socket::INET->new(Proto => 'udp') or die "socket: $@"; # yes, it uses $@ hereThen to send a message to a machine named$HOSTNAMEon port number$PORTNO, use:$ipaddr = inet_aton($HOSTNAME); $portaddr = sockaddr_in($PORTNO, $ipaddr); send(SOCKET, $MSG, 0, $portaddr) = = length($MSG) or die "cannot send to $HOSTNAME($PORTNO): $!";To receive a message of length no greater than $MAXLEN, use:$portaddr = recv(SOCKET, $MSG, $MAXLEN, 0) or die "recv: $!"; ($portno, $ipaddr) = sockaddr_in($portaddr); $host = gethostbyaddr($ipaddr, AF_INET); print "$host($portno) said $MSG\n";
Datagram sockets are unlike stream sockets. Streams provide sessions, giving the illusion of a stable connection. You might think of them as working like a telephone call—expensive to set up, but once established, reliable and easy to use. Datagrams, though, are more like the postal system—it's cheaper and easier to send a letter to your friend on the other side of the world than to call them on the phone. Datagrams are easier on the system than streams. You send a small amount of information one message at a time. But your messages' delivery isn't guaranteed, and they might arrive in the wrong order. Like a small post box, the receiver's queue might fill up and cause further messages to be dropped.Why then, if datagrams are unreliable, do we have them? Because some applications are most sensibly implemented in terms of datagrams. For instance, in streaming audio, it's more important that the stream as a whole be preserved than that every packet get through, especially if packets are being dropped because there's not enough bandwidth for them all. Another use for datagrams is broadcasting, which corresponds to mass mailing of advertisements in the postal model, and is equally popular in most circles. One use for broadcast packets is to send out a message to your local subnet saying, "Hey, is there anybody around here who wants to be my server?"Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Setting Up a UDP Server
- InhaltsvorschauYou want to write a UDP server.First
bindto the port on which the server is to be contacted. With IO::Socket, this is easily accomplished:use IO::Socket; $server = IO::Socket::INET->new(LocalPort => $server_port, Proto => "udp") or die "Couldn't be a udp server on port $server_port : $@\n";Then, go into a loop receiving messages:while ($him = $server->recv($datagram, $MAX_TO_READ, $flags)) { # do something }Life with UDP is much simpler than life with TCP. Instead of accepting client connections one at a time and committing yourself to a long-term relationship, take messages from clients as they come in. Therecvfunction returns the address of the sender, which you must then decode.Example 17-2 is a small UDP-based server that just sits around waiting for messages. Every time a message comes in, we find out who sent it, send that person a message based on the previous message, and then save the new message.Example 17-2. udpqotd#!/usr/bin/perl -w # udpqotd - UDP message server use strict; use IO::Socket; my ($sock, $oldmsg, $newmsg, $hisaddr, $hishost, $MAXLEN, $PORTNO); $MAXLEN = 1024; $PORTNO = 5151; $sock = IO::Socket::INET->new(LocalPort => $PORTNO, Proto => 'udp') or die "socket: $@"; print "Awaiting UDP messages on port $PORTNO\n"; $oldmsg = "This is the starting message."; while ($sock->recv($newmsg, $MAXLEN)) { my($port, $ipaddr) = sockaddr_in($sock->peername); $hishost = gethostbyaddr($ipaddr, AF_INET); print "Client $hishost said ``$newmsg''\n"; $sock->send($oldmsg); $oldmsg = "[$hishost] $newmsg"; } die "recv: $!";This program is easier using IO::Socket than the raw Socket module. We don't have to say where to send the message, because the library keeps track of who sent the last message and stores that information away on theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using Unix Domain Sockets
- InhaltsvorschauYou want to communicate with other processes on only the local machine.Use domain sockets. You can use the code and techniques from the preceding Internet domain recipes, with the following changes:
-
Because the naming system is different, use
sockaddr_uninstead ofsockaddr_in. -
Use IO::Socket::UNIX instead of IO::Socket::INET, and use
PeerandLocalinstead ofPeerAddr/PeerPortandLocalAddr/LocalPort. -
Use PF_UNIX instead of PF_INET, and give PF_UNSPEC as the last argument to
socket. -
SOCK_STREAM clients don't have to
bindto a local address before theyconnect.
Unix domain sockets have names like files on the filesystem. In fact, most systems implement them as special files; that's what Perl's-Sfiletest operator looks for—whether the file is a Unix domain socket.Supply the filename as the Peer argument toIO::Socket::UNIX->new, or encode it withsockaddr_unand pass it toconnect. Here's how to make server and client Unix domain stream sockets with IO::Socket::UNIX:use IO::Socket; unlink "/tmp/mysock"; $server = IO::Socket::UNIX->new(LocalAddr => "/tmp/mysock", Type => SOCK_STREAM, Listen => 5 ) or die $@; $client = IO::Socket::UNIX->new(PeerAddr => "/tmp/mysock", Type => SOCK_STREAM, Timeout => 10 ) or die $@;Here's how to use the traditional functions to make stream sockets:use Socket; socket(SERVER, PF_UNIX, SOCK_STREAM, 0); unlink "/tmp/mysock"; bind(SERVER, sockaddr_un("/tmp/mysock")) or die "Can't create server: $!"; socket(CLIENT, PF_UNIX, SOCK_STREAM, 0); connect(CLIENT, sockaddr_un("/tmp/mysock")) or die "Can't connect to /tmp/mysock: $!";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Identifying the Other End of a Socket
- InhaltsvorschauYou have a socket and want to identify the machine at the other end.If you're only interested in the IP address of the remote machine, use:
use Socket; $other_end = getpeername(SOCKET) or die "Couldn't identify other end: $!\n"; ($port, $iaddr) = unpack_sockaddr_in($other_end); $ip_address = inet_ntoa($iaddr);If you want its actual hostname, use:use Socket; $other_end = getpeername(SOCKET) or die "Couldn't identify other end: $!\n"; ($port, $iaddr) = unpack_sockaddr_in($other_end); $actual_ip = inet_ntoa($iaddr); $claimed_hostname = gethostbyaddr($iaddr, AF_INET); @name_lookup = gethostbyname($claimed_hostname) or die "Could not look up $claimed_hostname : $!\n"; @resolved_ips = map { inet_ntoa($_) } @name_lookup[ 4 .. $#name_lookup ];For a long time, figuring out who connected to you was considered more straightforward than it really is. Thegetpeernamefunction returns the IP address of the remote machine in a packed binary structure (orundefif an error occurred). To unpack it, useinet_ntoa. If you want the name of the remote end, callgethostbyaddrto look up the name of the machine in the DNS tables, right?Not really. That's only half the solution. Because a name lookup goes to the name's owner's DNS server and a lookup of an IP address goes to the address's owner's DNS server, you have to contend with the possibility that the machine that connected to you is giving incorrect names. For instance, the machineevil.crackers.orgcould belong to malevolent cyberpirates who tell their DNS server that its IP address (1.2.3.4) should be identified astrusted.dod.gov. If your program truststrusted.dod.gov, a connection fromevil.crackers.orgwill causegetpeernameto return the right IP address (1.2.3.4), butgethostbyaddrwill return the duplicitous name.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding Your Own Name and Address
- InhaltsvorschauYou want to find your (fully qualified) hostname.First, get your (possibly qualified) hostname. Try either the standard Sys::Hostname module:
use Sys::Hostname; $hostname = hostname( );
or POSIX'sunamefunction:use POSIX qw(uname); ($kernel, $hostname, $release, $version, $hardware) = uname( ); $hostname = (uname)[1]; # or just one
Then turn it into an IP address and convert to its canonical name:use Socket; # for AF_INET $address = gethostbyname($hostname) or die "Couldn't resolve $hostname : $!"; $hostname = gethostbyaddr($address, AF_INET) or die "Couldn't re-resolve $hostname : $!";Sys::Hostname tries to be portable by using knowledge about your system to decide how best to find the hostname. It tries many different ways of getting the hostname, but several involve running other programs. This can lead to tainted data (see Recipe 19.1).POSIX::uname, on the other hand, works only on POSIX systems and isn't guaranteed to provide anything useful in thenodenamefield that we are examining. That said, the value is useful on many machines and doesn't suffer from the tainted data problem that Sys::Hostname does.Once you have the name, though, you must consider that it might be missing a domain name. For instance, Sys::Hostname may return youguanacoinstead ofguanaco.camelids.org. To fix this, convert the name back into an IP address withgethostbynameand then back into a name again withgethostbyaddr. By involving the domain name system, you are guaranteed of getting a full name.Thegethostbynameandgethostbyaddrfunctions in Chapter 29 of Programming Perl and in perlfunc(1); the documentation for the standard Net::hostent and Sys::Hostname modulesEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Closing a Socket After Forking
- InhaltsvorschauYour program has forked and you want to tell the other end that you're done sending data. You've tried
closeon the socket, but the remote end never gets an EOF or SIGPIPE.Useshutdown:shutdown(SOCKET, 0); # I/we have stopped reading data shutdown(SOCKET, 1); # I/we have stopped writing data shutdown(SOCKET, 2); # I/we have stopped using this socket
On an IO::Socket object, you could also write:$socket->shutdown(0); # I/we have stopped reading data
When a process forks, the child has copies of the parent's open filehandles, including sockets. When youclosea file or socket, you close only the current process's copy. If another process (parent or child) still has the socket open, the operating system doesn't consider their file or socket closed.Take the case of a socket that data is being sent to. If two processes have this socket open, one can close it but, because the other still has it open, the socket isn't considered closed by the operating system. Until the other process closes the socket, the process reading from the socket won't get an end-of-file. This can lead to confusion and deadlock.To avoid this, eithercloseunused filehandles after aforkor useshutdown. Theshutdownfunction is a more insistent form ofclose—it tells the operating system that even though other processes have copies of this filehandle, it should be marked as closed, and the other end should get an end-of-file if the processes read from it or a SIGPIPE if they write to it.The numeric argument toshutdownlets you specify which sides of the connection are closed. An argument of0says that we're done reading data, so the other end of the socket will get a SIGPIPE if they try writing.1says that we're done writing data, so the other end of the socket will get an end-of-file if they try reading.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing Bidirectional Clients
- InhaltsvorschauYou want set up a fully interactive client so you can type a line, get the answer, type a line, get the answer, etc., somewhat like telnet.Once you've connected, fork off a duplicate process. One twin reads only your input and passes it on to the server, and the other reads only the server's output and sends it to your own output.In a client-server relationship, it is difficult to know whose turn it is to talk. Single-threaded solutions involving the four-argument version of
selectare hard to write and maintain. But there's no reason to ignore multitasking solutions. Theforkfunction dramatically simplifies this problem.Once you've connected to the service you'd like to chat with, callforkto clone a twin. Each of these two (nearly) identical processes has a simple job. The parent copies everything from the socket to standard output, and the child simultaneously copies everything from standard input to the socket.The code is in Example 17-4.Example 17-4. biclient#!/usr/bin/perl -w # biclient - bidirectional forking client use strict; use IO::Socket; my ($host, $port, $kidpid, $handle, $line); unless (@ARGV = = 2) { die "usage: $0 host port" } ($host, $port) = @ARGV; # create a tcp connection to the specified host and port $handle = IO::Socket::INET->new(Proto => "tcp", PeerAddr => $host, PeerPort => $port) or die "can't connect to port $port on $host: $!"; $handle->autoflush(1); # so output gets there right away print STDERR "[Connected to $host:$port]\n"; # split the program into two processes, identical twins die "can't fork: $!" unless defined($kidpid = fork( )); if ($kidpid) { # parent copies the socket to standard output while (defined ($line = <$handle>)) { print STDOUT $line; } kill("TERM" => $kidpid); # send SIGTERM to child } else { # child copies standard input to the socket while (defined ($line = <STDIN>)) { print $handle $line; } } exit;Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Forking Servers
- InhaltsvorschauYou want to write a server that forks a subprocess to handle each new client.Fork in the
acceptloop, and use a$SIG{CHLD}handler to reap the children.# set up the socket SERVER, bind and listen ... use POSIX qw(:sys_wait_h); sub REAPER { 1 until (-1 = = waitpid(-1, WNOHANG)); $SIG{CHLD} = \&REAPER; # unless $]>= 5.002 } $SIG{CHLD} = \&REAPER; while ($hisaddr = accept(CLIENT, SERVER)) { next if $pid = fork; # parent die "fork: $!" unless defined $pid; # failure # otherwise child close(SERVER); # no use to child # ... do something exit; # child leaves } continue { close(CLIENT); # no use to parent }This approach is very common for SOCK_STREAM servers in the Internet and Unix domains. Each incoming connection gets a cloned server of its own. The model is:-
Accept a stream connection.
-
Fork off a duplicate to communicate over that stream.
-
Return to 1.
This technique isn't used with SOCK_DGRAM sockets, because their method of communication is different. The time it takes to fork makes the forking model impractical for UDP-style servers. Instead of working with a series of stateful, long-running connections, SOCK_DGRAM servers work with a bunch of sporadic datagrams, usually statelessly. With them, the model must become:-
Read a datagram.
-
Handle the datagram.
-
Return to 1.
The child process deals with the new connection. Because it will never use the SERVER socket, we immediately close it. This is partly to keep a tidy house, but mainly so that the server socket is closed when the parent (server) process exits. If the children do not close the SERVER socket, the operating system considers the socket still open even when the parent dies. For more on this, see Recipe 17.9.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Pre-Forking Servers
- InhaltsvorschauYou want to write a server that concurrently processes several clients (as in Recipe 17.11), but connections are coming in so fast that forking slows the server too much.Have a master server maintain a pool of pre-forked children, as shown in Example 17-5.Example 17-5. preforker
#!/usr/bin/perl # preforker - server who forks first use IO::Socket; use Symbol; use POSIX; # establish SERVER socket, bind and listen. $server = IO::Socket::INET->new(LocalPort => 6969, Type => SOCK_STREAM, Proto => 'tcp', Reuse => 1, Listen => 10 ) or die "making socket: $@\n"; # global variables $PREFORK = 5; # number of children to maintain $MAX_CLIENTS_PER_CHILD = 5; # number of clients each child should process %children = ( ); # keys are current child process IDs $children = 0; # current number of children sub REAPER { # takes care of dead children $SIG{CHLD} = \&REAPER; my $pid = wait; $children --; delete $children{$pid}; } sub HUNTSMAN { # signal handler for SIGINT local($SIG{CHLD}) = 'IGNORE'; # we're going to kill our children kill 'INT' => keys %children; exit; # clean up with dignity } # Fork off our children. for (1 .. $PREFORK) { make_new_child( ); } # Install signal handlers. $SIG{CHLD} = \&REAPER; $SIG{INT} = \&HUNTSMAN; # And maintain the population. while (1) { sleep; # wait for a signal (i.e., child's death) for ($i = $children; $i < $PREFORK; $i++) { make_new_child( ); # top up the child pool } } sub make_new_child { my $pid; my $sigset; # block signal for fork $sigset = POSIX::SigSet->new(SIGINT); sigprocmask(SIG_BLOCK, $sigset) or die "Can't block SIGINT for fork: $!\n"; die "fork: $!" unless defined ($pid = fork); if ($pid) { # Parent records the child's birth and returns. sigprocmask(SIG_UNBLOCK, $sigset) or die "Can't unblock SIGINT for fork: $!\n"; $children{$pid} = 1; $children++; return; } else { # Child can *not* return from this subroutine. $SIG{INT} = 'DEFAULT'; # make SIGINT kill us as it did before # unblock signals sigprocmask(SIG_UNBLOCK, $sigset) or die "Can't unblock SIGINT for fork: $!\n"; # handle connections until we've reached $MAX_CLIENTS_PER_CHILD for ($i=0; $i < $MAX_CLIENTS_PER_CHILD; $i++) { $client = $server->accept( ) or last; # do something with the connection } # tidy up gracefully and finish # this exit is VERY important, otherwise the child will become # a producer of more and more children, forking yourself into # process death. exit; } }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Non-Forking Servers
- InhaltsvorschauYou want a server to deal with several simultaneous connections, but you don't want to
forka process to deal with each connection.Keep an array of open clients, useselectto read information when it becomes available, and deal with a client only when you have read a full request from it, as shown in Example 17-6.Example 17-6. nonforker#!/usr/bin/perl -w # nonforker - server who multiplexes without forking use POSIX; use IO::Socket; use IO::Select; use Socket; use Fcntl; use Tie::RefHash; $port = 1685; # change this at will # Listen to port. $server = IO::Socket::INET->new(LocalPort => $port, Listen => 10 ) or die "Can't make server socket: $@\n"; # begin with empty buffers %inbuffer = ( ); %outbuffer = ( ); %ready = ( ); tie %ready, 'Tie::RefHash'; nonblock($server); $select = IO::Select->new($server); # Main loop: check reads/accepts, check writes, check ready to process while (1) { my $client; my $rv; my $data; # check for new information on the connections we have # anything to read or accept? foreach $client ($select->can_read(1)) { if ($client = = $server) { # accept a new connection $client = $server->accept( ); $select->add($client); nonblock($client); } else { # read data $data = ''; $rv = $client->recv($data, POSIX::BUFSIZ, 0); unless (defined($rv) && length $data) { # This would be the end of file, so close the client delete $inbuffer{$client}; delete $outbuffer{$client}; delete $ready{$client}; $select->remove($client); close $client; next; } $inbuffer{$client} .= $data; # test whether the data in the buffer or the data we # just read means there is a complete request waiting # to be fulfilled. If there is, set $ready{$client} # to the requests waiting to be fulfilled. while ($inbuffer{$client} =~ s/(.*\n)//) { push( @{$ready{$client}}, $1 ); } } } # Any complete requests to process? foreach $client (keys %ready) { handle($client); } # Buffers to flush? foreach $client ($select->can_write(1)) { # Skip this client if we have nothing to say next unless exists $outbuffer{$client}; $rv = $client->send($outbuffer{$client}, 0); unless (defined $rv) { # Whine, but move on. warn "I was told I could write, but I can't.\n"; next; } if ($rv = = length $outbuffer{$client} || $! = = POSIX::EWOULDBLOCK ) { substr($outbuffer{$client}, 0, $rv) = ''; delete $outbuffer{$client} unless length $outbuffer{$client}; } else { # Couldn't write all the data, and it wasn't because # it would have blocked. Shutdown and move on. delete $inbuffer{$client}; delete $outbuffer{$client}; delete $ready{$client}; $select->remove($client); close($client); next; } } # Out of band data? foreach $client ($select->has_exception(0)) { # arg is timeout # Deal with out-of-band data here, if you want to. } } # handle($socket) deals with all pending requests for $client sub handle { # requests are in $ready{$client} # send output to $outbuffer{$client} my $client = shift; my $request; foreach $request (@{$ready{$client}}) { # $request is the text of the request # put text of reply into $outbuffer{$client} } delete $ready{$client}; } # nonblock($socket) puts socket into nonblocking mode sub nonblock { my $socket = shift; my $flags; $flags = fcntl($socket, F_GETFL, 0) or die "Can't get flags for socket: $!\n"; fcntl($socket, F_SETFL, $flags | O_NONBLOCK) or die "Can't make socket nonblocking: $!\n"; }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Multitasking Server with Threads
- InhaltsvorschauYou want to write a server that handles multiple clients from within the one process using your operating system's threads.Use Perl v5.8.1 or later and threads.pm:
use threads; use IO::Socket; my $listen = IO::Socket::INET->new( LocalPort => $SERVER_PORT, ReuseAddr => 1, Listen => 10, ); sub handle_connection { my $socket = shift; my $output = shift || $socket; my $exit = 0; while (<$socket>) { # work with $_, # print to $output # set $exit to true when connection is done last if $exit; } } while (my $socket = $listen->accept) { async(\&handle_connection, $socket)->detach; }Threading in Perl is still evolving, but it became functional as of v5.8.1. The code in the Solution will not work in earlier versions of Perl. In particular, earlier versions of Perl implemented an entirely different threading model than the current "interpreter threads" system that threads.pm assumes.The hard work of handling the connection to the client is done in thehandle_connectionsubroutine. It is given the client socket as a parameter, and can call blocking routines like<$socket>because it runs in its own thread. If one thread blocks while reading, other threads can still run.The master thread in the program creates the socket and accepts connections on it. When a new client connects, the master thread spawns a new thread (with theasynccall) to handle the connection. The thread runs until the subroutine it is called with (handle_connectionin this case) returns.Wedetachthe newly created thread to ensure that its variables are garbage collected (closing the socket to the client) when the thread ends. If we didn't calldetach, our process would accumulate dead threads until we could no longer spawn new threads.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a Multitasking Server with POE
- InhaltsvorschauYou want to write a server that handles multiple clients from within the one process, without using Perl 5.8's threads or the complexity of non-blocking I/O.Use the cooperative multitasking framework POE (available from CPAN) and the accompanying POE::Component::Server::TCP module to create the server for you:
#!/usr/bin/perl use warnings; use strict; use POE qw(Component::Server::TCP); # Start a TCP server. Client input will be logged to the console and # echoed back to the client, one line at a time. POE::Component::Server::TCP->new ( Port => $PORT_NUMBER, # port to listen on ClientInput => \&handle_input, # method to call with input ); # Start the server. $poe_kernel->run( ); exit 0; sub handle_input { my ( $session, $heap, $input ) = @_[ SESSION, HEAP, ARG0 ]; # $session is a POE::Session object unique to this connection, # $heap is this connection's between-callback storage. # New data from client is in $input. Newlines are removed. # To echo input back to the client, simply say: $heap->{client}->put($input); # and log it to the console print "client ", $session->ID, ": $input\n"; }POE is a cooperatively multitasking framework for Perl built entirely out of software components. POE doesn't require you to recompile the Perl interpreter to support threads, but it does require you to design your program around the ideas of events and callbacks. Documentation for this framework is available athttp://poe.perl.org/.It helps to think of POE as an operating system: there's the kernel (an object responsible for deciding which piece of code is run next) and your processes (called sessions, implemented as objects). POE stores the kernel object in the variable$poe_kernel, which is automatically imported into your namespace. Each process in your operating system has a heap, memory where the variables for that process are stored. Sessions have heaps as well. In an operating system, I/O libraries handle buffered I/O. In POE, aEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a Multihomed Server
- InhaltsvorschauYou want to write a server that knows that the machine it runs on has multiple IP addresses, and that it should possibly do different things for each address.Don't bind your server to a particular address. Instead, bind to
INADDR_ANY. Then once you'veaccepted a connection, usegetsocknameon the client socket to find out which address the client connected to:use Socket; socket(SERVER, PF_INET, SOCK_STREAM, getprotobyname('tcp')); setsockopt(SERVER, SOL_SOCKET, SO_REUSEADDR, 1); bind(SERVER, sockaddr_in($server_port, INADDR_ANY)) or die "Binding: $!\n"; # accept loop while (accept(CLIENT, SERVER)) { $my_socket_address = getsockname(CLIENT); ($port, $myaddr) = sockaddr_in($my_socket_address); }Whereasgetpeername(as discussed in Recipe 17.7) returns the address of the remote end of the socket,getsocknamereturns the address of the local end. When we've bound toINADDR_ANY, thus accepting connections on any address the machine has, we need to usegetsocknameto identify which address the client connected to.If you're using IO::Socket::INET, your code will look like this:$server = IO::Socket::INET->new(LocalPort => $server_port, Type => SOCK_STREAM, Proto => 'tcp', Listen => 10) or die "Can't create server socket: $@\n"; while ($client = $server->accept( )) { $my_socket_address = $client->sockname( ); ($port, $myaddr) = sockaddr_in($my_socket_address); # ... }If you don't specify a local port toIO::Socket::INET->new, your socket will be bound toINADDR_ANY.If you want your server to listen only for a particular virtual host, don't useINADDR_ANY. Instead, bind to a specific host address:use Socket; $port = 4269; # port to bind to $host = "specific.host.com"; # virtual host to listen on socket(Server, PF_INET, SOCK_STREAM, getprotobyname("tcp")) or die "socket: $!"; bind(Server, sockaddr_in($port, inet_aton($host))) or die "bind: $!"; while ($client_address = accept(Client, Server)) { # ... }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making a Daemon Server
- InhaltsvorschauYou want your program to run as a daemon.If you are paranoid and running as root,
chrootto a safe directory:chroot("/var/daemon") or die "Couldn't chroot to /var/daemon: $!";Fork once, and let the parent exit:$pid = fork; exit if $pid; die "Couldn't fork: $!" unless defined($pid);
Close the three standard filehandles by reopening them to /dev/null:for my $handle (*STDIN, *STDOUT, *STDERR) { open($handle, "+<", "/dev/null") || die "can't reopen $handle to /dev/null: $!"; }Dissociate from the controlling terminal that started us and stop being part of whatever process group we had been a member of:use POSIX; POSIX::setsid( ) or die "Can't start a new session: $!";Trap fatal signals, setting a flag to indicate that we need to gracefully exit:$time_to_die = 0; sub signal_handler { $time_to_die = 1; } $SIG{INT} = $SIG{TERM} = $SIG{HUP} = \&signal_handler; # trap or ignore $SIG{PIPE}Wrap your actual server code in a loop:until ($time_to_die) { # ... }Before POSIX, every operating system had its own way for a process to tell the operating system "I'm going it alone, please interfere with me as little as possible." POSIX makes it much cleaner. That said, you can still take advantage of any operating system-specific calls if you want to.Thechrootcall is one of those non-POSIX calls. It makes a process change where it thinks the directory/is. For instance, afterchroot"/var/daemon", if the process tries to read the file /etc/passwd, it will read /var/daemon/etc/passwd. Achrooted process needs copies of any files it will run made available inside its new/, of course. For instance, our chrooted process would need /var/daemon/bin/csh if it were going to glob files. For security reasons, only the superuser mayEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Restarting a Server on Demand
- InhaltsvorschauYou want your server to shut down and restart when it receives a HUP signal, just like
inetdorhttpd.Catch the SIGHUP. Within the handler, set harmless signal handlers, unblock signals, and re-execute your program:use POSIX qw(:signal_h sigprocmask); my $SELF = "/path/to/my/program"; my @ARGS = @ARGV; # save for later $SIG{HUP} = \&phoenix; # your program sub phoenix { # make signals harmless for my $nal (qw[ALRM CHLD HUP INT PIPE TERM]) { $SIG{$nal} = sub { }; } # reenable them my $s = POSIX::SigSet->new; my $t = POSIX::SigSet->new; sigprocmask(SIG_BLOCK, $s, $t); # and restart print "Restarting\n"; exec $SELF => @ARGS; die "Couldn't exec $SELF => @ARGS\n"; }It sounds simple ("when I get a HUP signal, restart"), but it's tricky. You must know your own program name, and that isn't easy to find out. You could use$0or the FindBin module. For normal programs, this is fine, but critical system utilities must be more cautious, as there's no guarantee that$0is valid. You can hardcode the filename and arguments into your program, as we do here. That's not necessarily the most convenient solution, however, so you might want to read the program and arguments from an external file (using the filesystem's protections to ensure it hasn't been tampered with).Be sure to install your signal handler after you define$SELFand@ARGS; otherwise there's a race condition when a SIGHUP could runrestartbut you don't know the program to run. This would cause your program to die.Signals are tricky beasts. When youexecto restart your program, the reborn version inherits a set of blocked signals from its parent. Inside a signal handler, that signal is blocked. So if your signal handler simply calledexecright away, the new process would have SIGHUP blocked. You could only restart your program once!Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Managing Multiple Streams of Input
- InhaltsvorschauThe next input to your program could be coming from any number of filehandles, but you don't know which. You've tried using
select( ), but the need to then do unbuffered I/O is more than you can deal with (and it's making your code very difficult to follow).Use the IO::Multiplex module from CPAN. It calls amux_input( )function when input is received over a socket, and handles input and output buffering for you:use IO::Multiplex; $mux = IO::Multiplex->new( ); $mux->add($FH1); $mux->add($FH2); # ... and so on for all the filehandles to manage $mux->set_callback_object(_ _PACKAGE_ _); # or an object $mux->Loop( ); sub mux_input { my ($package, $mux, $fh, $input) = @_; # $input is ref to the filehandle's input buffer # ... }Although you can useselectto manage input coming at you from multiple directions, there are many tricks and traps. For example, you can't use<>to read a line of input, because you never know whether the client has sent a full line yet (or will ever finish sending a line). You can'tprintto a socket without the risk of the output buffer being full and your process blocking. You need to use non-blocking I/O and maintain your own buffers, and, consequently, life rapidly becomes unmanageably complex.Fortunately, we have a way of hiding complexity: modules. The IO::Multiplex module from CPAN takes care of non-blocking I/O andselectfor you. You tell it which filehandles to watch, and it tells you when new data arrives. You can evenprintto the filehandles, and it'll buffer and non-blockingly output it. An IO::Multiplex object manages a pool of filehandles.Use theaddmethod to tell IO::Multiplex to manage a filehandle. This enables non-blocking I/O and disables the stdio buffering. When IO::Multiplex receives data on one of its managed filehandles, it calls amux_inputmethod on an object or class of your choosing. Specify whereEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: backsniff
- InhaltsvorschauThis program logs attempts to connect to ports. It uses the Sys::Syslog module to log the connection attempt through a Unix socket, with logging level LOG_NOTICE and facility LOG_DAEMON. It uses
getsocknameto find out what port was connected to andgetpeernameto find out what machine made the connection. It usesgetservbyportto convert the local port number (e.g., 7) into a service name (e.g, "echo").It produces entries in the system log file like this:May 25 15:50:22 coprolith sniffer: Connection from 207.46.131.141 to 207.46.130.164:echoInstall it in the inetd.conf file with a line like this:echo stream tcp nowait nobody /usr/scripts/snfsqrd snifferThe program is shown in Example 17-9.Example 17-9. backsniff#!/usr/bin/perl -w # backsniff - log attempts to connect to particular ports use strict; use Sys::Syslog qw(:DEFAULT setlogsock); use Socket; # identify my port and address my $sockname = getsockname(STDIN) or die "Couldn't identify myself: $!\n"; my ($port, $iaddr) = sockaddr_in($sockname); my $my_address = inet_ntoa($iaddr); # get a name for the service my $service = (getservbyport ($port, "tcp"))[0] || $port; # now identify remote address $sockname = getpeername(STDIN) or die "Couldn't identify other end: $!\n"; ($port, $iaddr) = sockaddr_in($sockname); my $ex_address = inet_ntoa($iaddr); # and log the information setlogsock("unix"); openlog("sniffer", "ndelay", "daemon"); syslog("notice", "Connection from %s to %s:%s\n", $ex_address, $my_address, $service); closelog( );Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: fwdport
- InhaltsvorschauImagine that you're nestled deep inside a protective firewall. Somewhere in the outside world is a server that you'd like access to, but only processes on the firewall can reach it. You don't want to log into the firewall machine each time to access that service.For example, this might arise if your company's ISP provides a news-reading service that seems to come from your main firewall machine, but rejects any NNTP connections from any other address. As the administrator of the firewall, you don't want dozens of people logging onto it, but you would like to let them read and post news from their own workstations.The program in Example 17-10, fwdport, solves this problem in a generic fashion. You may run as many of these as you like, one per outside service. Sitting on the firewall, it can talk to both worlds. When someone wants to access the outside service, they contact this proxy, which connects on their behalf to the external service. To that outside service, the connection is coming from your firewall, so it lets it in. Then your proxy forks off twin processes, one only reading data from the external server and writing that data back to the internal client, the other only reading data from the internal client and writing that data back to the external server.For example, you might invoke it this way:
% fwdport -s nntp -l fw.oursite.com -r news.bigorg.com
That means that the program will act as the server for the NNTP service, listening for local connections on the NNTP port on the host fw.oursite.com. When one comes in, it contacts news.bigorg.com (on the same port), and then ferries data between the remote server and local client.Here's another example:% fwdport -l myname:9191 -r news.bigorg.com:nntp
This time we listen for local connections on port 9191 of the host myname, and patch those connecting clients to the remote server news.bigorg.com on its NNTP port.In a way, fwdport acts as both a server and a client. It's a server from the perspective of inside the firewall and a client from the perspective of the remote server outside. The program summarizes this chapter well because it demonstrates just about everything we've covered here. It has server activity, client activity, collecting of zombie children, forking and process management, plus much more thrown in.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 18: Internet Services
- InhaltsvorschauThis "telephone" has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us.—Western Union internal memo, 1876Correct use of sockets is only part of writing programs that communicate over the network. Once you have a way for two programs to talk, you still need a protocol for communication. This protocol lets each party know when to talk, and it precisely defines who is responsible for which part of the service.Common Internet protocols are listed in Table 18-1.
Table 18-1: Common Internet protocols ProtocolMeaningActionFTPFile Transfer ProtocolCopying files between remote machinestelnetRemote loginrsh and rcpRemote shell and Remote copyRemote login and remote file copyingNNTPNetwork News Transfer ProtocolReading and posting USENET newsHTTPHypertext Transfer ProtocolTransferring documents on the WebSMTPSimple Mail Transfer ProtocolEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauCorrect use of sockets is only part of writing programs that communicate over the network. Once you have a way for two programs to talk, you still need a protocol for communication. This protocol lets each party know when to talk, and it precisely defines who is responsible for which part of the service.Common Internet protocols are listed in Table 18-1.
Table 18-1: Common Internet protocols ProtocolMeaningActionFTPFile Transfer ProtocolCopying files between remote machinestelnetRemote loginrsh and rcpRemote shell and Remote copyRemote login and remote file copyingNNTPNetwork News Transfer ProtocolReading and posting USENET newsHTTPHypertext Transfer ProtocolTransferring documents on the WebSMTPSimple Mail Transfer ProtocolSending mailPOP3Post Office ProtocolEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Simple DNS Lookups
- InhaltsvorschauYou want to find the IP address of a host or turn an IP address into a name. Network servers do this to authenticate their clients, and clients do it when the user gives them a hostname. But Perl's socket library requires an IP address. Furthermore, many servers produce log files containing IP addresses, but hostnames are more useful to analysis software and humans.If you have a name like
www.perl.com, usegethostbynameif you want all addresses:use Socket; @addresses = gethostbyname($name) or die "Can't resolve $name: $!\n"; @addresses = map { inet_ntoa($_) } @addresses[4 .. $#addresses]; # @addresses is a list of IP addresses ("208.201.239.48", "208.201.239.49")Or useinet_atonif you need only the first address:use Socket; $address = inet_ntoa(inet_aton($name)); # $address is a single IP address "208.201.239.48"
If you have an IP address like "208.201.239.48", use:use Socket; $name = gethostbyaddr(inet_aton($address), AF_INET) or die "Can't resolve $address: $!\n"; # $name is the hostname ("www.perl.com")This process is complicated because the functions are mere wrappers for C system calls, so you have to convert IP addresses from ASCII strings ("208.146.240.1") into C structures. The standard Socket module providesinet_atonto convert from ASCII to the packed numeric format andinet_ntoato convert back:use Socket; $packed_address = inet_aton("208.146.140.1"); $ascii_address = inet_ntoa($packed_address);Thegethostbynamefunction takes a string containing the hostname (or IP address). In scalar context, it returns the remote IP address suitable for passing toinet_ntoa(orundefon error). In list context, it returns a list of at least five elements (or an empty list on error). The returned list is:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Being an FTP Client
- InhaltsvorschauYou want to connect to an FTP server and transfer files. For example, you might want to automate the one-time transfer of many files or automatically mirror an entire section of an FTP server.Use the Net::FTP module:
use Net::FTP; $ftp = Net::FTP->new("ftp.host.com") or die "Can't connect: $@\n"; $ftp->login($username, $password) or die "Couldn't login\n"; $ftp->cwd($directory) or die "Couldn't change directory\n"; $ftp->get($filename) or die "Couldn't get $filename\n"; $ftp->put($filename) or die "Couldn't put $filename\n";Using the Net::FTP module is a three-part process: connect to a server, identify and authenticate yourself, and transfer files. All interaction with the FTP server happens through method calls on a Net::FTP object. If an error occurs, methods returnundefin scalar context or the empty list in list context.The connection is established with thenewconstructor. If an error occurs,$@is set to an error message andnewreturnsundef. The first argument is the hostname of the FTP server, optionally followed by named options:$ftp = Net::FTP->new("ftp.host.com", Timeout => 30, Debug => 1) or die "Can't connect: $@\n";TheTimeoutoption gives the number of seconds all operations wait before giving up.Debugsets the debugging level (non-zero sends copies of all commands toSTDERR).Firewalltakes a string as an argument, specifying the machine acting as an FTP proxy.Portlets you select an alternate port number (the default is 21, the standard port for FTP). Finally, if thePassiveoption is set to true, all transfers are done passively (some firewalls and proxies require this). TheFirewallandPassiveoptions override the environment variablesFTP_FIREWALLandFTP_PASSIVE.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Sending Mail
- InhaltsvorschauYou want your program to send mail. Some programs monitor system resources like disk space and notify appropriate people when disk space becomes dangerously low. CGI script authors may not want programs to report errors like "the database is down" to the user, preferring instead to send mail to the database administrator about the problem.Use the CPAN module Mail::Mailer:
use Mail::Mailer; $mailer = Mail::Mailer->new("sendmail"); $mailer->open({ From => $from_address, To => $to_address, Subject => $subject, }) or die "Can't open: $!\n"; print $mailer $body; $mailer->close( );Or use thesendmailprogram directly:open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq") or die "Can't fork for sendmail: $!\n"; print SENDMAIL <<"EOF"; From: User Originating Mail <me\@host> To: Final Destination <you\@otherhost> Subject: A relevant subject line Body of the message goes here, in as many lines as you like. EOF close(SENDMAIL) or warn "sendmail didn't close nicely";You have three choices for sending mail from your program. You can call another program normally used to send mail, such as Mail or mailx; these are called MUAs or Mail User Agents. You can use a system-level mail program, such as sendmail; this is an MTA, or Mail Transport Agent. Or you can connect to an Simple Mail Transfer Protocol (SMTP) server. Unfortunately, there's no standard user-level mail program, sendmail doesn't have a standard location, and SMTP isn't particularly simple. The CPAN module Mail::Mailer hides these complexities from you.Create a Mail::Mailer object withMail::Mailer->new. If you don't pass any arguments, it uses the default mail sending method (probably a program like mail). Arguments tonewselect an alternate way to send the message. The first argument is the type of delivery method ("Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading and Posting Usenet News Messages
- InhaltsvorschauYou want to connect to a Usenet news server to read and post messages. Your program could send a periodic posting to a newsgroup, summarize a newsgroup, or identify first-time contributors in a newsgroup so you can send them a helpful welcome message.Use the Net::NNTP module:
use Net::NNTP; $server = Net::NNTP->new("news.host.dom") or die "Can't connect to news server: $@\n"; ($narticles, $first, $last, $name) = $server->group( "misc.test" ) or die "Can't select misc.test\n"; $headers = $server->head($first) or die "Can't get headers from article $first in $name\n"; $bodytext = $server->body($first) or die "Can't get body from article $first in $name\n"; $article = $server->article($first) or die "Can't get article $first from $name\n"; $server->postok( ) or warn "Server didn't tell me I could post.\n"; $server->post( [ @lines ] ) or die "Can't post: $!\n";Usenet is a distributed bulletin board system. Servers exchange messages to ensure that each server gets all messages for the newsgroups it carries. Each server sets its own expiration criteria for how long messages stay on the server. Client newsreaders connect to their designated server (usually belonging to their company, ISP, or university) and can read existing postings and contribute new ones.Each message (or article, as they're also known) has a set of headers and a body, separated by a blank line. Articles are identified in two ways: the message ID header and an article number within a newsgroup. An article's message ID is stored in the message itself and is guaranteed unique no matter which news server the article was read from. When an article references others, it does so by message ID. A message ID is a string like:<0401@jpl-devvax.JPL.NASA.GOV>
An article can also be identified by its newsgroup and article number within the group. Each news server assigns its own article numbers, so they're valid only for the news server you got them from.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading Mail with POP3
- InhaltsvorschauYou want to fetch mail from a POP3 server. This lets you write a program to summarize your unread mail, move it from a remote server to a local mailbox, or toggle between Internet and local mail systems.Use the Net::POP3 module:
$pop = Net::POP3->new($mail_server) or die "Can't open connection to $mail_server : $!\n"; defined ($pop->login($username, $password)) or die "Can't authenticate: $!\n"; $messages = $pop->list or die "Can't get list of undeleted messages: $!\n"; foreach $msgid (keys %$messages) { $message = $pop->get($msgid); unless (defined $message) { warn "Couldn't fetch $msgid from server: $!\n"; next; } # $message is a reference to an array of lines $pop->delete($msgid); }Traditionally, mail has been a three-party system: the MTA (Mail Transport Agent, a system program like sendmail) delivers mail to the spool, where it is read by the MUA (Mail User Agent, a program like mail). This dates from the days of big servers holding mail and users reading it through dumb terminals. As PCs and networks entered the picture, the need arose for MUAs like Pine to run on different machines than the one housing the spool. The Post Office Protocol (POP) implements efficient message listing, reading, and deleting over a TCP/IP session.The Net::POP3 module is a POP client. That is, it lets your Perl program act as an MUA. The first step in using Net::POP3 is to create a new Net::POP3 object. Passnewthe name of the POP3 server:$pop = Net::POP3->new( "pop.myisp.com" ) or die "Can't connect to pop.myisp.com: $!\n";All Net::POP3 functions returnundefor the empty list upon error, depending on the calling context. If an error occurs, the fickle$!variable just might contain a meaningful error message—but also might not.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Simulating Telnet from a Program
- InhaltsvorschauYou want to simulate a telnet connection from your program by logging into a remote machine, issuing commands, and reacting to what is returned. This has many applications, from automating tasks on machines you can telnet to but which don't support scripting or rsh, to simply testing whether a machine's Telnet daemon is still running properly.Use the CPAN module Net::Telnet:
use Net::Telnet; $t = Net::Telnet->new( Timeout => 10, Prompt => '/%/', Host => $hostname ); $t->login($username, $password); @files = $t->cmd("ls"); $t->print("top"); (undef, $process_string) = $t->waitfor('/\d+ processes/'); $t->close;Net::Telnet provides an object-oriented interface to the Telnet protocol. Create a connection withNet::Telnet->new, then interact with the remote machine using method calls on the resulting object.Give thenewmethod a list of named-parameter pairs, much like initializing a hash. We'll cover only a few possible parameters. The most important isHost, the machine you're telnetting to. The default host islocalhost. To connect to a port other than the one Telnet normally uses, specify this in thePortoption. Error handling is done through the function whose reference is specified in theErrmodeparameter.Another important option isPrompt. When you log in or run a command, Net::Telnet uses thePromptpattern to determine when the login or command has completed. The defaultPromptis:/[\$%#>] $/
which matches the common Unix shell prompts. If the prompt on the remote machine doesn't match the default pattern, you have to specify your own. Remember to include slashes.Timeoutlets you control how long (in seconds) network operations wait before they give up. The default is 10 seconds.An error or timeout in the Net::Telnet module raises an exception by default, which, if uncaught, prints a message toEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Pinging a Machine
- InhaltsvorschauYou want to test whether a machine is alive. Network- and system-monitoring software often use the
pingprogram as an indication of availability.Use the standard Net::Ping module:use Net::Ping; $p = Net::Ping->new( ) or die "Can't create new ping object: $!\n"; print "$host is alive" if $p->ping($host); $p->close;Testing whether a machine is up isn't as easy as it sounds. It's not only possible but also unpleasantly common for machines to respond to the ping command when they have no working services. It's better to think of ping as testing whether a machine is reachable, rather than whether the machine is doing its job. To check the latter, you must try its services (Telnet, FTP, web, NFS, etc.).In the form shown in the Solution, Net::Ping attempts to connect to the TCP echo port (port number 7) on the remote machine. Thepingmethod returns true if the connection could be made, false otherwise.You can also ping using other protocols by passing the protocol name tonew. Valid protocols are tcp, udp, syn, and icmp (all lowercase). A UDP ping attempts to connect to the echo port (port 7) on the remote machine, sends a datagram, and attempts to read the response. The machine is considered unreachable if it can't connect, if the reply datagram isn't received, or if the reply differs from the original datagram. An ICMP ping uses the ICMP protocol, just like the ping(8) command. On Unix machines, you must be the superuser to use the ICMP protocol:# use TCP if we're not root, ICMP if we are $pong = Net::Ping->new( $> ? "tcp" : "icmp" ); (defined $pong) or die "Couldn't create Net::Ping object: $!\n"; if ($pong->ping("kingkong.com")) { print "The giant ape lives!\n"; } else { print "All hail mighty Gamera, friend of children!\n"; }A SYN ping is asynchronous: you first send out manypings and then receive responses by repeatedly invoking theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Accessing an LDAP Server
- InhaltsvorschauYou want to fetch or maintain information from a Lightweight Directory Access Protocol (LDAP) server. For example, you have a list of email addresses in your company, and you want correct names for those people.Use the Net::LDAP module from CPAN. For example, to search, use:
use Net::LDAP; $ldap = Net::LDAP->new("ldap.example.com") or die $@; $ldap->bind( ); $mesg = $ldap->search(base => $base_dn, filter => $FILTER); $mesg->code( ) && die $mesg->error; foreach $result ($mesg->all_entries) { # do something with $result } $ldap->unbind( );The Net::LDAP module manages an LDAP session. It is a pure Perl module, so it doesn't require a C compiler to install. To use it effectively, though, you'll need to know a little about LDAP in general and the query syntax in particular. If you're new to LDAP, you might want to read the articles athttp://www.onlamp.com/topics/apache/ldap.The four steps to working with an LDAP server are connecting, authenticating, interacting, and logging off. Interacting includes searching, adding, deleting, and altering records.Theconnectmethod establishes a connection to the LDAP server and is immediately followed by a call to thebindmethod. If you give no argument tobind, you log into the LDAP server anonymously. You can give a fully qualified Distinguished Name (DN) and password to authenticate yourself:$ldap->bind("cn=directory manager,ou=gurus,dc=oreilly,dc=com", password => "timtoady") or die $@;This sends the username and password unencrypted over the wire. For encrypted access, pass asaslparameter tobindto use an Authen::SASL object to authenticate with.TheEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Sending Attachments in Mail
- InhaltsvorschauYou want to send mail that includes attachments; for example, you want to mail a PDF document.Use the MIME::Lite module from CPAN. First, create a MIME::Lite object representing the multipart message:
use MIME::Lite; $msg = MIME::Lite->new(From => 'sender@example.com', To => 'recipient@example.com', Subject => 'My photo for the brochure', Type => 'multipart/mixed');Then, add content through theattachmethod:$msg->attach(Type => 'image/jpeg', Path => '/Users/gnat/Photoshopped/nat.jpg', Filename => 'gnat-face.jpg'); $msg->attach(Type => 'TEXT', Data => 'I hope you can use this!');Finally, send the message, optionally specifying how to send it:$msg->send( ); # default is to use sendmail(1) # alternatively $msg->send('smtp', 'mailserver.example.com');The MIME::Lite module creates and sends mail with MIME-encoded attachments. MIME stands for Multimedia Internet Mail Extensions, and is the standard way of attaching files and documents. It can't, however, extract attachments from mail messages—for that you need to read Recipe 18.10.When creating and adding to a MIME::Lite object, pass parameters as a list of named parameter pairs. The pair conveys both mail headers (e.g.,From,To,Subject) and those specific to MIME::Lite. In general, mail headers should be given with a trailing colon:$msg = MIME::Lite->new('X-Song-Playing:' => 'Natchez Trace');However, MIME::Lite accepts the headers in Table 18-1 without a trailing colon.*indicates a wildcard, soContent-*includesContent-TypeandContent-IDbut notDis-Content.Table 18-2: Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Extracting Attachments from Mail
- InhaltsvorschauYou have one or more mail messages with MIME attachments, and you want to process these messages with a Perl program to extract files or otherwise manipulate attachments.Use the MIME-Tools bundle from CPAN:
use MIME::Parser; $parser = MIME::Parser->new( ); $parser->output_to_core(1); # don't write attachments to disk $message = $parser->parse_data($MESSAGE); # die( )s if can't parse # OR $message = $parser->parse($FILEHANDLE); # die( )s if can't parse $head = $message->head( ); # object--see docs $preamble = $message->preamble; # ref to array of lines $epilogue = $message->epilogue; # ref to array of lines $num_parts = $message->parts; for (my $i=0; $i < $num_parts; $i++) { my $part = $message->parts($i); my $content_type = $part->mime_type; my $body = $part->as_string; }Formally, a MIME message has only two parts: the head (containing headers such asFromandSubject) and the body (containing the message, rather than its metadata). The body, however, has three parts: the preamble (text before the first attachment), a series of parts (the attachments), and the epilogue (text after the last attachment). This is shown in Figure 18-1.
Figure 18-1: Composition of a MIME messageIn the Solution, we disable the default behavior of MIME::Parser that writes the attachments to disk. Doing so increases memory consumption because now the decoded attachments must be stored in memory, but prevents the need to clean up temporary files and directories once the attachments are no longer needed.To write attachments to a file, replace the call tooutput_to_corewith calls to methods that specify the directory in which to store the attachments and what to name the files. TheEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing an XML-RPC Server
- InhaltsvorschauYou want to write a server for an XML-RPC web service.Use the SOAP-Lite distribution from CPAN, which supports XML-RPC. Your server can be either standalone:
use XMLRPC::Transport::HTTP; $daemon = XMLRPC::Transport::HTTP::Daemon ->new(LocalPort => $PORT) ->dispatch_to('ClassName') ->handle( );or a CGI script:use XMLRPC::Transport::HTTP; $daemon = XMLRPC::Transport::HTTP::CGI ->dispatch_to('ClassName') ->handle( );In both cases, incoming methods will be invoked on the class named in thedispatch_tomethod (it will berequired if it is not already loaded):package ClassName; sub handler { my ($class, $arg_hash_ref) = @_; # ... }The SOAP-Lite toolkit's modules take care of translating between Perl's native data structures and the XML representation of those values. However, it's up to you how the server decides which method to invoke when a request arrives. This process of matching an XML-RPC request to a Perl function is known as dispatching.It looks strange to see all those chained method invocation in the Solution. When used to set a value, XMLRPC::Lite methods return their invocation, which lets you chain methods rather than repeat$daemonover and over again:$daemon = XMLRPC::Transport::HTTP::Daemon; $daemon->new(LocalPort => $PORT); $daemon->dispatch_to('ClassName'); $daemon->handle( );Thenewconstructor takes IO::Socket::INET's constructor parameters as well, so you can sayReuseAddr => 1, for example.When you give thedispatch_tomethod a class name argument (as in the Solution), the XML-RPC server looks for methods in that class. If the server in the Solution receives a request for theClassName.hasBeenmethod (XML-RPC methods are typically in IntraCaps), it invokes theClassName->hasBeenEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing an XML-RPC Client
- InhaltsvorschauYou want to write a client for an XML-RPC service.Use the XMLRPC::Lite module from the SOAP-Lite distribution:
use XMLRPC::Lite; $server = XMLRPC::Lite->proxy("http://server.example.com/path"); $result = $server->call('ClassName.handler', @ARGS); die $call->faultstring if $call->fault; print $call->result;A single XML-RPC server may run many services, differentiated by their method name:ClassName.handlercorresponds toClassName->handleron the server side;A.B.methodcorresponds toA::B->method; and acalltohandlercorresponds tomain->handler.Theproxyis the actual URL of the server. If you're using a CGI server, theproxymethod looks something like this:$server->proxy("http://server.example.com/path/to/server.cgi")There are three ways to invoke remote methods. The first way is to use thecallmethod on your XMLRPC::Lite object. The first argument tocallis the remote method name, and the remaining arguments are parameters for the remote method:$returned = $server -> call("getRecordByNumber", 12, { format => "CSV" }) -> result;The second way to invoke a remote method is to call that method on the XMLRPC::Lite object. This works only when the remote method name isn't the same as a method provided by the XMLRPC::Lite object. For example:$returned = $server -> getRecordByNumber(12, { format => "CSV" }) -> result;The last way to invoke a remote method is with autodispatch, turning unrequired function calls and method invocations in your Perl program into XML-RPC requests. Enable autodispatch with:use XMLRPC::Lite +autodispatch => proxy => "http://server.example.com/path"; $returned = getRecordByNumber(12, { format => "CSV" });A critical difference between autodispatch and the other styles is that autodispatch automatically decodes the result into a Perl value for you. When you use an XMLRPC::Lite object, you must explicitly invoke theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a SOAP Server
- InhaltsvorschauYou want to write a web service where SOAP is the transport.Use the SOAP-Lite distribution from CPAN. Your server can be either standalone:
use SOAP::Transport::HTTP; $daemon = SOAP::Transport::HTTP::Daemon ->new(LocalPort => $PORT) ->dispatch_to('ClassName') ->handle( );or a CGI script:use SOAP::Transport::HTTP; $daemon = SOAP::Transport::HTTP::CGI ->dispatch_to('ClassName') ->handle( );In both cases, the only methods that SOAP clients are permitted to invoke are those in the classes named in the argument todispatch_to(those classes will berequired if not already loaded):package ClassName; sub handler { my ($class, $arg_hash_ref) = @_; # ... }The SOAP-Lite toolkit contains SOAP and XML-RPC modules. Writing a SOAP service is similar to writing an XML-RPC service. Control method dispatch in SOAP as in XML-RPC. See Recipe 18.11 for details.Recipe 18.14; Recipe 18.11Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a SOAP Client
- InhaltsvorschauYou want to write a client for a SOAP web service.Use the SOAP::Lite module from the SOAP-Lite distribution:
use SOAP::Lite; $server = SOAP::Lite -> uri("http://localhost/Namespace") -> proxy("http://server.example.com/path"); $result = $server->call('ClassName.handler', @ARGS); die $call->faultstring if $call->fault; print $call->result;A single SOAP server may offer remote access to the methods of many classes. A client identifies the class upon which it wishes to invoke methods with theuriparameter. The hostname in the argument is irrelevant; only the path portion (the class name) matters. For example, these two URIs are equivalent:http://modacrylic.clue.com/GimpyMod http://weenies.mit.edu/GimpyMod
As with XML-RPC, theproxyargument is the server's URL. For example, if your SOAP server is implemented as a CGI script, theproxycall looks like this:$server->proxy("http://server.example.com/path/to/server.cgi");Invoke remote methods as you do with XML-RPC, either with thecallmethod:$returned = $server -> call("getRecordByNumber", 12, { format => "CSV" }) -> result;or by invoking the method on a SOAP::Lite object directly:$returned = $server -> getRecordByNumber(12, { format => "CSV" }) -> result;or usingautodispatch:use SOAP::Lite +autodispatch => uri => "http://identifier.example.com/Namespace", proxy => "http://server.example.com/path"; $returned = getRecordByNumber(12, { format => "CSV" });You can also use this with OO syntax:$returned = Some::Remote::Module->getRecordByNumber(12, { format => "CSV" });There's a lot more to SOAP than we can explain here. The books Programming Web Services with SOAPEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: rfrm
- InhaltsvorschauThis program fetches a list of messages waiting on a POP3 server and summarizes that list:
# ./rfrm Nathan Torkington Re: YAPC Rob Brown Re: Net::Ping syn round trip time Rael Dornfest Re: Book Proposal - Blosxom in a Nutshell spam@example.com Extend your ping times 633%!!!!
Tell the program which POP3 server to contact and the username and password to authenticate with using a ~/.rfrmrc file like this:SERVER=pop3.example.com USER=gnat PASS=I(heart)Perl
The program verifies that your .rfrmrc file isn't readable or writable by anyone but you, and stops if it is.The program is shown in Example 18-4.Example 18-4. rfrm#!/usr/bin/perl -w # rfrm - get a list of mail messages waiting on a pop server use Net::POP3; use strict; my ($Pop_host, $Pop_user, $Pop_pass) = read_conf( ) or usage( ); my $pop = Net::POP3->new($Pop_host) or die "Can't connect to $Pop_host: $!\n"; defined ($pop->login($Pop_user, $Pop_pass)) or die "Can't authenticate\n"; my $messages = $pop->list or die "Can't get a list of messages\n"; foreach my $msgid (sort { $a <=> $b } keys %$messages) { my ($msg, $subject, $sender, $from); $msg = $pop->top($msgid, 0); # returns ref to array $msg = join "\n", @$msg; # now it's one big string # extract From and Subject lines, and boil From down $subject = $sender = ''; if ($msg =~ /^Subject: (.*)/m) { $subject = $1 } if ($msg =~ /^From: (.*)/m) { $sender = $1 } ($from = $sender) =~ s{<.*>}{ }; if ($from =~ m{\(.*\)}) { $from = $1 } $from ||= $sender; # print boiled down summary of this message printf("%-20.20s %-58.58s\n", $from, $subject); } sub usage { die <<EOF ; usage: rfrm Configure with ~/.rfrmrc thus: SERVER=pop.mydomain.com USER=myusername PASS=mypassword EOF } sub read_conf { my ($server, $user, $pass, @stat); open(FH, "< $ENV{HOME}/.rfrmrc") or return; # paranoia check @stat = stat(FH) or die "Can't stat ~/.rfrmrc: $!\n"; if ($stat[2] & 177) { die "~/.rfrmrc should be mode 600 or tighter\n"; } # read config file while (<FH>) { if (/SERVER=(.*)/) { $server = $1 } if (/USER=(.*)/) { $user = $1 } if (/PASS=(.*)/) { $pass = $1 } } close FH; # must have something for every value return unless $server && $user && $pass; return ($server, $user, $pass); }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: expn and vrfy
- InhaltsvorschauThis program uses Net::SMTP to talk to an SMTP server and uses the EXPN and VRFY commands to figure out whether an address is going to work. It isn't perfect, because it relies on the remote SMTP giving meaningful information with the EXPN and VRFY commands—they are common ways for spammers to harvest email addresses, and so many servers have disabled these options. It uses Net::DNS if available, but can also work without it.This program inspects
$0(the program name) to see how it was called. If run as expn, it uses the EXPN command; if called as vrfy, it uses the VRFY command. Use links to install it with two names (on a system without links, simply copy the program code in Example 18-5):% cat > expn #!/usr/bin/perl -w ... ^D % ln expn vrfy
When given an email address as an argument, the program reports what the mail server says when you try to EXPN or VRFY the address. If you have Net::DNS installed, it tries all hosts listed as mail exchangers in the DNS entry for the address.Here's what it looks like without Net::DNS:% expn gnat@frii.com Expanding gnat at frii.com (gnat@frii.com): calisto.frii.com Hello coprolith.frii.com [207.46.130.14], pleased to meet you gnat@mail.frii.comAnd here's the same address with Net::DNS installed:% expn gnat@frii.com Expanding gnat at mail.frii.net (gnat@frii.com): deimos.frii.com Hello coprolith.frii.com [207.46.130.14], pleased to meet you Nathan Torkington <gnat@deimos.frii.com> Expanding gnat at mx1.frii.net (gnat@frii.com): phobos.frii.com Hello coprolith.frii.com [207.46.130.14], pleased to meet you gnat@mail.frii.com Expanding gnat at mx2.frii.net (gnat@frii.com): europa.frii.com Hello coprolith.frii.com [207.46.130.14], pleased to meet you gnat@mail.frii.com Expanding gnat at mx3.frii.net (gnat@frii.com): ns2.winterlan.com Hello coprolith.frii.com [207.46.130.14], pleased to meet you 550 gnat... User unknownEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 19: CGI Programming
- InhaltsvorschauA successful tool is one that was used to do something undreamt of by its author.—Stephen C. JohnsonChanges in the environment or the availability of food can make certain species more successful than others at finding food or avoiding predators. Many scientists believe a comet struck the Earth millions of years ago, throwing an enormous cloud of dust into the atmosphere. Subsequent radical changes to the environment proved too much for some organisms, say dinosaurs, and hastened their extinction. Other creatures, such as mammals, found new food supplies and freshly exposed habitats to compete in.Much as the comet altered the environment for prehistoric species, the Web has altered the environment for modern programming languages. It's opened up new vistas, and although some languages have found themselves eminently unsuited to this new world order, Perl has positively thrived. Because of its strong background in text processing and system glue, Perl has readily adapted itself to the task of providing information using text-based protocols.The Web is driven by plain text. Web servers and web browsers communicate using a text protocol called HTTP, Hypertext Transfer Protocol. Many of the documents exchanged are encoded in a text markup system called HTML, Hypertext Markup Language. This grounding in text is the source of much of the Web's flexibility, power, and success. The only notable exception to the predominance of plain text is the Secure Socket Layer (SSL) protocol that encrypts other protocols like HTTP into binary data that snoopers can't decode.Web pages are identified using the Uniform Resource Locator (URL) naming scheme. URLs look like this:
http://www.perl.com/CPAN/ http://www.perl.com:8001/bad/mojo.html ftp://gatekeeper.dec.com/pub/misc/netlib.tar.Z ftp://anonymous@myplace:gatekeeper.dec.com/pub/misc/netlib.tar.Z file:///etc/motd
The first part (Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauChanges in the environment or the availability of food can make certain species more successful than others at finding food or avoiding predators. Many scientists believe a comet struck the Earth millions of years ago, throwing an enormous cloud of dust into the atmosphere. Subsequent radical changes to the environment proved too much for some organisms, say dinosaurs, and hastened their extinction. Other creatures, such as mammals, found new food supplies and freshly exposed habitats to compete in.Much as the comet altered the environment for prehistoric species, the Web has altered the environment for modern programming languages. It's opened up new vistas, and although some languages have found themselves eminently unsuited to this new world order, Perl has positively thrived. Because of its strong background in text processing and system glue, Perl has readily adapted itself to the task of providing information using text-based protocols.The Web is driven by plain text. Web servers and web browsers communicate using a text protocol called HTTP, Hypertext Transfer Protocol. Many of the documents exchanged are encoded in a text markup system called HTML, Hypertext Markup Language. This grounding in text is the source of much of the Web's flexibility, power, and success. The only notable exception to the predominance of plain text is the Secure Socket Layer (SSL) protocol that encrypts other protocols like HTTP into binary data that snoopers can't decode.Web pages are identified using the Uniform Resource Locator (URL) naming scheme. URLs look like this:
http://www.perl.com/CPAN/ http://www.perl.com:8001/bad/mojo.html ftp://gatekeeper.dec.com/pub/misc/netlib.tar.Z ftp://anonymous@myplace:gatekeeper.dec.com/pub/misc/netlib.tar.Z file:///etc/motd
The first part (http,ftp,file) is called the scheme, which identifies how the file is retrieved. The next part (://) means a hostname will follow, whose interpretation depends on the scheme. After the hostname comes theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a CGI Script
- InhaltsvorschauYou want to write a CGI script to process the contents of an HTML form. In particular, you want to access the form contents and produce valid output in return.A CGI script is a server-side program launched by a web server to generate dynamic content. It receives encoded information from the remote client (user's browser) via
STDINand environment variables, and it must produce a valid HTTP header and body onSTDOUT. The standard CGI module, shown in Example 19-1, painlessly manages input and output encoding.Example 19-1. hiweb#!/usr/bin/perl -w # hiweb - load CGI module to decode information given by web server use strict; use CGI qw(:standard escapeHTML); # get a parameter from a form my $value = param('PARAM_NAME'); # output a document print header( ), start_html("Howdy there!"), p("You typed: ", tt(escapeHTML($value))), end_html( );CGI is just a protocol, a formal agreement between a web server and a separate program. The server encodes the client's form input data, and the CGI program decodes the form and generates output. The protocol says nothing regarding which language the program must be written in; programs and scripts that obey the CGI protocol have been written in C, shell, Rexx, C++, VMS DCL, Smalltalk, Tcl, Python, and of course Perl.The full CGI specification lays out which environment variables hold which data (such as form input parameters) and how it's all encoded. In theory, it should be easy to follow the protocol to decode the input, but in practice, it is surprisingly tricky to get right. That's why we strongly recommend using the CGI module. The hard work of handling the CGI requirements correctly and conveniently has already been done, freeing you to write the core of your program without getting bogged down in network protocols.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Redirecting Error Messages
- InhaltsvorschauYou're having trouble tracking down your script's warnings and error messages, or your script's
STDERRoutput is confusing your server.Use the CGI::Carp module from the standard Perl distribution to prefix each line onSTDERRwith the program name and current date. You can also send warnings and errors to a file or the browser if you wish.Tracking down error messages from CGI scripts is notoriously annoying. Even if you manage to find the server error log, you still can't determine which message came from which script, or at what time. Some unfriendly web servers even abort the script if it has the audacity to emit anything out itsSTDERRbefore theContent-Typeheader is generated onSTDOUT, so warnings can get you into trouble.Enter the CGI::Carp module. It replaceswarnanddie—plus the normal Carp module'scarp,croak,cluck, andconfessfunctions—with more verbose and safer versions. It still sends them to the normal server error log.use CGI::Carp; warn "This is a complaint"; die "But this one is serious";
The following use of CGI::Carp also redirects errors to a file of your choice, placed in a BEGIN block to catch compile-time warnings as well:BEGIN { use CGI::Carp qw(carpout); open(LOG, ">>/var/local/cgi-logs/mycgi-log") or die "Unable to append to mycgi-log: $!\n"; carpout(*LOG); }You can even arrange for fatal errors to show up at the client browser, which is nice for your own debugging but might confuse the end user.use CGI::Carp qw(fatalsToBrowser); die "Bad error here";
Even if the error happens before you get the HTTP header out, the module will try to detect this and avoid the dreaded500ServerError. Normal warnings still go to the server error log (or wherever you've sent them withcarpout) with the program name and date stamp prepended.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Fixing a 500 Server Error
- InhaltsvorschauYour CGI script gives you a
500ServerError.Follow the checklist given in the discussion.This checklist is aimed at a Unix audience, but the general principles embodied in the questions apply to all systems.Section 19.3.3.1: Make sure the web server can run the script
Check ownership and permissions with ls -l. The appropriate read and execute bits must be set on the script before the web server can run it. The script should be readable and executable by everyone (or at least by whomever the server runs scripts as). Use chmod 0755 scriptname if it's owned by you, or otherwise chmod 0555 scriptname if owned by the designated anonymous web user, assuming you are running as that user or the superuser. All directories in the path must also have their execute bit set (most FTP clients support changing protections on uploaded files if you don't have shell access to your server).Make sure the script can be identified as a script by the web server. Most web servers have a system-wide cgi-bin, and all files in that directory will be run as scripts. Some servers identify a CGI script as any file whose name ends in a particular extension, such as .cgi or .plx. Some servers have options to permit access via the GET method alone, not through the POST method that your form likely uses. Consult your web server documentation, configuration files, webmaster, and (if all else fails) technical support.If you're running on Unix, do you have the right path to the Perl executable on the#!line? The#!line must be the first line in the script; you can't even have blank lines before the#!line. Some operating systems have ridiculously short limits on the number of characters that can be in this line, so you may need to make a link (e.g., from /home/richh/perl to /opt/installed/third-party/software/perl-5.004/bin/perlEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a Safe CGI Program
- InhaltsvorschauBecause CGI programs allow external users to run programs on systems they would not otherwise have access on, all CGI programs represent a potential security risk. You want to minimize your exposure.
-
Use taint mode (the -T switch on the #! line).
-
Don't blindly untaint data. (See the Discussion.)
-
Sanity-check everything, including all form widget return values, even hidden widgets or values generated by JavaScript code. Many people naïvely assume that just because they tell JavaScript to check the form's values before the form is submitted, the form's values will actually be checked. Not at all! The user can trivially circumvent this by disabling JavaScript in their browser, by downloading the form and altering the JavaScript, or quit by talking HTTP without a browser using any of the examples in Chapter 20 .
-
Check return conditions from system calls.
-
Be conscious of race conditions (described in the Discussion).
-
Run with
use warningsandusestrictto make sure Perl isn't assuming things incorrectly. -
Don't run anything setuid unless you absolutely must. If you must, think about running setgid instead if you can. Certainly avoid setuid root at all costs. If you must run setuid or setgid, use a wrapper unless Perl is convinced your system has secure setuid scripts and you know what this means.
-
Always encode login passwords, credit card numbers, social security numbers, and anything else you'd not care to read pasted across the front page of your local newspaper. Use a secure protocol like SSL when dealing with such data. Ensuring that a CGI only ever runs as HTTPS can be as simple as:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. -
- Executing Commands Without Shell Escapes
- InhaltsvorschauYou need to use a user's input as part of a command, but you don't want to allow the user to trick the shell into running other commands or looking at other files. If you just blindly call the
systemfunction or backticks on a single string containing a command line derived from untested user input, the shell might be used to run the command. This would be unsafe.Unlike its single-argument version, the list form of thesystemfunction is safe from shell escapes. When the command's arguments involve user input from a form, never use this:system("command $input @files"); # UNSAFEWrite it this way instead:system("command", $input, @files); # saferBecause Perl was designed as a glue language, it's easy to use it to call other programs—too easy, in some cases.If you're merely trying to run a shell command but don't need to capture its output, it's easy enough to callsystemusing its multiple argument form. But what happens if you're using the command in backticks or as part of a piped open? Now you have a real problem, because those don't permit the multiple argument form thatsystemdoes. The solution (prior to v5.8; see later in this Discussion) is to manuallyforkandexecthe child processes on your own. It's more work, but at least stray shell escapes won't be ruining your day.It's safe to use backticks in a CGI script only if the arguments you give the program are internally generated, as in:chomp($now = `date`);
But if the command within the backticks contains user-supplied input, perhaps like this:@output = `grep $input @files`;
you have to be much more careful.die "cannot fork: $!" unless defined ($pid = open(SAFE_KID, "-|")); if ($pid = = 0) { exec('grep', $input, @files) or die "can't exec grep: $!"; } else { @output = <SAFE_KID>; close SAFE_KID; # $? contains status }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Formatting Lists and Tables with HTML Shortcuts
- InhaltsvorschauYou have several lists and tables to generate and wish they were easier to output.The CGI module provides HTML helper functions that, when passed array references, apply themselves to each element of the referenced array:
print ol( li([ qw(red blue green)]) ); <OL><LI>red</LI><LI>blue</LI><LI>green</LI></OL> @names = qw(Larry Moe Curly); print ul( li({ -TYPE => "disc" }, \@names) ); <UL><LI TYPE="disc">Larry</LI> <LI TYPE="disc">Moe</LI> <LI TYPE="disc">Curly</LI></UL>
The HTML-generating functions in CGI.pm can make it easy to generate lists and tables. Passed a simple string, these functions produce HTML for that string. But passed an array reference, they work on all strings in that array.print li("alpha"); <LI>alpha</LI> print li( [ "alpha", "omega"] ); <LI>alpha</LI> <LI>omega</LI>The shortcut functions for lists are loaded when you use the:standardimport tag, but you need to ask for:html3explicitly to get helper functions for tables. There's also a conflict between the<TR>tag, which would normally make atr( )function, and Perl's built-intr///operator. Therefore, to make a table row, use theTr( )function.This example generates an HTML table starting with a hash of arrays. The keys will be the row headers, and the array of values are the columns.use CGI qw(:standard :html3); %hash = ( "Wisconsin" => [ "Superior", "Lake Geneva", "Madison" ], "Colorado" => [ "Denver", "Fort Collins", "Boulder" ], "Texas" => [ "Plano", "Austin", "Fort Stockton" ], "California" => [ "Sebastopol", "Santa Rosa", "Berkeley" ], ); $\ = "\n"; print "<TABLE><CAPTION>Cities I Have Known</CAPTION>"; print Tr(th [qw(State Cities)]); for $k (sort keys %hash) { print Tr(th($k), td( [ sort @{$hash{$k}} ] )); } print "</TABLE>";Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Redirecting to a Different Location
- InhaltsvorschauYou need to tell the client's browser to look elsewhere for a page.Instead of a normal header, just issue a location redirect and exit. Don't forget the extra newline at the end of the header.
$url = "http://www.perl.com/CPAN/"; print "Location: $url\n\n"; exit;
Sometimes your CGI program doesn't need to generate the document on its own. It only needs to tell the client at the other end to fetch a different document instead. In that case, the HTTP header needs to include this directive as aLocationline followed by the URL you want to send them to. Make sure to use an absolute URL, not a relative one.The direct and literal solution given in the Solution is usually sufficient, but if you already have the CGI module loaded, use theredirectfunction. You might use this code for building and setting a cookie, as shown in Example 19-4.Example 19-4. oreobounce#!/usr/bin/perl -w # oreobounce - set a cookie and redirect the browser use CGI qw(:cgi); use strict; my $oreo = cookie( -NAME => 'filling', -VALUE => "vanilla crème", -EXPIRES => '+3M', # M for month, m for minute -DOMAIN => '.perl.com'); my $whither = "http://somewhere.perl.com/nonesuch.html"; print redirect( -URL => $whither, -COOKIE => $oreo);That would produce:Status: 302 Moved Temporarily Set-Cookie: filling=vanilla%20cr%E4me; domain=.perl.com; expires=Tue, 21-Jul-1998 11:58:55 GMT Date: Tue, 21 Apr 1998 11:55:55 GMT Location: http://somewhere.perl.com/nonesuch.html Content-Type: text/html <<blank line here>>Example 19-5 is a complete program that looks at the client browser name and redirects it to a page in Eric Raymond's edition of theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Debugging the Raw HTTP Exchange
- InhaltsvorschauYour CGI script is misbehaving strangely with your browser, and you suspect something in the HTTP header is missing. You want to find out exactly what your browser is sending to the server in the HTTP header.Create your own fake web server and point your browser at it, as shown in Example 19-6.Example 19-6. dummyhttpd
#!/usr/bin/perl -w # dummyhttpd - start an HTTP daemon and print what the client sends use strict; use LWP 5.32; # minimal good version use HTTP::Daemon; my $server = HTTP::Daemon->new(Timeout => 60, LocalPort => 8989); print "Please contact me at: <URL:", $server->url, ">\n"; while (my $client = $server->accept) { CONNECTION: while (my $answer = $client->get_request) { print $answer->as_string; $client->autoflush; RESPONSE: while (<STDIN>) { last RESPONSE if $_ eq ".\n"; last CONNECTION if $_ eq "..\n"; print $client $_; } print "\nEOF\n"; } print "CLOSE: ", $client->reason, "\n"; $client->close; undef $client; }It's hard to keep track of which versions of all browsers still have which bugs. The fake server program can save you days of head scratching, because sometimes a misbehaving browser doesn't send the server the right thing. Historically, we have seen aberrant browsers lose their cookies, mis-escape a URL, send the wrong status line, and other blunders even less obvious.The fake server is best run on the same machine as the real server. That way your browser still sends any cookies destined for that domain. Then instead of pointing your browser at:http://somewhere.com/cgi-bin/whatever
use the alternate port given in thenewconstructor in the Solution. You don't need to be the superuser to run the server if you use the alternate port.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Managing Cookies
- InhaltsvorschauYou want to get or set a cookie to help manage sessions or user preferences.Using CGI.pm, retrieve an existing cookie like this:
$preference_value = cookie("preference name");To prepare a cookie, do this:$packed_cookie = cookie( -NAME => "preference name", -VALUE => "whatever you'd like", -EXPIRES => "+2y");To save a cookie back to the client browser, you must include it in the HTTP header, probably using either theheaderorredirectfunctions:print header(-COOKIE => $packed_cookie);
Cookies store information on the client's browser. If you're using Netscape under Unix, you can inspect your own ~/.netscape/cookies file, although this doesn't show your current cookies. It holds only those cookies present when you last exited the browser. Think of them as per-application user preferences or a way to help with transactions. Benefits of cookies are that they can be shared between several different programs on your server, and they persist even across browser invocations.However, cookies can be used for dubious purposes like traffic analysis and click tracing. This makes some folks very nervous about who is collecting their personal data and what use will be made of their page viewing habits. Cookies don't travel well, either. If you use a browser at home or in someone else's office, it won't have the cookies from the browser at your office. For this reason, do not expect every browser to accept the cookies you give it. As if that weren't bad enough, browsers can't guarantee they'll keep cookies around forever. Here's an excerpt from the HTTP State Management Mechanism RFC (number 2109):Because user agents have finite space in which to store cookies, they may also discard older cookies to make space for newer ones, using, for example, a least-recently-used algorithm, along with constraints on the maximum number of cookies that each origin server may set.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Creating Sticky Widgets
- InhaltsvorschauYou want form fields to default to the last values submitted. For instance, you want a search form like Google (
http://www.google.com/) where the keywords you searched for appear in the search dialog above the results.Use CGI.pm's HTML shortcuts to create your form, which automatically provides previous values as defaults:print textfield("SEARCH"); # previous SEARCH value is the defaultExample 19-8 is a simple script for producing the list of users currently logged in.Example 19-8. who.cgi#!/usr/bin/perl -wT # who.cgi - run who(1) on a user and format the results nicely $ENV{IFS}=''; $ENV{PATH}='/bin:/usr/bin'; use CGI qw(:standard); # print search form print header( ), start_html("Query Users"), h1("Search"); print start_form( ), p("Which user?", textfield("WHO")); submit( ), end_form( ); # print results of the query if we have someone to look for $name = param("WHO"); if ($name) { print h1("Results"); $html = ''; # call who and build up text of response foreach (`who`) { next unless /^$name\s/o; # only lines matching $name s/&/&/g; # escape HTML s/</</g; s/>/>/g; $html .= $_; } # nice message if we didn't find anyone by that name $html = $html || "$name is not logged in"; print pre($html); } print end_html( );The call totextfieldgenerates HTML for a text entry field whose parameter name isWHO. After printing the form, we check whether we were called with a value for theWHOparameter. If so, we try to find lines in the output from who for that user.The documentation for the standard CGI module; Recipe 19.4; Recipe 19.6Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing a Multiscreen CGI Script
- InhaltsvorschauYou want a single CGI script that can return several different pages to the browser. For instance, you want a single script for administering a database of products. The script will be called to display the form to add a product, to process the add-product form, to display a list of products to delete, to process the delete-product form, to display a list of product to edit, to display a form of the product's attributes for the user to change, and to process the edit-product form. You can use these multiscreen CGI scripts to form an elementary shopping-cart application.Use a hidden field to encode the current screen.It is easy to generate sticky hidden fields with the CGI module. The
hiddenfunction returns HTML for a hidden widget and uses the widget's current value if you passhiddenonly the widget name:use CGI qw(:standard); print hidden("bacon");To determine which page ("display product list", "display all items in shopping cart", "confirm order") to display, use another hidden field. We'll call this one.Stateso it won't conflict with any field we might have calledState(for instance, in credit card billing information). To let the user move from page to page, use submit buttons that set.Stateto the name of the page to go to. For instance, to make a button to take the user to the "Checkout" page, use:print submit(-NAME => ".State", -VALUE => "Checkout");
We wrap this in a function to make it easier to type:sub to_page { return submit( -NAME => ".State", -VALUE => shift ) }To decide what code to display, check the.Stateparameter:$page = param(".State") || "Default";Put the code to generate each page in separate subroutines. You could decide which subroutine to call with a longif...elsif...elsif:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Saving a Form to a File or Mail Pipe
- InhaltsvorschauYour CGI script needs to save or mail the entire form contents to a file.To store a form, use the CGI module's
save_parametersfunction orsavemethod, which take a filehandle argument. You can save to a file:# first open and exclusively lock the file open(FH, ">>/tmp/formlog") or die "can't append to formlog: $!"; flock(FH, 2) or die "can't flock formlog: $!"; # either using the procedural interface use CGI qw(:standard); save_parameters(*FH); # with CGI::save # or using the object interface use CGI; $query = CGI->new( ); $query->save(*FH); close(FH) or die "can't close formlog: $!";
or save to a pipe, such as one connected to a mailer process:use CGI qw(:standard); open(MAIL, "|/usr/lib/sendmail -oi -t") or die "can't fork sendmail: $!"; print MAIL <<EOF; From: $0 (your cgi script) To: hisname\@hishost.com Subject: mailed form submission EOF save_parameters(*MAIL); close(MAIL) or die "can't close sendmail: $!";
Sometimes all you want to do with form data is to save it for later use. Thesave_parametersfunction andsavemethod in CGI.pm write form parameters to an open filehandle. That filehandle can be attached to an open file (preferably one opened in append mode and locked, as in the Solution), or to a pipe whose other end is a mail program.File entries are stored one per line asvariable=valuepairs, with any funny characters URL-escaped. Each record is separated by a line with a single equals sign. These are typically read back by invoking theCGI->newmethod with a filehandle argument that manages all of the unescaping automatically, as described later.If you want to add extra information to your query before you save it, theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: chemiserie
- InhaltsvorschauThe CGI script in Example 19-9 lets people order t-shirts and sweaters over the Web, using techniques described in Recipe 19.11. Its output isn't elegant or beautiful, but illustrating the multiscreen technique in a short program was challenging enough without trying to make it pretty as well.The
shirtandsweatersubroutines check their widget values. If the user somehow submits an invalid color or size, the value is reset to the first in the list of allowable colors or sizes.Example 19-9. chemiserie#!/usr/bin/perl -w # chemiserie - simple CGI shopping for shirts and sweaters use strict; use CGI qw(:standard); use CGI::Carp qw(fatalsToBrowser); my %States; # state table mapping pages to functions my $Current_Screen; # the current screen croak "This CGI works only over HTTPS" if $ENV{'SERVER_PORT'} && !$ENV{'HTTPS'}; # Since we deal with sensitive data like credit card numbers # Hash of pages and functions. %States = ( 'Default' => \&front_page, 'Shirt' => \&shirt, 'Sweater' => \&sweater, 'Checkout' => \&checkout, 'Card' => \&credit_card, 'Order' => \&order, 'Cancel' => \&front_page, ); $Current_Screen = param(".State") || "Default"; die "No screen for $Current_Screen" unless $States{$Current_Screen}; # Generate the current page. standard_header( ); while (my($screen_name, $function) = each %States) { $function->($screen_name eq $Current_Screen); } standard_footer( ); exit; ################################ # header, footer, menu functions ################################ sub standard_header { print header( ), start_html(-Title => "Shirts", -BGCOLOR=>"White"); print start_form( ); # start_multipart_form( ) if file upload } sub standard_footer { print end_form( ), end_html( ) } sub shop_menu { print p(defaults("Empty My Shopping Cart"), to_page("Shirt"), to_page("Sweater"), to_page("Checkout")); } ############################# # subroutines for each screen ############################# # The default page. sub front_page { my $active = shift; return unless $active; print "<H1>Hi!</H1>\n"; print "Welcome to our Shirt Shop! Please make your selection from "; print "the menu below.\n"; shop_menu( ); } # Page to order a shirt from. sub shirt { my $active = shift; my @sizes = qw(XL L M S); my @colors = qw(Black White); my ($size, $color, $count) = (param("shirt_size"), param("shirt_color"), param("shirt_count")); # sanity check if ($count) { $color = $colors[0] unless grep { $_ eq $color } @colors; $size = $sizes[0] unless grep { $_ eq $size } @sizes; param("shirt_color", $color); param("shirt_size", $size); } unless ($active) { print hidden("shirt_size") if $size; print hidden("shirt_color") if $color; print hidden("shirt_count") if $count; return; } print h1("T-Shirt"); print p("What a shirt! This baby is decked out with all the options.", "It comes with full luxury interior, cotton trim, and a collar", "to make your eyes water! Unit price: \$33.00"); print h2("Options"); print p("How Many?", textfield("shirt_count")); print p("Size?", popup_menu("shirt_size", \@sizes ), "Color?", popup_menu("shirt_color", \@colors)); shop_menu( ); } # Page to order a sweater from. sub sweater { my $active = shift; my @sizes = qw(XL L M); my @colors = qw(Chartreuse Puce Lavender); my ($size, $color, $count) = (param("sweater_size"), param("sweater_color"), param("sweater_count")); # sanity check if ($count) { $color = $colors[0] unless grep { $_ eq $color } @colors; $size = $sizes[0] unless grep { $_ eq $size } @sizes; param("sweater_color", $color); param("sweater_size", $size); } unless ($active) { print hidden("sweater_size") if $size; print hidden("sweater_color") if $color; print hidden("sweater_count") if $count; return; } print h1("Sweater"); print p("Nothing implies preppy elegance more than this fine", "sweater. Made by peasant workers from black market silk,", "it slides onto your lean form and cries out \"Take me,", "for I am a god!\". Unit price: \$49.99."); print h2("Options"); print p("How Many?", textfield("sweater_count")); print p("Size?", popup_menu("sweater_size", \@sizes)); print p("Color?", popup_menu("sweater_color", \@colors)); shop_menu( ); } # Page to display current order for confirmation. sub checkout { my $active = shift; return unless $active; print h1("Order Confirmation"); print p("You ordered the following:"); print order_text( ); print p("Is this right? Select 'Card' to pay for the items", "or 'Shirt' or 'Sweater' to continue shopping."); print p(to_page("Card"), to_page("Shirt"), to_page("Sweater")); } # Page to gather credit-card information. sub credit_card { my $active = shift; my @widgets = qw(Name Address1 Address2 City Zip State Phone Card Expiry); unless ($active) { print map { hidden($_) } @widgets; return; } print pre(p("Name: ", textfield("Name")), p("Address: ", textfield("Address1")), p(" ", textfield("Address2")), p("City: ", textfield("City")), p("Zip: ", textfield("Zip")), p("State: ", textfield("State")), p("Phone: ", textfield("Phone")), p("Credit Card #: ", textfield("Card")), p("Expiry: ", textfield("Expiry"))); print p("Click on 'Order' to order the items. Click on 'Cancel' to return shopping."); print p(to_page("Order"), to_page("Cancel")); } # Page to complete an order. sub order { my $active = shift; unless ($active) { return; } # you'd check credit card values here print h1("Ordered!"); print p("You have ordered the following toppings:"); print order_text( ); print p(defaults("Begin Again")); } # Returns HTML for the current order ("You have ordered ...") sub order_text { my $html = ''; if (param("shirt_count")) { $html .= p("You have ordered ", param("shirt_count"), " shirts of size ", param("shirt_size"), " and color ", param("shirt_color"), "."); } if (param("sweater_count")) { $html .= p("You have ordered ", param("sweater_count"), " sweaters of size ", param("sweater_size"), " and color ", param("sweater_color"), "."); } $html = p("Nothing!") unless $html; $html .= p("For a total cost of ", calculate_price( )); return $html; } sub calculate_price { my $shirts = param("shirt_count") || 0; my $sweaters = param("sweater_count") || 0; return sprintf("\$%.2f", $shirts*33 + $sweaters * 49.99); } sub to_page { submit(-NAME => ".State", -VALUE => shift) }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 20: Web Automation
- InhaltsvorschauThe web, then, or the pattern, a web at once sensuous and logical, an elegant and pregnant texture: that is style, that is the foundation of the art of literature.—Robert Louis Stevenson, On some Technical Elements of Style in Literature (1885)Chapter 19 concentrated on responding to browser requests and producing documents using CGI. This chapter approaches the Web from the other side: instead of responding to a browser, you pretend to be one, generating requests and processing returned documents. We make extensive use of modules to simplify this process because the intricate network protocols and document formats are tricky to get right. By letting existing modules handle the hard parts, you can concentrate on the interesting part—your own program.The relevant modules can all be found under the following URL:
http://search.cpan.org/modlist/World_Wide_Web
There you'll find modules for computing credit card checksums, interacting with Netscape or Apache server APIs, processing image maps, validating HTML, and manipulating MIME. The largest and most important modules for this chapter, though, are found in the libwww-perl suite of modules, referred to collectively as LWP. Table 20-1 lists just a few modules included in LWP.Table 20-1: LWP modules (continued) Module namePurposeLWP::UserAgentWWW user agent classLWP::RobotUADevelop robot applicationsLWP::ProtocolInterface to various protocol schemesEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauChapter 19 concentrated on responding to browser requests and producing documents using CGI. This chapter approaches the Web from the other side: instead of responding to a browser, you pretend to be one, generating requests and processing returned documents. We make extensive use of modules to simplify this process because the intricate network protocols and document formats are tricky to get right. By letting existing modules handle the hard parts, you can concentrate on the interesting part—your own program.The relevant modules can all be found under the following URL:
http://search.cpan.org/modlist/World_Wide_Web
There you'll find modules for computing credit card checksums, interacting with Netscape or Apache server APIs, processing image maps, validating HTML, and manipulating MIME. The largest and most important modules for this chapter, though, are found in the libwww-perl suite of modules, referred to collectively as LWP. Table 20-1 lists just a few modules included in LWP.Table 20-1: LWP modules (continued) Module namePurposeLWP::UserAgentWWW user agent classLWP::RobotUADevelop robot applicationsLWP::ProtocolInterface to various protocol schemesLWP::Authen::BasicHandle 401 and 407 responsesLWP::MediaTypesMIME types configuration (text/html, etc.)Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Fetching a URL from a Perl Script
- InhaltsvorschauYou have a URL whose contents you want to fetch from a script.Use the
getfunction from the CPAN module LWP::Simple, part of LWP.use LWP::Simple; $content = get($URL);
The right library makes life easier, and the LWP modules are the right ones for this task. As you can see from the Solution, LWP makes this task a trivial one.Thegetfunction from LWP::Simple returnsundefon error, so check for errors this way:use LWP::Simple; unless (defined ($content = get $URL)) { die "could not get $URL\n"; }When called that way, however, you can't determine the cause of the error. For this and other elaborate processing, you'll have to go beyond LWP::Simple.Example 20-1 is a program that fetches a remote document. If it fails, it prints out the error status line. Otherwise, it prints out the document title and the number of bytes of content. We use three modules, two of which are from LWP.- LWP::UserAgent
-
This module creates a virtual browser. The object returned from the new constructor is used to make the actual request. We've set the name of our agent to "Schmozilla/v9.14 Platinum" just to give the remote webmaster browser-envy when they see it in their logs. This is useful on obnoxious web servers that needlessly consult the user agent string to decide whether to return a proper page or an infuriating "you need Internet Navigator v12 or later to view this site" cop-out.
- HTTP::Response
-
This is the object type returned when the user agent actually runs the request. We check it for errors and contents.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Automating Form Submission
- InhaltsvorschauYou want to submit form values to a CGI script from your program. For example, you want to write a program that searches Amazon and notifies you when new books with a particular keyword in the title or new books by a particular author appear.If you're submitting form values with GET, use the
getmethod on an LWP::UserAgent object:use LWP::Simple; use URI::URL; $url = url("http://www.amazon.com/exec/obidos/search-handle-url/index=books"); $url->query_form("field-author" => "Larry Wall"); # more params if needed $page = get($url);If you're using the POST method, create your own user agent and encode the content appropriately:use LWP::UserAgent; $ua = LWP::UserAgent->new( ); $resp = $ua->post("www.amazon.com/exec/obidos/search-handle-form", { "url" => "index-books", "field-keywords" => "perl" }); $content = $resp->content;For simple operations, the procedural interface of the LWP::Simple module is sufficient. For fancier ones, the LWP::UserAgent module provides a virtual browser object, which you manipulate using method calls.The format of a query string is:field1=value1&field2=value2&field3=value3
In GET requests, this is encoded in the URL being requested:script.cgi?field1=value1&field2=value2&field3=value3
Fields must still be properly escaped, so setting theargform parameter to "thisisn't <EASY> & <FUN>" would yield:http://www.site.com/path/to/ script.cgi?arg=%22this+isn%27t+%3CEASY%3E+%26+%3CFUN%3E%22Thequery_formmethod called on a URL object correctly escapes the form values for you, or you could use theURI::Escape::uri_escapeorCGI::escape_htmlfunctions on your own. In POST requests, the query string is in the body of the HTTP document sent to the CGI script.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Extracting URLs
- InhaltsvorschauYou want to extract all URLs from an HTML file. For example, you have downloaded a page that lists the MP3 files downloadable from some site. You want to extract those MP3s' URLS so you can filter the list and write a program to download the ones you want.Use the HTML::LinkExtor module from CPAN:
use HTML::LinkExtor; $parser = HTML::LinkExtor->new(undef, $base_url); $parser->parse_file($filename); @links = $parser->links; foreach $linkarray (@links) { my @element = @$linkarray; my $elt_type = shift @element; # element type # possibly test whether this is an element we're interested in while (@element) { # extract the next attribute and its value my ($attr_name, $attr_value) = splice(@element, 0, 2); # ... do something with them ... } }You can use HTML::LinkExtor in two different ways: either by callinglinksto get a list of all links in the document once it is completely parsed, or by passing a code reference in the first argument tonew. The referenced function is called on each link as the document is parsed.Thelinksmethod clears the link list, so call it only once per parsed document. It returns a reference to an array of elements. Each element is itself an array reference with an HTML::Element object at the front followed by a list of attribute name and attribute value pairs. For instance, the HTML:<A HREF="http://www.perl.com/">Home page</A> <IMG SRC="images/big.gif" LOWSRC="images/big-lowres.gif">
would return a data structure like this:[ [ a, href => "http://www.perl.com/" ], [ img, src => "images/big.gif", lowsrc => "images/big-lowres.gif" ] ]Here's an example of how to use$elt_typeand$attr_nameto print out and anchor an image:Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Converting ASCII to HTML
- InhaltsvorschauYou want to convert ASCII text to HTML. For example, you have mail you want to display intelligently on a web page.Use the simple little encoding filter in Example 20-3.Example 20-3. text2html
#!/usr/bin/perl -w -p00 # text2html - trivial html encoding of normal text # -p means apply this script to each record. # -00 mean that a record is now a paragraph use HTML::Entities; $_ = encode_entities($_, "\200-\377"); if (/^\s/) { # Paragraphs beginning with whitespace are wrapped in <PRE> s{(.*)$} {<PRE>\n$1</PRE>\n}s; # indented verbatim } else { s{^(>.*)} {$1<BR>}gm; # quoted text s{<URL:(.*?)>} {<A HREF="$1">$1</A>}gs # embedded URL (good) || s{(http:\S+)} {<A HREF="$1">$1</A>}gs; # guessed URL (bad) s{*(\S+)*} {<STRONG>$1</STRONG>}g; # this is *bold* here s{\b_(\S+)\_\b} {<EM>$1</EM>}g; # this is _italics_ here s{^} {<P>\n}; # add paragraph tag }Converting arbitrary plain text to HTML has no general solution because there are too many conflicting ways to represent formatting information. The more you know about the input, the better you can format it.For example, if you knew that you would be fed a mail message, you could add this block to format the mail headers:BEGIN { print "<TABLE>"; $_ = encode_entities(scalar <>); s/\n\s+/ /g; # continuation lines while ( /^(\S+?:)\s*(.*)$/gm ) { # parse heading print "<TR><TH ALIGN='LEFT'>$1</TH><TD>$2</TD></TR>\n"; } print "</TABLE><HR>"; }The CPAN module HTML::TextToHTML has options for headers, footers, indentation, tables, and more.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Converting HTML to ASCII
- InhaltsvorschauYou want to convert an HTML file into formatted, plain ASCII. For example, you want to mail a web document to someone.If you have an external formatter like lynx, call an external program:
$ascii = `lynx -dump $filename`;
If you want to do it within your program and don't care about the things that the HTML::FormatText formatter doesn't yet handle well (tables and frames):use HTML::FormatText 3; $ascii = HTML::FormatText->format_file( $filename, leftmargin => 0, rightmargin => 50 );
These examples both assume the HTML is in a file. If your HTML is in a variable, you need to write it to a file for lynx to read. With HTML::FormatText, use theformat_string( )method:use HTML::FormatText 3; $ascii = HTML::FormatText->format_string( $filename, leftmargin => 0, rightmargin => 50 );
If you use Netscape, its "Save as" option with the type set to "Text" does the best job with tables.The documentation for the CPAN modules HTML::TreeBuilder and HTML::FormatText; your system's lynx(1) manpage; Recipe 20.6Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Extracting or Removing HTML Tags
- InhaltsvorschauYou want to remove HTML tags from a string, leaving just plain text. For example, you are indexing a document but don't want your index to show "words" like
<B>and<body>.The following oft-cited solution is simple but wrong on all but the most trivial HTML:($plain_text = $html_text) =~ s/<[^>]*>//gs; # WRONG
A correct but slower and slightly more complicated way is to use the technique from Recipe 20.5:use HTML::FormatText 2; $plain_text = HTML::FormatText->format_string($html_text);
As with almost everything else in Perl, there is more than one way to do it. Each solution attempts to strike a balance between speed and flexibility. Occasionally you may find HTML that's simple enough that a trivial command-line call works:% perl -pe 's/<[^>]*>//g' file
However, this breaks with files whose tags cross line boundaries, like this:<IMG SRC = "foo.gif" ALT = "Flurp!">So, you'll see people doing this instead:% perl -0777 -pe 's/<[^>]*>//gs' file
or its scripted equivalent:{ local $/; # temporary whole-file input mode $html = <FILE>; $html =~ s/<[^>]*>//gs; }But even that isn't good enough except for simplistic HTML without any interesting bits in it. This approach fails for the following examples of valid HTML (among many others):<IMG SRC = "foo.gif" ALT = "A > B"> <!-- <A comment> --> <script>if (a<b && a>c)</script> <# Just data #> <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
If HTML comments include other tags, those solutions would also break on text like this:<!-- This section commented out. <B>You can't see me!</B> -->The only solution that works well here is to use the HTML parsing routines from CPAN. The second code snippet shown in the Solution demonstrates this better technique.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding Stale Links
- InhaltsvorschauYou want to check a document for invalid links.Use the technique outlined in Recipe 20.3 to extract each link, and then use LWP::Simple's
headfunction to make sure that link exists.Example 20-5 is an applied example of the link-extraction technique. Instead of just printing the name of the link, we call LWP::Simple'sheadfunction on it. The HEAD method fetches the remote document's metainformation without downloading the whole document. If it fails, the link is bad, so we print an appropriate message.Because this program uses thegetfunction from LWP::Simple, it is expecting a URL, not a filename. If you want to supply either, use the URI::Heuristic module described in Recipe 20.1.Example 20-5. churl#!/usr/bin/perl -w # churl - check urls use HTML::LinkExtor; use LWP::Simple; $base_url = shift or die "usage: $0 <start_url>\n"; $parser = HTML::LinkExtor->new(undef, $base_url); $html = get($base_url); die "Can't fetch $base_url" unless defined($html); $parser->parse($html); @links = $parser->links; print "$base_url: \n"; foreach $linkarray (@links) { my @element = @$linkarray; my $elt_type = shift @element; while (@element) { my ($attr_name , $attr_value) = splice(@element, 0, 2); if ($attr_value->scheme =~ /\b(ftp|https?|file)\b/) { print " $attr_value: ", head($attr_value) ? "OK" : "BAD", "\n"; } } }Here's an example of a program run:% churl http://www.wizards.com http://www.wizards.com FrontPage/FP_Color.gif: OK FrontPage/FP_BW.gif: BAD #FP_Map: OK Games_Library/Welcome.html: OK
This program has the same limitation as the HTML::LinkExtor program in Recipe 20.3.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding Fresh Links
- InhaltsvorschauGiven a list of URLs, you want to determine which have been modified most recently. For example, you want to sort your bookmarks so those most recently updated are on the top.The program in Example 20-6 reads URLs from standard input, rearranges them by date, and prints them to standard output with those dates prepended.Example 20-6. surl
#!/usr/bin/perl -w # surl - sort URLs by their last modification date use strict; use LWP::UserAgent; use HTTP::Request; use URI::URL qw(url); my %Date; my $ua = LWP::UserAgent->new( ); while ( my $url = url(scalar <>) ) { my $ans; next unless $url->scheme =~ /^(file|https?)$/; $ans = $ua->head($url); if ($ans->is_success) { $Date{$url} = $ans->last_modified || 0; # unknown } else { warn("$url: Error [", $ans->code, "] ", $ans->message, "!\n"); } } foreach my $url ( sort { $Date{$b} <=> $Date{$a} } keys %Date ) { printf "%-25s %s\n", $Date{$url} ? (scalar localtime $Date{$url}) : "<NONE SPECIFIED>", $url; }The surl script works more like a traditional filter program. It reads from standard input one URL per line. (Actually, it usesARGVto read, which defaults toSTDINwhen@ARGVis empty.) The last-modified date on each URL is fetched by a HEAD request. That date is stored in a hash with the URL as key. Then a simple sort by value is run on the hash to reorder the URLs by date. On output, the internal date is converted intolocaltimeformat.Here's an example of using the xurl program from the earlier recipe to extract the URLs, then running that program's output to feed into surl.% xurl http://use.perl.org/~gnat/journal | surl | head Mon Jan 13 22:58:16 2003 http://www.nanowrimo.org/ Sun Jan 12 19:29:00 2003 http://www.costik.com/gamespek.html
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using Templates to Generate HTML
- InhaltsvorschauYou want to store a parameterized template in an external file, read that template from your CGI script, and substitute your own variables for escapes embedded in the text. This separates your program from the static parts of the document.To expand only variable references, use this
templatefunction:sub template { my ($filename, $fillings) = @_; my $text; local $/; # slurp mode (undef) open(my $fh, "<", $filename) or return; $text = <$fh>; # read whole file close($fh); # ignore retval # replace quoted words with value in %$fillings hash $text =~ s{ %% ( .*? ) %% } { exists( $fillings->{$1} ) ? $fillings->{$1} : "" }gsex; return $text; }on a data file like this:<!-- simple.template for internal template( ) function --> <HTML><HEAD><TITLE>Report for %%username%%</TITLE></HEAD> <BODY><H1>Report for %%username%%</H1> %%username%% logged in %%count%% times, for a total of %%total%% minutes.
If you can guarantee the data file is secure from tampering, use the CPAN module Text::Template to expand full expressions. A data file for Text::Template looks like this:<!-- fancy.template for Text::Template --> <HTML><HEAD><TITLE>Report for {$user}</TITLE></HEAD> <BODY><H1>Report for {$user}</H1> { lcfirst($user) } logged in {$count} times, for a total of { int($total / 60) } minutes.For a complete templating solution, see the Template Toolkit's Template module This offers a scripting language and mod_perl integration, and is covered in Recipe 21.17.Parameterized output for your CGI scripts is a good idea for many reasons. Separating your program from its data lets other people (art directors, for instance) change the HTML but not the program. Even better, two programs can share the same template, so style changes in the template are immediately reflected in both programs' output.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Mirroring Web Pages
- InhaltsvorschauYou want a local copy of a web page kept up-to-date.Use LWP::Simple's
mirrorfunction:use LWP::Simple; mirror($URL, $local_filename);
Although closely related to thegetfunction discussed in Recipe 20.1, themirrorfunction doesn't download the file unconditionally. It adds theIf-Modified-Sinceheader to the GET request it creates, so the server does not transfer the file unless the file has been updated.Themirrorfunction mirrors only a single page, not a full tree. To mirror a set of pages, use this recipe in conjunction with Recipe 20.3. A good solution to mirroring an entire directory hierarchy can be found in the w3mir program, also found on CPAN, and the wget program from ftp.gnu.org.Be careful! It's possible (and easy) to write programs that run amok and begin downloading all web pages on the net. This is not only poor etiquette, it's also an infinite task, since some pages are dynamically generated. It could also get you into trouble with someone who doesn't want their pages downloaded en masse.The documentation for the CPAN module LWP::Simple; the HTTP specification athttp://www.w3.org/pub/WWW/Protocols/HTTP/Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Creating a Robot
- InhaltsvorschauYou want to create a script that navigates the Web on its own (i.e., a robot), and you'd like to respect the remote sites' wishes.Instead of writing your robot with LWP::UserAgent, use LWP::RobotUA instead:
use LWP::RobotUA; $ua = LWP::RobotUA->new('websnuffler/0.1', 'me@wherever.com');To avoid marauding robots and web crawlers hammering their servers, sites are encouraged to create a file with access rules called robots.txt. If you're fetching only one document, this is no big deal, but if your script fetches many documents from the same server, you could easily exhaust that site's bandwidth.When writing scripts to run around the Web, it's important to be a good net citizen: don't request documents from the same server too often, and heed the advisory access rules in their robots.txt file.The easiest way to handle this is to use the LWP::RobotUA module instead of LWP::UserAgent to create agents. This agent automatically knows to fetch data slowly when calling the same server repeatedly. It also checks each site's robots.txt file to see whether you're trying to grab a file that is off-limits. If you do, you'll get a response like this:403 (Forbidden) Forbidden by robots.txtHere's an example robots.txt file, fetched using the GET program that comes with the LWP module suite:% GET http://www.webtechniques.com/robots.txt User-agent: * Disallow: /stats Disallow: /db Disallow: /logs Disallow: /store Disallow: /forms Disallow: /gifs Disallow: /wais-src Disallow: /scripts Disallow: /config
A more interesting and extensive example is athttp://www.cnn.com/robots.txt. This file is so big, they even keep it under RCS control!Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Parsing a Web Server Log File
- InhaltsvorschauYou want to extract selected information from a web server log file.Pull apart the log file as follows:
while (<LOGFILE>) { my ($client, $identuser, $authuser, $date, $time, $tz, $method, $url, $protocol, $status, $bytes) = /^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.*?) (\S+)" (\S+) (\S+)$/; # ... }This regular expression pulls apart entries in Common Log Format, an informal standard that most web servers adhere to. The fields are listed in Table 20-1.Table 20-2: Common Log Format fields FieldMeaningclientIP address or hostname of browser's machineidentuserIf IDENT (RFC 1413) was used, what it returnedauthuserIf username/password authentication was used, whom they logged in asdateDate of request (e.g., 01/Mar/1997)timeTime of request (e.g., 12:55:36)tzTime zone (e.g., -0700)methodEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Processing Server Logs
- InhaltsvorschauYou need to summarize your server logs, but you don't have a customizable program to do it.Parse the error log yourself with regular expressions, or use the Logfile modules from CPAN.Example 20-9 is a sample report generator for an Apache weblog.Example 20-9. sumwww
#!/usr/bin/perl -w # sumwww - summarize web server log activity $lastdate = ""; daily_logs( ); summary( ); exit; # read CLF files and tally hits from the host and to the URL sub daily_logs { while (<>) { ($type, $what) = /"(GET|POST)\s+(\S+?) \S+"/ or next; ($host, undef, undef, $datetime) = split; ($bytes) = /\s(\d+)\s*$/ or next; ($date) = ($datetime =~ /\[([^:]*)/); $posts += ($type eq POST); $home++ if m, / ,; if ($date ne $lastdate) { if ($lastdate) { write_report( ) } else { $lastdate = $date } } $count++; $hosts{$host}++; $what{$what}++; $bytesum += $bytes; } write_report( ) if $count; } # use *typeglob aliasing of global variables for cheap copy sub summary { $lastdate = "Grand Total"; *count = *sumcount; *bytesum = *bytesumsum; *hosts = *allhosts; *posts = *allposts; *what = *allwhat; *home = *allhome; write; } # display the tallies of hosts and URLs, using formats sub write_report { write; # add to summary data $lastdate = $date; $sumcount += $count; $bytesumsum += $bytesum; $allposts += $posts; $allhome += $home; # reset daily data $posts = $count = $bytesum = $home = 0; @allwhat{keys %what} = keys %what; @allhosts{keys %hosts} = keys %hosts; %hosts = %what = ( ); } format STDOUT_TOP = @|||||||||| @|||||| @||||||| @||||||| @|||||| @|||||| @||||||||||||| "Date", "Hosts", "Accesses", "Unidocs", "POST", "Home", "Bytes" ----------- ------- -------- -------- ------- ------- -------------- . format STDOUT = @>>>>>>>>>> @>>>>>> @>>>>>>> @>>>>>>> @>>>>>> @>>>>>> @>>>>>>>>>>>>> $lastdate, scalar(keys %hosts), $count, scalar(keys %what), $posts, $home, $bytesum .Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Using Cookies
- InhaltsvorschauYou want to fetch web pages, but the server is using cookies to track you. For example, some sites use a cookie to remember that you've authenticated. If you don't send the right cookie, you'll never get past the login screen.Let LWP::UserAgent handle cookies for you. You can enable cookies for just this program run with:
$ua->cookie_jar({ });Or instead store cookies in a file between invocations with:$ua->cookie_jar({ file => "$ENV{HOME}/.cookies" });The default behavior of LWP::UserAgent is never to send aCookie: header, even when the server offers cookies in a response. To keep track of cookies LWP::UserAgent receives and send them when appropriate, provide the user agent object with a special "cookie jar" object to hold the cookies: an HTTP::Cookies object.Pass thecookie_jarmethod either an HTTP::Cookies object to use that object as the cookie jar, or else a hash reference whose contents go into a new HTTP::Cookies object.Without parameters, an HTTP::Cookies object keeps cookies in memory, so they're no longer available once your program exits. Thefileparameter in thecookie_jarmethod call specifies a filename to use for initializing the cookie jar and for saving updated or new cookies. This is how you give cookies a shelf life beyond a single run of your program.To disable cookies, callcookie_jarwith no parameters:$ua->cookie_jar( );
The documentation for the CPAN modules LWP::UserAgent and HTTP::CookieEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Fetching Password-Protected Pages
- InhaltsvorschauYou want to use LWP to fetch web pages or submit forms, but the web server requires authentication.Set the username and password for a particular realm with the user agent's
credentialsmethod:$ua->credentials('http://www.perlcabal.com/cabal_only/', 'Secret Perl Cabal Files', 'username' => 'password');To access pages protected by basic authentication, a browser must supply the username and password for the realm of the authentication. The realm is just a string that identifies which username and password the user must supply. Thecredentialsmethod tells the user agent to send the username and password for a particular realm.A somewhat kludgey solution is to specify URLs with the username and password in them:http://user:password@www.example.com/private/pages/
This is kludgey because links within the returned document do not have your username and password encoded in them. Solutions that rely entirely on URL-encoded usernames and passwords often quickly degenerate into code that wishes it had usedcredentialsto begin with.The documentation for the CPAN module LWP::UserAgentEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Fetching https:// Web Pages
- InhaltsvorschauYou want to work with a web server over a secure (SSL) connection. For example, you want to automate ordering supplies for your company from an online store, and the online store wisely protects its transactions with SSL.Install Crypt::SSLeay and
https: URLs automatically work with LWP. You do not need to reinstall LWP.When LWP sends a request to anhttpsserver, it checks whether there is a module installed to do SSL. The two modules that LWP can use are, in order of preference, Crypt::SSLeay and IO::Socket::SSL. Of the two, Crypt::SSLeay is the more fully featured, but requires the OpenSSL libraries fromhttp://www.openssl.org.The documentation for the CPAN module Crypt::SSLeay; the README.SSL file in the libwww-perl distributionEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Resuming an HTTP GET
- InhaltsvorschauYou have part of a file and want to download the rest without refetching the content you already have. For example, your initial download was interrupted, so you want to complete it.Use the HTTP 1.1
Rangeheader in your GET request:use LWP; $have = length($file); $response = $ua->get($URL, 'Range', "bytes=$have-"); # $response->content hold the rest of the fileTheRangeheader lets you specify which bytes to fetch. The 0th byte is the first in the file, so the range "bytes=0-" fetches the whole file.You can also specify a range with two endpoints: "0-25", for example, fetches the first 26 bytes of the file. If you want to fetch an interior range, use "26-99".Some servers don't support ranges, even though they claim to understand HTTP 1.1. In this case you'll be sent the whole file, not the range you asked for. To detect this, use HEAD to see the size of the file and then use a GET with a range to fetch the rest. If the content in the GET response is the same length as the original file, your range was ignored.Here is the full list of ranges possible in the HTTP 1.1 specification:[start]-From start on (inclusive)[start]-[end]From start to end (inclusive)-[num]The last num bytesEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Parsing HTML
- InhaltsvorschauYou need to extract complex information from a web page or pages. For example, you want to extract news stories from web sites like
CNN.comornews.bbc.co.uk.Use regular expressions for data that's well identified:# story is everything from <!-- story --> to <!-- /story --> if ($html =~ m{<!-- story -->(.*?)<!-- /story -->}s) { my $story = $1; # ... } else { warn "No story found in the page"; }But for tables and data identifiable only by complex patterns of HTML, use a parser:use HTML::TokeParser; my $parser = HTML::TokeParser->new($FILENAME) or die "Can't open $FILENAME: $!\n"; while (my $token = $parser->get_token( )) { my $type = $token->[0]; if ($type eq 'S') { ... } # start tag elsif ($type eq 'E') { ... } # end tag elsif ($type eq 'T') { ... } # text elsif ($type eq 'C') { ... } # comment elsif ($type eq 'D') { ... } # declaration elsif ($type eq 'PI') { ... } # processing instruction else { die "$type isn't a valid HTML token type" } }Regular expressions are a convenient way to extract information from HTML. However, as the complexity of the HTML and the amount of information to be extracted go up, the maintainability of the regular expressions goes down. For a few well-defined fields, regular expressions are fine. For anything else, use a proper parser.As an example of processing HTML with regular expressions, let's get the list of recent O'Reilly book releases. The list is found onhttp://www.oreilly.com/catalog/new.html, but there's also a navigation bar and a list of upcoming releases, so we can't simply extract all links.The relevant HTML from the page looks like this:<!-- New titles --> <h3>New Titles</h3> <ul><li><a href="netwinformian/">.NET Windows Forms in a Nutshell</a> <em>(March)</em></li><li><a href="actscrptpr/"> ActionScript for Flash MX Pocket Reference</a> <em>(March)</em> </li><li><a href="abcancer/">After Breast Cancer</a> <em>(March) ... <li><a href="samba2/">Using Samba, 2nd Edition</a> <em>(February) </em></li><li><a href="vbscriptian2/">VBScript in a Nutshell, 2nd Edition</a> <em>(March)</em></li><li><a href="tpj2/">Web, Graphics & Perl/Tk</a> <em>(March)</em></li></ul></td> <td valign="top"> <!-- Upcoming titles -->
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Extracting Table Data
- InhaltsvorschauYou have data in an HTML table, and you would like to turn that into a Perl data structure. For example, you want to monitor changes to an author's CPAN module list.Use the HTML::TableContentParser module from CPAN:
use HTML::TableContentParser; $tcp = HTML::TableContentParser->new; $tables = $tcp->parse($HTML); foreach $table (@$tables) { @headers = map { $_->{data} } @{ $table->{headers} }; # attributes of table tag available as keys in hash $table_width = $table->{width}; foreach $row (@{ $tables->{rows} }) { # attributes of tr tag available as keys in hash foreach $col (@{ $row->{cols} }) { # attributes of td tag available as keys in hash $data = $col->{data}; } } }The HTML::TableContentParser module converts all tables in the HTML document into a Perl data structure. As with HTML tables, there are three layers of nesting in the data structure: the table, the row, and the data in that row.Each table, row, and data tag is represented as a hash reference. The hash keys correspond to attributes of the tag that defined that table, row, or cell. In addition, the value for a special key gives the contents of the table, row, or cell. In a table, the value for therowskey is a reference to an array of rows. In a row, thecolskey points to an array of cells. In a cell, thedatakey holds the HTML contents of the data tag.For example, take the following table:<table width="100%" bgcolor="#ffffff"> <tr> <td>Larry & Gloria</td> <td>Mountain View</td> <td>California</td> </tr> <tr> <td><b>Tom</b></td> <td>Boulder</td> <td>Colorado</td> </tr> <tr> <td>Nathan & Jenine</td> <td>Fort Collins</td> <td>Colorado</td> </tr> </table>Theparsemethod returns this data structure:[ { 'width' => '100%', 'bgcolor' => '#ffffff', 'rows' => [ { 'cells' => [ { 'data' => 'Larry & Gloria' }, { 'data' => 'Mountain View' }, { 'data' => 'California' }, ], 'data' => "\n " }, { 'cells' => [ { 'data' => '<b>Tom</b>' }, { 'data' => 'Boulder' }, { 'data' => 'Colorado' }, ], 'data' => "\n " }, { 'cells' => [ { 'data' => 'Nathan & Jenine' }, { 'data' => 'Fort Collins' }, { 'data' => 'Colorado' }, ], 'data' => "\n " } ] } ]Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: htmlsub
- InhaltsvorschauThis program makes substitutions in HTML files so changes happen only in normal text. If you had the file scooby.html that contained:
<HTML><HEAD><TITLE>Hi!</TITLE></HEAD> <BODY><H1>Welcome to Scooby World!</H1> I have <A HREF="pictures.html">pictures</A> of the crazy dog himself. Here's one!<P> <IMG SRC="scooby.jpg" ALT="Good doggy!"><P> <BLINK>He's my hero!</BLINK> I would like to meet him some day, and get my picture taken with him.<P> P.S. I am deathly ill. <A HREF="shergold.html">Please send cards</A>. </BODY></HTML>
you could use htmlsub to change every occurrence of the word "picture" in the document text to read "photo". It prints the new document onSTDOUT:% htmlsub picture photo scooby.html <HTML><HEAD><TITLE>Hi!</TITLE></HEAD> <BODY><H1>Welcome to Scooby World!</H1> I have <A HREF="pictures.html">photos</A> of the crazy dog himself. Here's one!<P> <IMG SRC="scooby.jpg" ALT="Good doggy!"><P> <BLINK>He's my hero!</BLINK> I would like to meet him some day and get my photo taken with him.<P> P.S. I am deathly ill. <A HREF="shergold.html">Please send cards</A> </BODY></HTML
The program is shown in Example 20-12.Example 20-12. htmlsub#!/usr/bin/perl -w # htmlsub - make substitutions in normal text of HTML files # from Gisle Aas <gisle@aas.no> sub usage { die "Usage: $0 <from> <to> <file>...\n" } my $from = shift or usage; my $to = shift or usage; usage unless @ARGV; # Build the HTML::Filter subclass to do the substituting. package MyFilter; use HTML::Filter; @ISA=qw(HTML::Filter); use HTML::Entities qw(decode_entities encode_entities); sub text { my $self = shift; my $text = decode_entities($_[0]); $text =~ s/\Q$from/$to/go; # most important line $self->SUPER::text(encode_entities($text)); } # Now use the class. package main; foreach (@ARGV) { MyFilter->new->parse_file($_); }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Program: hrefsub
- Inhaltsvorschauhrefsub makes substitutions in HTML files, so changes apply only to text in
<AHREF="..." >tags. For instance, if you had the scooby.html file from the previous recipe, and you've moved shergold.html to be cards.html, you need but say:% hrefsub shergold.html cards.html scooby.html <HTML><HEAD><TITLE>Hi!</TITLE></HEAD> <BODY><H1>Welcome to Scooby World!</H1> I have <A HREF="pictures.html">pictures</A> of the crazy dog himself. Here's one!<P> <IMG SRC="scooby.jpg" ALT="Good doggy!"><P <BLINK>He's my hero!</BLINK> I would like to meet him some day and get my picture taken with him.<P> P.S. I am deathly ill. <a href="cards.html">Please send cards</A> </BODY></HTML>
The HTML::Filter manual page has a BUGS section that says:Comments in declarations are removed from the declarations and then inserted as separate comments after the declaration. If you turn onstrict_comment( ), then comments with embedded "-\|-" are split into multiple comments.This version of hrefsub (shown in Example 20-13) always lowercases theaand the attribute names within this tag when substitution occurs. If$foois a multiword string, then the text given toMyFilter->textmay be broken such that these words do not come together; i.e., the substitution does not work. There should probably be a new option to HTML::Parser to make it not return text until the whole segment has been seen. Also, some people may not be happy with having their 8-bit Latin-1 characters replaced by ugly entities, so htmlsub does that, too.Example 20-13. hrefsub#!/usr/bin/perl -w # hrefsub - make substitutions in <A HREF="..."> fields of HTML files # from Gisle Aas <gisle@aas.no> sub usage { die "Usage: $0 <from> <to> <file>...\n" } my $from = shift or usage; my $to = shift or usage; usage unless @ARGV; # The HTML::Filter subclass to do the substitution. package MyFilter; use HTML::Filter; @ISA=qw(HTML::Filter); use HTML::Entities qw(encode_entities); sub start { my($self, $tag, $attr, $attrseq, $orig) = @_; if ($tag eq 'a' && exists $attr->{href}) { if ($attr->{href} =~ s/\Q$from/$to/g) { # must reconstruct the start tag based on $tag and $attr. # wish we instead were told the extent of the 'href' value # in $orig. my $tmp = "<$tag"; for (@$attrseq) { my $encoded = encode_entities($attr->{$_}); $tmp .= qq( $_="$encoded "); } $tmp .= ">"; $self->output($tmp); return; } } $self->output($orig); } # Now use the class. package main; foreach (@ARGV) { MyFilter->new->parse_file($_); }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 21: mod_perl
- InhaltsvorschauSpeed is good only when wisdom leads the way.—James PoeThe mod_perl project (
http://perl.apache.org/) integrates Perl with the Apache web server. That way, you can use Perl to configure Apache, manipulate and respond to requests, write to log files, and much more.Most people begin using mod_perl to avoid the performance penalty of CGI. With CGI programs, the web server starts a separate process for each request. This can be a costly business on most operating systems, with lots of kernel data structures to copy and file I/O to load the new process's binary. If you serve a lot of requests, the operating system may be unable to keep up with the demand for new processes, leaving your web server (and indeed the whole machine) unresponsive.By embedding the Perl interpreter within the Apache process, mod_perl removes the need to start a separate process to generate dynamic content. Indeed, the Apache::Registry and Apache::PerlRun modules provide a CGI environment within this persistent Perl interpreter (and form the basis of Recipe 21.12). This gives you an immediate performance boost over CGI (some report 10-100x performance) but doesn't take full advantage of the integration of Perl with Apache. For that, you need to write your own handlers.Because Apache has access to Perl at every step as it processes a request (and vice versa), you can write code (handlers) for every phase of a request-response cycle. There are 13 phases for which you can write handlers, and each phase has a default handler (so you don't have to install a handler for every phase).You must do three things to install a handler for a specific phase: write the code, load the code into mod_perl, and tell mod_perl to call the code.Handlers are simply subroutines. They're passed an Apache request object as the first argument, and through that object they can learn about the request, change Apache's information about the request, log errors, generate the response, and more. The return value of a handler determines whether the current phase continues with other handlers, the current phase ends successfully and execution proceeds to the next phase, or the current phase ends with an error. The return values are constants from the Apache::Constants module.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauThe mod_perl project (
http://perl.apache.org/) integrates Perl with the Apache web server. That way, you can use Perl to configure Apache, manipulate and respond to requests, write to log files, and much more.Most people begin using mod_perl to avoid the performance penalty of CGI. With CGI programs, the web server starts a separate process for each request. This can be a costly business on most operating systems, with lots of kernel data structures to copy and file I/O to load the new process's binary. If you serve a lot of requests, the operating system may be unable to keep up with the demand for new processes, leaving your web server (and indeed the whole machine) unresponsive.By embedding the Perl interpreter within the Apache process, mod_perl removes the need to start a separate process to generate dynamic content. Indeed, the Apache::Registry and Apache::PerlRun modules provide a CGI environment within this persistent Perl interpreter (and form the basis of Recipe 21.12). This gives you an immediate performance boost over CGI (some report 10-100x performance) but doesn't take full advantage of the integration of Perl with Apache. For that, you need to write your own handlers.Because Apache has access to Perl at every step as it processes a request (and vice versa), you can write code (handlers) for every phase of a request-response cycle. There are 13 phases for which you can write handlers, and each phase has a default handler (so you don't have to install a handler for every phase).You must do three things to install a handler for a specific phase: write the code, load the code into mod_perl, and tell mod_perl to call the code.Handlers are simply subroutines. They're passed an Apache request object as the first argument, and through that object they can learn about the request, change Apache's information about the request, log errors, generate the response, and more. The return value of a handler determines whether the current phase continues with other handlers, the current phase ends successfully and execution proceeds to the next phase, or the current phase ends with an error. The return values are constants from the Apache::Constants module.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Authenticating
- InhaltsvorschauYou want to verify the username and password supplied by users who are authenticating themselves.Get the password with
$r->get_basic_auth_pw, and the username with$r->connection->user. Indicate success by returning OK. Indicate failure by calling$r->note_basic_auth_failureand returning AUTH_REQUIRED.package Your::Authentication::Package; use Apache::Constants ':common'; sub handler { my $r = shift; return OK unless $r->is_main; # skip for subrequests my ($res, $sent_pw) = $r->get_basic_auth_pw; if ($res != OK) { $r->note_basic_auth_failure; return $res; } my $user = $r->user; # check username and password, setting $failed if they don't match if ($failed) { $r->note_basic_auth_failure; return AUTH_REQUIRED; } return OK; }Install the handler for a directory or set of files with:# the realm AuthName "Holiday Photos" # next line shouldn't be changed AuthType Basic PerlAuthenHandler Your::Authentication::Package require valid-user
The realm is what the user sees when their browser prompts for a username and password. If you set the realm to "Holiday Photos", the user is prompted to "enter username and password for Holiday Photos". You need at least onerequiredirective to trigger the call to the authentication handler.When you invoke$r->get_basic_auth_pw, Apache processes any authentication information sent by the client. Therefore you can't call$r->userbefore you call$r->get_basic_auth_pw(well, you can, but you won't get anything back).The call to$r->get_basic_auth_pwreturns two values, a status code and a password. If the status is OK, the browser agreed to authenticate and provided information. If the status is DECLINED, either the area isn't protected by basic authentication or there's no AuthType specified in httpd.conf. If the status is SERVER_ERROR, there's no realm defined for this area. If the status is AUTH_REQUIRED, the browser mangled or omitted basic authentication. If you decide to return AUTH_REQUIRED, first callEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Setting Cookies
- InhaltsvorschauYou want to send a cookie to a client as part of a response.Use the Apache::Cookie module from CPAN. From within your content handler, create a new cookie and attach it to the outgoing headers:
use Apache::Cookie; $cookie = Apache::Cookie->new($r, -name => "cookie name", -value => "its value", -expires => "+1d" ); $cookie->bake;Don't forget to send the headers before generating content:$r->send_http_header; $r->print("...");The Apache::Cookie module builds a string that represents a cookie. To specify an expiration time for your cookie, use one of these formats:+30s30 seconds from now+10m10 minutes from now+1h1 hour from now-1d1 day agonowNow+3MThree months from nowEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Accessing Cookie Values
- InhaltsvorschauYou want to examine values client sent you in a cookie.Use the CPAN module Apache::Cookie to populate a hash of cookie objects derived from the header sent by the client.
use Apache::Cookie; $ac = Apache::Cookie->new($r); %all_cookies = $ac->parse( );
Now each element of that hash is an object representing a single cookie:$one_cookie = $all_cookies{COOKIE_NAME};Interrogate the object to learn about that cookie's values:$one_cookie->value( ) $one_cookie->name( ) $one_cookie->domain( ) $one_cookie->path( ) $one_cookie->expires( ) $one_cookie->secure( )
To test whether a cookie was sent by the browser, useexistson the hash element:unless (exists $all_cookies{chocolate}) { $r->header_out(Location => "http://www.site.com/login"); return REDIRECT; }Don't simply test for truth:unless ($all_cookies{chocolate}) { # BADValid cookie values include the empty string and 0, both false to Perl. See the Introduction to Chapter 1 for more.The CGI::Cookie module is a pure Perl substitute for Apache::Cookie. Its strategy for getting a hash of cookies is slightly different from that of Apache::Cookies:use CGI::Cookie; %all_cookies = CGI::Cookie->fetch;
This hash of cookies works the same as the one by Apache::Cookie.Writing Apache Modules with Perl and C; Recipe 20.14; Recipe 3.7 in mod_perl Developer's Cookbook; the Apache.pm manpage; documentation for the CGI::Cookie and Apache::Cookie modules from CPANEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Redirecting the Browser
- InhaltsvorschauYou want to send a redirection back to the browser.Use
$r->header_outto set the Location header, then return REDIRECT:$r->header_out(Location => "http://www.example.com/somewhere"); return REDIRECT;
If you set the Location header and return REDIRECT, the client knows the address of the new page. This is called an external redirection, because the browser (external to the web server) handles the mechanics of requesting the new page. The URL should be a complete URL (withhttp, etc.), never a partial one.An internal redirection is one where Apache sends back another page from the same site. The browser never knows that the page has changed, which means relative URLs from the page could be broken. Request an internal redirection with:$r->internal_redirect($new_partial_url); return OK;
Apache treats internal redirections almost as though they were new requests: each phase of the request cycle is called again for the new request. Unlike theLocationheader,internal_redirecttakes only a partial URL. You should have no logic after callinginternal_redirectother than to return OK.Writing Apache Modules with Perl and C; Recipe 19.7; Recipe 8.5 in mod_perl Developer's Cookbook; the Apache.pm manpageEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Interrogating Headers
- InhaltsvorschauYou want to learn the value of a header sent by the client.Use the
$r->header_inmethod:$value = $r->header_in("Header-name");For example, suppose you want to discover the client's preferred language (as sent in the Accept-Language header).if ($r->header_in("Accept-Language") !~ /\ben-US\b/i) { $r->print("No furriners!"); return OK; }If you want to access more than one header, use the$r->headers_inmethod, which returns a list of key-value pairs of all clients' request headers, which are typically assigned to a hash:%h = $r->headers_in; if ($h{"Accept-Language"} !~ /\ben-US\b/i) { $r->print("No furriners!"); return OK; }Writing Apache Modules with Perl and C; Recipe 3.4 in mod_perl Developer's Cookbook; the Apache.pm manpageEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Accessing Form Parameters
- InhaltsvorschauYou want the values for form fields submitted by the client.To access the form's various parameters, use
$r->contentto access POSTed parameters and$r->argsto access GET parameters encoded in the URL.%post_parameters = $r->content; %get_parameters = $r->args;
You can call$r->contentonly once per request because the first call consumes all POSTed data.The Apache::Request module from CPAN gives you a$r->parammethod to access specific parameters, regardless of whether they're from GET or POST:use Apache::Request; sub handler { my $r = Apache::Request->instance(shift); my @param_names = $r->param; my $value = $r->param("username"); # single value my @values = $r->param("toppings"); # multiple values # ... }Processing form parameters without Apache::Request is problematic with values that occur multiple times. For example, aSELECTlist withMULTIPLEenabled sends repeated entries for the same parameter name. Putting them into a hash preserves only one of those entries. Apache::Request solves this problem by accumulating multiply-submitted parameters in an array.Form parameters POSTed to your handler can be a problem. The nature of Apache is that once one handler reads the POSTed data, another handler cannot come along later and reread that same information. So if you're going to process POSTed form parameters, you had better keep the decoded parameters around in case another handler wants to access them. Theinstanceconstructor handles this for us. When two handlers both call theinstanceconstructor, the second handler gets back the Apache::Request object populated by the first, with form parameters already decoded.The Apache::Request$r->paraminterface is based on the CGI module's parameter-parsing interface.The Apache.pm manpage;Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Receiving Uploaded Files
- InhaltsvorschauYou want a mod_perl handler that processes an uploaded file. For example, an image gallery might let the owner upload image files to the gallery.Use the Apache::Request module's
$r->uploadand$r->parammethods from within your handler (assuming the file upload field was calledfileParam):use Apache::Request; my $TEN_MEG = 10 * 2 ** 20; # 10 megabytes sub handler { my $r = Apache::Request->new(shift, DISABLE_UPLOADS => 0, POST_MAX => $TEN_MEG); $r->parse; my $uploaded_file = $r->upload("fileParam"); my $filename = $uploaded_file->filename; # filename my $fh = $uploaded_file->fh; # filehandle my $size = $uploaded_file->size; # size in bytes my $info = $uploaded_file->info; # headers my $type = $uploaded_file->type; # Content-Type my $tempname = $uploaded_file->tempname; # temporary name # ... }By default, Apache::Request won't process uploaded file data. This is because the file is read into memory, which might not be released to the operating system once the request is over. If you do enable uploaded files (by settingDISABLE_UPLOADSto false), set an upper limit on the size of the file you will accept. This prevents a malicious attacker from sending an infinite stream of data and exhausting your system's memory. ThePOST_MAXvalue (10M in the Solution code) is that maximum value, specified in bytes.The$r->uploadmethod processes the POSTed file data and returns an Apache::Upload object. This object has the following methods for accessing information on the uploaded file:MethodReturnsEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Speeding Up Database Access
- InhaltsvorschauYou use the DBI module within your mod_perl handler, but connecting to the database server for each request is slowing down your web application unacceptably.To cache database connections transparently, load the Apache::DBI module before the DBI module:
use Apache::DBI; use DBI;
Many sites load the Apache::DBI module from the httpd.conf file to ensure it's loaded before anything else:PerlModule Apache::DBI
The Apache::DBI module intercepts theDBI->connectmethod, returning a previously opened handle if the handle had the same connection parameters as the current request. The module also prevents$dbh->disconnectfrom closing connections. This lets you adduse Apache::DBIto the start of an existing program without having to touch the rest of your code.The Apache::DBI module uses an open database connection for each different database login in each Apache child process. You might need to change your database server's configuration to increase its maximum number of connections. With commercial database systems, you might even need to buy more client licenses.This proliferation of connections can lead to situations where Apache::DBI isn't the best choice. For example, if each user of your site has his own database login, you'll need as many concurrent database connections as the number of active users multiplied by however many httpd processes are running worth of database connections, which might well be more than your server supports! Similarly, if you have many Apache child processes running concurrently, this could open more simultaneous database connections than your server supports.One strategy for optimizing database access is to batch requests where possible. For example, if you're logging to a database, consider accumulating log information and updating the database only after every 5 or 10 hits.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Customizing Apache's Logging
- InhaltsvorschauYou want to change how Apache logs requests. For example, you want a database of URLs and access counts, or per-user logs.Install a handler with PerlLogHandler:
PerlModule Apache::MyLogger PerlLogHandler Apache::MyLogger
Within the handler, methods on the request object obtain information about the completed request. In the following code,$ris the request object and$cis the connection object obtained from$r->connection:$r->the_request GET /roast/chickens.html HTTP/1.1 $r->uri /roast/chickens.html $r->header_in("User-Agent") Mozilla-XXX $r->header_in("Referer") http://gargle.com/?search=h0t%20chix0rz $r->bytes_sent 1648 $c->get_remote_host 208.201.239.56 $r->status_line 200 OK $r->server_hostname www.myserver.comApache calls logging handlers after sending the response to the client. You have full access to the request and response parameters, such as client IP address, headers, status, and even content. Access this information through method calls on the request object.You'll probably want to escape values before writing them to a text file because spaces, newlines, and quotes could spoil the formatting of the files. Two useful functions are:# return string with newlines and double quotes escaped sub escape { my $a = shift; $a =~ s/([\n\"])/sprintf("%%%02x", ord($1))/ge; return $a; } # return string with newlines, spaces, and double quotes escaped sub escape_plus { my $a = shift; $a =~ s/([\n \"])/sprintf("%%%02x", ord($1))/ge; return $a; }Two prebuilt logging modules on CPAN are Apache::Traffic and Apache::DBILogger. Apache::Traffic lets you assign owner strings (either usernames, UIDs, or arbitrary strings) to your web server's directories inEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Transparently Storing Information in URLs
- InhaltsvorschauYou want to store information like session IDs in the URL, but you don't want to figure out to how work around the extra data when constructing relative URLs.Store the ID at the start of the URL:
http://www.example.com/ID/12345678/path/to/page
Extract it with a PerlTransHandler, and store it in a pnote, a hash entry accessible by other Perl handlers in this request:sub trans { my $r = shift; my $uri = $r->uri( ); if ($uri =~ s{/ID/(\d{8})}{ }) { $r->pnotes("ID", $1); } $r->uri($uri); return DECLINED; }Restore the URL in a PerlFixupHandler:sub fixup { my $r = shift; my $id = $r->pnotes("ID"); if ($id) { $r->uri("/ID/$id" . $r->uri); } return DECLINED; }Consult the pnote in the content handler:use Apache::URI; sub content { my $r = shift; my $id = $r->pnotes("ID"); unless ($id) { join(('', map { int rand 10 } (1..8)); my $uri = Apache::URI->parse($r); $uri->path("ID/$id" . $uri->path); $r->header_out(Location => $uri->unparse); return REDIRECT; } # use $id return OK; }The client thinks your pages have a URL like http://www.example.com/ID/12345678/path/to/page.html. Your PerlTransHandler intercepts the incoming request and removes the /ID/12345678 part before Apache tries to translate the request into a file location. Just before your content handler runs, your PerlFixupHandler reinserts the ID. When your content handler calls$r->uri, it gets a URI that includes the ID.We returned DECLINED from our PerlTransHandler and PerlFixupHandler to indicate that any other translation or fixup handlers that were installed should also be run. If we returned OK in the PerlTransHandler, Apache would not call any subsequent translation handlers. In PerlFixupHandlers, DECLINED and OK both mean a successful fixup, and that other fixup handlers should also run.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Communicating Between mod_perl and PHP
- InhaltsvorschauYou want to build your site from both mod_perl and PHP. For example, you might want to use mod_perl for authentication and logging, while PHP generates the actual content. However, doing so means that Perl and PHP must share values; for example, so the PHP content handler knows which username successfully authenticated through mod_perl.Use Apache notes. From Perl, you simply say:
$main = $r->main || $r; $main->notes($KEY => $VALUE); $VALUE = $main->notes($KEY);
From PHP, you say:apache_note($KEY, $VALUE); $VALUE = apache_note($KEY);
A note is a string value attached to an Apache request. They're a perfect way to pass information between handlers, even when those handlers are written in different programming languages. Each request has a different set of notes, so from Perl always identify the main request and use it to communicate with PHP code.Don't confuse the$r->notesmethod with the$r->pnotesmethod. The latter is only available to Perl modules.Recipe 21.10Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Migrating from CGI to mod_perl
- InhaltsvorschauYour CGI script is called so often that your web server's performance deteriorates unacceptably. You'd like to use mod_perl to make things faster.Use Apache::Registry or Apache::PerlRun:
PerlModule Apache::Registry # or Apache::PerlRun PerlModule CGI PerlSendHeader On Alias /perl/ /real/path/to/perl/scripts/ <Location /perl> SetHandler perl-script PerlHandler Apache::Registry # or Apache::PerlRun Options ExecCGI </Location>
The Solution tells Apache that requests with URLs starting in /perl/ are in /real/path/to/perl/scripts/ and that Apache::Registry handles them. This module runs them in a CGI environment.PerlModuleCGIpreloads the CGI module, andPerlSendHeaderOnmakes most CGI scripts work out of the box withmod_perl.We have configured /perl/ to work analogously to /cgi-bin/. To make the suffix .perl indicatemod_perlCGI scripts, just as the suffix .cgi indicates regular CGI scripts, use the following in your Apache configuration file:<Files *.perl> SetHandler perl-script PerlHandler Apache::Registry Options ExecCGI </Files>
Because the Perl interpreter that runs your CGI script doesn't shut down when your script is done, as would occur when the web server runs your script as a separate program, you cannot rely on global variables being undefined when the script runs repeatedly. Thewarningsandstrictpragmas check for many bad habits in these kinds of scripts. There are other gotchas, too—see the mod_perl_traps manpage.The Apache::PerlRun handler can work around some of these traps. This is like Apache::Registry, but doesn't cache the compiled module. If your CGI program is sloppy and doesn't initialize variables or close filehandles, you can still gain speed by not starting a new process for every request. To use it, substitute Apache::PerlRun for Apache::Registry.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Sharing Information Between Handlers
- InhaltsvorschauYou want to share information between handlers, but global variables are global to a process and not automatically cleaned up after every request.Use Apache pnotes (Perl notes):
# in one handler $r->pnotes("Name", $name); # in another handler $name = $r->pnotes("Name");Apache modules communicate with each other using notes (see Recipe 21.11). Apache notes act like a hash attached to a request—one handler stores a value for a key in the hash, so that another handler can read it later. The Perl notes features is also a hash attached to the request object, but it's only for the Perl handlers.To set a pnote, pass a key and a value to the$r->pnotesmethod. To retrieve a pnote, pass only the key. You can store complex data structures:$r->pnotes("Person", { Name => "Nat", Age => 30, Kids => 2 }); # later $person = $r->pnotes("Person");and even objects:$person = new Person; $person->name("Nat"); $person->age(30); $person->kids(2); $r->pnotes(Person => $person); # later $person = $r->pnotes("Person"); # $person is a reference to the same objectRecipe 8.11 in mod_perl Developer's Cookbook; Apache::Table; pnotes method in the Apache manpageEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reloading Changed Modules
- InhaltsvorschauYou've updated your mod_perl modules, but you have to restart the web server for Apache to notice the change.Use Apache::StatINC (standard with mod_perl) to automatically reload any code when it changes on disk:
PerlModule Apache::StatINC PerlInitHandler Apache::StatINC
Or use the CPAN module Apache::Reload to limit the monitoring to specific modules:PerlModule Apache::Reload PerlInitHandler Apache::Reload PerlSetVar ReloadAll Off PerlSetVar ReloadModules "Example::One Example::Two Example::Three"
Apache::Reload includes the functionality of Apache::StatINC. Simply saying:PerlModule Apache::Reload PerlInitHandler Apache::Reload
is enough to duplicate the functionality of Apache::StatINC. That is, at the start of each request, Apache::Reload goes through all currently loaded modules, checking timestamps to see which have changed. Because checking every module on every request is a burden on popular sites, Apache::Reload also lets you specify which modules to check and reload.The documentation for the Apache::StatINC and Apache::Reload modules; Recipe 8.1 in mod_perl Developer's Cookbook; the mod_perl guide athttp://perl.apache.org/guideEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Benchmarking a mod_perl Application
- InhaltsvorschauYou have an idea to speed up your application, but you're not sure whether your change will help.Use Apache::Timeit to time how long your content handler takes to run:
PerlModule Apache::Timeit PerlFixupHandler Apache::Timeit
For more detailed analysis, use the Apache::DProf module available from CPAN:PerlModule Apache::DProf
The Apache::Timeit module is available athttp://perl.apache.org/dist/contrib/Timeit.pm.This module records in the error log the amount of time that elapsed while the content handler ran. By scanning the logs and averaging these numbers, you can see which pages take longer to generate, and you can then start to figure out why.To drill down into your code and figure out which parts of the content handler are taking the most time, use the Apache::DProf module. This module connects the standard (as of v5.8) Devel::DProf module to Apache and mod_perl.The profiler records time spent in every subroutine the Perl module executes. The record of times is written to a file named dprof/ $$ /tmon.out ($$ is the process ID of the Apache child process), located under the ServerRoot directory. This file corresponds to every Perl subroutine encountered during the lifetime of the Apache child process. To profile just a single request, set the MaxRequestsPerChild directive in the httpd.conf file:MaxRequestsPerChild 1
You must create and chmod this directory yourself:cd $APACHE_SERVER_ROOT mkdir logs/dprof chmod 777 logs/dprof
To analyze the output, use the dprofpp program:dprofpp -r dprof/13169/tmon.out Total Elapsed Time = 89.93962 Seconds Real Time = 89.93962 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Templating with HTML::Mason
- InhaltsvorschauYou want to separate presentation (HTML formatting) from logic (Perl code) in your program. Your web site has a lot of components with only slight variations between them. You'd like to abstract out common elements and build your pages from templates without having a lot of "if I'm in this page, then print this; else if I'm in some other page . . . " conditional statements in a single master template.Use HTML::Mason components and inheritance.HTML::Mason (also simply called Mason) offers the power of Perl in templates. The basic unit of a web site built with Mason is the component—a file that produces output. The file can be HTML, Perl, or a mixture of both. Components can take arguments and execute arbitrary Perl code. Mason has many features, documented at
http://masonhq.comand in Embedding Perl in HTML with Mason by Dave Rolsky and Ken Williams (O'Reilly; online athttp://masonbook.com).Mason works equally well with CGI, mod_perl, and non-web programs. For the purposes of this recipe, however, we look at how to use it with mod_perl. The rest of this recipe contains a few demonstrations to give you a feel for what you can do with Mason and how your site will be constructed. There are more tricks, traps, and techniques for everything we discuss, though, so be sure to visit the web site and read the book for the full story.Section 21.16.3.1: Configuration
Install the HTML-Mason distribution from CPAN and add the following to your httpd.conf:PerlModule HTML::Mason::ApacheHandler <Location /mason> SetHandler perl-script PerlHandler HTML::Mason::ApacheHandler DefaultType text/html </Location>
This tells mod_perl that every URL that starts with/masonis handled by Mason. So if you request/mason/hello.html, the file mason/hello.htmlEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Templating with Template Toolkit
- InhaltsvorschauYou want to separate presentation (HTML formatting) from logic (Perl code) in your program. You want designers and other people who don't speak Perl to be able to edit the templates.Use the Template Toolkit and Apache::Template.The Template Toolkit (TT2) is a general templating system that can be used not just for web pages, but for any kind of templated text. The Apache::Template module is an Apache content handler that uses TT2 to build the returned page. The biggest benefit of TT2 is that it has a simple language for variables, loops, and data structures, which can be used instead of Perl for presentation logic. This simple language can be read and written by people who don't know Perl.This recipe documents Version 2 of the Template Toolkit. As with HTML::Mason, there's far more to TT2 that we can possibly cover here. This recipe is just a tour of some of the highlights of TT2's syntax and functionality. The Template Toolkit is well documented at
http://www.template-toolkit.org, and in the upcoming book Perl Template Toolkit, by Darren Chamberlain, Dave Cross, and Andy Wardley (O'Reilly).Section 21.17.3.1: Configuration
Install the Template and Apache::Template modules from CPAN. Add this to your httpd.conf file:PerlModule Apache::Template TT2EvalPerl On TT2Params all TT2IncludePath /usr/local/apache/htdocs/tt2 <Location /tt2> SetHandler perl-script PerlHandler Apache::Template DefaultType text/html </Location>
The TT2EvalPerl directive lets us embed Perl code in our templates as well as the TT2 language. TT2Params tells Apache::Template to give our templates access to form parameters, Apache environment variables, notes, cookies, and more. TT2IncludePath tells the Template Toolkit where to look for templates that our templates include. Finally, we designate theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Chapter 22: XML
- InhaltsvorschauI am a little world made cunningly. Of elements, and an angelic sprite—John Donne, Holy SonnetsThe Extensible Markup Language (XML) standard was released in 1998. It quickly became the standard way to represent and exchange almost every kind of data, from books to genes to function calls.XML succeeded where other past "standard" data formats failed (including XML's ancestor, SGML—the Standard Generalized Markup Language). There are three reasons for XML's success: it is text-based instead of binary, it is simple rather than complex, and it has a superficial resemblance to HTML.
- Text
-
Unix realized nearly 30 years before XML that humans primarily interact with computers through text. Thus text files are the only files any system is guaranteed to be able to read and write. Because XML is text, programmers can easily make legacy systems emit XML reports.
- Simplicity
-
As we'll see, a lot of complexity has arisen around XML, but the XML standard itself is very simple. There are very few things that can appear in an XML document, but from those basic building blocks you can build extremely complex systems.
- HTML
-
XML is not HTML, but XML and HTML share a common ancestor: SGML. The superficial resemblance meant that the millions of programmers who had to learn HTML to put data on the web were able to learn (and accept) XML more easily.
Example 22-1 shows a simple XML document.Example 22-1. Simple XML document<?xml version="1.0" encoding="UTF-8"?> <books> <!-- Programming Perl 3ed --> <book id="1"> <title>Programming Perl</title> <edition>3</edition> <authors> <author> <firstname>Larry</firstname> <lastname>Wall</lastname> </author> <author> <firstname>Tom</firstname> <lastname>Christiansen</lastname> </author> <author> <firstname>Jon</firstname> <lastname>Orwant</lastname> </author> </authors> <isbn>0-596-00027-8</isbn> </book> <!-- Perl & LWP --> <book id="2"> <title>Perl & </title> <edition>1</edition> <authors> <author> <firstname>Sean</firstname> <lastname>Burke</lastname> </author> </authors> <isbn>0-596-00178-9</isbn> </book> <book id="3"> <!-- Anonymous Perl --> <title>Anonymous Perl</title> <edition>1</edition> <authors /> <isbn>0-555-00178-0</isbn> </book> </books>Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Introduction
- InhaltsvorschauThe Extensible Markup Language (XML) standard was released in 1998. It quickly became the standard way to represent and exchange almost every kind of data, from books to genes to function calls.XML succeeded where other past "standard" data formats failed (including XML's ancestor, SGML—the Standard Generalized Markup Language). There are three reasons for XML's success: it is text-based instead of binary, it is simple rather than complex, and it has a superficial resemblance to HTML.
- Text
-
Unix realized nearly 30 years before XML that humans primarily interact with computers through text. Thus text files are the only files any system is guaranteed to be able to read and write. Because XML is text, programmers can easily make legacy systems emit XML reports.
- Simplicity
-
As we'll see, a lot of complexity has arisen around XML, but the XML standard itself is very simple. There are very few things that can appear in an XML document, but from those basic building blocks you can build extremely complex systems.
- HTML
-
XML is not HTML, but XML and HTML share a common ancestor: SGML. The superficial resemblance meant that the millions of programmers who had to learn HTML to put data on the web were able to learn (and accept) XML more easily.
Example 22-1 shows a simple XML document.Example 22-1. Simple XML document<?xml version="1.0" encoding="UTF-8"?> <books> <!-- Programming Perl 3ed --> <book id="1"> <title>Programming Perl</title> <edition>3</edition> <authors> <author> <firstname>Larry</firstname> <lastname>Wall</lastname> </author> <author> <firstname>Tom</firstname> <lastname>Christiansen</lastname> </author> <author> <firstname>Jon</firstname> <lastname>Orwant</lastname> </author> </authors> <isbn>0-596-00027-8</isbn> </book> <!-- Perl & LWP --> <book id="2"> <title>Perl & </title> <edition>1</edition> <authors> <author> <firstname>Sean</firstname> <lastname>Burke</lastname> </author> </authors> <isbn>0-596-00178-9</isbn> </book> <book id="3"> <!-- Anonymous Perl --> <title>Anonymous Perl</title> <edition>1</edition> <authors /> <isbn>0-555-00178-0</isbn> </book> </books>Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Parsing XML into Data Structures
- InhaltsvorschauYou want a Perl data structure (a combination of hashes and arrays) that corresponds to the structure and content of an XML file. For example, you have XML representing a configuration file, and you'd like to say
$xml->{config}{server}{hostname}to access the contents of<config><server><hostname>...</hostname>.Use the XML::Simple module from CPAN. If your XML is in a file, pass the filename toXMLin:use XML::Simple; $ref = XMLin($FILENAME, ForceArray => 1);
If your XML is in a string, pass the string toXMLin:use XML::Simple; $ref = XMLin($STRING, ForceArray => 1);
Here's the data structure that XML::Simple produces from the XML in Example 22-1:{'book' => { '1' => { 'authors' => [ { 'author' => [ { 'firstname' => [ 'Larry' ], 'lastname' => [ 'Wall' ] }, { 'firstname' => [ 'Tom' ], 'lastname' => [ 'Christiansen' ] }, { 'firstname' => [ 'Jon' ], 'lastname' => [ 'Orwant' ] } ] } ], 'edition' => [ '3' ], 'title' => [ 'Programming Perl' ], 'isbn' => [ '0-596-00027-8' ] }, '2' => { 'authors' => [ { 'author' => [ { 'firstname' => [ 'Sean' ], 'lastname' => [ 'Burke' ] } ] } ], 'edition' => [ '1' ], 'title' => [ 'Perl & LWP' ], 'isbn' => [ '0-596-00178-9' ] }, '3' => { 'authors' => [ { } ], 'edition' => [ '1' ], 'title' => [ 'Anonymous Perl' ], 'isbn' => [ '0-555-00178-0' ] }, } }The basic function of XML::Simple is to turn an element that contains other elements into a hash. If there are multiple identically named elements inside a single containing element (e.g.,book), they become an array of hashes unless XML::Simple knows they are uniquely identified by attributes (as happens here with theEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Parsing XML into a DOM Tree
- InhaltsvorschauYou want to use the Document Object Model (DOM) to access and perhaps change the parse tree of an XML file.Use the XML::LibXML module from CPAN:
use XML::LibXML; my $parser = XML::LibXML->new( ); my $dom = $parser->parse_string($XML); # or my $dom = $parser->parse_file($FILENAME); my $root = $dom->getDocumentElement;
DOM is a framework of classes for representing XML parse trees. Each element is a node in the tree, with which you can do operations like find its children nodes (the XML elements in this case), add another child node, and move the node somewhere else in the tree. Theparse_string,parse_file, andparse_fh(filehandle) constructors all return a DOM object that you can use to find nodes in the tree.For example, given thebooksXML from Example 22-1, Example 22-2 shows one way to print the titles.Example 22-2. dom-titledumper#!/usr/bin/perl -w # dom-titledumper -- display titles in books file using DOM use XML::LibXML; use Data::Dumper; use strict; my $parser = XML::LibXML->new; my $dom = $parser->parse_file("books.xml") or die; # get all the title elements my @titles = $dom->getElementsByTagName("title"); foreach my $t (@titles) { # get the text node inside the <title> element, and print its value print $t->firstChild->data, "\n"; }ThegetElementsByTagNamemethod returns a list of elements as nodes within the document that have the specific tag name. Here we get a list of thetitleelements, then go through eachtitleto find its contents. We know that eachtitlehas only a single piece of text, so we assume the first child node is text and print its contents.If we wanted to confirm that the node was a text node, we could say:die "the title contained something other than text!" if $t->firstChild->nodeType != 3;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Parsing XML into SAX Events
- InhaltsvorschauYou want to receive Simple API for XML (SAX) events from an XML parser because event-based parsing is faster and uses less memory than parsers that build a DOM tree.Use the XML::SAX module from CPAN:
use XML::SAX::ParserFactory; use MyHandler; my $handler = MyHandler->new( ); my $parser = XML::SAX::ParserFactory->parser(Handler => $handler); $parser->parse_uri($FILENAME); # or $parser->parse_string($XML);
Logic for handling events goes into the handler class (MyHandler in this example), which you write:# in MyHandler.pm package MyHandler; use base qw(XML::SAX::Base); sub start_element { # method names are specified by SAX my ($self, $data) = @_; # $data is hash with keys like Name and Attributes # ... } # other possible methods include end_element( ) and characters( ) 1;An XML processor that uses SAX has three parts: the XML parser that generates SAX events, the handler that reacts to them, and the stub that connects the two. The XML parser can be XML::Parser, XML::LibXML, or the pure Perl XML::SAX::PurePerl that comes with XML::SAX. The XML::SAX::ParserFactory module selects a parser for you and connects it to your handler. Your handler takes the form of a class that inherits from XML::SAX::Base. The stub is the program shown in the Solution.The XML::SAX::Base module provides stubs for the different methods that the XML parser calls on your handler. Those methods are listed in Table 22-2, and are the methods defined by the SAX1 and SAX2 standards athttp://www.saxproject.org/. The Perl implementation uses more Perl-ish data structures and is described in the XML::SAX::Intro manpage.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Making Simple Changes to Elements or Text
- InhaltsvorschauYou want to filter some XML. For example, you want to make substitutions in the body of a document, or add a price to every book described in an XML document, or you want to change
<book id="1">to<book> <id>1</id>.Use the XML::SAX::Machines module from CPAN:#!/usr/bin/perl -w use MySAXFilter1; use MySAXFilter2; use XML::SAX::ParserFactory; use XML::SAX::Machines qw(Pipeline); my $machine = Pipeline(MySAXFilter1 => MySAXFilter2); # or more $machine->parse_uri($FILENAME);
Write a handler, inheriting from XML::SAX::Base as in Recipe 22.3, then whenever you need a SAX event, call the appropriate handler in your superclass. For example:$self->SUPER::start_element($tag_struct);
A SAX filter accepts SAX events and triggers new ones. The XML::SAX::Base module detects whether your handler object is called as a filter. If so, the XML::SAX::Base methods pass the SAX events onto the next filter in the chain. If your handler object is not called as a filter, then the XML::SAX::Base methods consume events but do not emit them. This makes it almost as simple to write events as it is to consume them.The XML::SAX::Machines module chains the filters for you. Import itsPipelinefunction, then say:my $machine = Pipeline(Filter1 => Filter2 => Filter3 => Filter4); $machine->parse_uri($FILENAME);
SAX events triggered by parsing the XML file go to Filter1, which sends possibly different events to Filter2, which in turn sends events to Filter3, and so on to Filter4. The last filter should print or otherwise do something with the incoming SAX events. If you pass a reference to a typeglob, XML::SAX::Machines writes the XML to the filehandle in that typeglob.Example 22-5 shows a filter that turns theidattribute inbookelements from the XML document in Example 22-1 into a newEnde der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Validating XML
- InhaltsvorschauYou want to ensure that the XML you're processing conforms to a DTD or XML Schema.To validate against a DTD, use the XML::LibXML module:
use XML::LibXML; my $parser = XML::LibXML->new; $parser->validation(1); $parser->parse_file($FILENAME);
To validate against a W3C Schema, use the XML::Xerces module:use XML::Xerces; my $parser = XML::Xerces::DOMParser->new; $parser->setValidationScheme($XML::Xerces::DOMParser::Val_Always); my $error_handler = XML::Xerces::PerlErrorHandler->new( ); $parser->setErrorHandler($error_handler); $parser->parse($FILENAME);
The libxml2 library, upon which XML::LibXML is based, can validate as it parses. Thevalidationmethod on the parser enables this option. At the time of this writing, XML::LibXML could only validate with DOM parsing—validation is not available with SAX-style parsing.Example 22-7 is a DTD for the books.xml file in Example 22-1.Example 22-7. validating-booksdtd<!ELEMENT books (book*)> <!ELEMENT book (title,edition,authors,isbn)> <!ELEMENT authors (author*)> <!ELEMENT author (firstname,lastname)> <!ELEMENT title (#PCDATA)> <!ELEMENT edition (#PCDATA)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)> <!ELEMENT isbn (#PCDATA)> <!ATTLIST book id CDATA #REQUIRED >
To make XML::LibXML parse the DTD, add this line to the books.xml file:<!DOCTYPE books SYSTEM "books.dtd">
Example 22-8 is a simple driver used to parse and validate.Example 22-8. validating-bookchecker#!/usr/bin/perl -w # bookchecker - parse and validate the books.xml file use XML::LibXML; $parser = XML::LibXML->new; $parser->validation(1); $parser->parse_file("books.xml");Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Finding Elements and Text Within an XML Document
- InhaltsvorschauYou want to get to a specific part of the XML; for example, the
hrefattribute of anatag whose contents are animgtag withalttext containing the word "monkey".Use XML::LibXML and construct an XPath expression to find nodes you're interested in:use XML::LibXML; my $parser = XML::LibXML->new; $doc = $parser->parse_file($FILENAME); my @nodes = $doc->findnodes($XPATH_EXPRESSION);
Example 22-9 shows how you would print all the titles in the book XML from Example 22-1.Example 22-9. xpath-1#!/usr/bin/perl -w use XML::LibXML; my $parser = XML::LibXML->new; $doc = $parser->parse_file("books.xml"); # find title elements my @nodes = $doc->findnodes("/books/book/title"); # print the text in the title elements foreach my $node (@nodes) { print $node->firstChild->data, "\n"; }The difference between DOM'sgetElementsByTagNameandfindnodesis that the former identifies elements only by their name. An XPath expression specifies a set of steps that the XPath engine takes to find nodes you're interested in. In Example 22-9 the XPath expression says "start at the top of the document, go into thebookselement, go into thebookelement, and then go into thetitleelement."The difference is important. Consider this XML document:<message> <header><to>Tom</to><from>Nat</from></header> <body> <order><to>555 House St, Mundaneville</to> <product>Fish sticks</product> </order> </body> </message>There are twotoelements here: one in the header and one in the body. If we said$doc->getElementsByTagName("to"), we'd get bothtoelements. The XPath expression "/message/header/to" restricts output to thetoelement in the header.XPath expressions are like regular expressions that operate on XML structure instead of text. As with regular expressions, there are a lot of things you can specify in XPath expressions—far more than the simple "find this child node and go into it" that we've been doing.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Processing XML Stylesheet Transformations
- InhaltsvorschauYou have an XML stylesheet that you want to use to convert XML into something else. For example, you want to produce HTML from files of XML using the stylesheet.Use XML::LibXSLT:
use XML::LibXSLT; my $xslt = XML::LibXSLT->new; my $stylesheet = $xslt->parse_stylesheet_file($XSL_FILENAME); my $results = $stylesheet->transform_file($XML_FILENAME); print $stylesheet->output_string($results);
XML::LibXSLT is built on the fast and powerful libxslt library from the GNOME project. To perform a transformation, first build a stylesheet object from the XSL source and then use it to transform an XML file. If you wanted to (for example, your XSL is dynamically generated rather than being stored in a file), you could break this down into separate steps:use XML::LibXSLT; use XML::LibXML; my $xml_parser = XML::LibXML->new; my $xslt_parser = XML::LibXSLT->new; my $xml = $xml_parser->parse_file($XML_FILENAME); my $xsl = $xml_parser->parse_file($XSL_FILENAME); my $stylesheet = $xslt_parser->parse_stylesheet($xsl); my $results = $stylesheet->transform($xml); my $output = $stylesheet->output_string($results);
To save the output to a file, useoutput_file:$stylesheet->output_file($OUTPUT_FILENAME);
Similarly, write the output to an already-opened filehandle withoutput_fh:$stylesheet->output_fh($FILEHANDLE);
It's possible to pass parameters to the transformation engine. For example, your transformation might use parameters to set a footer at the bottom of each page:$stylesheet->transform($xml, footer => "'I Made This!'");
The strange quoting is because the XSLT engine expects to see quoted values. In the preceding example, the double quotes tell Perl it's a string, whereas the single quotes are for the XSLT engine.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Processing Files Larger Than Available Memory
- InhaltsvorschauYou want to work with a large XML file, but you can't read it into memory to form a DOM or other kind of tree because it's too big.Use SAX (as described in Recipe 22.3) to process events instead of building a tree.Alternatively, use XML::Twig to build trees only for the parts of the document you want to work with (as specified by XPath expressions):
use XML::Twig; my $twig = XML::Twig->new( twig_handlers => { $XPATH_EXPRESSION => \&HANDLER, # ... }); $twig->parsefile($FILENAME); $twig->flush( );You can call a lot of DOM-like functions from within a handler, but only the elements identified by the XPath expression (and whatever those elements enclose) go into a tree.DOM modules turn the entire document into a tree, regardless of whether you use all of it. With SAX modules, there are no trees built—if your task depends on document structure, you must keep track of that structure yourself. A happy middle ground is XML::Twig, which creates DOM trees only for the bits of the file that you're interested in. Because you work with files a piece at a time, you can cope with very large files by processing pieces that fit in memory.For example, to print the titles of books in books.xml (Example 22-1), you could write:use XML::Twig; my $twig = XML::Twig->new( twig_roots => { '/books/book' => \&do_book }); $twig->parsefile("books.xml"); $twig->purge( ); sub do_book { my($title) = $_->find_nodes("title"); print $title->text, "\n"; }For eachbookelement, XML::Twig callsdo_bookon its contents. That subroutine finds thetitlenode and prints its text. Rather than having the entire file parsed into a DOM structure, we keep only onebookelement at a time.Consult the XML::Twig manpages for details on how much DOM and XPath the module supports—it's not complete, but it's growing all the time. XML::Twig uses XML::Parser for its XML parsing, and as a result the functions available on nodes are slightly different from those provided by XML::LibXSLT's DOM parsing.Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Reading and Writing RSS Files
- InhaltsvorschauYou want to create a Rich Site Summary (RSS) file, or read one produced by another application.Use the CPAN module XML::RSS to read an existing RSS file:
use XML::RSS; my $rss = XML::RSS->new; $rss->parsefile($RSS_FILENAME); my @items = @{$rss->{items}}; foreach my $item (@items) { print "title: $item->{'title'}\n"; print "link: $item->{'link'}\n\n"; }To create an RSS file:use XML::RSS; my $rss = XML::RSS->new (version => $VERSION); $rss->channel( title => $CHANNEL_TITLE, link => $CHANNEL_LINK, description => $CHANNEL_DESC); $rss->add_item(title => $ITEM_TITLE, link => $ITEM_LINK, description => $ITEM_DESC, name => $ITEM_NAME); print $rss->as_string;There are at least four variations of RSS extant: 0.9, 0.91, 1.0, and 2.0. At the time of this writing, XML::RSS understood all but RSS 2.0. Each version has different capabilities, so methods and parameters depend on which version of RSS you're using. For example, RSS 1.0 supports RDF and uses the Dublin Core metadata (http://dublincore.org/). Consult the documentation for what you can and cannot call.XML::RSS uses XML::Parser to parse the RSS. Unfortunately, not all RSS files are well-formed XML, let alone valid. The XML::RSSLite module on CPAN offers a looser approach to parsing RSS—it uses regular expressions and is much more forgiving of incorrect XML.Example 22-13 uses XML::RSSLite and LWP::Simple to download The Guardian's RSS feed and print out the items whose descriptions contain the keywords we're interested in.Example 22-13. rss-parser#!/usr/bin/perl -w # guardian-list -- list Guardian articles matching keyword use XML::RSSLite; use LWP::Simple; use strict; # list of keywords we want my @keywords = qw(perl internet porn iraq bush); # get the RSS my $URL = 'http://www.guardian.co.uk/rss/1,,,00.xml'; my $content = get($URL); # parse the RSS my %result; parseRSS(\%result, \$content); # build the regex from keywords my $re = join "|", @keywords; $re = qr/\b(?:$re)\b/i; # print report of matching items foreach my $item (@{ $result{items} }) { my $title = $item->{title}; $title =~ s{\s+}{ }; $title =~ s{^\s+}{ }; $title =~ s{\s+$}{ }; if ($title =~ /$re/) { print "$title\n\t$item->{link}\n\n"; } }Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar. - Writing XML
- InhaltsvorschauYou have a data structure that you'd like to convert to XML.Use XML::Simple's
XMLoutfunction:use XML::Simple qw(XMLout); my $xml = XMLout($hashref);
TheXMLoutfunction takes a data structure and produces XML from it. For example, here's how to generate part of the book data:#!/usr/bin/perl -w use XML::Simple qw(XMLout); $ds = { book => [ { id => 1, title => [ "Programming Perl" ], edition => [ 3 ], }, { id => 2, title => [ "Perl & LWP" ], edition => [ 1 ], }, { id => 3, title => [ "Anonymous Perl" ], edition => [ 1 ], }, ] }; print XMLout($ds, RootName => "books" );This produces:<books> <book id="1"> <edition>3</edition> <title>Programming Perl</title> </book> <book id="2"> <edition>1</edition> <title>Perl & </title </book> <book id="3"> <edition>1</edition> <title>Anonymous Perl</title> </book> </books>The rule is: if you want something to be text data, rather than an attribute value, put it in an array. Notice how we used theRootNameoption toXMLoutto specify thatbooksis the top-level element. Passundefor the empty string to generate an XML fragment with no top-level fragment. The default value isopt.Theidentry in each hash became an attribute because the default behavior ofXMLoutis to do this for theid,key, andnamefields. Prevent this with:XMLout($ds, RootName => "books", KeyAttr => [ ]);
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Zurück zu Perl Cookbook
