JETZT ONLINE BESTELLEN
Add to Cart
Perl Best Practices

First Edition Juli 2005
ISBN 978-0-596-00173-5
542 Seiten
EUR32.00

Weitere Informationen zu diesem Buch

Inhaltsverzeichnis | Rezensionen |


Inhaltsverzeichnis

	
Chapter 1: Best Practices
Inhaltsvorschau
We do not all have to write like Faulkner, or program
like Dijkstra. I will gladly tell people what my
programming style is, and I will even tell them where I
think their own style is unclear or makes me jump
through mental hoops.
But I do this as a fellow programmer, not as the Perl
god ... stylistic limits should be self-imposed, or at most
policed by consensus among your buddies.
—Larry Wall
Natural Language Principles in Perl
Code matters. Analysis, design, decomposition, algorithms, data structures, and control flow mean nothing until they are made real, given form and power in the statements of some programming language. It is code that allows abstractions and ideas to control the physical world, that enables mathematical procedures to govern real-world processes, that converts data into information and information into knowledge.
Code matters. So the way in which you code matters too. Every programmer has a unique approach to writing software; a unique coding style. Programmers' styles are based on their earliest experiences in programming—the linguistic idiosyncrasies of their first languages, the way in which code was presented in their initial textbooks, and the stylistic prejudices of their early instructors. That style will develop and change as the programmer's experience and skills increase. Indeed, most programmers' style is really just a collection of coding habits that have evolved in response to the opportunities and pressures they have experienced throughout their careers.
Just as in natural evolution, those opportunities and pressures may lead to a coding style that is fit, strong, and well-adapted to the programmer's needs. Or it may lead to a coding style that is nasty, brutish, and underthought. But what it most often leads to is something even worse: Intuitive Programmer Syndrome .
Many programmers code by instinct. They aren't conscious of the hundreds of choices they make every time they code: how they format their source, the names they use for variables, the kinds of loops they use (
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Three Goals
Inhaltsvorschau
A good coding style is one that reduces the costs of your software project. There are three main ways in which a coding style can do that: by producing applications that are more robust, by supporting implementations that are more efficient, and by creating source code that is easier to maintain.
When deciding how you will write code, choose a style that is likely to reduce the number of bugs in your programs. There are several ways that your coding style can do that:
  • A coding style can minimize the chance of introducing errors in the first place. For example, appending _ref to the name of every variable that stores a reference (see Chapter 3) makes it harder to accidentally write $array_ref[$n] instead of $array_ref->[$n], because anything except an arrow after _ref will soon come to look wrong.
  • A coding style can make it easy to detect incorrect edge cases, where bugs often hide. For example, constructing a regular expression from a table (see Chapter 12) can prevent that regex from ever matching a value that the table doesn't cover, or from failing to match a value that it does.
  • A coding style can help you avoid constructs that don't scale well. For example, avoiding a cascaded if-elsif-elsif-elsif-... in favour of table look-ups (see Chapter 6) can ensure that the cost of any selection statement stays nearly constant, rather than growing linearly with the number of alternatives.
  • A coding style can improve how code handles failure. For example, mandating a standard interface for I/O prompting (see Chapter 10) can encourage developers to habitually verify terminal input, rather than simply assuming it will always be correct.
  • A coding style can improve how code reports failure. For example, a rule that every failure must throw an exception, rather than returning an
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
This Book
Inhaltsvorschau
To help you develop that consistent and coherent approach, the following 18 chapters explore a coordinated set of coding practices that have been specifically designed to enhance the robustness, efficiency, and maintainability of Perl code.
Each piece of advice is framed as a single imperative sentence—a "Thou shalt..." or a "Thou shalt not...", presented like this:
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
Each such admonition is followed by a detailed explanation of the rule, explaining how and when it applies. Every recommendation also includes a summary of the reasoning behind the prescription or proscription, usually in terms of how it can improve the reliability, performance, or comprehensibility of your code.
Almost every guideline also includes at least one example of code that conforms to the rule (set in constant-width bold) as well as counterexamples that break it (set in constant-width regular). These code fragments aim to demonstrate the advantages of following the suggested practice, and the problems that can occur if you don't. All of these examples are also available for you to download and reuse from http://www.oreilly.com/catalog/perlbp.
The guidelines are organized by topic, not by significance. For example, some readers will wonder why use strict and use warnings aren't mentioned on page 1. But if you've already seen the light on those two, they don't need to be on page 1. And if you haven't seen the light yet, Chapter 18 is soon enough. By then you'll have discovered several hundred ways in which code can go horribly wrong, and will be better able to appreciate these two ways in which Perl can help your code go right.
Other readers may object to "trivial" code layout recommendations appearing so early in the book. But if you've ever had to write code as part of a group, you'll know that layout is where most of the arguments start. Code layout is the medium in which all other coding practices are practised, so the sooner everyone can admit that code layout
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Rehabiting
Inhaltsvorschau
People cling to their current coding habits even when those habits are manifestly making their code buggy, slow, and incomprehensible to others. They cling to those habits because it's easier to live with their deficiencies than it is to fix them. Not thinking about how you code requires no effort. That's the whole point of a habit. It's a skill that has been compiled down from a cerebral process and then burnt into muscle memory; a microcoded reflex that your fingers can perform without your conscious control.
For example, if you're an aficionado of the BSD style of bracketing (see Chapter 2), then it's likely that your fingers can type Closingparen-Return-Openingcurly-Return-Tab without your ever needing to think about it—which makes it especially hard if your development team decides to adopt K&R bracketing instead, because now you have to type Closingparen-Return-Openingcurly-Return-dammit!-Backspace-Backspace-Backspace-Space-Openingcurly-Return-Tab for a couple of months until your fingers learn the new sequence.
Likewise, if you're used to writing Perl like this:

     @tcmd= grep /^.*;$/ => @cmd;

then abiding by the guidelines in this book and writing this instead:

            

    @terminated_commands

        = grep { m/ \A [^\n]* ; \n? \z /xms } @raw_commands;

         
will be deeply onerous. At least, it will be at first, until you break your existing habits and develop new ones.
But that's the great thing about programming habits: they're incredibly easy to change. All you have to do is consciously practise things the new way for long enough, and eventually your coding habits will automatically re-formulate themselves around that new behaviour.
So, if you decide to adopt the recommendations in the following chapters, try to adopt them zealously. See how often you can catch yourself (or others in your team) breaking the new rules. Stop letting your fingers do the programming. Recorrect each old habit the instant you notice yourself backsliding. Be strict with your hands. Rather than letting them type what feels good, force them to type what works well.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 2: Code Layout
Inhaltsvorschau
Most people's [...] programs should be indented six feet downward and covered with dirt.
—Blair P. Houghton
Formatting. Indentation. Style. Code layout. Whatever you choose to call it, it's one of the most contentious aspects of programming discipline. More and bloodier wars have been fought over code layout than over just about any other aspect of coding.
So what is the best practice here? Should you use classic Kernighan & Ritchie (K&R) style? Or go with BSD code formatting? Or adopt the layout scheme specified by the GNU project? Or conform to the Slashcode coding guidelines?
Of course not! Everyone knows that [insert your personal coding style here] is the One True Layout Style, the only sane choice, as ordained by [insert your favorite Programming Deity here] since Time Immemorial! Any other choice is manifestly absurd, willfully heretical, and self-evidently a Work of Darkness!!!
And that's precisely the problem. When deciding on a layout style, it's hard to decide where rational choices end and rationalized habits begin.
Adopting a coherently designed approach to code layout, and then applying that approach consistently across all your coding, is fundamental to best practice programming. Good layout can improve the readability of a program, help detect errors within it, and make the structure of your code much easier to comprehend. Layout matters.
But most coding styles—including the four mentioned earlier—confer those benefits almost equally well. So while it's true that having a consistent code layout scheme matters very much indeed, the particular code layout scheme you ultimately decide upon... does not matter at all!
All that matters is that you adopt a single, coherent style; one that works for your entire programming team. And, having agreed upon that style, that you then apply it consistently across all your development.
The layout guidelines suggested in this chapter have been carefully and consciously selected from many alternatives, in a deliberate attempt to construct a coding style that is self-consistent and concise, that improves the readability of the resulting code, that makes it easy to detect coding mistakes, and that works well for a wide range of programmers in a wide range of development environments.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Bracketing
Inhaltsvorschau
Brace and parenthesize in K&R style.
When setting out a code block, use the K&R style of bracketing . That is, place the opening brace at the end of the construct that controls the block. Then start the contents of the block on the next line, and indent those contents by one indentation level. Finally, place the closing brace on a separate line, at the same indentation level as the controlling construct.
Likewise, when setting out a parenthesized list over multiple lines, put the opening parenthesis at the end of the controlling expression; arrange the list elements on the subsequent lines, indented by one level; and place the closing parenthesis on its own line, outdenting it back to the level of the controlling expression. For example:

            

    my @names = (

        'Damian',    

                  # Primary key

               

        'Matthew',   

                  # Disambiguator

               

        'Conway',    

                  # General class or category

               

    );



    for my $name (@names) {

        for my $word ( anagrams_of(lc $name) ) {

            print "$word\n";

        }

    }

         
Don't place the opening brace or parenthesis on a separate line, as is common under the BSD and GNU styles of bracketing :

            # Don't use BSD style...

    my @names =

    (

        'Damian',    # Primary key

        'Matthew',   # Disambiguator

        'Conway',    # General class or category

    );



    for my $name (@names)

    {

        for my $word (anagrams_of(lc $name))

        {

            print "$word\n";

        }

    }



    # And don't use GNU style either...



    for my $name (@names)

      {

        for my $word (anagrams_of(lc $name))

          {

            print "$word\n";

          }

      }
The K&R style has one obvious advantage over the other two styles: it requires one fewer line per block, which means one more line of actual code will be visible at any time on your screen. If you're looking at a series of blocks, that might add up to three or four extra code lines per screen.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Keywords
Inhaltsvorschau
Separate your control keywords from the following opening bracket.
Control structures regulate the dynamic behaviour of a program, so the keywords of control structures are amongst the most critical components of a program. That's why it's important that those keywords stand out clearly in the source code.
In Perl, most control structure keywords are immediately followed by an opening parenthesis, which can make it easy to confuse them with subroutine calls. It's important to distinguish the two. To do this, use a single space between a keyword and the following brace or parenthesis:

            

    for my $result (@results) {

        print_sep();

        print $result;

    }



    while ($min < $max) {

        my $try = ($max - $min) / 2;

        if ($value[$try] < $target) {

            $max = $try;

        }

        else {

            $min = $try;

        }

    }

         
Without the intervening space, it's harder to pick out the keyword, and easier to mistake it for the start of a subroutine call:

    for(@results) {

        print_sep();

        print;

    }



    while($min < $max) {

        my $try = ($max - $min) / 2;

        if($value[$try] < $target) {

            $max = $try;

        }

        else{

            $min = $try;

        }

    }

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Subroutines and Variables
Inhaltsvorschau
Don't separate subroutine or variable names from the following opening bracket.
In order for the previous rule to work properly, it's important that subroutines and variables not have a space between their names and any following brackets. Otherwise, it's too easy to mistake a subroutine call for a control structure, or misread the initial part of an array element as an independent scalar variable.
So cuddle subroutine calls and variable names against their trailing parentheses or braces:

            

    my @candidates = get_candidates($marker);



    CANDIDATE:

    for my $i (0..$#candidates) {

        next CANDIDATE if open_region($i);



        $candidates[$i]

            = $incumbent{ $candidates[$i]{region} };

    }

         
Spacing them out only makes them harder to recognize:

    my @candidates = get_candidates ($marker);



    CANDIDATE:

    for my $i (0..$#candidates) {

        next CANDIDATE if open_region ($i);



        $candidates [$i]

            = $incumbent {$candidates [$i] {region}};

    }

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Builtins
Inhaltsvorschau
Don't use unnecessary parentheses for builtins and "honorary" builtins.
Perl's many built-in functions are effectively keywords of the language, so they can legitimately be called without parentheses, except where it's necessary to enforce precedence.
Calling builtins without parentheses reduces clutter in your code, and thereby enhances readability. The lack of parentheses also helps to visually distinguish between subroutine calls and calls to builtins:

            

    while (my $record = <$results_file>) {

        chomp $record;

        my ($name, $votes) = split "\t", $record;

        print 'Votes for ',

              substr($name, 0, 10),       

                  # Parens needed for precedence

               

              ": $votes (verified)\n";

    }

         
Certain imported subroutines, usually from modules in the core distribution, also qualify as "honorary" builtins, and may be called without parentheses. Typically these will be subroutines that provide functionality that ought to be in the language itself but isn't. Examples include carp and croak (from the standard Carp module—see Chapter 13), first and max (from the standard List::Util module—see Chapter 8), and prompt (from the IO::Prompt CPAN module—see Chapter 10).
Note, however, that in any cases where you find that you need to use parentheses in builtins, they should follow the rules for subroutines, not those for control keywords. That is, treat them as subroutines, with no space between the builtin name and the opening parenthesis:

            

    while (my $record = <$results_file>) {

        chomp( $record );

        my ($name, $votes) = split("\t", $record);

        print(

            'Votes for ',

            substr($name, 0, 10),

            ": $votes (verified)\n"

        );

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Keys and Indices
Inhaltsvorschau
Separate complex keys or indices from their surrounding brackets.
When accessing elements of nested data structures (hashes of hashes of arrays of whatever), it's easy to produce a long, complex, and visually dense expression, such as:

    $candidates[$i] = $incumbent{$candidates[$i]{get_region()}};
That's especially true when one or more of the indices are themselves indexed variables. Squashing everything together without any spacing doesn't help the readability of such expressions. In particular, it can be difficult to detect whether a given pair of brackets is part of the inner or outer index.
Unless an index is a simple constant or scalar variable, it's much clearer to put spaces between the indexing expression and its surrounding brackets:

            

    $candidates[$i] = $incumbent{ $candidates[$i]{ get_region() } };

         
Note that the determining factors here are both the complexity and the overall length of the index. Occasionally, "spacing-out" an index makes sense even if that index is just a single constant or scalar. For example, if that simple index is unusually long, it's better written as:

            

    print $incumbent{ $largest_gerrymandered_constituency };

         
rather than:

    print $incumbent{$largest_gerrymandered_constituency};

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Operators
Inhaltsvorschau
Use whitespace to help binary operators stand out from their operands.
Long expressions can be hard enough to comprehend without adding to their complexity by jamming their various components together:

    my $displacement=$initial_velocity*$time+0.5*$acceleration*$time**2;



    my $price=$coupon_paid*$exp_rate+(($face_val+$coupon_val)*$exp_rate**2);
Give your binary operators room to breathe, even if it requires an extra line to do so:

            

    my $displacement

        = $initial_velocity * $time  +  0.5 * $acceleration * $time**2;



    my $price

        = $coupon_paid * $exp_rate  +  ($face_val + $coupon_paid) * $exp_rate**2;

         
Choose the amount of whitespace according to the precedence of the operators, to help the reader's eyes pick out the natural groupings within the expression. For example, you might put additional spaces on either side of the lower-precedence + to visually reinforce the higher precedence of the two multiplicative subexpressions surrounding it. On the other hand, it's quite appropriate to sandwich the ** operator tightly between its operands, given its very high precedence and its longer, more easily identified symbol.
A single space is always sufficient whenever you're also using parentheses to emphasize (or to vary) precedence:

            

    my $velocity

        = $initial_velocity + ($acceleration * ($time + $delta_time));



    my $future_price

        = $current_price * exp($rate - $dividend_rate_on_index) * ($delivery - $now);

         
Symbolic unary operators should always be kept with their operands:

            

    my $spring_force = !$hyperextended ? -$spring_constant * $extension : 0;



    my $payoff = max(0, -$asset_price_at_maturity + $strike_price);

         
Named unary operators should be treated like builtins, and spaced from their operands appropriately:

            

    my $tan_theta = sin $theta / cos $theta;



    my $forward_differential_1_year = $delivery_price * exp -$interest_rate;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Semicolons
Inhaltsvorschau
Place a semicolon after every statement.
In Perl, semicolons are statement separators, not statement terminators, so a semicolon isn't required after the very last statement in a block. Put one in anyway, even if there's only one statement in the block:

            

    while (my $line = <>) {

        chomp $line;



        if ( $line =~ s{\A (\s*) -- (.*)}{$1#$2}xms ) {

            push @comments, $2;

        }



        print $line;

    }

         
The extra effort to do this is negligible, and that final semicolon confers two very important advantages. It signals to the reader that the preceding statement is finished, and (perhaps more importantly) it signals to the compiler that the statement is finished. Telling the compiler is more important than telling the reader, because the reader can often work out what you really meant, whereas the compiler reads only what you actually wrote.
Leaving out the final semicolon usually works fine when the code is first written (i.e., when you're still paying proper attention to the entire piece of code):

    while (my $line = <>) {

        chomp $line;



        if ( $line =~ s{\A (\s*) -- (.*)}{$1#$2}xms ) {

            push @comments, $2

        }



        print $line

    }
But, without the semicolons, there's nothing to prevent later additions to the code from causing subtle problems:

    while (my $line = <>) {

        chomp $line;



        if ( $line =~ s{\A (\s*) -- (.*)}{$1#$2}xms ) {

            push @comments, $2

            /shift/mix

        }



        print $line

        $src_len += length;

    }
The problem is that those two additions don't actually add new statements; they just absorb the existing ones. So the previous code actually means:

    while (my $line = <>) {

        chomp $line;



        if ( $line =~ s{\A (\s*) -- (.*)}{$1#$2}xms ) {

            push @comments, $2 / shift() / mix()

        }



        print $line ($src_len += length);

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Commas
Inhaltsvorschau
Place a comma after every value in a multiline list.
Just as semicolons act as separators in a block of statements, commas act as separators in a list of values. That means that exactly the same arguments apply in favour of treating them as terminators instead.
Adding an extra trailing comma (which is perfectly legal in any Perl list) also makes it much easier to reorder the elements of the list. For example, it's much easier to convert:

            

    my @dwarves = (

        'Happy',

        'Sleepy',

        'Dopey',

        'Sneezy',

        'Grumpy',

        'Bashful',

        'Doc',

    );

         
to:

            

    my @dwarves = (

        'Bashful',

        'Doc',

        'Dopey',

        'Grumpy',

        'Happy',

        'Sleepy',

        'Sneezy',

    );

         
You can manually cut and paste lines or even feed the list contents through sort.
Without that trailing comma after 'Doc', reordering the list would introduce a bug:

    my @dwarves = (

        'Bashful',

        'Doc'

        'Dopey',

        'Grumpy',

        'Happy',

        'Sleepy',

        'Sneezy',

    );
Of course, that's a trivial mistake to find and fix, but why not adopt a coding style that eliminates the very possibility of such problems?
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Line Lengths
Inhaltsvorschau
Use 78-column lines.
In these modern days of high-resolution 30-inch screens, anti-aliased fonts, and laser eyesight correction, it's entirely possible to program in a terminal window that's 300 columns wide.
Please don't.
Given the limitations of printed documents, legacy VGA display devices, presentation software, and unreconstructed managerial optics, it isn't reasonable to format code to a width greater than 80 columns. And even an 80-column line width is not always safe, given the text-wrapping characteristics of some terminals, editors, and mail systems.
Setting your right margin at 78 columns maximizes the usable width of each code line whilst ensuring that those lines appear consistently on the vast majority of display devices.
In vi, you can set your right margin appropriately by adding:

            

    set textwidth=78

         
to your configuration file. For Emacs, use:

            

    (setq fill-column 78)

    (setq auto-fill-mode t)

         
Another advantage of this particular line width is that it ensures that any code fragment sent via email can be quoted at least once without wrapping:

            

    From: boss@headquarters

    To: you@saltmines

    Subject: Please explain



    I came across this chunk of code in your latest module.

    Is this your idea of a joke???



    > $;=$/;seek+DATA,undef$/,!$s;$_=<DATA>;$s&&print||(*{q;::\;

    > ;}=sub{$d=$d-1?$d:$0;s;';\t#$d#;,$_})&&$g&&do{$y=($x||=20)*($y||8);sub

    > i{sleep&f}sub'p{print$;x$=,join$;,$b=~/.{$x}/g,$;}sub'f{pop||1}sub'n{substr($b

    > ,&f%$y,3)=~tr,O,O,}sub'g{@_[@_]=@_;--($f=&f);$m=substr($b,&f,1);($w,$w,$m,O)

    > [n($f-$x)+n($x+$f)-(${m}eq+O=>)+n$f]||$w}$w="\40";$b=join'',@ARGV?<>:$_,$w

    > x$y;$b=~s).)$&=~/\w/?O:$w)gse;substr($b,$y)=q++;$g='$i=0;$i?$b:$c=$b;

    > substr+$c,$i,1,g$i;$g=~s?\d+?($&+1)%$y?e;$i-$y+1?eval$g:do{$b=$c;p;i}';

    > sub'e{eval$g;&e};e}||eval||die+No.$;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Indentation
Inhaltsvorschau
Use four-column indentation levels.
Indentation depth is far more controversial than line width. Ask four programmers the right number of columns per indentation level and you'll get four different answers: two-, three-, four-, or eight-column indents. You'll usually also get a heated argument.
The ancient coding masters, who first cut code on teletypes or hardware terminals with fixed tabstops, will assert that eight columns per level of indentation is the only acceptable ratio, and support that argument by pointing out that most printers and software terminals still default to eight-column tabs. Eight columns per indentation level ensures that your code looks the same everywhere:

    while (my $line = <>) {

            chomp $line;

            if ( $line =~ s{\A (\s*) -- ([^\n]*) }{$1#$2}xms ) {

                    push @comments, $2;

            }

            print $line;

    }
Yes (agree many younger hackers), eight-column indents ensure that your code looks equally ugly and unreadable everywhere! Instead, they insist on no more than two or three columns per indentation level. Smaller indents maximize the number of levels of nesting available across a fixed-width display: about a dozen levels under a two- or three-column indent, versus only four or five levels with eight-column indents. Shallower indentation also reduces the horizontal distance the eye has to track, thereby keeping indented code in the same vertical sight-line and making the context of any line of code easier to ascertain:

    while (my $line = <>) {

      chomp $line;

      if ( $line =~ s{\A (\s*) -- ([^\n]*) }{$1#$2}xms ) {

        push @comments, $2;

      }

      print $line;

    }
The problem with this approach (cry the ancient masters) is that it can make indentations impossible to detect for anyone whose eyes are older than 30, or whose vision is worse than 20/20. And that's the crux of the problem. Deep indentation enhances structural readability at the expense of contextual readability; shallow indentation, vice versa. Neither is ideal.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Tabs
Inhaltsvorschau
Indent with spaces, not tabs .
Tabs are a bad choice for indenting code, even if you set your editor's tabspacing to four columns. Tabs do not appear the same when printed on different output devices, or pasted into a word-processor document, or even just viewed in someone else's differently tabspaced editor. So don't use tabs alone or (worse still) intermix tabs with spaces:

    sub addarray_internal {

    »   my ($var_name, $need_quotemeta) = @_;



    »   $raw .= $var_name;



    »   my $quotemeta = $need_quotemeta ? q{ map {quotemeta $_} }

    »   »   »   »   » :                   $EMPTY_STR

    »   ··············;



    ····my $perl5pat

    ····»   = qq{(??{join q{|}, $quotemeta \@{$var_name}})};



    »   push @perl5pats, $perl5pat;



    »   return;

    }
The only reliable, repeatable, transportable way to ensure that indentation remains consistent across viewing environments is to indent your code using only spaces. And, in keeping with the previous rule on indentation depth, that means using four space characters per indentation level:

            

    sub addarray_internal {

    ····my ($var_name, $need_quotemeta) = @_;



    ····$raw .= $var_name;



    ····my $quotemeta = $need_quotemeta ? q{ map {quotemeta $_} }

    ··················:···················$EMPTY_STR

    ··················;



    

                  ····

               my $perl5pat

    

                  ········

               = qq{(??{join q{|}, $quotemeta \@{$var_name}})};



    

                  ····

               push @perl5pats, $perl5pat;



    

                  ····

               return;

    }

         
Note that this rule doesn't mean you can't use the Tab key to indent your code; only that the result of pressing that key can't actually be a tab. That's usually very easy to ensure under modern editors, most of which can easily be configured to convert tabs to spaces. For example, if you use vim, you can include the following directives in your
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Blocks
Inhaltsvorschau
Never place two statements on the same line.
If two or more statements share one line, each of them becomes harder to comprehend:

    RECORD:

    while (my $record = <$inventory_file>) {

        chomp $record; next RECORD if $record eq $EMPTY_STR;

        my @fields = split $FIELD_SEPARATOR, $record; update_sales(\@fields);$count++;

    }
You're already saving vertical space by using K&R bracketing; use that space to improve the code's readability, by giving each statement its own line:

            

    RECORD:

    while (my $record = <$inventory_file>) {

        chomp $record;

        next RECORD if $record eq $EMPTY_STR;

        my @fields = split $FIELD_SEPARATOR, $record;

        update_sales(\@fields);

        $count++;

    }

         
Note that this guideline applies even to map and grep blocks that contain more than one statement. You should write:

            

    my @clean_words

        = map {

              my $word = $_;

              $word =~ s/$EXPLETIVE/[DELETED]/gxms;

              $word;

          } @raw_words;

         
not:

    my @clean_words

        = map { my $word = $_; $word =~ s/$EXPLETIVE/[DELETED]/gxms; $word } @raw_words;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chunking
Inhaltsvorschau
Code in paragraphs.
A paragraph is a collection of statements that accomplish a single task: in literature, it's a series of sentences conveying a single idea; in programming, it's a series of instructions implementing a single step of an algorithm.
Break each piece of code into sequences that achieve a single task, placing a single empty line between each sequence. To further improve the maintainability of the code, place a one-line comment at the start of each such paragraph, describing what the sequence of statements does. Like so:

            

               

                  # Process an array that has been recognized...

               

    sub addarray_internal {

        my ($var_name, $needs_quotemeta) = @_;



        

                  # Cache the original...

               

        $raw .= $var_name;



        

                  # Build meta-quoting code, if requested...

               

        my $quotemeta = $needs_quotemeta ?  q{map {quotemeta $_} } : $EMPTY_STR;



        

                  # Expand elements of variable, conjoin with ORs...

               

        my $perl5pat = qq{(??{join q{|}, $quotemeta \@{$var_name}})};



        

                  # Insert debugging code if requested...

               

        my $type = $quotemeta ? 'literal' : 'pattern';

        debug_now("Adding $var_name (as $type)");

        add_debug_mesg("Trying $var_name (as $type)");



        return $perl5pat;

    }

         
Paragraphs are useful because humans can focus on only a few pieces of information at once. Paragraphs are one way of aggregating small amounts of related information, so that the resulting "chunk" can fit into a single slot of the reader's limited short-term memory. Paragraphs enable the physical structure of a piece of writing to reflect and emphasize its logical structure. Adding comments at the start of each paragraph further enhances the chunking by explicitly summarizing the purpose of each chunk.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Elses
Inhaltsvorschau
Don't cuddle an else .
A "cuddled" else looks like this:

    } else {

An uncuddled else looks like this:

            

    }

    else {

         
Cuddling saves an additional line per alternative, but ultimately it works against the readability of code in other ways, especially when that code is formatted using K&R bracketing. A cuddled else keyword is no longer in vertical alignment with its controlling if, nor with its own closing bracket. This misalignment makes it harder to visually match up the various components of an if-else construct.
More importantly, the whole point of an else is to distinguish an alternate course of action. But cuddling the else makes that distinction less distinct. For a start, it removes the near-empty line provided by the closing brace of the preceding if, which reduces the visual gap between the if and else blocks. Squashing the two blocks together in that way undermines the paragraphing inside the two blocks (see the previous guideline, "Chunking"), especially if the contents of the blocks are themselves properly paragraphed with empty lines between chunks.
Cuddling also moves the else from the leftmost position on its line, which means that the keyword is harder to locate when you are scanning down the code. On the other hand, an uncuddled else improves both the vertical separation of your code and the identifiability of the keyword:

            

    if ($sigil eq '$') {

        if ($subsigil eq '?') {

            $sym_table{ substr($var_name,2) } = delete $sym_table{$var_name};



            $internal_count++;

            $has_internal{$var_name}++;

        }

        else {

            ${$var_ref} = q{$sym_table{$var_name}};



            $external_count++;

            $has_external{$var_name}++;

        }

    }

    elsif ($sigil eq '@' && $subsigil eq '?') {

        @{ $sym_table{$var_name} }

            = grep {defined $_} @{$sym_table{$var_name}};

    }

    elsif ($sigil eq '%' && $subsigil eq '?') {

        delete $sym_table{$var_name}{$EMPTY_STR};

    }

    else {

        ${$var_ref} = q{$sym_table{$var_name}};

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Vertical Alignment
Inhaltsvorschau
Align corresponding items vertically.
Tables are another familiar means of chunking related information, and of using physical layout to indicate logical relationships. When setting out code, it's often useful to align data in a table-like series of columns. Consistent indentation can suggest equivalences in structure, usage, or purpose.
For example, initializers for non-scalar variables are often much more readable when laid out neatly using extra whitespace. The following array and hash initializations are very readable in tabular layout:

            

    my @months = qw(

        January   February   March

        April     May        June

        July      August     September

        October   November   December

    );



    my %expansion_of = (

        q{it's}    => q{it is},

        q{we're}   => q{we are},

        q{didn't}  => q{did not},

        q{must've} => q{must have},

        q{I'll}    => q{I will},

    );

         
Compressing them into lists saves lines, but also significantly reduces their readability:

    my @months = qw(

        January February March April May June July August September

        October November December

    );



    my %expansion_of = (

        q{it's} => q{it is}, q{we're} => q{we are}, q{didn't} => q{did not},

        q{must've} => q{must have}, q{I'll} => q{I will},

    );
Take a similar tabular approach with sequences of assignments to related variables, by aligning the assignment operators:

            

    $name   = standardize_name($name);

    $age    = time - $birth_date;

    $status = 'active';

         
rather than:

    $name = standardize_name($name);

    $age = time - $birth_date;

    $status = 'active';
Alignment is even more important when assigning to a hash entry or an array element. In such cases, the keys (or indices) should be aligned in a column, with the surrounding braces (or square brackets) also aligned. That is:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Breaking Long Lines
Inhaltsvorschau
Break long expressions before an operator.
When an expression at the end of a statement gets too long, it's common practice to break that expression after an operator and then continue the expression on the following line, indenting it one level. Like so:

    push @steps, $steps[-1] +

        $radial_velocity * $elapsed_time +

        $orbital_velocity * ($phase + $phase_shift) -

        $DRAG_COEFF * $altitude;
The rationale is that the operator that remains at the end of the line acts like a continuation marker, indicating that the expression continues on the following line.
Using the operator as a continuation marker seems like an excellent idea, but there's a serious problem with it: people rarely look at the right edge of code. Most of the semantic hints in a program—such as keywords—appear on the left side of that code. More importantly, the structural cues for understanding code—for example, indenting—are predominantly on the left as well (see the upcoming "Keep Left" sidebar). This means that indenting the continued lines of the expression actually gives a false impression of the underlying structure, a misperception that the eye must travel all the way to the right margin to correct.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Non-Terminal Expressions
Inhaltsvorschau
Factor out long expressions in the middle of statements.
The previous guideline applies only if the long expression to be broken is the last value in a statement. If the expression appears in the middle of a statement, it is better to factor that expression out into a separate variable assignment. For example:

            

    my $next_step = $steps[-1]

                    + $radial_velocity * $elapsed_time

                    + $orbital_velocity * ($phase + $phase_shift)

                    - $DRAG_COEFF * $altitude

                    ;

    add_step( \@steps, $next_step, $elapsed_time);

         
rather than:

    add_step( \@steps, $steps[-1]

                       + $radial_velocity * $elapsed_time

                       + $orbital_velocity * ($phase + $phase_shift)

                       - $DRAG_COEFF * $altitude

                       , $elapsed_time);

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Breaking by Precedence
Inhaltsvorschau
Always break a long expression at the operator of the lowest possible precedence.
As the examples in the previous two guidelines show, when breaking an expression across several lines, each line should be broken before a low-precedence operator. Breaking at operators of higher precedence encourages the unwary reader to misunderstand the computation that the expression performs. For example, the following layout might surreptitiously suggest that the additions and subtractions happen before the multiplications:

    push @steps, $steps[-1] + $radial_velocity

                 * $elapsed_time + $orbital_velocity

                 * ($phase + $phase_shift) - $DRAG_COEFF

                 * $altitude

                 ;

If you're forced to break on an operator of less-than-minimal precedence, indent the broken line one additional level relative to the start of the expression, like so:

            

    push @steps, $steps[-1]

                 + $radial_velocity * $elapsed_time

                 + $orbital_velocity

                     * ($phase + $phase_shift)

                 - $DRAG_COEFF * $altitude

                 ;

         
This strategy has the effect of keeping the subexpressions of the higher precedence operation visually "together".
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Assignments
Inhaltsvorschau
Break long assignments before the assignment operator.
Often, the long statement that needs to be broken will be an assignment. The preceding rule does work in such cases, but leads to code that's unaesthetic and hard to read:

    $predicted_val = $average

                     + $predicted_change * $fudge_factor

                     ;

A better approach when breaking assignment statements is to break before the assignment operator itself, leaving only the variable being assigned to on the first line. Then indent one level, and place the assignment operator at the start of the next line—once again indicating a continued statement:

            

    $predicted_val

        = $average + $predicted_change * $fudge_factor;

         
Note that this approach often allows the entire righthand side of an assignment to be laid out on a single line, as in the preceding example. However, if the righthand expression is still too long, break it again at a low-precedence operator, as suggested in the previous guideline:

            

    $predicted_val

        = ($minimum + $maximum) / 2

          + $predicted_change * max($fudge_factor, $local_epsilon);

         
A commonly used alternative layout for broken assignments is to break after the assignment operator, like so:

    $predicted_val =

        $average + $predicted_change * $fudge_factor;
This approach suffers from the same difficulty described earlier: it's impossible to detect the line continuation without scanning all the way to the right of the code, and the "unmarked" indentation of the second line can mislead the casual reader. This problem of readability is most noticeable when the variable being assigned to is itself quite long:

    $predicted_val{$current_data_set}[$next_iteration] =

        $average + $predicted_change * $fudge_factor;
which, of course, is precisely when such an assignment would most likely need to be broken. Breaking before the assignment operator makes long assignments much easier to identify, by keeping the assignment operator visually close to the start of the variable being assigned to:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Ternaries
Inhaltsvorschau
Format cascaded ternary operators in columns.
One operator that is particularly prone to creating long expressions is the ternary operator. Because the ? and : of a ternary have very low precedence, a straightforward interpretation of the expression-breaking rule doesn't work well in this particular case, since it produces something like:

    my $salute = $name eq $EMPTY_STR ? 'Customer'

                 : $name =~ m/\A((?:Sir|Dame) \s+ \S+)/xms ? $1

                 : $name =~ m/(.*), \s+ Ph[.]?D \z/xms ? "Dr $1" : $name;
which is almost unreadable.
The best way to lay out a series of ternary selections is in two columns, like so:

            

               

                  # When their name is...                    Address them as...

               

    my $salute = $name eq $EMPTY_STR                      ? 'Customer'

               : $name =~ m/\A((?:Sir|Dame) \s+ \S+) /xms ? $1

               : $name =~ m/(.*), \s+ Ph[.]?D \z     /xms ? "Dr $1"

               :                                            $name

               ;

         
In other words, break a series of ternary operators before every colon, aligning the colons with the operator preceding the first conditional. Doing so will cause the conditional tests to form a column. Then align the question marks of the ternaries so that the various possible results of the ternary also form a column. Finally, indent the last result (which has no preceding question mark) so that it too lines up in the results column.
This special layout converts the typical impenetrably obscure ternary sequence into a simple look-up table: for a given condition in column one, use the corresponding result from column two.
You can use the tabular layout even if you have only a single ternary:

            

    my $name = defined $customer{name} ? $customer{name}

             :                           'Sir or Madam'

             ;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Lists
Inhaltsvorschau
Parenthesize long lists .
The comma operator is really an operator only in scalar contexts. In lists, the comma is an item separator. Consequently, commas in multiline lists are best treated as item terminators. Moreover, multiline lists are particularly easy to confuse with a series of statements, as there is very little visual difference between a , and a ;.
Given the potential for confusion, it's important to clearly mark a multiline list as being a list. So, if you need to break a list across multiple lines, place the entire list in parentheses. The presence of an opening parenthesis highlights the fact that the subsequent expressions form a list, and the closing parenthesis makes it immediately apparent that the list is complete.e
When laying out a statement containing a multiline list, place the opening parenthesis on the same line as the preceding portion of the statement. Then break the list after every comma, placing the same number of list elements on each separate line and indenting those lines one level deeper than the surrounding statement. Finally, outdent the closing parenthesis back to the same level as the statement. Like so:

            

    my @months = qw(

        January   February   March

        April     May        June

        July      August     September

        October   November   December

    );



    for my $item (@requested_items) {

        push @items, (

            "A brand new $item",

            "A fully refurbished $item",

            "A ratty old $item",

        );

    }



    print (

        'Processing ',

        scalar(@items),

        ' items at ',

        time,

        "\n",

    );

         
Note that the final item in the list should still have a comma, even though it isn't required syntactically.
When writing multiline lists, always use parentheses (with K&R-style bracketing), keep to the same number of items on each line, and remember that in list contexts a comma isn't an operator, so the "break-before-an-operator rule" doesn't apply. In other words, not like this:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Automated Layout
Inhaltsvorschau
Enforce your chosen layout style mechanically.
In the long term, it's best to train yourself and your team to code in a consistent, rational, and readable style such as the one suggested earlier. However, the time and commitment necessary to accomplish that isn't always available. In such cases, a reasonable compromise is to prescribe a standard code-formatting tool that must be applied to all code before it's committed, reviewed, or otherwise displayed in public.
There is now an excellent code formatter available for Perl: perltidy. It's freely available from SourceForge at http://perltidy.sourceforge.net and provides an extensive range of user-configurable options for indenting, block delimiter positioning, column-like alignment, and comment positioning.
Using perltidy, you can convert code like this:

    if($sigil eq '$'){

        if($subsigil eq '?'){

            $sym_table{substr($var_name,2)}=delete $sym_table{locate_orig_var($var)};

            $internal_count++;$has_internal{$var_name}++

        } else {

            ${$var_ref} =

                q{$sym_table{$var_name}}; $external_count++; $has_external{$var_name}++;

    }} elsif ($sigil eq '@'&&$subsigil eq '?') {

        @{$sym_table{$var_name}} = grep

            {defined $_} @{$sym_table{$var_name}};

    } elsif ($sigil eq '%' && $subsigil eq '?') {

    delete $sym_table{$var_name}{$EMPTY_STR}; } else

    {

    ${$var_ref}

    =

    q{$sym_table{$var_name}}

    }

into something readable:

            

    if ( $sigil eq '$' ) {

        if ( $subsigil eq '?' ) {

            $sym_table{ substr( $var_name, 2 ) }

                = delete $sym_table{ locate_orig_var($var) };

            $internal_count++;

            $has_internal{$var_name}++;

        }

        else {

            ${$var_ref} = q{$sym_table{$var_name}};

            $external_count++;

            $has_external{$var_name}++;

        }

    }

    elsif ( $sigil eq '@' && $subsigil eq '?' ) {

        @{ $sym_table{$var_name} }

            = grep {defined $_} @{ $sym_table{$var_name} };

    }

    elsif ( $sigil eq '%' && $subsigil eq '?' ) {

        delete $sym_table{$var_name}{$EMPTY_STR};

    }

    else {

        ${$var_ref} = q{$sym_table{$var_name}};

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 3: Naming Conventions
Inhaltsvorschau
Names are but noise and smoke,
Obscuring heavenly light
—Johann Wolfgang von Goethe
Faust: Part I
Consistent and coherent code layout is vital, because it determines what the reader of your code sees. But naming conventions are even more important, because they determine how the reader thinks about your program.
Well-chosen identifier names convey to the reader the meaning of the data in variables, the behaviour and results of subroutines, and the features and purpose of classes and other data types. They can help to make the data structures and algorithms used in a program explicit and unambiguous. They can also function as a reliable form of documentation, and as a powerful debugging aid.
Best practice in naming consists of finding a consistent way of assigning identifiers to variables, subroutines, and types. There are two principal components of this method: syntactic consistency and semantic consistency.
Syntactic consistency means that all identifiers should conform to a predictable and recognizable grammatical structure. That is, you should not name one variable $max_velocity and then name another $displacementMax, or $mxdsp, or $Xmaximal. In other words, if one variable name has an adjective_noun structure, all variable names should be adjective_noun. Similarly, if one variable uses underscores to separate components of the name, then others shouldn't omit similar separators elsewhere, or use interCapStyle instead. Your approach to abbreviation—both what to abbreviate and how to abbreviate it—has to be consistent as well.
Semantic consistency means that the names you choose should clearly and accurately reflect the purpose, usage, and significance of whatever you're naming. In other words, a name like @data is a poor choice (compared to, say, @sales_records) because it fails to tell the reader anything important about the contents of the array or their significance in your program. Likewise, naming an indexing variable $i or $n doesn't serve to make the meaning of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Identifiers
Inhaltsvorschau
Use grammatical templates when forming identifiers .
The single most important practice when creating names is to devise a set of grammar rules to which all names must conform. A grammar rule specifies one or more templates (e.g., Noun :: Adjective :: Adjective) that describe how to form the entity on the left of the arrow (e.g., namespace). Placeholders in templates, such as Noun and Adjective, are replaced by the corresponding parts of speech: nouns like "Disk" and adjectives like "Audio". For a Perlish introduction to the concepts of grammars, see the tutorial.html file that accompanies the Parse::RecDescent CPAN module.
Develop a set of "name templates" for your packages' subroutines and variables, learn them by heart, and use them consistently. This practice will ensure that you always generate names that have a standard internal structure.
A suitable grammar rule for naming packages and classes is:

            

               

                  namespace

                

               

               

                  Noun

                 :: 

                  Adjective

                :: 

                  Adjective

               

             | 

                  Noun

                :: 

                  Adjective

               

             | 

                  Noun

               

            

         
This rule might produce package names such as:

            

    package Disk;

    package Disk::Audio;

    package Disk::DVD;

    package Disk::DVD::Rewritable;

         
In this scheme, specialized versions of an existing namespace are named by adding adjectives to the name of the more general namespace. Hence you would expect that
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Booleans
Inhaltsvorschau
Name booleans after their associated test.
A special case can be made for subroutines that return boolean values, and for variables that store them. These should be named for the properties or predicates they test, in such a way that the resulting conditional expressions read naturally. Often that rule will mean they begin with is_ or has_, but not always. For example:

            

    sub is_valid;

    sub metadata_available_for;

    sub has_end_tag;



    my $loading_finished;

    my $has_found_bad_record;

 

    

                  # and later...

               



    if (is_valid($next_record) && !$loading_finished) {

        METADATA:

        while (metadata_available_for($next_record)) {

            push @metadata, get_metadata_for($next_record);

            last METADATA if has_end_tag($next_record);

        }

    }

    else {

        $has_found_bad_record = 1;

    }

         
Again, explicit and longer names are strongly preferred. Compare the readability of the previous code with the following:

    sub ok;

    sub metadata;

    sub end_tag;



    my $done;

    my $bad;



    # and later...



    if (ok($next_record) && !$done) {               # Ok in what sense? What is done?

        METADATA:

        while (metadata($next_record)) {            # Metadata exists? Defined? True?

            push @metadata, get_metadata_for($next_record);

            last METADATA if end_tag($next_record); # Does this set an end tag?

        }

    }

    else {

        $bad = 1;                                   # What's bad? In what way?

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reference Variables
Inhaltsvorschau
Mark variables that store references with a _ref suffix.
In Perl, you can't give a variable a specific type to ensure that it's able to store only particular kinds of values (integer, string, reference, and so on). That's usually not a problem, because Perl's automatic type conversions paper over most of the cracks very neatly.
Except when it comes to references.
It's an all-too-common mistake to put a reference into a scalar, and then subsequently forget to use the all-important dereferencing arrow:

    sub pad_str {

        my ($text, $opts) = @_;



        my $gap   = $opts{cols} - length $text;        # Oops! Should be: opts->{cols}

        my $left  = $opts{centred} ? int($gap/2) : 0;  # Should be: opts->{centred}

        my $right = $gap - $left;



        return $SPACE x $left . $text . $SPACE x $right;

    }
Of course, use strict qw( vars ) (see Chapter 18) is supposed to pick up precisely this transgression. And it usually will. Unless, of course, there also happens to be a valid %opts hash in the same scope.
You can minimize the chances of making this mistake in the first place by always appending the suffix _ref to any variable that is supposed to store a reference. Of course, naming reference variables this way doesn't prevent this particular mistake, or even catch it for you when you do. But it does make the error much more visually obvious:

            

    sub pad_str {

        my ($text, $opts_ref) = @_;



        my $gap   = $opts_ref{cols} - length $text;

        my $left  = $opts_ref{centred} ? int($gap/2) : 0;

        my $right = $gap - $left;



        return $SPACE x $left . $text . $SPACE x $right;

    }

         
If you adopt this coding practice, your eyes will soon come to expect an arrow after any occurrence of _ref, and the absence of such a dereferencer will become glaringly obvious.
You could also write a very short Perl script to detect and correct such mistakes:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Arrays and Hashes
Inhaltsvorschau
Name arrays in the plural and hashes in the singular.
A hash is a mapping from distinct keys to individual values, and is most commonly used as a random-access look-up table. On the other hand, arrays are usually ordered sequences of multiple values, and are most commonly processed collectively or iteratively.
Because hash entries are typically accessed individually, it makes sense for the hash itself to be named in the singular. That convention causes the individual accesses to read more naturally in the code. Moreover, because hashes often store a property that's related to their key, it's often even more readable to name a hash with a singular noun followed by a preposition. For example:

            

    my %option;

    my %title_of;

    my %count_for;

    my %is_available;



    

                  # and later...

               



    if ($option{'count_all'} && $title_of{$next_book} =~ m/$target/xms) {

        $count_for{$next_book}++;

        $is_available{$next_book} = 1;

    }

         
On the other hand, array values are more often processed collectively, in loops or in map or grep operations. So it makes sense to name them in the plural, after the group of items they store:

            

    my @events;

    my @handlers;

    my @unknowns;



    

                  # and later...

               



    for my $event (@events) {

        push @unknowns, grep { ! $_->handle($event) } @handlers;

    }



    print map { $_->err_msg } @unknowns;

         
If, however, an array is to be used as a random-access look-up table, name it in the singular, using the same conventions as for a hash:

            

               

                  # Initialize table of factorials

               

    my @factorial = (1);

    for my $n (1..$MAX_FACT) {

        $factorial[$n] = $n * $factorial[$n-1];

    }



    

                  # Check availability and look up in table
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Underscores
Inhaltsvorschau
Use underscores to separate words in multiword identifiers.
In English, when a name consists of two or more words, those words are typically separated by spaces or hyphens—for example, "input stream", "key pressed", "end-of-file", "double-click".
Since neither spaces nor hyphens are valid characters in Perl identifiers, use the next closest available alternative: the underscore. Underscores correspond better to the default natural-language word separator (a space) because they impose a visual gap between the words in an identifier. For example:

            

    FORM:

    for my $tax_form (@tax_form_sequence) {

        my $notional_tax_paid

            = $tax_form->{reported_income} * $tax_form->{effective_tax_rate};



        next FORM if $notional_tax_paid  < $MIN_ASSESSABLE;



        $total_paid

            += $notional_tax_paid - $tax_form->{allowed_deductions};

    }

         
TheAlternativeInterCapsApproachIsHarderToReadAndInParticularDoesn'tGeneralizeWellToALLCAPSCONSTANTS:

    FORM:

    for my $taxForm (@taxFormSequence) {

        my $notionalTaxPaid

            = $taxForm->{reportedIncome} * $taxForm->{effectiveTaxRate};



        next FORM if $notionalTaxPaid  < $MINASSESSABLE;



        $totalPaid

            += $notionalTaxPaid - $taxForm->{allowedDeductions};

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Capitalization
Inhaltsvorschau
Distinguish different program components by case.
In a Perl program, an identifier might refer to a variable, a subroutine, a class or package name, an I/O stream, a format, or a typeglob. More importantly, sometimes the same identifier can refer to two or more of those in the same scope:

            # Print command line files, prefixing each line with the filename...

    if (@ARGV) {

        while (my $line = <ARGV>) {

            print "$ARGV: $line";

        }

    }
To help make it clear what kind of referent an identifier is naming:
  • Use lowercase only for the names of subroutines, methods, variables, and labeled arguments ($controller, new(), src=>$fh).
  • Use mixed-case for package and class names (IO::Controller).
  • Use uppercase for constants ($SRC, $NODE).
For example:

            

    my $controller

        = IO::Controller->new(src=>$fh,  mode=>$SRC|$NODE);

         
These case distinctions can then serve as useful clues to the purpose and role of each identifier, with visual differences reinforcing semantic differences. In contrast, it's much harder to distinguish between the variables, constants, methods, and classes in any of the following:

    my $controller

        = io::controller->new(src=>$fh,  mode=>$src|$node);



    my $Controller

        = Io::Controller->New(Src=>$Fh,  Mode=>$Src|$Node);



    my $CONTROLLER

        = IO::CONTROLLER->NEW(SRC=>$FH,  MODE=>$SRC|$NODE);
Of course, the approach suggested here is by no means the only possible set of conventions. But they are the same conventions (adapted for Perl's unique syntax) that are already applied in many languages and software libraries. In addition, they are already widely used throughout the Perl community, and therefore familiar to many programmers.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Abbreviations
Inhaltsvorschau
Abbr idents by prefx.
If you choose to abbreviate an identifier, abbreviate it by retaining the start of each word. This generally produces much more readable names than other approaches. Fr xmpl, bbrvtng wrds by rmvng vwls cn prdc mch lss rdbl nms.
This example is easily comprehended:

            

    use List::Util qw( max );



    DESC:

    for my $desc (@orig_strs) {

        my $len = length $desc;

        next DESC if $len > $UPPER_LIM;

        $max_len = max($max_len, $len);

    }

         
This usage is not nearly as simple to decipher:

    use List::Util qw( max );



    DSCN:

    for my $dscn (@rgnl_strgs) {

        my $lngh = length $dscn;

        next DSCN if $lngh > $UPPR_LMT;

        $mx_lngh = max($mx_lngh, $lngh);

    }
Note that, when you're abbreviating identifiers by prefixing, it's acceptable—and often desirable—to keep the last consonant as well ($orig_strs, prefx(), and so on), especially if that consonant is a plural suffix.
This rule need not be applied to identifiers that are well-known standard abbreviations. In such cases, it's better to use the "native" abbreviation strategy:

            

    $ctrl_char = '\N{ESCAPE}';



    $connection_Mbps  = get_bitrate() / 10e6;



    $is_tty = -t $msg_src;

         
"Ctrl" is preferable because it appears on most keyboards, whereas $con_char could be misread as "continuation character". "Mbps" is the standard unit, and the alternative ($connection_Mbits_per_sec) is far too unwieldy. As for "tty", "src", and "msg" (or "mesg"), they're all in common use, and the alternatives—"term", "sou", or "mess"—are either ambiguous, obscure, or just plain silly.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Ambiguous Abbreviations
Inhaltsvorschau
Abbreviate only when the meaning remains unambiguous.
Well-chosen abbreviations can improve the readability of code by reducing the length of identifiers, which can then be recognized as a single visual chunk. Abbreviation is, in effect, a form of visual hashing.
Unfortunately, as with most other hashing schemes, abbreviation suffers from the problem of collisions. When a single abbreviation could be the shortened form of two or more common words, the few characters saved by abbreviating will be paid for over and over again in time lost deciphering the resulting code

    $term_val      # terminal value or termination valid?

        = $temp    # temperature or temporary?

          * $dev;  # device or deviation?

         
On the other hand, abbreviating down to even a single character can occasionally be appropriate:

            

               

                  # Run the standard dynamic and kinematic calculations...

               

    $a = $f / $m;

    $v = $u + $a*$t;

    $s = $u*$t + 0.5*$a*$t**2;

         
The standard single letter iterator variables—$i, $j, $k, $n, $x, $y, $z—are also often acceptable in nested loops, especially when the indices are coordinates of some kind:

            

    sub swap_domain_and_range_of {

        my ($table_ref) = @_;



        my @pivotted_table;

        for my $x (0..$#{$table_ref}) {

            for my $y (0..$#{$table_ref->[$x]}) {

                $pivotted_table[$y][$x] = $table_ref->[$x][$y];

            }

        }



        return \@pivotted_table;

    }

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Ambiguous Names
Inhaltsvorschau
Avoid using inherently ambiguous words in names.
It's not only abbreviations that can introduce ambiguities in your identifiers. Complete words often have two or more homonyms, in which case any name containing them will be inherently ambiguous.
One of the worst offenders in this respect is the word "last". A variable named $last_record might refer to the record that was most recently processed (in which case it should be called $prev_record), or it might refer to the ultimate record in a list (in which case it should be called $final_record).
The word "set" is another major stumbling block. A subroutine named get_set() might retrieve a collection of values (in which case, call it retrieve_collection()), or it might test whether the "get" option has been enabled (in which case, call it get_is_enabled()), or it might mediate both fetch and store operations of some value (in which case, call it fetch_or_store()).
Other commonly used words to avoid include:
  • "left" (the direction vs what remains)
  • "right" (the other direction vs being correct vs. an entitlement)
  • "no" (the negative vs the abbreviation for number)
  • "abstract" (theoretical vs a précis vs to summarize)
  • "contract" (make smaller vs a legal agreement)
  • "record" (an extreme outcome vs a data aggregation vs to log)
  • "second" (the ordinal position vs the unit of time)
  • "close" (near vs to shut)
  • "bases" (more than one base vs more than one basis)
Any homograph can potentially cause difficulties if it has a distinct, non-programming-related sense that is relevant to your particular problem domain.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Utility Subroutines
Inhaltsvorschau
Prefix "for internal use only" subroutines with an underscore.
A utility subroutine exists only to simplify the implementation of a module or class. It is never supposed to be exported from its module, nor ever to be used in client code.
Always use an underscore as the first "letter" of any utility subroutine's name. A leading underscore is ugly and unusual and reserved (by ancient C/Unix convention) for non-public components of a system. The presence of a leading underscore in a subroutine call makes it immediately obvious when part of the implementation has been mistaken for part of the interface.
For example, if you had a function fib() for computing Fibonacci numbers (like the one shown in Example 3-1), then it would be an error to call:

    print "Fibonacci($n) = ", _find_fib($n), "\n";
because _find_fib() doesn't return a useful value. You almost certainly wanted:

            

    print "Fibonacci($n) = ", fib($n), "\n";

         
By naming _find_fib()with an initial underscore, the call to it stands out far more clearly, and the misuse is brought immediately to the attention of anyone familiar with the convention.
Example 3-1. Iterative on-demand Fibonacci computations

               

                  

                     # Cache of previous results, minimally initialized...

                  

my @fib_for = (1,1);





                     # Extend cache when needed...

                  

sub _find_fib {

    my ($n) = @_;



    

                     # Walk up cache from last known value, applying Fn = Fn-1 + Fn-2...

                  

    for my $i (@fib_for..$n) {

        $fib_for[$i] = $fib_for[$i-1] + $fib_for[$i-2];

    }



    return;

}





                     # Return Fibonacci number N

                  

sub fib {

    my ($n) = @_;



    

                     # Verify argument in computable range...

                  

    croak "Can't compute fib($n)" if $n < 0;



    
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 4: Values and Expressions
Inhaltsvorschau
Data is semi-animate...sort of like programmers.
—Arthur Norman
Constructing and using values ought to be trivial. After all, there are very few components of a Perl program simpler than a character string or a number or a + operator.
Unfortunately, the syntax of Perl's literal values is so rich that there are plenty of ways to mess them up. Variables can interpolate unexpectedly, or fail to interpolate at all. Character escape codes and literal numbers can mysteriously appear in the wrong base. Delimiters can be just about anything you like.
And Perl's operators are even worse. Several of them are polymorphic: silently changing their behaviour depending on the type of argument they're applied to. Others are monomorphic: silently changing their arguments to fit their behaviour. Others are just plain inefficient in some usages.
This chapter suggests some appropriate coding habits that can help you avoid the pitfalls associated with creating values and manipulating them in expressions.
Use interpolating string delimiters only for strings that actually interpolate.
Unexpectedly interpolating a variable in a character string is a common source of errors in Perl programs. So is unexpected non-interpolation. Fortunately, Perl provides two distinct types of strings that make it easy to specify exactly what you want.
If you're creating a literal character string and you definitely intend to interpolate one or more variables into it, use a double-quoted string:

            

    my $spam_name = "$title $first_name $surname";

    my $pay_rate  = "$minimal for maximal work";

         
If you're creating a literal character string and not intending to interpolate any variables into it, use a single-quoted string:

            

    my $spam_name = 'Dr Lawrence Mwalle';

    my $pay_rate  = '$minimal for maximal work';
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
String Delimiters
Inhaltsvorschau
Use interpolating string delimiters only for strings that actually interpolate.
Unexpectedly interpolating a variable in a character string is a common source of errors in Perl programs. So is unexpected non-interpolation. Fortunately, Perl provides two distinct types of strings that make it easy to specify exactly what you want.
If you're creating a literal character string and you definitely intend to interpolate one or more variables into it, use a double-quoted string:

            

    my $spam_name = "$title $first_name $surname";

    my $pay_rate  = "$minimal for maximal work";

         
If you're creating a literal character string and not intending to interpolate any variables into it, use a single-quoted string:

            

    my $spam_name = 'Dr Lawrence Mwalle';

    my $pay_rate  = '$minimal for maximal work';

         
If your uninterpolated string includes a literal single quote, use the q{...} form instead:

            

    my $spam_name = q{Dr Lawrence ('Larry') Mwalle};

    my $pay_rate  = q{'$minimal' for maximal work};

         
Don't use backslashes as quote delimiters; they only make it harder to distinguish the content from the container:

    my $spam_name = 'Dr Lawrence (\'Larry\') Mwalle';

    my $pay_rate  = '\'$minimal\' for maximal work';
If your uninterpolated string includes both a literal single quote and an unbalanced brace, use square brackets as delimiters instead:

            

    my $spam_name = q[Dr Lawrence }Larry{ Mwalle];

    my $pay_rate  = q['$minimal' for warrior's work {{:-)];

         
Reserving interpolating quoters for strings that actually do interpolate something can help you avoid unintentional interpolations, because the presence of a $ or @ in a single-quoted string then becomes a sign that something might be amiss. Likewise, once you become used to seeing double quotes only on interpolated strings, the absence of any variable in a double-quoted string becomes a warning sign. So these rules also help highlight missing intentional interpolations.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Empty Strings
Inhaltsvorschau
Don't use "" or '' for an empty string.
An important exception to the preceding rules is the empty string. You can't use "", as an empty string doesn't interpolate anything. It doesn't contain a literal quote or brace either, so the previous rules call for it to be written like so:

    $error_msg = '';
But that's not a good choice. In many display fonts, it's far too easy to mistake '' (single-quote, single-quote) for " (a lone double-quote), which means that you need to apply the second rule for non-interpolated strings, and write each empty string like so, preferably with a comment highlighting it:

            

    $error_msg = q{};  

                  # Empty string

               

            

         
Also see the "Constants" guideline later in this chapter.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Single-Character Strings
Inhaltsvorschau
Don't write one-character strings in visually ambiguous ways.
Character strings that consist of a single character can present a variety of problems, all of which make code harder to maintain.
A single space in quotes is easily confused with an empty string:

    $separator = ' ';
Like an empty string, it should be specified more verbosely:

            

    $separator = q{ };   

                  # Single space

               

            

         
Literal tabs are even worse (and not just in single-character strings):

    $separator  = ' ';         # Empty string, single space, or single tab???

    $column_gap = '         '; # Spaces? Tabs? Some combination thereof?

         
Always use the interpolated \t form instead:

            

    $separator  = "\t";

    $column_gap = "\t\t\t";

         
Literal single-quote and double-quote characters shouldn't be specified in quotation marks either, for obvious aesthetic reasons: '"', "\"", '\'', "'". Use q{"} and q{'} instead.
You should also avoid using quotation marks when specifying a single comma character. The most common use of a comma string is as the first argument to a join:

    my $printable_list = '(' . join(',', @list) . ')';
The ',', sequence is unnecessarily hard to decipher, especially when:

            

    my $printable_list = '(' . join(q{,}, @list) . ')';

         
is just as easy to write, and stands out more clearly as being a literal. See the "Constants" guideline later in this chapter for an even cleaner solution.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Escaped Characters
Inhaltsvorschau
Use named character escapes instead of numeric escapes.
Some ASCII characters that might appear in a string—such as DEL or ACK or CAN—don't have a "native" Perl representation. When one or more of those characters is required, the standard solution is to use a numeric escape: a backslash followed by the character's ASCII value inside double-quotes. For example, using octal escapes:

    $escape_seq = "\127\006\030Z";       # DEL-ACK-CAN-Z

         
or hexadecimal escapes:

    $escape_seq = "\x7F\x06\x22Z";       # DEL-ACK-CAN-Z

         
But not everyone who subsequently reads your code will be familiar with the ASCII values for these characters, which means they will have to rely on the associated comments. That's a real shame too, because both of the previous examples are wrong! The correct sequence was:

    $escape_seq = "\177\006\030Z";       # Octal DEL-ACK-CAN-Z

         
or:

    $escape_seq = "\x7F\x06\x18Z";       # Hexadecimal DEL-ACK-CAN-Z

         
Errors like that are particularly hard to track down. Even if you do know the ASCII table by heart, it's still easy to mistakenly type "\127" for DEL because the ASCII code for DEL is 127. At least, in base 10 it is. Unfortunately, backslashed escapes in strings are specified in base 8. And once your brain has accepted the 127-is-DEL relationship, it becomes exceptionally hard to see the mistake. After all, it looks right.
That's why it's better to use named escapes for those characters that have no explicit Perl representation. Named escapes are available in Perl 5.6 and later, and are enabled via the use charnames pragma. Once they're operational, instead of using a numeric escape you can put the name of the required character inside a \N{...} sequence within any double-quoted string. For example:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Constants
Inhaltsvorschau
Use named constants, but don't use constant .
Raw numbers that suddenly appear in the middle of a program are often mysterious, frequently confusing, and always a potential source of errors. Certain types of unprintable character strings—for example, initialization strings for modems—are similarly awkward.
A line like this:

    print $count * 42;
is unsatisfactory, as the reader may have no idea from the context why the variable is being multiplied by that particular number. Is it 42: the number of dots on a pair of dice? Or 42: the decimal ASCII value of asterisk? Or 42: the number of chromosomes in common wheat? Or 42: the angular spread of a rainbow? Or 42: the number of lines per page in the Gutenberg Bible? Or 42: the number of gallons per barrel of oil?
Replace these kinds of raw literals with a read-only lexical variable whose name explains the meaning of the number:

            

    use Readonly;

    Readonly my $MOLYBDENUM_ATOMIC_NUMBER => 42;



    

                  # and later...

               



    print $count * $MOLYBDENUM_ATOMIC_NUMBER;

         
The Readonly CPAN module exports a single subroutine (Readonly()) that expects two arguments: a scalar, array, or hash variable, and a value. The value is assigned to the variable, and then the variable's "read-only" flag is set, to prevent any further assignments. Note the use of all-uppercase in the variable name (in accordance with the guideline in Chapter 3) and the use of the fat comma (because the constant name and its value form a natural pair—see "Fat Commas" later in this chapter).
If you accidentally try to assign a new value to a constant:

    $MOLYBDENUM_ATOMIC_NUMBER = $CARBON_ATOMIC_NUMBER * $NITROGEN_ATOMIC_NUMBER;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Leading Zeros
Inhaltsvorschau
Don't pad decimal numbers with leading zeros .
Several of the guidelines in this book recommend laying out data in table format, and aligning that data vertically. For example:

            

    use Readonly;



    Readonly my %ATOMIC_NUMBER => (

        NITROGEN   =>    7,

        NIOBIUM    =>   41,

        NEODYNIUM  =>   60,

        NOBELIUM   =>  102,

    );

         
But sometimes the desire to make columns line up cleanly can be counterproductive. For example, you might be tempted to pad the atomic weight values with zeros to make them uniform:

    use Readonly;



    Readonly my %ATOMIC_NUMBER => (

        NITROGEN   =>  007,

        NIOBIUM    =>  041,

        NEODYNIUM  =>  060,

        NOBELIUM   =>  102,

    );
Unfortunately, that also makes them wrong. Even though leading zeros aren't significant in mathematics, they are significant in Perl. Any integer that begins with a zero is interpreted as an octal number, not a decimal. So the example zero-padded version is actually equivalent to:

    use Readonly;



    Readonly my %ATOMIC_NUMBER => (

        NITROGEN   =>   7,

        NIOBIUM    =>  33,

        NEODYNIUM  =>  48,

        NOBELIUM   => 102,

    );
To avoid this covert transmutation of the numbers, never start a literal integer with zero. Even if you do intend to specify octal numbers, don't use a leading zero, as that may still mislead inattentive future readers of your code.
If you need to specify octal values, use the built-in oct function, like so:

            

    use Readonly;



    Readonly my %PERMISSIONS_FOR => (

        USER_ONLY     => oct(600),

        NORMAL_ACCESS => oct(644),

        ALL_ACCESS    => oct(666),

    );

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Long Numbers
Inhaltsvorschau
Use underscores to improve the readability of long numbers .
Large numbers can be difficult to sanity check:

    $US_GDP              = 10990000000000;

    $US_govt_revenue     =  1782000000000;

    $US_govt_expenditure =  2156000000000;
Those figures are supposed to be in the trillions, but it's very hard to tell if they have the right number of zeros. So Perl provides a convenient mechanism for making large numbers easier to read: you can use underscores to "separate your thousands":

            

               

                  # In the US they use thousands, millions, billions, trillions, etc...

               

    $US_GDP              = 10_990_000_000_000;

    $US_govt_revenue     =  1_782_000_000_000;

    $US_govt_expenditure =  2_156_000_000_000;

         
Prior to Perl 5.8, these separators could only be placed in front of every third digit of an integer (i.e., to separate the thousands, millions, billions, etc.). From 5.8 onwards, underscores can be placed between any two digits. For example:

            

               

                  # In India they use lakhs, crores, arabs, kharabs, etc...

               

    $India_GDP              = 30_33_00_00_00_000;

    $India_govt_revenue     =    86_69_00_00_000;

    $India_govt_expenditure =  1_14_60_00_00_000;

         
Separators can also now be used in floating-point numbers and non-decimals, to make them easier to comprehend as well:

            

    use bignum;

    $PI = 3.141592_653589_793238_462643_383279_502884_197169_399375;



    $subnet_mask= 0xFF_FF_FF_80;

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Multiline Strings
Inhaltsvorschau
Lay out multiline strings over multiple lines.
If a string has embedded newline characters, but the entire string won't fit on a single source line, then break the string after each newline and concatenate the pieces:

            

    $usage = "Usage: $0 <file> [-full]\n"

             . "(Use -full option for full dump)\n"

             ;

         
In other words, the internal appearance of the string should mirror its external (printed) appearance as closely as possible.
Don't, however, be tempted to make the newline implicit, by wrapping a single string across multiple lines, like so:

    $usage = "Usage: $0 <file> [-full]

    (Use -full option for full dump)

    ";
Even though actual line breaks inside such a string do become newline characters within the string, the readability of such code suffers severely. It's harder to verify the line structure of the resulting string, because the first line is indented whilst the remaining lines have to be fully left-justified. That justification can also compromise your code's indentation structure.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Here Documents
Inhaltsvorschau
Use a heredoc when a multiline string exceeds two lines.
The "break-after-newlines-and-concatenate" approach is fine for a small number of lines, but it starts to become inefficient—and ugly—for larger chunks of text.
For multiline strings that exceed two lines, use a heredoc:

            

    $usage = <<"END_USAGE";

    Usage: $0 <file> [-full] [-o] [-beans]

    Options:

        -full  : produce a full dump

        -o     : dump in octal

        -beans : source is Java

    END_USAGE

         
instead of:

    $usage = "Usage: $0 <file> [-full] [-o] [-beans]\n"

             . "Options:\n"

             . "    -full  : produce a full dump\n"

             . "    -o     : dump in octal\n"

             . "    -beans : source is Java\n"

             ;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Heredoc Indentation
Inhaltsvorschau
Use a "theredoc" when a heredoc would compromise your indentation.
Of course, even if your lines are all simple strings, the problem with using a heredoc in the middle of code is that its contents must be left-justified, regardless of the indentation level of the code it's in:

    if ($usage_error) {

        warn <<'END_USAGE';

    Usage: qdump <file> [-full] [-o] [-beans]

    Options:

        -full  : produce a full dump

        -o     : dump in octal

        -beans : source is Java

    END_USAGE

    }
A better practice is to factor out any such heredoc into a predefined constant or a subroutine (a "theredoc"):

            

    use Readonly;

    Readonly my $USAGE => <<'END_USAGE';

    Usage: qdump file [-full] [-o] [-beans]

    Options:

        -full  : produce a full dump

        -o     : dump in octal

        -beans : source is Java

    END_USAGE



    

                  # and later...

               



    if ($usage_error) {

        warn $USAGE;

    }

         
If the heredoc needs to interpolate variables whose values are not known at compile time, use a subroutine instead, and parameterize the variables:

            

    sub build_usage {

        my ($prog_name, $filename) = @_;



        return <<"END_USAGE";

    Usage: $prog_name $filename [-full] [-o] [-beans]

    Options:

        -full  : produce a full dump

        -o     : dump in octal

        -beans : source is Java

    END_USAGE

    }



    

                  # and later...

               



    if ($usage_error) {

        warn build_usage($PROGRAM_NAME, $requested_file);

    }

         
The heredoc does compromise the indentation of the subroutine, but that's now a small and isolated section of the code, so it doesn't significantly impair the overall readability of your program.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Heredoc Terminators
Inhaltsvorschau
Make every heredoc terminator a single uppercase identifier with a standard prefix.
You can use just about anything you like as a heredoc terminator. For example:

    print <<'end list';          # Prints 3 lines then [DONE]

    get name

    set size

    put next

    end list



    print "[DONE]\n";
or:

    print <<'';                  # Prints 4 lines (up to the empty line) then [DONE]

    get name

    set size

    put next

    end list



    print "[DONE]\n";
or even:

    print <<'print "[DONE]\n";'; # Prints 5 lines but no [DONE]!

    get name

    set size

    put next

    end list



    print "[DONE]\n";
Please don't. Heredocs are tough enough to understand as it is. Using bizarre terminators only makes them more difficult. It's a far better practice to stick with terminators that are capitalized (so they stand out better in mixed-case code) and free of whitespace (so only a single visual token has to be recognized).
For example, compared to the previous examples, it's much easier to tell what the contents of the following heredoc are:

            

    print <<'END_LIST';

    get name

    set size

    put next

    END_LIST

         
But even with a single identifier as terminator, both the contents and the termination marker of a heredoc still have to be left-justified. So it can still be difficult to detect the end of a heredoc. By naming every heredoc marker with a standard, easily recognized prefix, you can make them much easier to pick out.
'END_...' is the recommended choice for this prefix. That is, instead of:

    Readonly my $USAGE => <<"USAGE";

    Usage: $0 <file> [-full] [-o] [-beans]

    Options:

        -full  : produce a full dump

        -o     : dump in octal

        -beans : source is Java

    USAGE
delimit your heredocs like so:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Heredoc Quoters
Inhaltsvorschau
When introducing a heredoc, quote the terminator.
Notice that all the heredoc examples in the previous guidelines used either single or double quotes after the <<. Single-quoting the marker forces the heredoc to not interpolate variables. That is, it acts just like a single-quoted string:

            

    Readonly my $GRIPE => <<'END_GRIPE';

    $minimal for maximal work

    END_GRIPE



    print $GRIPE;    

                  # Prints: $minimal for maximal work

               

            

         
Double-quoting the marker ensures that the heredoc string is interpolated, just like a double-quoted string:

            

    Readonly my $GRIPE => <<"END_GRIPE";

    $minimal for maximal work

    END_GRIPE



    print $GRIPE;    

                  # Prints: 4.99 an hour for maximal work

               

            

         
Most people aren't sure what the default interpolation behaviour is if you don't use any quotes on the marker:

    Readonly my $GRIPE => <<END_GRIPE;

    $minimal for maximal work

    END_GRIPE



    print $GRIPE;    # ???

         
Do you know? Are you sure? And even if you are sure you know, are you sure that your colleagues all know?
And that's the whole point. Heredocs aren't used as frequently as other types of strings, so their default interpolation behaviour isn't as familiar to most Perl programmers. Adding the explicit quotes around the heredoc marker takes almost no extra effort, but it relieves every reader of the considerable extra effort of having to remember the default behaviour. Or, more commonly, of having to look up the default behaviour every time.
It's always best practice to say precisely what you mean, and to record as much of your intention as possible in the actual source code—even if saying what you mean makes the code a little more verbose.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Barewords
Inhaltsvorschau
Don't use barewords .
In Perl, any identifier that the compiler doesn't recognize as a subroutine (or as a package name or filehandle or label or built-in function) is treated as an unquoted character string. For example:

    $greeting = Hello . World;

    print $greeting, "\n";                # Prints: HelloWorld

         
Barewords are fraught with peril. They're inherently ambiguous, because their meaning can be changed by the introduction or removal of seemingly unrelated code. In the previous example, a Hello() subroutine might somehow come to be defined before the assignment, perhaps when a new version of some module started to export that subroutine by default. If that were to happen, the former Hello bareword would silently become a zero-argument Hello() subroutine call.
Even without such pre-emptive predeclarations, barewords are unreliable. If someone refactored the previous example into a single print statement:

    print Hello, World, "\n";
then you'd suddenly get a compile-time error:

    No comma allowed after filehandle at demo.pl line 1
That's because Perl always treats the first bareword after a print as a named filehandle, rather than as a bareword string value to be printed.
Barewords can also crop up accidentally, like this:

    my @sqrt = map {sqrt $_} 0..100;

    for my $N (2,3,5,8,13,21,34,55) {

        print $sqrt[N], "\n";

    }
And your brain will "helpfully" gloss over the critical difference between $sqrt[$N] and $sqrt[N]. The latter is really $sqrt['N'], which in turn becomes $sqrt[0] in the numeric context of the array index; unless, of course, there's a sub N() already defined, in which case anything might happen.
All in all, barewords are far too error-prone to be relied upon. So don't use them at all. The easiest way to accomplish that is to put a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Fat Commas
Inhaltsvorschau
Reserve => for pairs.
Whenever you are creating a list of key/value or name/value pairs, use the "fat comma" (=>) to connect the keys to their corresponding values. For example, use it when constructing a hash:

            

    %default_service_record  = (

        name   => '<unknown>',

        rank   => 'Recruit',

        serial => undef,

        unit   => ['Training platoon'],

        duty   => ['Basic training'],

    );

         
or when passing named arguments to a subroutine (see Chapter 9):

            

    $text = format_text(src=>$raw_text,  margins=>[1,62], justify=>'left');

         
or when creating a constant:

            

    Readonly my $ESCAPE_SEQ => "\N{DELETE}\N{ACKNOWLEDGE}\N{CANCEL}Z";

         
The fat comma visually reinforces the connection between the name and the following value. It also removes the need to quote the key string, as long as you use only valid Perl identifiers as keys. Compare the readability of the previous examples with the following comma-only versions:

    %default_service_record  = (

        'name',   '<unknown>',

        'rank',   'Recruit',

        'serial', undef,

        'unit',   ['Training platoon'],

        'duty',   ['Basic training'],

    );



    $text = format_text('src', $raw_text, 'margins', [1,62], 'justify', 'left');



    Readonly my $ESCAPE_SEQ, "\N{DELETE}\N{ACKNOWLEDGE}\N{CANCEL}Z";
An alternative criterion that is sometimes used when considering a => is whether you can pronounce the symbol as some kind of process verb, such as "becomes" or "produces" or "implies" or "goes into" or "is sent to". For example:

            # The substring of $name becomes whatever's in $new_name

    substr $name, $from, $len => $new_name;



    # Send this signal to this process
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Thin Commas
Inhaltsvorschau
Don't use commas to sequence statements.
Perl programmers from a C/C++ background are used to writing C-style for loops in Perl:

            # Binary chop search...

    SEARCH:

    for ($min=0,$max=$#samples, $found_target=0; $min<=$max; ) {

        $pos = int(($max+$min)/2);

        my $test_val = $samples[$pos];



        if ($target == $test_val) {

            $found_target = 1;

            last SEARCH;

        }

        elsif ($target < $test_val) {

            $max = $pos-1;

        }

        else {

            $min = $pos+1;

        }

    }
Each comma within the for initialization acts as a kind of "junior semicolon", separating substatements within the first compartment of the for.
After seeing commas used that way, people sometimes think that it's also possible to use "junior semicolons" within a list:

    print 'Sir ',

          (check_name($name), $name),

          ', KBE';
The intent seems to be to check the person's name just before it's printed, with check_name() throwing an exception if the name is wrong (see Chapter 13). The underlying assumption is that using a comma would mean that only the final value in the parentheses was passed on to print.
Unfortunately, that's not what happens. The comma actually has two distinct roles in Perl. In a scalar context, it is (as those former C programmers expect) a sequencing operator: "do this, then do that". But in a list context, such as the argument list of a print, the comma is a list separator, not technically an operator at all.
The subexpression (check_name($name), $name) is merely a sublist. And a list context automatically flattens any sublists into the main list. That means that the previous example is the same as:

    print 'Sir ',

          check_name($name),

          $name,

          ', KBE';
which will probably not produce the desired effect:

    Sir 1Tim Berners-Lee, KBE
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Low-Precedence Operators
Inhaltsvorschau
Don't mix high- and low-precedence booleans.
Perl's low-precedence logical not reads much better than its corresponding high-precedence ! operator. So it's tempting to write:

    next CLIENT if not $finished;    # Much nicer than: if !$finished

         
However, the extremely low precedence of not can lead to problems if that condition is later extended:

    next CLIENT if not $finished || $result < $MIN_ACCEPTABLE;
It's likely that at least some readers of your code will mistake the behaviour of that statement and assume that it's equivalent to:

    next CLIENT if (not $finished) || $result < $MIN_ACCEPTABLE;
It's not. It actually means:

    next CLIENT if not( $finished || $result < $MIN_ACCEPTABLE );
Even if the choice of || was deliberate, and implements the desired test correctly, there is nothing in the code to indicate that the mixing of precedence was intentional. So, while the novice reader is left to wonder about the meaning of the expression, the more experienced reader is left to wonder about its correctness.
Replacing the || with an or would solve the precedence problem (if indeed there were one), since or is even lower precedence than not:

    next CLIENT if not $finished or $result < $MIN_ACCEPTABLE;
And then adding a pair of parentheses would explicitly indicate whether the intention was:

    next CLIENT if not($finished or $result < $MIN_ACCEPTABLE);
or:

    next CLIENT if not($finished) or $result < $MIN_ACCEPTABLE;
On the other hand, the high-precedence boolean operators don't seem to invoke the same levels of fear, uncertainty, or doubt, probably because they're used much more frequently. It's safer and more comprehensible to use only high-precedence booleans in conditional expressions:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Lists
Inhaltsvorschau
Parenthesize every raw list.
The precedence of the comma operator is so low that, even when it's in a list context, it may not act the way that a casual reader expects. For example, the following assignment:

    @todo = 'Patent concept of 1 and 0', 'Sue Microsoft and IBM', 'Profit!';
is identical to:

    @todo = 'Patent concept of 1 and 0';

    'Sue Microsoft and IBM';

    'Profit!';
That's because the precedence of the comma is less than that of assignment, so the previous example is really a set of "junior semicolons":

    (@todo = 'Patent concept of 1 and 0'), 'Sue Microsoft and IBM', 'Profit!';
For that reason it's a good practice to ensure that comma-separated lists of values are always safely enclosed in parentheses, to boost the precedence of the comma-separators appropriately:

            

    @todo = ('Patent concept of 1 and 0', 'Sue Microsoft and IBM', 'Profit!');

         
But be careful to avoid the all-too-common error of using square brackets instead of parentheses:

    @todo = ['Patent concept of 1 and 0', 'Sue Microsoft and IBM', 'Profit!'];
This example produces a @todo array with only a single element, which is a reference to an anonymous array containing the three strings.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
List Membership
Inhaltsvorschau
Use table-lookup to test for membership in lists of strings; use any() for membership of lists of anything else.
Like grep, the any() function from List::MoreUtils (see "Utilities" in Chapter 8) takes a block of code followed by a list of values. Like grep, it applies the code block to each value in turn, passing them as $_. But, unlike grep, any() returns a true value as soon as any of the values causes its test block to succeed. If none of the values ever makes the block true, any() returns false.
This behaviour makes any() an efficient general solution for testing list membership , because you can put any kind of equivalence test in the block. For example:

            

               

                  # Is the index number already taken?

               

    if ( any { $requested_slot == $_ } @allocated_slots ) {

        print "Slot $requested_slot is already taken. Please select another: ";

        redo GET_SLOT;

    }

         
or:

            

               

                  # Is the bad guy at the party under an assumed name?

               

    if ( any { $fugitive->also_known_as($_) } @guests ) {

        stay_calm();

        dial(911);

        do_not_approach($fugitive);

    }

         
But don't use any() if your list membership test uses eq:

    Readonly my @EXIT_WORDS => qw(

        q  quit  bye  exit  stop  done  last  finish  aurevoir

    );



    # and later...



    if ( any { $cmd eq $_ } @EXIT_WORDS ) {

        abort_run();

    }
In such cases it's much better to use a look-up table instead:

            

    Readonly my %IS_EXIT_WORD

        => map { ($_ => 1) } qw(

               q  quit  bye  exit  stop  done  last  finish  aurevoir

           );



    

                  # and later...

               



    if ( $IS_EXIT_WORD{$cmd} ) {

        abort_run();

    }

         
The hash access is faster than a linear search through an array, even if that search can short-circuit. The code implementing the test is far more readable as well.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 5: Variables
Inhaltsvorschau
Variables won't. Constants aren't.
—Osborn's Law
Compared to most mainstream languages, Perl has an embarrassingly rich variety of built-in variables. The largest group of these are the global punctuation variables— $_, $/, $|, @_, @+, %!, %^H—which control a wide range of fundamental program behaviours, and which are largely responsible for Perl's unwarranted reputation as "executable line-noise". Other standard variables have more obvious names—@ARGV, %SIG, ${^TAINT}—but are still global in their scope, and in their effects as well.
Perl also provides self-declaring package variables. These will silently spring into existence the first time they're referred to, helpfully converting typos into valid, but incorrect, code.
This chapter presents a series of coding practices that can minimize the problems associated with Perl's sometimes over-helpful built-in variables. It also offers some techniques for making the most efficient use of variables you create yourself.
Avoid using non-lexical variables.
Stick to using only lexical variables (my), unless you genuinely need the functionality that only a package or punctuation variable can provide.
Using non-lexical variables increases the "coupling" of your code. If two otherwise unrelated sections of code both use a package variable, those two pieces of code can interact with each other in very subtle ways, just by the way they each interact with that shared variable. In other words, without full knowledge of every other piece of code that is called from a particular statement, it is impossible to know whether the value of a given non-lexical variable will somehow be changed by executing that statement.
Some of Perl's built-in non-lexical variables, such as $_, @ARGV, $AUTOLOAD, or $a and $b, are impossible to avoid. But most of the rest are not required in general programming, and there are usually better alternatives. Table 5-1 lists the commonly used Perl built-in variables and what you should use instead. Note that prior to Perl 5.8, you may need to specify
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Lexical Variables
Inhaltsvorschau
Avoid using non-lexical variables.
Stick to using only lexical variables (my), unless you genuinely need the functionality that only a package or punctuation variable can provide.
Using non-lexical variables increases the "coupling" of your code. If two otherwise unrelated sections of code both use a package variable, those two pieces of code can interact with each other in very subtle ways, just by the way they each interact with that shared variable. In other words, without full knowledge of every other piece of code that is called from a particular statement, it is impossible to know whether the value of a given non-lexical variable will somehow be changed by executing that statement.
Some of Perl's built-in non-lexical variables, such as $_, @ARGV, $AUTOLOAD, or $a and $b, are impossible to avoid. But most of the rest are not required in general programming, and there are usually better alternatives. Table 5-1 lists the commonly used Perl built-in variables and what you should use instead. Note that prior to Perl 5.8, you may need to specify use IO::Handle explicitly before using the suggestions that involve method calls on filehandles.
Table 5-1: Alternatives to built-in variables
Variable
Purpose
Alternative
$1, $2, $3, etc.
Store substrings captured from the previous regex match
Assign captures directly using list context regex matching, or unpack them into lexical variables immediately after the match (see Chapter 12). Note that these variables are still acceptable in the replacement string of a substitution, because there is no alternative. For example:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Package Variables
Inhaltsvorschau
Don't use package variables in your own development.
Even if you're occasionally forced to use Perl's built-in non-lexical variables, there's no reason to use ordinary package variables in your own development.
For example, don't use package variables to store state inside a module:

    package Customer;



    use Perl6::Export::Attrs;    # See Chapter 17

            



            

    # State variables...

    our %customer;

    our %opt;



    sub list_customers : Export {

        for my $id (sort keys %customer) {

            if ($opt{terse}) {

                print "$customer{$id}{name}\n";

            }

            else {

                print $customer{$id}->dump();

            }

        }

        return;

    }





    # and later in...

    package main;

    use Customer qw( list_customers );



    $Customer::opt{terse} = 1;



    list_customers();
Lexical variables are a much better choice. And if they need to be accessed outside the package, provide a separate subroutine to do that:

            

    package Customer;



    use Perl6::Export::Attrs;



    

                  # State variables...

               

    my %customer;

    my %opt;



    sub set_terse {

        $opt{terse} = 1;

        return;

    }



    sub list_customers : Export {

        for my $id (sort keys %customer) {

            if ($opt{terse}) {

                print "$customer{$id}{name}\n";

            }

            else {

                print $customer{$id}->dump();

            }

        }

        return;

    }



    

                  # and elsewhere...

               



    package main;

    use Customer qw( list_customers );



    Customer::set_terse();



    list_customers();

         
If you never use package variables, there's no possibility that people using your module could accidentally corrupt its internal state. Developers who are using your code simply cannot access the lexical state variables outside your module, so there is no possibility of incorrectly assigning to them.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Localization
Inhaltsvorschau
If you're forced to modify a package variable, localize it.
Occasionally you will have no choice but to use a package variable, usually because some other developer has made it part of the module's public interface. But if you change the value of that variable, you're making a permanent decision for every other piece of code in your program that uses the module:

    use YAML;

    $YAML::Indent = 4;       # Indent hereafter 4 everywhere that YAML is used

         
By using a local declaration when making that change, you restrict its effects to the dynamic scope of the declaration:

            

    use YAML;

    local $YAML::Indent = 4; 

                  # Indent is 4 until control exits current scope

               

            

         
That is, by prefacing the assignment with the word local, you can temporarily replace the package variable $YAML::Indent until control reaches the end of the current scope. So any calls to the various subroutines in the YAML package from within the current scope will see an indent value of 4. And after the scope is exited, the previous indent value (whatever it was) will be restored.
This is much more neighbourly behaviour. Rather than imposing your personal preferences on the rest of the program, you're imposing them only on your small corner of the code.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Initialization
Inhaltsvorschau
Initialize any variable you localize.
Many people seem to think that a localized variable keeps its pre-localization value. It doesn't. Whenever a variable is localized, its value is reset to undef .
So this probably won't work as expected:

    use YAML;

    # Localize the current value...    (No it doesn't!)

    local $YAML::Indent;



    # Then change it, if necessary...

    if (defined $config{indent}) {

        $YAML::Indent = $config{indent};

    }
Unless the if statement executes, the localized copy of $YAML::Indent will retain its post-localization value of undef.
To correctly localize a package variable but still retain its pre-localization value, you need to write this instead:

            

    use YAML;

    

                  # Localize the current value...

               

    local $YAML::Indent = $YAML::Indent;



    

                  # Then change it, if necessary...

               

    if (defined $config{indent}) {

        $YAML::Indent = $config{indent};

    }

         
This version might look odd, redundant, and very possibly wrong, but it's actually both correct and necessary. As with any other assignment, the righthand side of the localized assignment is evaluated first, yielding the original value of $YAML::Indent. Then the variable is localized, which installs a new container inside $YAML::Indent. Finally, the assignment—of the old value to the new container—is performed.
Of course, you may not have wanted to preserve the former indentation value, in which case you probably needed something like:

            

    Readonly my $DEFAULT_INDENT => 4;



    

                  # and later...

               



    use YAML;

    local $YAML::Indent = $DEFAULT_INDENT;

         
Even if you specifically did want that variable to be undefined, it's better to say so explicitly:

            

    use YAML;

    local $YAML::Indent = undef;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Punctuation Variables
Inhaltsvorschau
English for the less familiar punctuation variables.
Avoiding punctuation variables completely is, unfortunately, not a realistic option. For a few of the less commonly used variables, there is no good alternative. Or you may be maintaining code that is already structured around the extensive use of these variables, and reworking that code is impractical.
For example:

    local $| = 1;        # Autoflush output

    local $" = qq{\0};   # Hash subscript separator

    local $; =  q{, };   # List separator

    local $, =  q{, };   # Output field separator

    local $\ = qq{\n};   # Output record separator



    eval {

        open my $pipe, '<', '/cdrom/install |'

            or croak "open failed: $!";



        @external_results = <$pipe>;



        close $pipe

            or croak "close failed: $?, $!";

    };



    carp "Internal error: $@" if $@;
In such cases, the best practice is to use the "long" forms of the variables instead, as provided by use English. The English.pm module gives readable identifiers to most of the punctuation variables. With it, you could greatly improve the readability and robustness of the previous example:

            

    use English qw( -no_match_vars );   

                  # See the "Match Variables" guideline later

               



    local $OUTPUT_AUTOFLUSH         = 1;

    local $SUBSCRIPT_SEPARATOR      = qq{\0};

    local $LIST_SEPARATOR           =  q{, };

    local $OUTPUT_FIELD_SEPARATOR   =  q{, };

    local $OUTPUT_RECORD_SEPARATOR  = qq{\n};



    eval {

        open my $pipe, '/cdrom/install |'

            or croak "open failed: $OS_ERROR";



        @extrenal_results = <$pipe>;



        close $pipe

            or croak "close failed: $CHILD_ERROR, $OS_ERROR";

    };



    carp "Internal error: $EVAL_ERROR"

        if $EVAL_ERROR;

         
The readability improvement is easy to see, but the greater robustness is perhaps less obvious. Take another look at the localization of the five variables:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Localizing Punctuation Variables
Inhaltsvorschau
If you're forced to modify a punctuation variable, localize it.
The problems described earlier under "Localization can also crop up whenever you're forced to change the value in a punctuation variable (often in I/O operations). All punctuation variables are global in scope. They provide explicit control over what would be completely implicit behaviours in most other languages: output buffering, input line numbering, input and output line endings, array indexing, et cetera.
It's usually a grave error to change a punctuation variable without first localizing it. Unlocalized assignments can potentially change the behaviour of code in entirely unrelated parts of your system, even in modules you did not write yourself but are merely using.
Using local is the cleanest and most robust way to temporarily change the value of a global variable. It should always be applied in the smallest possible scope, so as to minimize the effects of any "ambient behaviour" the variable might control:

            

    Readonly my $SPACE => q{};



    if (@ARGV) {

        local $INPUT_RECORD_SEPARATOR  = undef;   

                  # Slurp mode

               

        local $OUTPUT_RECORD_SEPARATOR = $SPACE;  

                  # Autoappend a space to every print

               

        local $OUTPUT_AUTOFLUSH        = 1;       

                  # Flush buffer after every print

               



               

                  

        # Slurp, mutilate, and spindle...

               

        $text = <>;

        $text =~ s/\n/[EOL]/gxms;

        print $text;

    }

         
A common mistake is to use unlocalized global variables, saving and restoring their original values at either end of the block, like so:

    Readonly my $SPACE => q{};



    if (@ARGV) {

        my $prev_irs = $INPUT_RECORD_SEPARATOR;

        my $prev_ors = $OUTPUT_RECORD_SEPARATOR;

        my $prev_af  = $OUTPUT_AUTOFLUSH;



        $INPUT_RECORD_SEPARATOR  = undef;

        $OUTPUT_RECORD_SEPARATOR = $SPACE;

        $OUTPUT_AUTOFLUSH        = 1;



        $text = <>;

        $text =~ s/\n/[EOL]/gxms;

        print $text;



        $INPUT_RECORD_SEPARATOR  = $prev_irs;

        $OUTPUT_RECORD_SEPARATOR = $prev_ors;

        $OUTPUT_AUTOFLUSH        = $prev_af;



    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Match Variables
Inhaltsvorschau
Don't use the regex match variables .
Whenever you use English, it's important to load the module with a special argument:

            

    use English qw( -no_match_vars );

         
This argument prevents the module from creating the three "match variables": $PREMATCH (or $'), $MATCH (or $&), and $POSTMATCH (or $'). Whenever these variables appear anywhere in a program, they force every regular expression in that program to save three extra pieces of information: the substring the match initially skipped (the "prematch"), the substring it actually matched (the "match"), and the substring that followed the match (the "postmatch").
Every regex has to do this every time any pattern match succeeds, because these punctuation variables are global in scope, and hence available everywhere. So the regex that sets them might not be in the same lexical scope, the same package, or even the same file as the code that next uses them. The compiler can't know which regex will have been the most recently successful at any point, so it has to play it safe and set the match variables every time any regex anywhere matches, in case that particular match is the one that precedes the use of one of the match variables.
This particular problem neatly illustrates why all non-lexical variables cause difficulties. The presence of $', $&, or $' immediately couples a particular piece of code to (potentially) every single regex in your program. Leaving aside the extra workload that connection imposes on every pattern match, this also means that debugging pattern matches can be potentially much more difficult. If one of the match variables doesn't contain what you expected, it's possible that's because it was actually set by some pattern match other than the one you thought was setting it. And that pattern match could be
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Dollar-Underscore
Inhaltsvorschau
Beware of any modification via $_ .
One particularly easy way to introduce subtle bugs is to forget that $_ is often an alias for some other variable. Any assignment to $_ or any other form of transformation on it, such as a substitution or transliteration, is probably changing some other variable. So any change applied to $_ needs to be scrutinized particularly carefully.
This problem can be especially insidious when $_ isn't actually being named explicitly. For example, suppose you needed a subroutine that would return a copy of any string passed to it, with the leading and trailing whitespace trimmed from the copy. And suppose you also want that subroutine to default to trimming $_ if no explicit argument is provided (just as the built-in chomp does). You might write such a subroutine like this:

    sub trimmed_copy_of {

        # Trim explicit arguments...

        if (@_ > 0) {

            my ($string) = @_;

            $string =~ s{\A \s* (.*?) \s* \z}{$1}xms;

            return $string;

        }

        # Otherwise, trim the default argument (i.e. $_)...

        else {

            s{\A \s* (.*?) \s* \z}{$1}xms;

            return $_;

        }

    }
and then use it like so:

    print trimmed_copy_of($error_mesg);



    for (@diagnostics) {

        print trimmed_copy_of;

    }
Unfortunately, that implementation of trimmed_copy_of() is fatally flawed. After using the function in the previous code, the contents of $error_mesg are unchanged (as they should be), but each of the elements of @diagnostics has been unexpectedly shaved. That's because trimmed_copy_of() correctly deals with explicit arguments by copying them into a separate variable and then changing that copy:

        if (@_ > 0) {

            my ($string) = @_;

            $string =~ s{\A \s* (.*?) \s* \z}{$1}xms;

            return $string;

        }
But the subroutine applies its substitution directly to the (implicit) $_, without first copying its contents:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Array Indices
Inhaltsvorschau
Use negative indices when counting from the end of an array.
The last, second last, third last, nth last elements of an array can be accessed by counting backwards from the length of the array, like so:

            # Replace broken frames...

    $frames[@frames-1] = $active{top};         # Final frame

    $frames[@frames-2] = $active{prev};        # Penultimate frame

    $frames[@frames-3] = $active{backup};      # Prepenultimate frame

         
Alternatively, you can work backwards from the final index ($#array_name), like so:

            # Replace broken frames...

    $frames[$#frames  ] = $active{top};        # Final frame

    $frames[$#frames-1] = $active{prev};       # Penultimate frame

    $frames[$#frames-2] = $active{backup};     # Prepenultimate frame

         
However, Perl provides a much cleaner notation for accessing the terminal elements of an array. Whenever an array access is specified with a negative number, that number is taken as an ordinal position in the array, counting backwards from the last element.
The preceding assignments are much better written as:

            

               

                  # Replace broken frames...

               

    $frames[-1] = $active{top};                

                  # 1st-last frame (i.e., final frame)

               

    $frames[-2] = $active{prev};               

                  # 2nd-last frame

               

    $frames[-3] = $active{backup};             

                  # 3rd-last frame

               

            

         
Using negative indices is good practice, because the leading minus sign makes the index stand out as unusual, forcing the reader to think about what that index means and marking any "from the end" indices with an obvious prefix.
Equally importantly, the negative indices are unobscured by any repetition of the variable name within the square brackets. In the previous two versions, notice how similar the three indices are (in that all three start with either
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Slicing
Inhaltsvorschau
Take advantage of hash and array slicing .
The previous examples would be even less cluttered (and hence more readable) using an array slice and a hash slice:

            

    @frames[-1,-2,-3]

        = @active{'top', 'prev', 'backup'};

         
An array slice is a syntactic shortcut that allows you to specify a list of array elements, without repeating the array name for each one. A slice looks similar to a regular array access, except that the array keeps its leading @ and you're then allowed to specify more than one index in the square brackets. An array slice like:

            

    @frames[-1,-2,-3]

         
is exactly the same as:

            

    ($frames[-1], $frames[-2], $frames[-3])

         
just much less work to type in, or read. There's a similar syntax for accessing several elements of a hash: you change the leading $ of a regular hash access to @, then add as many keys as you like. The slice:

            

    @active{'top', 'prev', 'backup'}

         
is exactly the same as:

            

    ($active{'top'}, $active{'prev'}, $active{'backup'})

         
The sliced version of the frames assignment will be marginally faster than three separate scalar assignments, though the difference in performance is probably not significant unless you're doing hundreds of millions of repetitions. The real benefit is in comprehensibility and extensibility.
Be careful, though. This version:

    @frames[-1..-3]

        = @active{'top', 'prev', 'backup'};
is not identical in behaviour. In fact it's a no-op, since the -1..-3 range generates an empty list, just like any other range whose final value is less than its initial value. So the "negative range" actually selects an empty slice, which makes the previous code equivalent to:

    () = @active{'top', 'prev', 'backup'};
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Slice Layout
Inhaltsvorschau
Use a tabular layout for slices.
A slice-to-slice assignment like:

            

    @frames[-1,-2,-3]

        = @active{'top', 'prev', 'backup'};

         
can also be written as:

            

      @frames[ -1,    -2,     -3     ]

    = @active{'top', 'prev', 'backup'};

         
This second version makes it immediately apparent which hash entry is being assigned to which array element. Unfortunately, this approach is useful only when the number of keys/indices in the slices is small. As soon as either list exceeds a single line, the readability of the resulting code is made much worse by vertical alignments:

      @frames[ -1,    -2,     -3,      -4,          -5,      -6,

               -7,          -8     ]

    = @active{'top', 'prev', 'backup', 'emergency', 'spare', 'rainy day',

              'alternate', 'default'};
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Slice Factoring
Inhaltsvorschau
Factor large key or index lists out of their slices.
As the final example in the previous guideline demonstrates, slices can quickly become unwieldy as the number of indices/keys increases.
A more readable and more scalable approach in such cases is to factor out the index/key equivalences in a separate tabular data structure:

            

    Readonly my %CORRESPONDING => (

      

                  # Key of         Index of

               

               

                  

      # %active...     @frames...

               

        'top'        =>  -1,

        'prev'       =>  -2,

        'backup'     =>  -3,

        'emergency'  =>  -4,

        'spare'      =>  -5,

        'rainy day'  =>  -6,

        'alternate'  =>  -7,

        'default'    =>  -8,

    );



    @frames[ values %CORRESPONDING ] = @active{ keys %CORRESPONDING };

         
Each key in %CORRESPONDING is one of the keys of %active, and each value in %CORRESPONDING is the corresponding index of @frames. So the righthand side of the assignment (@active{ keys %CORRESPONDING }) is a hash slice of %active that includes all the entries whose keys are listed in %CORRESPONDING. Similarly, @frames[ values %CORRESPONDING ] is an array slice of @frames that includes all the corresponding indices listed in %CORRESPONDING. That means that the assignment copies entries from %active to the corresponding elements of @frames, with the correspondence being specified by the key/value pairs in %CORRESPONDING.
Storing that key/value correspondence in a hash works because the values and keys functions always traverse the entries of a hash in the same order, so the Nth value returned by values will always be the value of the Nth key returned by keys. Because the two builtins preserve the order of the entries of %CORRESPONDING, the assignment between the two slices copies $active{'top'} into $frames[-1], $active{'prev'} into $frames[-2], $active{'backup'} into $frames[-3], etc.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 6: Control Structures
Inhaltsvorschau
Nothing is more difficult,
and therefore more precious,
than to be able to decide.
—Napoleon I
Control structures are all about choosing: choosing whether to do something, choosing between two or more alternatives, choosing how often to repeat something. As in real life, much programming grief springs either from making the wrong choice or from using the wrong approach when making a choice.
This chapter looks at a range of programming practices that can help to make your code's decision making less error-prone, more efficient, and easier to verify.
The basic principles are simple: make the decision stand out; make the consequences of any decision stand out; base the decision on as few criteria as possible; don't phrase the decision negatively; avoid flag variables and count variables; and make it very easy to detect variations in the flow of control.
Use block if, not postfix if .
One of the most effective ways to make decisions and their consequences stand out is to avoid using the postfix form of if. For example, it's easier to detect the decision and consequences in:

            

    if (defined $measurement) {

        $sum += $measurement;

    }

         
than in:

    $sum += $measurement if defined $measurement;
Moreover, postfix tests don't scale well as the consequences increase. For example:

    $sum += $measurement

    and $count++

    and next SAMPLE

        if defined $measurement;
and:

    do {

        $sum += $measurement;

        $count++;

        next SAMPLE;

    } if defined $measurement;
are both much harder to comprehend than:

            

    if (defined $measurement) {

        $sum += $measurement;

        $count++;

        next SAMPLE;

    }

         
So always use the block form of if.
Reserve postfix
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
If Blocks
Inhaltsvorschau
Use block if, not postfix if .
One of the most effective ways to make decisions and their consequences stand out is to avoid using the postfix form of if. For example, it's easier to detect the decision and consequences in:

            

    if (defined $measurement) {

        $sum += $measurement;

    }

         
than in:

    $sum += $measurement if defined $measurement;
Moreover, postfix tests don't scale well as the consequences increase. For example:

    $sum += $measurement

    and $count++

    and next SAMPLE

        if defined $measurement;
and:

    do {

        $sum += $measurement;

        $count++;

        next SAMPLE;

    } if defined $measurement;
are both much harder to comprehend than:

            

    if (defined $measurement) {

        $sum += $measurement;

        $count++;

        next SAMPLE;

    }

         
So always use the block form of if.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Postfix Selectors
Inhaltsvorschau
Reserve postfix if for flow-of-control statements .
The only exception to the previous guideline comes about because of another of the principles enumerated at the start of this chapter: "make it very easy to detect variations in the flow of control".
Such variations come about when a next, last, redo, return, goto, die, croak, or throw occurs in the middle of other code. These commands break up the orderly downward flow of execution, so it is critical that they are easy to detect. And, although they are usually associated with some conditional test, the fact that they may potentially interrupt the control flow is more important than the conditions under which they are doing so.
Hence it's better to place the next, last, redo, return, goto, die, croak, and throw keywords in the most prominent position on their code line. In other words, they should appear as far to the left as possible (as discussed in the "Keep Left" sidebar in Chapter 2).
If an if is being used solely to determine whether to invoke a flow-control statement, use the postfix form. Don't hide the action over on the right:

    sub find_anomolous_sample_in {

        my ($samples_ref) = @_;



        MEASUREMENT:

        for my $measurement (@{$samples_ref}) {

            if ($measurement < 0) { last MEASUREMENT; }



            my $floor = int($measurement);

            if ($floor == $measurement) { next MEASUREMENT; }



            my $allowed_inaccuracy = scale($EPSILON, $floor);

            if ($measurement-$floor > $allowed_inaccuracy) {

                return $measurement;

            }

        }

        return;

    }
Be "up front" about it:

            

    sub find_anomolous_sample_in {

        my ($samples_ref) = @_;



        MEASUREMENT:

        for my $measurement (@{$samples_ref}) {

            last MEASUREMENT if $measurement < 0;



            my $floor = int($measurement);

            next MEASUREMENT if $floor == $measurement;



            my $allowed_inaccuracy = scale($EPSILON, $floor);

            return $measurement

                if $measurement-$floor > $allowed_inaccuracy;

        }

        return;

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Other Postfix Modifiers
Inhaltsvorschau
Don't use postfix unless, for, while, or until .
The special dispensation to use postfix if in flow-control statements doesn't extend to any other types of statements. Nor does it extend to any of the other postfix statement modifiers.
The postfix looping modifiers create particular maintenance problems because they place the control flow (i.e., the loop specifier) to the right of the statement it controls. For example, a loop like:

    print for grep {defined $_} @generated_lines;
makes it harder to notice the looped flow of control, especially if you also have statements like:

    print $fh grep {defined $_} @generated_lines;
A proper for loop makes the iteration much more obvious:

            

    for my $line (grep {defined $_} @generated_lines) {

        print $line;

    }

         
Note too that it's not possible to give a readable name to the iterator variable of a postfix loop, nor to easily nest conditional tests inside such a loop. Instead of being able to write the code in a straightforward, explicit, easy-to-follow, and extensible way:

            

    for my $line (@generated_lines) {

        if (defined $line) {

            print lc $line;

        }

    }

         
you're forced to rely on boolean operations, and tempted by default behaviours:

    defined and print lc for @generated_lines;
Worse still, using a postfix loop will sometime make it necessary to use explicit $_, which makes the resulting code much harder to understand:

    $_ = lc for @generated_lines;
The same code is much clearer in block form:

            

    for my $line (@generated_lines) {

        $line = lc $line;

    }

         
This disparity in readability grows greater as the number of statements to be iterated increases:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Negative Control Statements
Inhaltsvorschau
Don't use unless or until at all.
Perl is unusual amongst programming languages in that it provides not only positive conditional tests (if and while), but also their negative counterparts (unless and until). Some people find that these keywords can make certain control structures read more naturally to them:

    RANGE_CHECK:

    until ($measurement > $ACCEPTANCE_THRESHOLD) {

        $measurement = get_next_measurement();

        redo RANGE_CHECK unless defined $measurement;

        # etc.

    }
However, for many other developers, the relative unfamiliarity of these negated tests actually makes the resulting code harder to read than the equivalent "positive" version:

            

    RANGE_CHECK:

    while ($measurement <= $ACCEPTANCE_THRESHOLD) {

        $measurement = get_next_measurement();

        redo RANGE_CHECK if !defined $measurement;

        

                  # etc.

               

    }

         
More importantly, the negative tests don't scale well. They almost always become much harder to comprehend as soon as their condition has two or more components, especially if any of those components is itself expressed negatively. For example, most people have significantly more difficulty understanding the double negatives in:

    VALIDITY_CHECK:

    until ($measurement > $ACCEPTANCE_THRESHOLD && ! $is_exception{$measurement}) {

        $measurement = get_next_measurement();

        redo VALIDITY_CHECK unless defined $measurement && $measurement ne '[null]';

        # etc.

    }
So unless and until are inherently harder to maintain. In particular, whenever a negative control statement is extended to include a negative operator, it will have to be re-cast as a positive control, which requires you to change both the keyword and the conditional:

    VALIDITY_CHECK:

    while ($measurement < $ACCEPTANCE_THRESHOLD && $is_exception{$measurement}) {

        $measurement = get_next_measurement();

        redo VALIDITY_CHECK if !defined $measurement || $measurement eq '[null]';

        
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
C-Style Loops
Inhaltsvorschau
Avoid C-style for statements.
The three-part for statements that Perl inherits from C are needed only for unusual loop control behaviour, such as iterating by twos, or in an irregular sequence. But even in such cases, these C-style loops provide that unusual behaviour in an obscure and harder-to-maintain way.
That's because the iterative behaviour of a three-part for statement is emergent, rather than explicit. In other words, the only way to know what a loop like:

    for (my $n=4; $n<=$MAX; $n+=2) {

        print $result[$n];

    }
is going to do is to sit down and work out the abstract logic of the three components:
"Let's see: n starts at 4, and continues up to MAX, incrementing by two each time. So the sequence is 4, 6, 8, etc. So the loop iterates through all the even n's from 4 up to and including MAX (if MAX itself is even)."
But you could write the same loop without the C-style for, like this:

            

    RESULT:

    for my $n (4..$MAX) {

        next RESULT if odd($n);

        print $result[$n];

    }

         
The advantage with this version is that subsequent readers of the code no longer have to work out the logic of the loop. The code itself says explicitly:
"n from 4 to MAX, skipping values that are odd."
The code is clearer, which means it's more maintainable and less susceptible to subtle bugs or nasty edge-cases.
The usual counter-argument is that this second example has to iterate twice as many times for the same effect, and has to call a subroutine (odd()) each of those times. Should $MAX become large, that additional cost might become prohibitive.
In practice, many loops don't iterate enough times for those overheads to matter. And often the actual work done by the loop will swamp the costs of iteration anyway. But, if benchmarking indicates that the clearer-but-slower code
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Unnecessary Subscripting
Inhaltsvorschau
Avoid subscripting arrays or hashes within loops.
Unless you actually need to know the indices of the array elements you're processing, iterate the values of an array directly:

            

    for my $client (@clients) {

        $client->tally_hours();

        $client->bill_hours();

        $client->reset_hours();

    }

         
Iterating the indices and then doing repeated array accesses is significantly slower, and less readable:

    for my $n (0..$#clients) {

        $clients[$n]->tally_hours();

        $clients[$n]->bill_hours();

        $clients[$n]->reset_hours();

    }
Repeated indexing is repeated computation; duplicated effort that incurs an extra cost but provides no added benefit. Iterating indices is also prone to off-by-one errors. For example:

    for my $n (1..@clients) {

        $clients[$n]->tally_hours();

        $clients[$n]->bill_hours();

        $clients[$n]->reset_hours();

    }
Likewise, if you're processing the entries of a hash and you need only the values of those entries, don't iterate the keys and then look up the values repeatedly:

    for my $original_word (keys %translation_for) {

       if ( $translation_for{$original_word} =~ m/ $PROFANITY /xms) {

           $translation_for{$original_word} = '[DELETED]';

       }

    }
Repeated hash look-ups are even more costly than repeated array indexing. Just iterate the hash values directly:

            

    for my $translated_word (values %translation_for) {

       if ( $translated_word =~ m/ $PROFANITY /xms) {

           $translated_word = '[DELETED]';

       }

    }

         
Note that this last example works correctly because, in Perl 5.6 and later, the values function returns a list of aliases to the actual values of the hash, rather than just a list of copies (see "Hash Values" in Chapter 8). So if you change the iterator variable (for example, assigning
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Necessary Subscripting
Inhaltsvorschau
Never subscript more than once in a loop.
Sometimes you have no choice: you really do need to know the index of each value you're iterating over, as well as the value itself. But, even when it is necessary to iterate indices or keys, be sure to extract the value only once:

            

    for my $agent_num (0..$#operatives) {                        

                  # Iterate indices

               

        my $agent = $operatives[$agent_num];                     

                  # Extract value once

               



        print "Checking agent $agent_num\n";                     

                  # Use index

               

        if ($on_disavowed_list{$agent}) {                        

                  # Use value

               

            print "\t...$agent disavowed!\n";                    

                  # Use value again

               

        }

    }

         
Never extract it repeatedly in the same iteration:

    for my $agent_num (0..$#operatives) {                        # Iterate indices

        print "Checking agent $agent_num\n";                     # Use index

        if ($on_disavowed_list{$operatives[$agent_num]}) {       # Extract value

            print "\t...$operatives[$agent_num] disavowed!\n";   # Extract value again

        }

    }
Apart from the fact that repeated array look-ups are repeatedly expensive, they also clutter the code, and increase the maintenance effort if either the array name or the name of the iterator variable subsequently has to be changed.
Occasionally a mere copy of the value won't do, because you need to iterate both indices and values, and still be able to modify the values. It's easy to do that too—just use the Data::Alias CPAN module:

            

    use Data::Alias;



    for my $agent_num (0..$#operatives) {           

                  
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Iterator Variables
Inhaltsvorschau
Use named lexicals as explicit for loop iterators.
From a readability standpoint, $_ is a terrible name for a variable, especially for an iterator variable. It conveys nothing about the nature or purpose of the values it stores, except that they're currently being iterated in the innermost enclosing loop. For example:

    for (@candidates) {

        if (m/\[ NO \] \z/xms) {

            $_ = reconsider($_);



            $have_reconsidered{lc()}++;

        }

        else {

            print "New candidate: $_\n";



            $_ .= accept_or_reject($_);



            $have_reconsidered{lc()} = 0;

        }

    }
This piece of code starts off well enough: "For each of these candidates, if it matches a certain pattern...". But things go downhill very quickly from there.
On the third line, the call to lc has its argument omitted, so the function defaults to using $_. And the maintainability of the code immediately suffers. Whoever wrote the code obviously knew that lc defaults to $_ in this way; in fact, that's probably part of the reason they used $_ as the loop iterator in the first place. But will future maintainers of the code know about that default behaviour? If not, they'll have to look up lc to check, which makes their job just a little harder. Unnecessarily harder.
The usual reply at this point is that those maintainers should know Perl well enough to know that lc defaults to lowercasing $_. But that's the crumbling edge of a very slippery slope. Which of the following built-in functions also default to $_?

            

    abs          close           printf        sleep

    chdir        die             require       -t

    chroot       localtime       select        -T

         
Even if you knew, and were confident that you knew, are you equally confident that your teammates know?
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Non-Lexical Loop Iterators
Inhaltsvorschau
Always declare a for loop iterator variable with my .
When using an explicit iterator variable in a for loop, make sure it's explicitly declared as a lexical variable, using the my keyword. That is, never write a for loop like this:

    my $client;



    SEARCH:

    for $client (@clients) {

        last SEARCH if $client->holding();

    }



    if ($client) {

        $client->resume_conversation();

    }
If you leave off the my, Perl doesn't reuse the lexical variable declared above the loop. Instead, it silently declares a new lexical variable (which is also named $client) as the iterator variable. That new lexical is always scoped to the loop block, and it hides any variable of the same name from any outer scope.
This behaviour is contrary to all reasonable expectation. Everywhere else in Perl, when you declare a lexical variable, it's visible throughout the remainder of its scope, unless another explicit my declaration hides it. So it's natural to expect that the $client variable used in the for loop is the same lexical $client variable that was declared before the loop.
But it isn't. The previous example is actually equivalent to:

    my $client;



    SEARCH:

    for my $some_other_variable_also_named_client (@clients) {

        last SEARCH if $some_other_variable_also_named_client->holding();

    }



    if ($client) {

        $client->resume_conversation();

    }
Writing it that way makes the logical error in the code much more obvious. The loop isn't setting the outermost lexical $client to the first client who's on hold. It's setting an inner lexical variable (which is also named $client in the original version). Then it's throwing that variable away at the end of the loop. The outer lexical $client retains its original undefined value, and the if block is never executed.
Unfortunately, the first version shown doesn't make that error obvious at all. It looks like it ought to work. It
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
List Generation
Inhaltsvorschau
Use map instead of for when generating new lists from old.
A for loop is so convenient that it's natural to reach for it in any situation where a fixed number of list elements is to be processed. For example:

    my @sqrt_results;

    for my $result (@results) {

        push @sqrt_results, sqrt($result);

    }
But code like that can be very inefficient, because it has to perform a separate push for every transformed element. Those pushes usually require a series of internal memory reallocations, as the @sqrt_results array repeatedly fills up. It is possible to preallocate space in @sqrt_results, but the syntax to do that is a little obscure, which doesn't help readability:

    my @sqrt_results;



    # Preallocate as many elements as @results already has...

    $#sqrt_results = $#results;



    for my $next_sqrt_result (0..$#results) {



        $sqrt_results[$next_sqrt_result] = sqrt $results[$next_sqrt_result];

    }
You also have to use an explicit counter if you preallocate. You can't use push, because you just gave the array some number of preallocated elements, so push would put each new value after them.
The alternative is to use Perl's built-in map function. This function is specifically aimed at those situations when you want to process a list of values, to create some kind of related list. For example, to produce a list of square roots from a list of numbers:

            

    my @sqrt_results = map { sqrt $_ } @results;

         
Some of the benefits of this approach are very obvious. For a start, there's less code, so (provided you know what map does) the code is significantly easier to understand. Less code also means there are likely to be fewer bugs, as there are fewer places for things to go wrong.
There are a couple of other advantages that aren't quite as obvious. For example, when you use map
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
List Selections
Inhaltsvorschau
Use grep and first instead of for when searching for values in a list.
The same principles apply when you want to refine a list by removing unwanted elements. Instead of a for loop:

            

    # Identify candidates who are unfit for the cut-and-thrust of politics...

    my @disqualified_candidates;

    for my $name (@candidates) {

        if (cannot_tell_a_lie($name)) {

            push @disqualified_candidates, $name;

        }

    }
just use a grep:

            

               

                  

    # Identify candidates who are unfit for the cut-and-thrust of politics...

               

    my @disqualified_candidates

        = grep {cannot_tell_a_lie($_)} @candidates;

         
Likewise, don't use a for when you're searching a list for a particular element:

            

    # Victimize someone at random...

    my $scapegoat = $disqualified_candidates[rand @disqualified_candidates];



    # Unless there's a juicier story...

    SEARCH:

    for my $name (@disqualified_candidates) {

        if (chopped_down_cherry_tree($name)) {

            $scapegoat = $name;

            last SEARCH;

        }

    }



    # Publish and be-damn...

    print {$headline} "Disgraced $scapegoat Disqualified From Election!!!\n";
Using the first function often results in code that is both more comprehensible and more efficient:

            

    use List::Util qw( first );



    

                  # Find a juicy story...

               

    my $scapegoat

        = first { chopped_down_cherry_tree($_) }  @disqualified_candidates;



    

                  # Otherwise victimize someone at random...

               

    if (!defined $scapegoat) {

        $scapegoat = $disqualified_candidates[rand @disqualified_candidates];

    }



    

                  # Publish and be-damn...

               

    print {$headline} "Disgraced $scapegoat Disqualified From Election!!!\n";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
List Transformation
Inhaltsvorschau
Use for instead of map when transforming a list in place.
There is, however, a particular case where map and grep are not better than an explicit for loop: when you're transforming an array in situ. In other words, when you have an array of elements or a list of lvalues and you want to replace each of them with a transformed version of the original.
For example, suppose you have a series of temperature measurements in Fahrenheit, and you need them in Kelvin instead. You could accomplish that transformation by applying a map to the data and then assigning it back to the original container:

    @temperature_measurements = map { F_to_K($_) } @temperature_measurements;
But the map statement has to allocate extra memory to store the transformed values and then assign that temporary list back to the original array. That process could become expensive if the list is large or the transformation is repeated many times.
In contrast, the equivalent for block can simply reuse the existing memory in the array:

            

    for my $measurement (@temperature_measurements) {

        $measurement = F_to_K($measurement);

    }

         
Note that this second version also makes it slightly more obvious that elements of the array are being replaced. To detect that fact in the map version, you have to compare the array names at both ends of a long assignment statement. In the for-loop version, the more compact statement:

            

        $measurement = F_to_K($measurement);

         
makes it easier to see that each measurement is being replaced with some transformed version of its original value.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Complex Mappings
Inhaltsvorschau
Use a subroutine call to factor out complex list transformations .
When a map, grep, or first is applied to a list, the block performing the transformation or conditional test can sometimes become quite complex. For example:

    use List::Util qw( max );



    Readonly my $JITTER_FACTOR => 0.01;   # Jitter by a maximum of 1%



    my @jittered_points

        = map { my $x = $_->{x};

                my $y = $_->{y};



                my $max_jitter = max($x, $y) / $JITTER_FACTOR;





                { x => $x + gaussian_rand({mean=>0, dev=>0.25, scale=>$max_jitter}),

                  y => $y + gaussian_rand({mean=>0, dev=>0.25, scale=>$max_jitter}),

                }

              } @points;
This large block is very hard to read, especially since the final anonymous hash constructor looks more like a nested block. So the temptation is to use a for instead:

    my @jittered_points;

    for my $point (@points) {

        my $x = $point->{x};

        my $y = $point->{y};



        my $max_jitter = max($x, $y) / $JITTER_FACTOR;



        my $jittered_point = {

            x => $x + gaussian_rand({ mean=>0, dev=>0.25, scale=>$max_jitter }),

            y => $y + gaussian_rand({ mean=>0, dev=>0.25, scale=>$max_jitter }),

        };



        push @jittered_points, $jittered_point;

    }
That certainly does help the overall readability, but it's still far from optimal. A better solution is to factor out the complex calculation into a separate subroutine, then call that subroutine within a now much simpler and more readable map expression:

            

    my @jittered_points = map { jitter($_) } @points;



    

                  # and elsewhere...



    # Add a random Gaussian perturbation to a point...

               

    sub jitter {

        my ($point) = @_;

        my $x = $point->{x};

        my $y = $point->{y};



        my $max_jitter = max($x, $y) / $JITTER_FACTOR;



        return {

            x => $x + gaussian_rand({ mean=>0, dev=>0.25, scale=>$max_jitter }),

            y => $y + gaussian_rand({ mean=>0, dev=>0.25, scale=>$max_jitter }),

        };

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
List Processing Side Effects
Inhaltsvorschau
Never modify $_ in a list function.
One particular feature of the way the map, grep, and first functions work can easily become a source of subtle errors. These functions all use the $_ variable to pass each list element into their associated block. But, for better efficiency, these functions alias $_ to each list value they're iterating, rather than copying each value into $_.
You probably don't often think of map, grep, and first as creating aliases. You probably just think of those functions as taking a list and returning a second, independent list. And, most importantly, you almost certainly don't expect them to change the original list.
However, if the block you give to a map, grep, or first modifies $_ in any way, then it's actually modifying an alias to some element of the function's list. That means it's actually modifying the original element itself, which is almost certainly an error.
This kind of mistake commonly occurs in code like this:

            

    # Select .pm files for which no corresponding .pl file exists...

    @pm_files_without_pl_files

        = grep { s/.pm\z/.pl/xms && !-e } @pm_files;
The intention here is almost certainly virtuous. The thought process was probably something like:
The implicit $_ successively holds a copy of each of the filenames in @pm_files. I'll replace the .pm suffix of that copy with .pl, then see if the resulting file exists. If it does, then the original (.pm) filename will be passed through the grep to be collected in @pm_files_without_pl_files.
The mistake is simple, but deadly: $_ doesn't successively hold a copy of anything. It successively holds aliases. So the actual effect of the grep is far more sinister. $_ is an alias—that is, just another name—for each of the filenames in
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Multipart Selections
Inhaltsvorschau
Avoid cascading an if .
Avoid cascades of if-elsif-elsif-else statements wherever possible. They tend to produce code with poor readability that is also expensive to execute.
The readability of an if cascade suffers because the blocks associated with each alternative have to be placed between the alternatives themselves. That can easily cause the entire construct to expand beyond a single screen or page. Any kind of code that extends over a visual boundary is very much more difficult to understand, because the reader is then forced to mentally cache parts of the construct as they scroll through it.
Even if the code doesn't cause a mental page fault, the alternation of condition-action-condition-action-condition-action can make it difficult to compare the conditions and hence to verify that the logic you're implementing is correct. For example, it can be hard to verify that, collectively, your conditions cover all the important alternatives. It can also be difficult to ensure that they are mutually exclusive.
Likewise, if the actions are very similar (e.g., assigning different values to the same variable), it's relatively easy to induce errors (mistyping the variable name in one branch, for example) or to introduce subtleties (such as deliberately using a different variable name in one branch).
The performance of an if cascade can also be suboptimal. Unless you are able to put the most common cases first, a cascaded if is going to have to test, on average, one-half of its alternative conditions before it can execute any of its blocks. And often it's simply not possible to put the common cases first, either because you don't know which cases will be the common ones or because you specifically need to check the special cases first.
The following guidelines examine specific types of cascaded if, and suggest alternative code structures that are more robust, readable, and efficient.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Value Switches
Inhaltsvorschau
Use table look-up in preference to cascaded equality tests .
Sometimes an if cascade selects its action by testing the same variable against a fixed number of predefined values. For example:

    sub words_to_num {

        my ($words) = @_;



        # Treat each sequence of non-whitespace as a word...

        my @words = split /\s+/, $words;



        # Translate each word to the appropriate number...

        my $num = $EMPTY_STR;

        for my $word (@words) {

            if ($word =~ m/ zero | zéro /ixms) {

                $num .= '0';

            }

            elsif ($word =~ m/ one | un | une /ixms) {

                $num .= '1';

            }

            elsif ($word =~ m/ two | deux /ixms) {

                $num .= '2';

            }

            elsif ($word =~ m/ three | trois /ixms) {

                $num .= '3';

            }

            # etc. etc. until...

            elsif ($word =~ m/ nine | neuf /ixms) {

                $num .= '9';

            }

            else {

                # Ignore unrecognized words

            }

        }



        return $num;

    }



    # and later...



    print words_to_num('one zero eight neuf');    # prints: 1089

         
A cleaner and more efficient solution is to use a hash as a look-up table, like so:

            

    my %num_for = (

    

                  #   English       Français        Française

               

       'zero' => 0,   'zéro' => 0,

        'one' => 1,     'un' => 1,    'une' => 1,

        'two' => 2,   'deux' => 2,



      'three' => 3,  'trois' => 3,

    

                  #        etc.           etc.

               

       'nine' => 9,   'neuf' => 9,

    );



    sub words_to_num {

        my ($words) = @_;



        

                  # Treat each sequence of non-whitespace as a word...

               

        my @words = split /\s+/, $words;



        

                  # Translate each word to the appropriate number...

               
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Tabular Ternaries
Inhaltsvorschau
When producing a value, use tabular ternaries .
Hash-based table look-ups aren't always feasible. Sometimes decisions have to be made based on a series of tests, rather than on a particular value. However, if each alternative course of action results in a simple value, then it's still possible to avoid explicit cascaded ifs and preserve a tabular layout in your code. The trick is to use the ternary operator (?:) instead.
For example, to produce a suitable string for a salutation in a form letter, you might write something like:

    my $salute;

    if ($name eq $EMPTY_STR) {

        $salute = 'Dear Customer';

    }

    elsif ($name =~ m/\A ((?:Sir|Dame) \s+ \S+)/xms) {

        $salute = "Dear $1";

    }



    elsif ($name =~ m/([^\n]*), \s+ Ph[.]?D \z/xms) {

        $sa1ute = "Dear Dr $1";

    }

    else {

        $salute = "Dear $name";

    }
The repeated assignments to $salute suggest that a cleaner solution, using only a single assignment, may be possible. Indeed, you could build a simple tabular structure to determine the correct salutation, by cascading ternaries instead of ifs, like so:

            

               

                  

               # Name format...                            # Salutation...

               

    my $salute = $name eq $EMPTY_STR                       ? 'Dear Customer'

               : $name =~ m/ \A((?:Sir|Dame) \s+ \S+) /xms ? "Dear $1"

               : $name =~ m/ (.*), \s+ Ph[.]?D \z     /xms ? "Dear Dr $1"

               :                                             "Dear $name"

               ;

         
The efficiency of this series of tests will be exactly the same as the preceding cascaded-if version, so there's no advantage in that respect. The advantages of this approach are in terms of readability and comprehensibility. For a start, it's very obvious that this extended construct is, despite the many alternatives it considers, really just a single assignment statement. And it's very easy to confirm that the correct variable is being assigned to.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
do-while Loops
Inhaltsvorschau
Don't use do...while loops.
Like any other postfix looping construct, a do...while loop is intrinsically hard to read, because it places the controlling condition at the end of the loop, rather than at the beginning.
More importantly, in Perl a do...while loop isn't a "first-class" loop at all. Specifically, you can't use the next, last, or redo commands within a do...while. Or, worse still, you can use those control directives; they just won't do what you expect.
For example, the following code looks like it should work:

    sub get_big_int {

        my $int;



        TRY:

        do {

            # Request an integer...

            print 'Enter a large integer: ';

            $int = <>;



            # That's not an integer!...

            next TRY if $int !~ /\A [-+]? \d+ \n? \z/xms;



            # Otherwise tidy it up a little...

            chomp $int;

        } while $int < 10;   # Until the input is more than a single digit



        return $int;

    }



    # and later...



    for (1..$MAX_NUMBER_OF_ATTEMPTS) {

        print sqrt get_big_int(), "\n";

    }
That looks okay, but it isn't. Specifically, if a non-integer is ever entered and the next TRY command is invoked, that next starts looking for an appropriately labeled loop to re-iterate. But the do...while isn't actually a loop; it's a postfix-modified do block. So the next ignores the TRY: label attached to the do. Control passes out of the do block, and then out of the subroutine call (a subroutine isn't a loop either), until it returns to the for loop. But the for loop isn't labeled TRY:, so control passes outwards again, this time right out of the program.
In other words, if the user ever enters a value that isn't a pure integer, the entire application will immediately terminate—not a very robust or graceful way to respond to errors. That kind of bug is particularly hard to find too, because it's one of those rare cases of a Perl construct not doing what you mean. It looks right, but it works wrong.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Linear Coding
Inhaltsvorschau
Reject as many iterations as possible, as early as possible.
Chapter 2 recommends the practice of "coding in paragraphs" as a way to chunk code and improve its comprehensibility. Taking this idea one step further, it is also good practice to "process in paragraphs". That is, don't wait until you have all your data assembled before you start checking it. It's more efficient, and often more comprehensible, to verify as you go.
Checking data as soon as it's available means that you can short-circuit sooner if the data is unacceptable. More importantly, the resulting "paragraphs" of code are then specific to each piece of data, rather than to one phase of the processing. That means your code chunks are better focused on the distinct elements of the problem domain, rather than on the more complex interactions between those elements.
For example, instead of:

    for my $client (@clients) {

        # Compute current and future client value...

        my $value     = $client->{volume} * $client->{rate};

        my $projected = $client->{activity} * $value;



        # Verify client is active, worth watching, and worth keeping...

        if ($client->{activity}

            && $value >= $WATCH_LEVEL

            && $projected >= $KEEP_LEVEL

        ) {

            # If so, add in the client's expected contribution...

            $total += $projected * $client->{volatility};

        }

    }
you can generate-and-test each datum sequentially, like so:

            

    CLIENT:

    for my $client (@clients) {

        

                  # Verify active client...

               

        next CLIENT if !$client->{activity};



        

                  # Compute current client value and verify client is worth watching...

               

        my $value = $client->{volume} * $client->{rate};

        next CLIENT if $value < $WATCH_LEVEL;



        

                  # Compute likely client future value and verify client is worth keeping...
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Distributed Control
Inhaltsvorschau
Don't contort loop structures just to consolidate control.
The bloated conditional tests mentioned in the previous guideline can also appear in the conditions of loop structures, where they usually indicate the (mis)application of structured programming techniques.
Proponents of structured programming usually insist that every loop should have only a single exit point: the conditional expression that's controlling the loop. The very laudable intent of that rule is to make it easier to determine the correctness of the loop by consolidating all information about its termination behaviour in a single place.
Unfortunately, blind adherence to this principle frequently produces code that looks like this:

    Readonly my $INTEGER => qr/\A [+-]? \d+ \n? \z/xms;



    my $int   = 0;

    my $tries = 0;

    my $eof   = 0;



    while (!$eof

           && $tries < $MAX_TRIES

           && ( $int !~ $INTEGER || $int < $MIN_BIG_INT )

    ) {

        print 'Enter a big integer: ';

        $int = <>;

        if (defined $int) {

            chomp $int;



            if ($int eq $EMPTY_STR) {

                $int = 0;

                $tries--;

            }

        }

        else {

            $eof = 1;

        }

        $tries++;

    }
The loop conditional typically contains a mixture of positive and negative tests on several flag variables. The block itself then contains multiple nested if tests, mainly to set the termination flags or to pre-empt further execution if an exit condition is encountered within the block.
When a loop has been contorted in this manner, it's often extremely difficult to understand. Take a moment to work through the previous example code and determine exactly what it does.
Now compare that convoluted code with the following version (which provides exactly the same behaviour):

            

    Readonly my $INTEGER => qr/\A [+-]? \d+ \n? \z/xms;



    my $int;



    INPUT:

    for my $attempt (1..$MAX_TRIES) {

        print 'Enter a big integer: ';

        $int = <>;



        last INPUT if not defined $int;

        redo INPUT if $int eq "\n";

        next INPUT if $int !~ $INTEGER;



        chomp $int;

        last INPUT if $int >= $MIN_BIG_INT;

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Redoing
Inhaltsvorschau
Use for and redo instead of an irregularly counted while .
In the final version of the input code shown in the previous guideline, a while loop plus a count variable ($tries) was replaced by a for loop. This is a good practice in any situation where a while loop is controlled by a variable that is linearly incremented on each iteration. Using a for makes explicit your intention to loop a fixed number of times. It also eliminates both the count variable and the need to explicitly test that variable against some maximal value. That, in turn, removes the possibility of forgetting to increment the variable and the risk of off-by-one errors in the explicit test.
However, this kind of loop refactoring is satisfactory only when the count variable is uniformly incremented on every iteration. There are plenty of situations where that is not quite the case; where the count is usually incremented each time, but not always. Such exceptions obviously create a serious problem in a fixed-repetition for loop.
For example, the previous example didn't count an empty input line as a legitimate "try". That was easy to accommodate in the "while ($tries < $MAX_TRIES)" version; you simply don't increment $tries in that case. But, in a for loop, the expected number of iterations is fixed before the loop even starts, and you have no control over the incrementing of the loop variable. So it would seem that a for loop is contraindicated whenever the iteration-counting is irregular.
Fortunately, the redo statement allows a loop to have its cake (i.e., be a for loop instead of a while) and eat it too (by still discounting certain iterations). That's because a redo sends the execution back to the start of the current iteration of the loop block: "Do not pass for. Do not collect another iterated value."
Using a redo allows you to take advantage of the fixed-iteration semantics of a for
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Loop Labels
Inhaltsvorschau
Label every loop that is exited explicitly, and use the label with every next, last, or redo .
The next, last, and redo statements make it much easier to specify sophisticated flow of control in a readable manner. And that readability is further enhanced if the reader doesn't have to puzzle out which particular loop a given next, last, or redo is controlling.
The easiest way to accomplish that is to label every loop in which a next, last, or redo is used. Then use the same label on each next, last, and redo in that loop. The reader can then match up the name on the keyword against the labels on the surrounding loops to determine which loop's flow of control is being altered.
So you should write:

            

    INPUT:

    for my $try (1..$MAX_TRIES) {

        print 'Enter an integer: ';

        $int = <>;



        last INPUT if not defined $int;

        redo INPUT if $int eq "\n";



        chomp $int;

        last INPUT if $int =~ $INTEGER;

    }

         
instead of:

    for my $try (1..$MAX_TRIES) {

        print 'Enter an integer: ';

        $int = <>;



        last if not defined $int;

        redo if $int eq "\n";



        chomp $int;



        last if $int =~ $INTEGER;

    }
Another, less obvious benefit of following this guideline is that the presence of the label at the start of any loop alerts the reader to the fact that the loop has embedded flow control.
Place the label on the line preceding the loop keyword, at the same level of indentation, and with an empty line (or a paragraph comment) above it. That way, the label helps the loop stand out, but leaves the actual loop keyword on the left margin, where it's easy to see.
When you're labeling a loop, choose a label that helps to document the purpose of the loop, and of the flow control statements. In particular, don't name loops
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 7: Documentation
Inhaltsvorschau
Documentation is like sex: when it's good, it's very, very
good; and when it's bad, it's still better than nothing.
—Dick Brandon
Documentation: for most development programmers it's a millstone, but for maintenance programmers it's a life-line. More importantly, very few programmers are exclusively in one role or the other. Most developers write code that they then have to maintain themselves. Or else they have to maintain other people's code in order to develop their own.
The problem is that any code of your own that you haven't looked at for six or more months might as well have been written by someone else. The young, smart, optimistic you—who's creating the code—will undoubtedly find it tedious to document your understanding of what that code does and how it does it. But the older, wiser, sadder you—who later has to fix, extend, and adapt that code—will treasure the long-forgotten insights that your documentation preserves.
In that sense, documentation is a love letter that you write to your future self.
Distinguish user documentation from technical documentation.
End users will rarely read your code, or your comments. If they read anything at all, they'll run your module or application through perldoc and read whatever emerges. On the other hand, maintainers and other developers may also read your POD, but they'll spend far more of their time looking directly at your code.
So it makes sense to put user documentation in the "public" sections of your code's POD (i.e., in the =head1, =head2, and =over/=item/=back sections), and relegate the technical documentation to "non-public" places (i.e., to the =for and =begin/=end POD sections and to comments).
More importantly, distinguish between the content of user and technical documentation. In particular, don't put implementation details in user documentation. It wastes your time and it annoys the user. Tell the user what the code does, not how the code does it, unless those details are somehow relevant to the users' use of that code.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Types of Documentation
Inhaltsvorschau
Distinguish user documentation from technical documentation.
End users will rarely read your code, or your comments. If they read anything at all, they'll run your module or application through perldoc and read whatever emerges. On the other hand, maintainers and other developers may also read your POD, but they'll spend far more of their time looking directly at your code.
So it makes sense to put user documentation in the "public" sections of your code's POD (i.e., in the =head1, =head2, and =over/=item/=back sections), and relegate the technical documentation to "non-public" places (i.e., to the =for and =begin/=end POD sections and to comments).
More importantly, distinguish between the content of user and technical documentation. In particular, don't put implementation details in user documentation. It wastes your time and it annoys the user. Tell the user what the code does, not how the code does it, unless those details are somehow relevant to the users' use of that code.
For example, when documenting a set of list operations for users, tell them that pick() takes a list and selects one element at random, that shuffle() takes a list and returns a randomly reordered version of that list, and that zip() takes two or more array references and produces a single list that interleaves the array values. You may choose to mention that pick() and shuffle() do their jobs in a genuinely random and unbiased manner, but there's no need to explain how that miracle is achieved.
On the other hand, your module may also provide a set of specialist sorting routines: sort_radix(), sort_shell(), sort_pigeonhole(). When documenting these, you will obviously need to at least mention the different algorithms they employ, and the conditions under which each might be a superior choice.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Boilerplates
Inhaltsvorschau
Create standard POD templates for modules and applications.
One of the main reasons documentation can often seem so unpleasant is the "blank page effect". Many programmers simply don't know how to start, or what to say.
One of the best ways to make writing documentation less forbidding (and hence more likely to actually occur) is to circumvent that initial empty screen by providing a template that developers can cut and paste into their code.
For a module, that documentation template might look something like Example 7-1. For an application, the variation shown in Example 7-2 is more appropriate. Of course, the specific details that your templates provide may vary from those shown here, according to your other coding practices. The most likely variation will be in the licence and copyright, but you may also have specific in-house conventions regarding version numbering (see Chapter 17), or the grammar of diagnostic messages (see Chapter 13), or the attribution of authorship.
Example 7-1. User documentation template for modules

               

=head1 NAME





                     <Module::Name> - <One-line description of module's purpose>

                  





=head1 VERSION





                     The initial template usually just has:

                  



This documentation refers to 

                     <Module::Name>

                   version 0.0.1.





=head1 SYNOPSIS



    use 

                     <Module::Name>

                  ;

    

                     # Brief but working code example(s) here showing the most common usage(s)



    # This section will be as far as many users bother reading,

    # so make it as educational and exemplary as possible.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extended Boilerplates
Inhaltsvorschau
Extend and customize your standard POD templates.
The two templates recommended in the previous section represent only the minimum amount of information that should be provided to the user. There are many more possibilities that your team might choose to add to its standard template, such as:
=head1 EXAMPLES
Many people learn better by example than by explanation, and most learn better by a combination of the two. Providing a /demo directory stocked with well-commented examples is an excellent idea, but your users might not have access to the original distribution, and the demos are unlikely to have been installed for them. Adding a few illustrative examples in the documentation itself can greatly increase the "learnability" of your code.
=head1 FREQUENTLY ASKED QUESTIONS
Incorporating a list of correct answers to common questions may seem like extra work (especially when it comes to maintaining that list), but in many cases it actually saves time. Frequently asked questions are frequently emailed questions, and you already have too much email to deal with. If you find yourself repeatedly answering the same question by email, in a newsgroup, on a web site, or in person, answer that question in your documentation as well. Not only is this likely to reduce the number of queries on that topic you subsequently receive, it also means that anyone who does ask you directly can simply be directed to read the fine manual.
=head1 COMMON USAGE MISTAKES
This section is really "Frequently Unasked Questions". With just about any kind of software, people inevitably misunderstand the same concepts and misuse the same components. By drawing attention to these common errors, explaining the misconceptions involved, and pointing out the correct alternatives, you can once again pre-empt a large amount of unproductive correspondence. Perl itself provides documentation of this kind, in the form of the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Location
Inhaltsvorschau
Put user documentation in source files.
Having decided what to provide as user documentation, the next question is where to provide it. The answer is: put the documentation in the same file as the module or application itself (i.e., in the relevant .pm or .pl file).
The other common alternative is to put the documentation in its own separate .pod file. This is possible because perldoc is smart enough to look for POD files as well as source files when searching for documentation. The problem is that this approach works only if the appropriate .pod document has been installed along with the module or application, and has been installed somewhere inperldoc's search path, which is unlikely.
In contrast, if the user documentation is placed directly in the appropriate .pm or .pl file, it will automatically be available anywhere the module or application itself is.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Contiguity
Inhaltsvorschau
Keep all user documentation in a single place within your source file.
Even though Perl allows you to interleave POD sections between chunks of source code, don't.
User documentation that is fragmented into numerous small pieces distributed throughout the code is much harder to maintain in a consistent state, because you have to sift through the intervening code fragments to find it or compare it.
It is sometimes argued that having documentation near the code that it documents can help maintain consistency between the two. In practice, the opposite often seems to be the case: the necessity to go elsewhere in a file in order to update documentation after a code change actually seems to make it more likely that developers will do so. When the documentation is right on hand it's somehow easier to overlook or ignore. Of course, that's not going to be the case for everyone. Many people do find documenting a subroutine easier when the documentation is immediately to hand.
A more important reason not to intersperse code and documentation is that doing so usually produces either contorted code or confused documentation. Keeping documentation near the code it explains will frequently force you to lay the code out in an unnatural order, so as to ensure sensible exposition in the documentation. Or else it will force you to present your documentation in an unnatural order, so as to ensure a sensible layout of the code. Neither of these outcomes is desirable, and both can be avoided by keeping the documentation in its own separate, coherent section of the source file.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Position
Inhaltsvorschau
Place POD as close as possible to the end of the file.
Having decided to keep the documentation together, the obvious question is whether to place it at the start or the end of the file.
There seems to be no particular reason to place it at the beginning. Anyone who is looking at the source is presumably most interested in the code itself, and will appreciate seeing it immediately when they open the file, rather than having to wade though several hundred lines of user documentation first. Moreover, the compiler is able to do a slightly more efficient job it if doesn't have to skip POD sections before it finds any code to compile.
So place your POD at the end of the file, preferably after the _ _END_ _ marker so that the compiler doesn't have to look at it at all. Or, if you're using a _ _DATA_ _ section in your implementation, wrap the documentation in =pod/=cut directives and place it just before the _ _DATA_ _ marker.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Technical Documentation
Inhaltsvorschau
Subdivide your technical documentation appropriately.
When it comes to technical documentation, use separate .pod or plain-text files for your external documentation, design documents, data dictionaries, algorithm overviews, change log, and so on. Make sure that the "See Also" section of your user documentation refers to these extra files.
Use comments (and "invisible" POD directives) for internal documentation, explanations of implementation, maintenance notes, et cetera. The following guidelines give details on each of these points.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Comments
Inhaltsvorschau
Use block templates for major comments.
Create comment templates that are suitable for your team. For example, to internally document a subroutine or method, you might use something like:

            

               

                  

    ############################################

    # Usage      : ????

    # Purpose    : ????

    # Returns    : ????

    # Parameters : ????

    # Throws     : no exceptions

    # Comments   : none

    # See Also   : n/a

               

            

         
which might be filled in like so:

            

               

                  

    ############################################

    # Usage      : Config::Auto->get_defaults()

    # Purpose    : Defaults for 'new'

    # Returns    : A hash of defaults

    # Parameters : none

    # Throws     : no exceptions

    # Comments   : No corresponding attribute,

    #            : gathers data from each

    #            : attr_def attribute

    # See Also   : $self->set_default()

               

            

         
Structured comments like that are usually better than free-form comments:

            

    # This method returns a hash containing the defaults currently being

    # used to initialize configuration objects. It takes no arguments.

    # There isn't a corresponding class attribute; instead it collects

    # the necessary information from the various attr_def attributes. There's

    # also a set_default() method.

         
Templates produce commenting that is more consistent and easier to read. They're also much more coder-friendly because they allow developers to simply "fill in a form". Comment templates also make it more feasible to ensure that all essential information is provided, and to identify missing information easily, by searching for any field that still has a ???? in its "slot".
Your team might prefer to use some other template for structured comments—maybe even just this:

            

               

                  
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Algorithmic Documentation
Inhaltsvorschau
Use full-line comments to explain the algorithm.
Chapter 2 recommends coding in paragraphs. Part of that advice is to prefix each paragraph with a single-line comment.
That comment should explain at a high level what the associated paragraph contributes to the overall process implemented by the code. Ideally, if all the paragraph comments were to be extracted, they should summarize the algorithm by which the code performs its task.
Keep each such comment strictly to a single line. Any more than that interrupts the code excessively, making it harder to follow. If the paragraph is doing something too complicated to be explained in a single line, that is a sign that the code either needs to be split into several paragraphs, or else refactored out into a subroutine (which can then be given a more expansive block comment).
For example:

            

    sub addarray_internal {

        my ($var_name, $needs_quotemeta) = @_;



        

                  # Record original...

               

        $raw .= $var_name;



        

                  # Build meta-quoting code, if required...

               

        my $quotemeta = $needs_quotemeta ? 'map {quotemeta $_}'

                                         : $EMPTY_STR

                                         ;



        

                  # Expand elements of variable, conjoin with ORs...

               

        my $perl5pat

            = qq{(??{join q{|}, $quotemeta \@{$var_name}})};



        

                  # Insert debugging code if requested...

               

        my $type = length $quotemeta ? 'literal' : 'pattern';

        debug_now("Adding $var_name (as $type)");

        add_debug_mesg("Trying $var_name (as $type)");



        

                  # Add back-translation...

               

        push @perl5pats, $perl5pat;



        return;

    }

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Elucidating Documentation
Inhaltsvorschau
Use end-of-line comments to point out subtleties and oddities.
The guidelines in this book aim to help you write code that's self-documenting, so most lines within a single paragraph shouldn't require extra "hints" in order to understand them.
But self-documentation is always in the eye of the original author, and code that seemed perfectly clear when it was written may be somewhat less intelligible when it's re-read six months later.
Comprehensibility can suffer particularly badly when the code incorporates jargon from the problem domain. Terms that were extremely familiar to the original designers and implementers might mean nothing to those who later have to maintain the source. For example, you could inherit code like this:

    my $QFETM_func_ref;



    if ($QFETM_func_ref  = get_GET()) {

        make_futtock($QFETM_func_ref);

    }



    $build_mode = oct $arg{mode};
in which case, the judicious application of trailing comments is appropriate:

            

    my $QFETM_func_ref;  

                  # stores Quantum Field Effect Transfer Mode function



    # Build futtock representation if remote data is available...

               

    if ($QFETM_func_ref  = get_GET()) {    

                  # instead of get_POST()

               

        make_futtock($QFETM_func_ref);     

                  # futtock: a rib of a ship's frame

               

    }



    $build_mode = oct $arg{mode};   

                  # *From* octal, not *to* octal

               

            

         
End-of-line comments should be kept pithy. If you feel that an elucidating comment needs more than the remainder of the current line, then use a discursive comment instead (see "Discursive Documentation" later in this chapter).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Defensive Documentation
Inhaltsvorschau
Comment anything that has puzzled or tricked you.
The final line in the previous example demonstrates the use of an in-line comment to overcome a maintainer's personal stumbling block:

            

    $build_mode = oct $arg{mode};   

                  # *From* octal, not *to* octal

               

            

         
Many programmers mistakenly assume that the oct builtin returns the octal version of its argument, when it actually converts its argument from an octal representation to decimal. That comment may have been added when the code was originally written (presumably in a d'oh! moment after several hours of fruitless debugging), or it may have been appended by a subsequent maintainer (to immortalize their own Homeric realization). Either way, by commenting it explicitly, that same false expectation will thereafter be averted every time someone new reads the code.
An in-line comment is appropriate whenever you encounter a subtle bug, or whenever you write some subtle code. "Subtle" has a very precise definition here: it means that you either had to look something up in a manual, or had to spend more than five seconds thinking about it before you understood its syntax or semantics.
For example, this:

    @options = map +{ $_ => 1 }, @flags;
needs to be commented:

    @options = map +{ $_ => 1 }, @flags;    # Anon hash ctor, not map block!

         
In general, if it puzzled or tricked you once, it will puzzle or trick you—or whoever comes after you—again. To avoid that, leave a Hyre Be Dragones comment in the code.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Indicative Documentation
Inhaltsvorschau
Consider whether it's better to rewrite than to comment.
More often than not, the need to leave hints in the code indicates that the code itself is in need of reworking. For example, if the final example of the previous section had used a map block (as suggested in the "Mapping and Grepping" guideline in Chapter 8), then it would look like this instead:

            

    @options = map { {$_ => 1} } @flags;

         
in which case the trailing comment would probably not be necessary. The outer braces after the map would obviously be block delimiters, because under the Chapter 8 guideline every map is followed by a block. The inner braces might still be slightly disconcerting, but as the map block is expected to return a value, it would be easy enough to deduce that those inner brackets must be producing a value, and hence must be a hash constructor.
Of course, if that still weren't obvious enough, a trailing comment would be appropriate. But now it could be much more to the point:

            

    @options = map { {$_ => 1} } @flags;   

                  # map block returns hash ref

               

            

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Discursive Documentation
Inhaltsvorschau
Use "invisible" POD sections for longer technical discussions.
The =for and =begin/=end POD directives provide an easy way to create large blocks of text that are ignored by the compiler and don't produce any visible output when the surrounding file is processed by a POD formatter. So these directives provide an easy way to embed extended pieces of internal documentation within your source.
The =for directive is identical to a =begin/=end pair, except that it allows only a single paragraph of content, terminated by an empty line. This might well be construed as a feature, in that it encourages conciseness. But note that you still have to provide a trailing =cut, to switch the compiler back from skipping documentation to compiling Perl code.
Both these forms of block commenting take a "format name" after the keyword. Normally this name would be used to indicate which formatting tool the documentation is intended for (e.g., =for html ..., =for groff ..., =for LaTeX ...), but it is far more useful as a means to specify the kind of internal documentation you are writing. Then, provided the description you choose doesn't match the name of one of the standard POD formatters, the resulting POD block will be effectively invisible outside the source code. An easy way to ensure that invisibility is to capitalize the description and put a colon at the end of it.
For example, you can use this approach to record your rationale for unusual design or implementation decisions:

            

    =for Rationale:

         We chose arrays over hashes here because profiling indicated over

         99% of accesses were iterated over the entire set, rather than being

         random. The dataset is expected to grow big enough that the better

         access performance and smaller memory footprint of a big array will

         outweigh the awkwardness of the occasional binary-chop search.



    =cut

         
You can make notes on possible improvements that you don't currently have time to design or implement:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Proofreading
Inhaltsvorschau
Check the spelling, syntax, and sanity of your documentation.
The point of all documentation is communication: either with the users of your code, or with those who maintain it. To be effective, documentation must communicate effectively. It must be without distractions (like spelling mistakes), it must be comprehensible (i.e., syntactically correct), it must be unambiguous, and it must make sense.
So, although it's important to write your documentation, it's far more important to read it after it's written, to make sure it will do the job you created it to do.
The best way to proofread a document is to look at a "rendered" version of it. That is, don't simply reread the POD source you just wrote. Instead, convert that POD to plain text (using perldoc) or to HTML (via pod2html) or even to LaTeX (with pod2latex), and then read through it using the appropriate display tool.
Better still, have someone who's unfamiliar with the code read through your documentation. A new reader will be far better able to recognize when some part of your explanation is confusing, ambiguous, or otherwise unenlightening.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 8: Built-in Functions
Inhaltsvorschau
Bloody instructions which, being taught,
return to plague their inventor
—William Shakespeare
Macbeth, Act 1, Scene 7
The single most important recommendation about Perl's built-in functions is also the simplest: use them.
If Perl already provides a way to solve your problem, and that way is integrated into the language itself, then it doesn't make sense to reinvent it. It's likely that Perl's built-in solution is faster and far better debugged than anything you'll have time to write yourself.
However, some of Perl's built-in functions are sufficiently complex, and their behaviour sufficiently subtle, that there are still right and wrong ways to use them. This chapter explores some of these ways.
Don't recompute sort keys inside a sort .
Doing expensive computations inside the block of a sort is inefficient. By default, the Perl interpreter now uses merge-sorting to implement sort , which means that every sort will call the sort block O(N log N) times. For example, suppose you needed to set up a collection of script files for binary-chop searching. In that case, you might need to sort a set of scripts by their SHA-512 digests. Doing that the obvious way is needlessly slow, because each script is likely to be re-SHA'd several times:

    Use Digest::SHA qw( sha512 );



    # Sort by SHA-512 digest of scripts

    @sorted_scripts

        = sort { sha512($a) cmp sha512($b) } @scripts;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sorting
Inhaltsvorschau
Don't recompute sort keys inside a sort .
Doing expensive computations inside the block of a sort is inefficient. By default, the Perl interpreter now uses merge-sorting to implement sort , which means that every sort will call the sort block O(N log N) times. For example, suppose you needed to set up a collection of script files for binary-chop searching. In that case, you might need to sort a set of scripts by their SHA-512 digests. Doing that the obvious way is needlessly slow, because each script is likely to be re-SHA'd several times:

    Use Digest::SHA qw( sha512 );



    # Sort by SHA-512 digest of scripts

    @sorted_scripts

        = sort { sha512($a) cmp sha512($b) } @scripts;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reversing Lists
Inhaltsvorschau
Use reverse to reverse a list.
By default, the sort builtin sorts strings by ascending ASCII sequence. To make it sort by descending sequence instead, you might write:

    @sorted_results = sort { $b cmp $a } @unsorted_results;
But the operation would be much more comprehensible if you wrote:

            

    @sorted_results = reverse sort @unsorted_results;

         
That is, if you sorted using the default ordering and then reversed the sorted results afterwards.
Interestingly, in many versions of Perl, it's just as fast (or occasionally even faster) to use an explicitly reversed sort. In recent releases, the reverse sort sequence is recognized and optimized. In older releases, sorting with any explicit block was not optimized, so calling sort without a block is significantly faster, even when the extra cost of the reverse is taken into account.
Another situation in which reversing a list can significantly improve maintainability, without seriously compromising performance, is when you need to iterate "downwards" in a for loop. Instead of writing:

    for (my $remaining=$MAX; $remaining>=$MIN; $remaining--) {

        print "T minus $remaining, and counting...\n";

        sleep $INTERVAL;

    }
write:

            

    for my $remaining (reverse $MIN..$MAX) {

        print "T minus $remaining, and counting...\n";

        sleep $INTERVAL;

    }

         
This approach makes it clear that you intended to count in reverse, as well as making the precise range of $remaining much easier to determine. And, once again, the difference in iteration speed is usually not even noticeable.
The loop itself is also more robust. In the first version, the C-like for relies on correct coordination among its three components to achieve the appropriate iteration behaviour. But in the second version, the Perl-like
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reversing Scalars
Inhaltsvorschau
Use scalar reverse to reverse a scalar.
The reverse function can also be called in scalar context to reverse the characters in a single string:

    my $visible_email_address = reverse $actual_email_address;
However, it's better to be explicit that a string reversal is intended there, by writing:

            

    my $visible_email_address = scalar reverse $actual_email_address;

         
Both of these examples happen to work correctly, but leaving off the scalar specifier can cause problems in code like this:

    add_email_addr(reverse $email_address);
which will not reverse the string inside $email_address. That particular call to reverse is in the argument list of a subroutine. That means it's in list context, so it reverses the order of the (one-element) list that it's passed. Reversing a one-element list gives you back the same list, in the same order, with the same single element unaltered by the reordering.
In such cases, you're working against the native context, so you have to be explicit:

            

    add_email_addr(scalar reverse $email_address);

         
Rather than having to puzzle out contexts every time you want to reverse a string, it's much easier—and more reliable—to develop the habit of always explicitly specifying a scalar reverse when that's what you want.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Fixed-Width Data
Inhaltsvorschau
Use unpack to extract fixed-width fields.
Fixed-width text data:

            

    X123-S000001324700000199

    SFG-AT000000010200009099

    Y811-Q000010030000000033

         
is still widely used in many data processing applications. The obvious way to extract this kind of data is with Perl's built-in substr function. But the resulting code is unwieldy and surprisingly slow:

            

    # Specify field locations...

    Readonly my %FIELD_POS => (ident=>0,  sales=>6,   price=>16);

    Readonly my %FIELD_LEN => (ident=>6,  sales=>10,  price=>8);



    # Grab each line/record...

    while (my $record = <$sales_data>) {



        # Extract each field...

        my $ident = substr($record, $FIELD_POS{ident}, $FIELD_LEN{ident});

        my $sales = substr($record, $FIELD_POS{sales}, $FIELD_LEN{sales});

        my $price = substr($record, $FIELD_POS{price}, $FIELD_LEN{price});



        # Append each record, translating ID codes and

        # normalizing sales (which are stored in 1000s)...

        push @sales, {

            ident => translate_ID($ident),

            sales => $sales * 1000,



            price => $price,

        };

    }
Using regexes to capture the various fields produces slightly cleaner code, but the matches are still not optimally fast:

            

    # Specify order and lengths of fields...

    Readonly my $RECORD_LAYOUT

        => qr/\A (.{6}) (.{10}) (.{8}) /xms;



    # Grab each line/record...

    while (my $record = <$sales_data>) {



        # Extract all fields...

        my ($ident, $sales, $price)

            = $record =~ m/ $RECORD_LAYOUT /xms;



        # Append each record, translating ID codes and

        # normalizing sales (which are stored in 1000s)...

        push @sales, {

            ident => translate_ID($ident),

            sales => $sales * 1000,

            price => $price,

        };

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Separated Data
Inhaltsvorschau
Use split to extract simple variable-width fields.
For data that is laid out in fields of varying width, with defined separators (such as tabs or commas) between the fields, the most efficient way to extract those fields is using a split. For example, if a single comma is the field separator:

            

               

                  

    # Specify field separator

               ...

    Readonly my $RECORD_SEPARATOR => q{,};

    Readonly my $FIELD_COUNT      => 3;



    

                  # Grab each line/record

               ...

    while (my $record = <$sales_data>) {

        chomp $record;



        

                  # Extract all fields

               ...

        my ($ident, $sales, $price)

            = split $RECORD_SEPARATOR, $record, $FIELD_COUNT+1;



        

                  # Append each record, translating ID codes and

        # normalizing sales (which are stored in 1000s)

               ...

        push @sales, {

            ident => translate_ID($ident),

            sales => $sales * 1000,

            price => $price,

        };

    }

         
Note the use of the third argument to split. Typically, split is called with only two arguments: the separator itself ($RECORD_SEPARATOR), and then the string from which the fields are to be split out ($record). If a third argument is provided, however, it specifies the maximum number of distinct fields that the split should return.
It's good practice to always provide this extra information if it's known, because otherwise split splits its input as many times as possible, builds a (potentially very long) list of the results, and returns it. The assignment would then throw away all but the first three elements of the returned list, so it's a (potentially very expensive) waste of time to create them in the first place.
In some circumstances, the optimizer can work out how many return values you were expecting, and will automatically supply the third argument itself. However, being explicit is still the better practice, because your code will stay efficient when someone later modifies your statement to something that isn't automatically optimized.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Variable-Width Data
Inhaltsvorschau
Use Text::CSV_XS to extract complex variable-width fields.
Perl's built-in functions aren't always the right answer. Using split to extract variable-width fields is efficient and easy, provided those fields really are always delimited by a simple separator. More often though, even if your records start out as purely comma-delimited:

            

    Readonly my $RECORD_SEPARATOR => q{,};

    Readonly my $FIELD_COUNT      => 3;



    my ($ident, $sales, $price) = split $RECORD_SEPARATOR, $record, $FIELD_COUNT+1;

         
it soon becomes necessary to extend the format rules to cope with human vagaries (such as ignoring whitespace around commas):

    Readonly my $RECORD_SEPARATOR => qr/\s* , \s*/xms;

    Readonly my $FIELD_COUNT      => 3;



    my ($ident, $sales, $price) = split $RECORD_SEPARATOR, $record, $FIELD_COUNT+1;
Or else someone will need to include a comma in a field and will decide to escape it with a backslash, in which case you'll need:

    Readonly my $RECORD_SEPARATOR => qr/ \s* (?<!\\) , \s* /xms;  # Unbackslashed comma

         
And from there it's "Oh, we ought to be able to backslash a backslash too" and then "Hey, let's allow double-quoted fields so we don't have to backslash any of the commas in them". At which point your attempts to write a suitable separator regex for split have become a whirling vortex of pain, as you struggle to reinvent the "Comma-Separated Values" encoding. Badly.
The split function is ideal for simple cases, but scales very poorly when some variant of CSV is being parsed. As soon as your record format goes beyond a simple separator that can be recognized with a (non-lookbehind) regex, consider whether you can respecify your data format and rewrite your code to use the Text::CSV_XS module instead:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
String Evaluations
Inhaltsvorschau
Avoid string eval .
There are numerous reasons why the string form of eval:

    use English qw( -no_match_vars );



    eval $source_code;

    croak $EVAL_ERROR if $EVAL_ERROR;

    # ALWAYS check for an error after any eval

         
is better avoided. For a start, it has to re-invoke the parser and the compiler every time you call it, so it can be expensive and can cause expected processing delays, especially if the eval is inside a loop.
More importantly, a string eval doesn't provide compile-time warnings on the code that it creates. It does produce run-time warnings, of course, but encountering those warnings then depends on the thoroughness of your testing regime (see Chapter 18).
This is a serious problem, because writing code that generates other code that is then eval'd is typically much harder (and therefore more error-prone) than writing normal code. And code-generating code is likewise very much harder to maintain.
Perhaps the most common rationale for using a string eval is to create new subroutines that are built around some expression the user supplies. For example, you might need to generate a range of sorting routines using different, user-provided keys. Example 8-1 demonstrates how to do that with a string eval.
Example 8-1. Creating subroutines via run-time compilation

sub make_sorter {

    my ($subname, $key_code) = @_;

    my $package = caller();



    # Create and compile the source of a new subroutine in the caller's namespace

    eval qq{

        # Go to the caller's namespace...

        package $package;



        # Define a subroutine of the specified name...

        sub $subname {



            

            # That subroutine does a Schwartzian transform...

            return map  { \$_->[0] }                    # 3. Return original value

                   sort { \$a->[1] cmp \$b->[1] }       
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Automating Sorts
Inhaltsvorschau
Consider building your sorting routines with Sort::Maker .
Using a subroutine like make_sorter() to create efficient sorts is a very good practice. It allows you to focus on specifying your sort criteria correctly, instead of on the mechanics of sorting. It also factors out the comparatively large amounts of coding infrastructure needed to optimize your sorts.
You don't even have to write make_sorter() yourself. The Sort::Maker CPAN module provides a very sophisticated implementation of the subroutine. It has options for building sorting subroutines using Orcish or Schwartzian optimizations, as well as the more advanced Guttman-Rosler Transform.
Using the module, Example 8-2 could be simplified to:

            

    use Sort::Maker;



    

                  # Create sort subroutines (ST flag enables Schwartzian transform)

               ...

    make_sorter(name => 'sort_sha', code => sub{ sha512($_)    }, ST => 1 );

    make_sorter(name => 'sort_ids', code => sub{ /ID:(\d+)/xms }, ST => 1 );

    make_sorter(name => 'sort_len', code => sub{ length        }, ST => 1 );



    

                  # and later

               ...





    @names_shortest_first = sort_len(@names);

    @names_digested_first = sort_sha(@names);

    @names_identity_first = sort_ids(@names);

         
Note that, unlike the version shown in Example 8-2, the make_sorter() subroutine provided by Sort::Maker supports a large set of options, and so uses labeled arguments (see Chapter 9).
The module even has a declarative syntax for creating commonly needed sorts. For example, to create a sort_max_first() subroutine that sorts its argument list in descending numeric order:

            

    make_sorter( name => 'sort_max_first', qw( plain number descending ) ) ;

         
The Sort::Maker module is highly recommended.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Substrings
Inhaltsvorschau
Use 4-arg substr instead of lvalue substr .
The substr builtin is unusual in that it can be used as an lvalue (i.e., a target of assignment). So you can write things like:

    substr($addr, $country_pos, $COUNTRY_LEN)

        = $country_name{$country_code};
This statement first locates the substring of the string in $addr which starts at $country_pos and runs for $COUNTRY_LEN characters. Then that substring is replaced with the string in $country_name{$country_code}. Effectively, it's an assignment into part of the string value in a variable.
But to readers who are unused to this particular feature, an assignment to a function call can be confusing, or even scary, and therefore less comprehensible. So substr assignments become an issue of maintainability.
Of course, it's not hard to look up the perlfunc manual and learn about the special semantics of substr assignments, so their impact on maintainability is marginal. Then again, almost every maintainability issue is, by itself, marginal. It's only collectively that subtleties, clevernesses, and esoterica begin to sabotage comprehensibility. And it's only collectively that obviousness, straightforwardness, and conformity to standards can help to enhance it. Every small choice when coding contributes in one direction or the other.
However you choose to assess their cognitive load, there is another problem with assignments to substrings : they're relatively slow. The call to substr has to locate the required substring, create an interim representation of it, return that interim representation, perform the assignment to it, re-identify the required substring, and then replace it.
To avoid those extra steps, in Perl 5.6.1 and later substr also comes in a four-argument model. That is, if you provide a fourth argument to the function, that argument is used as the string with which to replace the substring identified by the first three arguments. So the previous example could be rewritten more efficiently as:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Hash Values
Inhaltsvorschau
Make appropriate use of lvalue values .
Another builtin that can sometimes be used in an lvalue manner is the values function for hashes, though only in Perl 5.005_04 and later. Specifically, in recent Perls the values function returns a list of the original values of the hash, not a list of copies (as it did in Perl 5.005_03 and earlier).
This list of lvalues cannot be used in direct assignments:

    values(%seen_files) = ();    # Compile-time error

         
but it can be used indirectly: in a for loop. That is, if you need to transform every value of a hash in some generic fashion, you don't have to index repeatedly within a loop:

    for my $party (keys %candidate_for) {

        $candidate_for{$party} =~ s{($MATCH_ANY_NAME)}

                                   {\U$1}gmxs;

    }
You can just use the result of values as individual lvalues:

            

    for my $candidate (values %candidate_for) {

        $candidate =~ s{($MATCH_ANY_NAME)}

                       {\U$1}gxms;

    }

         
The performance of the values-based version is also better. The loop's iterator variable is directly aliased to each hash value, so there's no need for (expensive) hash loop-ups inside the loop.
Stick with the indexing approach, however, if your code also has to support pre-5.6 compilers.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Globbing
Inhaltsvorschau
Use glob, not <...> .
The <...> syntax is heavily associated with I/O in most people's minds. So something like this:

    my @files = <*.pl>;
is easy to mistake for a normal readline operation:

            

    my @files = <$fh>;

         
Unfortunately, the first version isn't an input operation at all. Angle brackets are input operators only when they're empty (<>), or when they contain a bareword identifier (<DATA>), or when they contain a simple scalar variable (<$input_file>). If anything else appears inside the angles, they perform shell-based directory look-up instead.
In other words, the <*.pl> operation takes the contents of the angle brackets (i.e., *.pl), passes them to the csh system shell, collects the list of filenames that match this shell pattern, and returns those names.
It's not bad enough that this "file glob" is easy to confuse with a popular I/O operation. Far worse, if you apply other best practices when writing it—such as factoring the fixed shell pattern out into a named constant—it suddenly transforms into the very I/O operation it previously only looked like:

    Readonly my $FILE_PATTERN => '*.pl';



    # and later...



    my @files = <$FILE_PATTERN>;    # KABOOM! (probably)

         
As mentioned earlier, a scalar variable in angles is one of the three valid forms that invoke a readline call in Perl, which means that the refactored operation isn't a file glob specification any more. Instead, the angles attempt to do a readline, discover that $FILE_PATTERN contains the string '*.pl', and head straight off to the symbol table looking for a filehandle of that name. Unless the coder has been truly evil, there won't be such a filehandle and, instead of the expected file list appearing in @files, a 'readline() on unopened filehandle' exception will be thrown.
A construct that breaks when you attempt to improve its readability is, by definition, unmaintainable. The file globbing operation has a proper name:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sleeping
Inhaltsvorschau
Avoid a raw select for non-integer sleeps.
Perl's built-in sleep function will only pause your program for an integer number of seconds, even if you give it a floating-point duration:

    sleep 1.5;          # same as sleep(int(1.5)), so sleeps 1 second

         
Worse still, if you ask it to sleep only a fraction of a second, it's effectively a no-op:

    sleep 0.5;          # same as sleep(int(0.5)), so sleeps 0 seconds

         
Some systems are not capable of sleeping for fractions of a second, but if yours is, the easiest way to achieve that is to use the Time::HiRes module (which comes standard in Perl 5.8 and later):

            

    use Time::HiRes qw( sleep );

    sleep 0.5;          

                  # now sleeps half a second

               

            

         
For even more accuracy (within the limitations of your underlying platform), you can import the Time::HiRes::usleep() function instead and specify the length of your nap as an integral number of microseconds:

            

    use Time::HiRes qw( usleep );

    usleep 500_001;     

                  # now sleeps just over half a second

               

            

         
Prior to the availability of the Time::HiRes module, the usual way to sleep for fractions of seconds was to use a side effect of Perl's built-in select function. The select function is supposed to poll sets of I/O streams to determine which of them are ready for reading or writing, and which have exceptions pending.
But the most useful part of this builtin turned out to be its fourth argument, which is supposed to tell select how long to conduct its poll before timing out. It was quickly realized that because this timeout value could be specified in fractions of a second, if select was called with a timeout value but without any streams to poll, like so:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Mapping and Grepping
Inhaltsvorschau
Always use a block with a map and grep .
The map and grep builtins each have two valid syntaxes:

    map BLOCK LIST           grep BLOCK LIST

    map EXPR, LIST           grep EXPR, LIST

         
That is, the code that tells map how to transform a list, or tells grep how to filter it, can be specified either as a single expression or in a block.
But when the first argument to a map or grep is specified as an expression, it becomes harder to distinguish from the remaining arguments:

    print grep valid($_), @candidates;



    @args = map substr($_, 0, 1), @flags, @files, @options;
The block form makes the transform or filter stand out more clearly:

            

    print grep { valid($_) } @candidates;



    @args = map {substr $_, 0, 1} @flags, @files, @options;

         
Using a block also avoids mistakes like:

    @args = map substr $_, 0, 1, @flags, @files, @options;
Here the programmer seems to have thought that substr would somehow work out that it should consume only the first three arguments ($_, 0, 1), to magically produce an "extract the first character" expression that the map can then apply to the remaining arguments. Unfortunately, what happens instead is that the compiler notices that substr was given six arguments and complains:

    Too many arguments for substr at demo.pl line 42, near "@options;"
Using the block form instead:

            

    @args = map {substr $_, 0, 1} @flags, @files, @options;

         
makes the intent clear—both to the compiler and to subsequent readers.
More importantly, the expression forms of map and grep don't scale well as their transforms or filters become more complicated. If additional statements need to be added to a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Utilities
Inhaltsvorschau
Use the "non-builtin builtins".
This guideline covers a number of common wheels that ought not be re-invented. Perl itself encourages the re-use of existing wheels by providing so many built-in functions in the first place. But there are a few gaps in its coverage; a few common tasks that it doesn't provide a convenient builtin to handle.
That's where the Scalar::Util, List::Util, and List::MoreUtils modules can help. They provide commonly needed list and scalar processing functions, which are implemented in C for performance. Scalar::Util and List::Util are part of the Perl standard library (since Perl 5.8), and all three are also available on CPAN.
The Scalar::Util module provides the following functions:
blessed $scalar
If $scalar contains a reference to an object, blessed() returns a true value (specifically, the name of the class). Otherwise, it returns undef.
refaddr $scalar
If $scalar contains a reference, refaddr() returns an integer representing the memory address that reference points to. If $scalar doesn't contain a reference, the subroutine returns undef. This result is useful for generating unique identifiers for variables or objects (see Chapter 15).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 9: Subroutines
Inhaltsvorschau
If you have a procedure with ten parameters,
you probably missed some.
—Alan Perlis
Subroutines are one of the two primary problem-decomposition tools available in Perl, modules being the other. They provide a convenient and familiar way to break a large task down into pieces that are small enough to understand, concise enough to implement, focused enough to test, and simple enough to debug.
In effect, subroutines allow programmers to extend the Perl language, creating useful new behaviours with sensible names. Having written a subroutine, you can immediately forget about its internals, and focus solely on the abstracted process or function it implements.
So the extensive use of subroutines helps to make a program more modular, which in turn makes it more robust and maintainable. Subroutines also make it possible to structure the actions of programs hierarchically, at increasingly high levels of abstraction, which improves the readability of the resulting code.
That's the theory, at least. In practice, there are plenty of ways that using subroutines can make code less robust, buggier, less concise, slower, and harder to understand. The guidelines in this chapter focus on avoiding those outcomes.
Call subroutines with parentheses but without a leading & .
It's possible to call a subroutine without parentheses, if it has already been declared in the current namespace:

    sub coerce;





    

    # and later...



    my $expected_count = coerce $input, $INTEGER, $ROUND_ZERO;
But that approach can quickly become much harder to understand:

    fix my $gaze, upon each %suspect;
More importantly, leaving off the parentheses on subroutines makes them harder to distinguish from builtins, and therefore increases the mental search space when the reader is confronted with either type of construct. Your code will be easier to read and understand if the subroutines always use parentheses and the built-in functions always don't:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Call Syntax
Inhaltsvorschau
Call subroutines with parentheses but without a leading & .
It's possible to call a subroutine without parentheses, if it has already been declared in the current namespace:

    sub coerce;





    

    # and later...



    my $expected_count = coerce $input, $INTEGER, $ROUND_ZERO;
But that approach can quickly become much harder to understand:

    fix my $gaze, upon each %suspect;
More importantly, leaving off the parentheses on subroutines makes them harder to distinguish from builtins, and therefore increases the mental search space when the reader is confronted with either type of construct. Your code will be easier to read and understand if the subroutines always use parentheses and the built-in functions always don't:

            

    my $expected_count = coerce($input, $INTEGER, $ROUND_ZERO);



    fix(my $gaze, upon(each %suspect));

         
Some programmers still prefer to call a subroutine using the ancient Perl 4 syntax, with an ampersand before the subroutine name:

    &coerce($input, $INTEGER, $ROUND_ZERO);



    &fix(my $gaze, &upon(each %suspect));
Perl 5 does support that syntax, but nowadays it's unnecessarily cluttered. Barewords are forbidden under use strict, so there are far fewer situations in which a subroutine call has to be disambiguated.
On the other hand, the ampersand itself is visually ambiguous; it can also signify a bitwise AND operator, depending on context. And context can be extremely subtle:

    $curr_pos  = tell &get_mask();    # means: tell(get_mask())

    $curr_time = time &get_mask();    # means: time() & get_mask()

         
Prefixing with & can also lead to other subtle (but radical) differences in behaviour:

    sub fix {

        my (@args) = @_ ? @_ : $_;    # Default to fixing $_ if no args provided



        # Fix each argument by grammatically transforming it and then printing it...
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Homonyms
Inhaltsvorschau
Don't give subroutines the same names as built-in functions.
If you declare a subroutine with the same name as a built-in function, subsequent invocations of that name will still call the builtin...except when occasionally they don't. For example:

    sub lock {

        my ($file) = @_;

        return flock $file, LOCK_SH;

    }



    sub link {

        my ($text, $url) = @_;

        return qq{<a href="$url">$text</a>};

    }



    lock($file);                   # Calls 'lock' subroutine; built-in 'lock' hidden

    print link($text, $text_url);  # Calls built-in 'link'; 'link' subroutine hidden

         
Perl considers some of its builtins (like link) to be "more built-in" than others (like lock), and chooses accordingly whether to call your subroutine of the same name. If the builtin is "strongly built-in", an ambiguous call will invoke it, in preference to any subroutine of the same name. On the other hand, if the builtin is "weakly built-in", an ambiguous call will invoke the subroutine of the same name instead.
Even if these subroutines did always work as expected, it's simply too hard to maintain code where the program-specific subroutines and the language's keywords overlap:

    sub crypt { return "You're in the tomb of @_\n"   }

    sub map   { return "You have found a map of @_\n" }

    sub chop  { return "You have chopped @_\n"        }

    sub close { return "The @_ is now closed\n"       }

    sub hex   { return "A hex has been cast on @_\n"  }



    print crypt( qw( Vlad Tsepes ) );             # Subroutine or builtin?



    for my $reward (qw( treasure danger) ) {

         print map($reward, 'in', $location);     # Subroutine or builtin?

    }



    print hex('the Demon');                       # Subroutine or builtin?

    print chop('the Demon');                      # Subroutine or builtin?

         
There is an inexhaustible supply of subroutine names available; names that are more descriptive and unambiguous. Use them:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Argument Lists
Inhaltsvorschau
Always unpack @_first.
Subroutines always receive their arguments in the @_ array. But accessing them via $_[0], $_[1], etc. directly is almost always a Very Bad Idea. For a start, it makes the code far less self-documenting:

    Readonly my $SPACE => q{ };



    # Pad a string with whitespace...

    sub padded {

        # Compute the left and right indents required...

        my $gap   = $_[1] - length $_[0];

        my $left  = $_[2] ? int($gap/2) : 0;

        my $right = $gap - $left;



        # Insert that many spaces fore and aft...

        return $SPACE x $left

             . $_[0]

             . $SPACE x $right;

    }
Using "numbered parameters" like this makes it difficult to determine what each argument is used for, whether they're being used in the correct order, and whether the computation they're used in is algorithmically sane. Compare the previous version to this one:

            

    sub padded {

        my ($text, $cols_count, $want_centering) = @_;

            

               



        # Compute the left and right indents required...

            

            

        my $gap   = $cols_count - length $text;

        my $left  = $want_centering ? int($gap/2) : 0;

        my $right = $gap - $left;



        

                  # Insert that many spaces fore and aft...

               

        return $SPACE x $left

             . $text

             . $SPACE x $right;

    }

         
Here the first line unpacks the argument array to give each parameter a sensible name. In the process, that assignment also documents the expected order and intended purpose of each parameter. The sensible parameter names also make it easier to verify that the computation of $left and $right is correct.
A mistake when using numbered parameters:

        my $gap   = $_[1] - length $_[2];

        my $left  = $_[0] ? int($gap/2) : 0;

        my $right = $gap - $left;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Named Arguments
Inhaltsvorschau
Use a hash of named arguments for any subroutine that has more than three parameters.
Better still, use named arguments for any subroutine that is ever likely to have more than three parameters.
Named arguments replace the need to remember an ordering (which humans are comparatively poor at) with the need to remember names (which humans are relatively good at). Names are especially advantageous when a subroutine has many optional arguments—such as flags or configuration switches—only a few of which may be needed for any particular invocation.
Named arguments should always be passed to a subroutine inside a single hash, like so:

            

    sub padded {

        my ($arg_ref) = @_;



        my $gap   = $arg_ref->{cols} - length $arg_ref->{text};

        my $left  = $arg_ref->{centered} ? int($gap/2) : 0;

        my $right = $gap - $left;



        return $arg_ref->{filler} x $left

               . $arg_ref->{text}

               . $arg_ref->{filler} x $right;

    }



    

                  # and then...

               

    for my $line (@lines) {

        $line = padded({ text=>$line, cols=>20, centered=>1, filler=>$SPACE });

    }

         
As tempting as it may be, don't pass them as a list of raw name/value pairs:

    sub padded {

        my %arg = @_;



        my $gap   = $arg{cols} - length $arg{text};

        my $left  = $arg{centered} ? int($gap/2) : 0;

        my $right = $gap - $left;



        return $arg{filler} x $left

               . $arg{text}

               . $arg{filler} x $right;

    }





    

    # and then...

    for my $line (@lines) {

        $line = padded( text=>$line, cols=>20, centered=>1, filler=>$SPACE );

    }
Requiring the named arguments to be specified inside a hash ensures that any mismatch, such as:

            

    $line = padded({text=>$line, cols=>20..21, centered=>1, filler=>$SPACE});
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Missing Arguments
Inhaltsvorschau
Use definedness or existence to test for missing arguments.
It's a common mistake to use a boolean test to probe for missing arguments :

    Readonly my $FILLED_USAGE => 'Usage: filled($text, $cols, $filler)';



    sub filled {

        my ($text, $cols, $filler) = @_;



        croak $FILLED_USAGE

            if !$text || !$cols || !$filler;



        # [etc.]

    }
The problem is that this approach can fail in subtle ways. If, for example, the filler character is '0' or the text to be padded is an empty string, then an exception will incorrectly be thrown.
A much more robust approach is to test for definedness:

            

    use List::MoreUtils qw( any );



    sub filled {

        my ($text, $cols, $filler) = @_;



        croak $FILLED_USAGE

            if any {!defined $_} $text, $cols, $filler;



        

                  # [etc.]

               

    }

         
Or, if a particular number of arguments is required, and undef is an acceptable value for one of them, test for mere existence:

            

    sub filled {

        croak $FILLED_USAGE if @_ != 3;   

                  # All three args must be supplied

               



        my ($text, $cols, $filler) = @_;

        

                  # etc.

               

    }

         
Existence tests are particularly efficient because they can be applied before the argument list is even unpacked. Testing for the existence of arguments also promotes more robust coding, in that it prevents callers from carelessly omitting a required argument, and from accidentally providing any extras.
Note that existence tests can also be used when some arguments are optional, because the recommended practice for this case—passing options in a hash—ensures that the actual number of arguments passed is fixed (or fixed-minus-one, if the options hash happens to be omitted entirely):

            
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Default Argument Values
Inhaltsvorschau
Resolve any default argument values as soon as @_ is unpacked.
The fundamental rule of argument processing is: nothing happens in the subroutine until all the arguments are stable. Don't, for example, add in defaults on the fly:

    Readonly my $DEF_PAGE_WIDTH => 78;

    Readonly my $SPACE          => q{ };



    sub padded {

        my ($text, $arg_ref) = @_;



        # Compute left and right spacings...

        my $gap   = ($arg_ref->{cols}||$DEF_PAGE_WIDTH) - length($text||=$EMPTY_STR);

        my $left  = $arg_ref->{centered} ? int($gap/2) : 0;

        my $right = $gap - $left;



        # Prepend and append space...

        my $filler = $arg_ref->{filler} || $SPACE;

        return $filler x $left . $text . $filler x $right;

    }
Apart from making the gap computation much harder to read and to verify, using the || and ||= operators to select default values is equivalent to testing for truth, and therefore much more prone to error on the edge cases (such as a '0' fill character).
If default values are needed, set them up first. Separating out any initialization will make your code more readable, and simplifying the computational statements is likely to make them less buggy too:

            

    sub padded {

        my ($text, $arg_ref) = @_;



        

                  # Set defaults...

        #            If option given...          Use option           Else default

               

        my $cols   = exists $arg_ref->{cols}   ? $arg_ref->{cols}   : $DEF_PAGE_WIDTH;

        my $filler = exists $arg_ref->{filler} ? $arg_ref->{filler} : $SPACE;





        

                  

        # Compute left and right spacings...

               

        my $gap   = $cols - length $text;

        my $left  = $arg_ref->{centered} ? int($gap/2) : 0;

        my $right = $gap - $left;



        

                  # Prepend and append space...

               

        return $filler x $left . $text . $filler x $right;

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Scalar Return Values
Inhaltsvorschau
Always return scalar in scalar returns.
One of the more subtle features of Perl subroutines is the way that their call context propagates to their return statements. In most places in Perl, the context (list, scalar, or void) can be deduced at compile time. One place where it can't be determined in advance is to the right of a return. The argument of a return is evaluated in whatever context the subroutine itself was called.
That's a very handy feature, which makes it easy to factor out or rename specific uses of built-in functions. For example, if you found yourself repeatedly filtering undefined and negative values out of lists:

            

    @valid_samples = grep {defined($_) && $_ >= 0} @raw_samples;

         
it would be better to encapsulate that complex filter and rename it more meaningfully:

            

    sub valid_samples_in {

        return grep {defined($_) && $_ >= 0} @_;

    }



    

                  # and then...

               



    @valid_samples = valid_samples_in(@raw_samples);

         
Because the return expression is always evaluated in the same context as the surrounding call, it's also still okay to use this subroutine in scalar context:

            

    if (valid_samples_in(@raw_samples) < $MIN_SAMPLE_COUNT) {

        report_sensor_malfunction();

    }

         
When the subroutine is called in scalar context, its return statement imposes scalar context on the grep, which then returns the total number of valid samples—just as a raw grep would do in the same position.
Unfortunately, it's easy to forget about the contextual lycanthropy of a return, especially when you write a subroutine that is "only ever going to be used one way". For example:

    sub how_many_defined {

        return grep {defined $_} @_;

    }



    # and "always" thereafter:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Contextual Return Values
Inhaltsvorschau
Make list-returning subroutines return the "obvious" value in scalar context.
There is only one kind of list in Perl, so returning in a list context is easy—you just return all the values you produced:

            

    sub defined_samples_in {

        return grep {defined $_} @_;

    }

         
But what should that subroutine return in a scalar context? It might legitimately return an integer count (like grep itself does), in which case the subroutine stays exactly the same:

            

    sub defined_samples_in {

        return grep {defined $_} @_;

    }

         
Or it might instead return some serialized string representation of the list (like localtime does in scalar context):

            

    sub defined_samples_in {

        my @defined_samples = grep {defined $_} @_;



        

                  # Return all defined args in list context...

               

        if (wantarray) {

            return @defined_samples;

        }

        

                  # Otherwise a serialized version in scalar context...

               

        return join($COMMA, @defined_samples);

    }

         
Or it might return the "next" value in a series (like readline does):

            

    use List::Util qw( first );



    sub defined_samples_in {

        

                  # Return all defined args in list context...

               

        if (wantarray) {

            return grep {defined $_} @_;

        }



        

                  # Or, in scalar context, extract the first defined arg...

               

        return first {defined $_} @_;

    }

         
It might try to preserve as much information as possible and return the full list of values using an array reference (which no Perl 5 builtin does):

            

    sub defined_samples_in {

        my @defined_samples = grep {defined $_} @_;



        
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Multi-Contextual Return Values
Inhaltsvorschau
When there is no "obvious" scalar context return value, consider Contextual::Return instead.
Sometimes no single scalar return value is appropriate for a list-returning subroutine. Your play-testers simply can't agree: different developers consistently expect different behaviours in different scalar contexts.
For example, suppose you're implementing a get_server_status() subroutine that normally returns its information as a heterogeneous list:

            

               

                  

    # In list context, return all the available information...

               

    my ($name, $uptime, $load, $users) = get_server_status($server_ID);

         
You may find that, in scalar contexts, some programmers expected it to return its numeric load value:

            

               

                  

    # Total load is sum of individual server loads...

               

    $total_load += get_server_status($server_ID);

         
Others assumed it would return a boolean value indicating whether the server is up:

            

               

                  

    # Skip inactive servers...

               

    next SERVER if ! get_server_status($server_ID);

         
Still others anticipated a string summarizing the current status:

            

               

                  

    # Compile report on all servers...

               

    $servers_summary .= get_server_status($server_ID) . "\n";

         
While a fourth group hoped for a hash-reference, to give them convenient named access to the particular server information they wanted:

            

               

                  

    # Total users is sum of users on each server...
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Prototypes
Inhaltsvorschau
Don't use subroutine prototypes .
Subroutine prototypes allow you to make use of more sophisticated argument-passing mechanisms than Perl's "usual list-of-aliases" behaviour. For example:

    sub swap_arrays (\@\@) {

        my ($array1_ref, $array2_ref) = @_;



        my @temp_array = @{$array1_ref};

        @{$array1_ref} = @{$array2_ref};

        @{$array2_ref} = @temp_array;



        return;

    }



    # and later...



    swap_arrays(@sheep, @goats);      # Implicitly pass references

         
The problem is that anyone who uses swap_arrays(), and anyone who subsequently has to maintain that code, has to know about that subroutine's special magic. Otherwise, they will quite naturally assume that the two arrays will be flattened into a single list and slurped up by the subroutine's @_, because that's what happens in just about every other subroutine they ever use.
Using prototypes makes it impossible to deduce the argument-passing behaviour of a subroutine call simply by looking at the call. They also make it impossible to deduce the context in which particular arguments are evaluated. A subtle but common mistake is to "improve" the robustness of an existing library by putting prototype specifiers on all the subroutines. So a subroutine that used to be defined:

            

    use List::Util qw( min max );



    sub clip_to_range {

        my ($min, $max, @data) = @_;



        return map { max( $min, min($max, $_) ) } @data;

    }

         
is updated to:

    sub clip_to_range($$@) {  # takes two scalars and an array

        my ($min, $max, @data) = @_;



        return map { max($min, min($max, $_)) } @data;

    }
The problem is that clip_to_range() was being used with an elegant table-lookup scheme:

            

    my %range = (

        normalized => [-0.5,0.5],

        greyscale  => [0,255],

        percentage => [0,100],

        weighted   => [0,1],

    );



    
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Implicit Returns
Inhaltsvorschau
Always return via an explicit return .
If a subroutine "falls off the end" without ever encountering an explicit return, the value of the last expression evaluated in a subroutine is returned. That can lead to completely unexpected return values.
For example, consider this subroutine, which is supposed to return the second odd number in its argument list, or undef if there isn't a second odd number in the list:

    sub find_second_odd {

        my $prev_odd_found = 0;



        # Check through args...

        for my $num (@_) {

            # Find an odd number...

            if (odd($num)) {

                # Return it if it's not the first (must be the second)...

                return $num if $prev_odd_found;



                # Otherwise, remember it's been seen...

                $prev_odd_found = 1;

            }

        }

        # Otherwise, fail

    }
When that subroutine is used, strange things happen:

    if (defined find_second_odd(2..6)) {

        # find_second_odd() returns 5

        # so the if block does execute as expected

    }

    if (defined find_second_odd(2..1)) {

        # find_second_odd() returns undef

        # so the if block is skipped as expected

    }



    if (defined find_second_odd(2..4)) {

        # find_second_odd() returns an empty string (!)

        # so the if block is unexpectedly executed

    }



    if (defined find_second_odd(2..3)) {

        # find_second_odd() returns an empty string again (!)

        # so the if block is unexpectedly executed again

    }
The subroutine works correctly when there is a second odd number to be found, and when there are no numbers at all to be considered, but it behaves—there's no other word for it—oddly for the in-between cases. That anomalous empty string is returned because that's what a failed boolean test evaluates to in Perl. And a failed boolean test is the last expression evaluated in the loop. No, not the conditional in:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Returning Failure
Inhaltsvorschau
Use a bare return to return failure.
Notice that each final return statement in the examples of the previous guideline used a return keyword with no argument, rather than a more-explicit return undef.
Normally, relying on default behaviour is not best practice. But in the case of a return statement, relying on the default return value actually prevents a particularly nasty bug.
The problem with returning an explicit return undef is that—contrary to most people's expectations—a returned undef isn't always false.
Consider a simple subroutine like this:

    use Contextual::Return;



    sub guesstimate {

        my ($criterion) = @_;



        my @estimates;

        my $failed = 0;



        # [Acquire data for specified criterion]



        return undef if $failed;



        # [Do guesswork based on the acquired data]



        # Return all guesses in list context or average guess in scalar context...

        return (

            LIST   { @estimates                  }

            SCALAR { sum(@estimates)/@estimates; }

        );

    }
The successful return values are both fine, and completely appropriate for the two contexts in which the subroutine might be called. But the failure value is a serious problem. Since guesstimate() specifically tests for calls in list context, it's obvious that the subroutine is expected to be called in list contexts:

            

    if (my @melt_rates = guesstimate('polar melting')) {

        my $model = Std::Climate::Model->new({ polar_melting => \@melt_rates });



        for my $interval (1,2,5,10,50,100,500) {

            print $model->predict({ year => $interval })

        }

    }

         
But if the guesstimate() subroutine fails, it returns a single scalar value: undef. And in a list context (such as the assignment to @melt_rates), that single scalar undef value becomes a one-element list: (undef). So @melt_rates is assigned that one-element list and then evaluated in the overall
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 10: I/O
Inhaltsvorschau
On two occasions I have been asked [by members
of Parliament], "Pray, Mr. Babbage, if you put into
the machine wrong figures, will
the right answers come out?"
I am not able rightly to apprehend the kind of
confusion of ideas that could provoke such a question.
—Charles Babbage
Input and output are critical in any design, because they mediate the interface of an application or library. To most users of your software, what your I/O components do is their entire experience of what the software is. So good I/O practices are essential to usability.
I/O operations are also particularly susceptible to inefficiencies, especially on large data sets. I/O is frequently the bottleneck in a system, and usually doesn't scale well. So good I/O practices are essential to performance too.
Yet another concern is that I/O deals with the software's external environment, which is typically less reliable than its own internals. Dealing successfully with the multiple failure modes of operating systems, filesystems, network connections, and human beings requires careful and conservative programming. So good I/O practices are essential to robustness as well.
Don't use bareword filehandles .
One of the most efficient ways for Perl programmers to bring misery and suffering upon themselves and their colleagues is to write this:

    open FILE, '<', $filename

        or croak "Can't open '$filename': $OS_ERROR";
Using a bareword like that as a filehandle causes Perl to store the corresponding input stream descriptor in the symbol table of the current package. Specifically, the stream descriptor is stored in the symbol table entry whose name is the same as the bareword; in this case, it's *FILE. By using a bareword, the author of the previous code is effectively using a package variable to store the filehandle.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Filehandles
Inhaltsvorschau
Don't use bareword filehandles .
One of the most efficient ways for Perl programmers to bring misery and suffering upon themselves and their colleagues is to write this:

    open FILE, '<', $filename

        or croak "Can't open '$filename': $OS_ERROR";
Using a bareword like that as a filehandle causes Perl to store the corresponding input stream descriptor in the symbol table of the current package. Specifically, the stream descriptor is stored in the symbol table entry whose name is the same as the bareword; in this case, it's *FILE. By using a bareword, the author of the previous code is effectively using a package variable to store the filehandle.
If that symbol has already been used as a filehandle anywhere else in the same package, executing this open statement will close that previous filehandle and replace it with the newly opened one. That's going to be a nasty surprise for any code that was already relying on reading input with <FILE> .
The writer of this particular code also chose the imaginative name FILE for this particular filehandle. That's one of the commonest names used for package filehandles, so the chances of colliding with someone else's open filehandle are greatly enhanced.
As if these pitfalls with bareword filehandles weren't bad enough, barewords are even more unreliable if there's a subroutine of the same name currently in scope. And worse still, under those circumstances they may fail silently. For example:

            

    # Somewhere earlier in the same package (but perhaps in a different file)...

    use POSIX;



    # and later...



    # Open filehandle to the external device...

    open EXDEV, '<', $filename

        or croak "Can't open '$filename': $OS_ERROR";



    # And process data stream...

    while (my $next_reading = <EXDEV>) {

        process_reading($next_reading);

    }
The POSIX module will have quietly exported a subroutine representing the POSIX error-code
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Indirect Filehandles
Inhaltsvorschau
Use indirect filehandles.
Indirect filehandles provide a much cleaner and less error-prone alternative to bareword filehandles, and from Perl 5.6 onwards they're as easy to use as barewords. Whenever you call open with an undefined scalar variable as its first argument, open creates an anonymous filehandle (i.e., one that isn't stored in any symbol table), opens it, and puts a reference to it in the scalar variable you passed.
So you can open a file and store the resulting filehandle in a lexical variable, all in one statement, like so:

            

    open my $FILE, '<', $filename

        or croak "Can't open '$filename': $OS_ERROR";

         
The my $FILE embedded in the open statement first declares a new lexical variable in the current scope. That variable is created in an undefined state, so the open fills it with a reference to the filehandle it's just created, as described earlier.
Under versions of Perl prior to 5.6, open isn't able to create the necessary filehandle automatically, so you have to do it yourself, using the gensym() subroutine from the standard Symbol module:

            

    use Symbol qw( gensym );



    

                  # and later...

               



    my $FILE = gensym();

    open $FILE, '<', $filename

        or croak "Can't open '$filename': $OS_ERROR";

         
Either way, once the open filehandle is safely stored in the variable, you can read from it like so:

            

    $next_line = <$FILE>;

         
And now it doesn't matter that the name of that filehandle is $FILE (at least, not from the point of view of code robustness). Sure, it's still a lousy, lazy, unimaginative, uninformative name, but now it's a lousy, lazy, unimaginative, uninformative, lexical name, so it won't sabotage anyone else's lousy, lazy, unimaginative, uninformative name.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Localizing Filehandles
Inhaltsvorschau
If you have to use a package filehandle, localize it first.
Very occasionally, you simply have to use a package filehandle, rather than a lexical. For example, you might have existing code that relies on hard-wired bareword filehandle names.
In such cases, make sure that the symbol table entry involved is always referred to explicitly, with a leading asterisk. And, more importantly, always localize that typeglob within the smallest possible scope. For example:

            

               

                  

    # Wrap the Bozo::get_data() subroutine cleanly.

    # (Apparently this subroutine is hard-wired to only read from a filehandle

    #  named DATA::SRC. And it's used in hundreds of places throughout our

    #  buffoon-monitoring system, so we can't change it. At least we fired the

    #  clown that wrote this, didn't we???)...

               

    sub get_fool_stats {

        my ($filename) = @_;



        

                  # Create a temporary version of the hardwired filehandle...

               

        local *DATA::SRC;



        

                  # Open it to the specified file...

               

        open *DATA::SRC, '<', $filename

            or croak "Can't open '$filename': $OS_ERROR";



        

                  # Call the legacy subroutine...

               

        return Bozo::get_data();

    }

         
Applying local to the *DATA::SRC typeglob temporarily replaces that entry in the symbol table. Thereafter, the filehandle that is opened is stored in the temporary replacement typeglob, not in the original. And it's the temporary *DATA::SRC that Bozo::get_data() sees when it's called. Then, when the results of that call are returned, control passes back out of the body of get_fool_stats(), at which point any localization within that scope is undone, and any pre-existing *DATA::SRC filehandle is restored.
Localization prevents most of the usual problems with bareword filehandles, because it ensures that the original
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Opening Cleanly
Inhaltsvorschau
Use either the IO::File module or the three-argument form of open .
You may have noticed that all of the examples so far use the three-argument form of open. This variant was introduced in Perl 5.6 and is more robust that the older two-argument version, which is susceptible to very rare, but subtle, failures:

            

    # Log system uses a weird but distinctive naming scheme...

    Readonly my $ACTIVE_LOG => '>temp.log<';

    Readonly my $STATIC_LOG => '>perm.log<';



    # and later...



    open my $active,  "$ACTIVE_LOG"  or croak "Can't open '$$ACTIVE_LOG': $OS_ERROR";

    open my $static, ">$STATIC_LOG"  or croak "Can't open '$STATIC_LOG': $OS_ERROR";
This code executes successfully, but it doesn't do what it appears to. The $active filehandle is opened for output to a file named temp.log<, not for input from a file named >temp.log<. And the $static filehandle is opened for appending to a file named perm.log<, rather than overwriting a file named >perm.log<. That's because the two open statements are equivalent to:

    open my $active, '>temp.log<'   or croak "Can't open '>temp.log<': $OS_ERROR";

    open my $static, '>>perm.log<'  or croak "Can't open '>perm.log<': $OS_ERROR";
and the '>' and '>>' prefixes on the second arguments tell open to open the files whose names appear after the prefixes in the corresponding output modes.
Using a three-argument open instead ensures that the specified opening mode can never be subverted by bizarre filenames, since the second argument now specifies only the opening mode, and the filename is supplied separately and doesn't have to be decoded at all:

            

               

                  

    # Log system uses a weird but distinctive naming scheme...

               

    Readonly my $ACTIVE_LOG => '>temp.log<';

    Readonly my $STATIC_LOG => '>perm.log<';



    

                  
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Error Checking
Inhaltsvorschau
Never open, close, or print to a file without checking the outcome.
These three I/O functions are probably the ones that fail most often. They can fail because a path is bad, or a file is missing, or inaccessible, or has the wrong permissions, or a disk crashes, or the network fails, or the process runs out of file descriptors or memory, or the filesystem is read-only, or any of a dozen other problems.
So writing unguarded I/O statements like this:

    open my $out,  '>', $out_file;

    print {$out} @results;

    close $out;
is sheer optimism, especially when it's not significantly harder to check that everything went to plan:

            

    open my $out,  '>', $out_file  or croak "Couldn't open '$out_file': $OS_ERROR";

    print {$out} @results          or croak "Couldn't write '$out_file': $OS_ERROR";

    close $out                     or croak "Couldn't close '$out_file': $OS_ERROR";

         
Or, more forgivingly, as part of a larger interactive process:

            

    SAVE:

    while (my $save_file = prompt 'Save to which file? ') {

        

                  # Open specified file and save results...

               

        open my $out, '>', $save_file  or next SAVE;

        print {$out} @results          or next SAVE;

        close $out                     or next SAVE;



        

                  # Save succeeded, so we're done...

               

        last SAVE;

    }

         
Also see the "Builtin Failures" guideline in Chapter 13 for a less intrusive way to ensure that every open, print, and close is properly checked.
Checking every print to a terminal device is also laudable, but not essential. Failure in such cases is much rarer, and usually self-evident. Besides, if your print statements can't reach the terminal, it's unlikely that your warnings or exceptions will either.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Cleanup
Inhaltsvorschau
Close filehandles explicitly, and as soon as possible.
Lexical filehandles, and even localized package filehandles, automatically close as soon as their variable or localization goes out of scope. But, depending on the structure of your code, that can still be suboptimal:

    sub get_config {

        my ($config_file) = @_;



        # Access config file or signal failure...

        open my $fh, '<', $config_file

            or croak "Can't open config file: $config_file";



        # Load file contents...

        my @lines = <$fh>;



        # Storage for config data...

        my %config;

        my $curr_section = $EMPTY_STR;





        

        # Decode config data...

        CONFIG:

        for my $line (@lines) {

            # Section markers change the second-level hash destination...

            if (my ($section_name) = $line =~ m/ \A \[ ([^]]+) \] /xms) {

                $curr_section = $section_name;

                next CONFIG;

            }



            # Key/value pairs are stored in the current second-level hash...

            if (my ($key, $val) = $line =~ m/\A \s* (.*?) \s* : \s* (.*?) \s* \z/xms) {

                $config{$curr_section}{$key} = $val;

                next CONFIG;

            }



            # Ignore everything else

        }



        return \%config;

    }
The problem here is that the input file remains open after it's used, and stays open for however long the decoding of the data takes.
The sooner a filehandle is closed, the sooner the internal and external resources it controls are freed up. The sooner it's closed, the less chance there is for accidental reuse or misuse. The sooner an output filehandle is closed, the sooner the written file is in a stable state.
The previous example would be more robust if it didn't rely on the scope boundary to close the lexical filehandle when the subroutine returns. It should have been written:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Input Loops
Inhaltsvorschau
Use while (<>), not for (<>) .
Programmers are occasionally tempted to write input loops using a for, like this:

    use Regexp::Common;

    Readonly my $EXPLETIVE => $RE{profanity};



    for my $line (<>) {

        $line =~ s/$EXPLETIVE/[DELETED]/gxms;

        print $line;

    }
That's presumably because for loops are inherently finite in their number of iterations, and hence intrinsically more robust. Or perhaps it's just that the keyword is two characters shorter.
Whatever the reason, using a for loop to iterate input is a very inefficient and brittle solution. The iteration list of a for loop is (obviously) a list context. So in the example, the <> operator is called in a list context. Evaluating <> in list context causes it to read in every line it can, building a temporary list as it does. Once the input is complete, that list becomes the list to be iterated by the for.
There are several problems with that approach. For a start, it means the for loop won't start to iterate until the entire input stream has been read and an end-of-file encountered. This means that the previous code can't be used interactively. Moreover, constructing a (potentially very long) list of the input lines is expensive, both in terms of the memory required to store the entire list and in terms of the time required to allocate that memory and to actually build the list.
Worst of all, the for input loop doesn't scale well. Its memory requirements are linearly proportional to the total size of the input, with something like a 200% overhead. That means that a sufficiently large input might actually break the input loop with a memory allocation failure (Out of memory!), or at least slow it down intolerably with excessive memory allocation and swapping overheads.
In contrast, an equivalent while loop:

            

    while (my $line = <>) {

        $line =~ s/$EXPLETIVE/[DELETED]/gxms;

        print $line;

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Line-Based Input
Inhaltsvorschau
Prefer line-based I/O to slurping.
Reading in an entire file in a single <> operation is colloquially known as "slurping". But the considerations of memory allocation discussed in the previous section mean that slurping the contents of a file and then manipulating those contents monolithically, like so:

            

    # Slurp the entire file (see the next guideline)...

    my $text = do { local $/; <> };



    # Wash its mouth out...

    $text =~ s/$EXPLETIVE/[DELETED]/gxms;



    # Print it all back out...

    print $text;
is generally slower, less robust, and less scalable than processing the contents a line at a time:

            

    while (my $line = <>) {

        $line =~ s/$expletive/[DELETED]/gxms;

        print $line;

    }

         
Reading an entire file into memory makes sense only when the file is unstable in some way, or is being updated asynchronously and you need a "snapshot", or if your planned text processing is likely to cross line boundaries:

            

    sub get_C_code {

        my ($filename) = @_;



        

                  # Get a handle on the code...

               

        open my $in, '<', $filename

            or croak "Can't open C file '$filename': $OS_ERROR";



        

                  # Read it all in...

               

        my $code = do { local $/; <$in> };



        

                  # Convert any C-style comment to a single space...

               

        use Regexp::Common;   

                  # See Chapter 12

                  

               

        $code =~ s{ $RE{comment}{C} }{$SPACE}gxms;



        return $code;

    }

         
Because C comments can span multiple lines, it's necessary to load the entire file into memory at once so the pattern can detect such cases.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Simple Slurping
Inhaltsvorschau
Slurp a filehandle with a do block for purity.
Whenever you do need to read in an entire file at once, the syntax shown in the final example of the previous guideline is the right way to do it:

            

    my $code = do { local $/; <$in> };

         
Localizing the global $/ variable (a.k.a. $RS or $INPUT_RECORD_SEPARATOR, under use English) temporarily replaces it with a version whose value is undef. But, if the input record separator is undefined, there is effectively no input record separator, so Perl treats the input as a single, unseparated record, and the single <> (or readline) reads in the entire input stream as a single "line".
Reading in a complete file or stream this way is much more efficient than "concatenative" approaches such as:

    my $code;

    while (my $line = <$in>) {

        $code .= $line;

    }
or:

    my $code = join $EMPTY_STR, <$in>;
That second alternative is particularly bad because, like the for (<>) discussed earlier, the join evaluates the read operation in a list context, constructs a list of individual lines, and then joins them back together to create a single string. This process requires about three times as much memory as:

            

    my $code = do { local $/; <$in> };

         
It's also appreciably slower, and doesn't scale nearly as well as the size of the input text increases.
Note that it's important to put that localization-and-read inside a do {...} or in some other small block. A common mistake is to write this instead:

    $/ = undef;

    my $text = <$in>;
That works perfectly well, in itself, but it also undefines the global input record separator, rather than its temporary localized replacement. But the global input record separator controls the read behaviour of every filehandle—even those that are lexically scoped, or in other packages. So, if you don't localize the change in
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Power Slurping
Inhaltsvorschau
Slurp a stream with Perl6::Slurp for power and simplicity.
Reading in an entire input stream is common enough, and the do {...} idiom is ugly enough, that the next major version of Perl (Perl 6) will provide a built-in function to handle it directly. Appropriately, that builtin will be called slurp.
Perl 5 doesn't have an equivalent builtin, and there are no plans to add one, but the future functionality is available in Perl 5 today, via the Perl6::Slurp CPAN module. Instead of:

            

    my $text = do { local $/; <$file_handle> };

         
you can just write:

            

    use Perl6::Slurp;



    my $text = slurp $file_handle;

         
which is cleaner, clearer, more concise, and consequently less error-prone.
The slurp() subroutine is also much more powerful. For example, if you have only the file's name, you would have to write:

    my $text = do {

        open my $fh, '<', $filename or croak "$filename: $OS_ERROR";

        local $/;

        <$fh>;

    };
which almost seems more trouble than it's worth. Or you can just give slurp() the filename directly:

            

    my $text = slurp $filename;

         
and it will open the file and then read in its full contents for you.
In a list context, slurp() acts like a regular <> or readline, reading in every line separately and returning them all in a list:

            

    my @lines = slurp $filename;

         
The slurp() subroutine also has a few useful features that <> and readline lack. For example, you can ask it to automatically chomp each line before it returns:

            

    my @lines = slurp $filename, {chomp => 1};

         
or, instead of removing the line-endings, it can convert each one to some other character sequence (say, '[EOL]'):

            

    my @lines = slurp $filename, {chomp => '[EOL]'};

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Standard Input
Inhaltsvorschau
Avoid using *STDIN, unless you really mean it.
The *STDIN stream doesn't always mean "...from the tty". And it never means "...from the files specified on the command line", unless you go out of your way to arrange for it to mean that:

    close *STDIN or croak "Can't close STDIN: $OS_ERROR";

    for my $filename (@ARGV) {

        open *STDIN, '<', $filename or croak "Can't open STDIN: $OS_ERROR";

        while (<STDIN>) {

            print substr($_,2);

        }

    }
which is, of course, so complicated and ugly that it constitutes its own punishment.
*STDIN is always attached to the zeroth file descriptor of your process. By default, that's bound to the terminal (if any), but you certainly can't rely on that default. For example, if data is being piped into your process, then *STDIN will be bound to file descriptor number 1 of the previous process in the pipeline. Or if your input to your process is being redirected from a file, then *STDIN will be connected to that file.
To cope with these diverse possibilities and the possibility that the user just typed the desired input file(s) on the command line without bothering with any redirection arrows, it's much safer to use Perl's vastly cleverer alternative: *ARGV . The *ARGV stream is connected to wherever *STDIN is connected, unless there are filenames on the command line, in which case it's connected to the concatenation of those files.
So you can allow your program to cope with interactive input, shell-level pipes, file redirections, and command-line file lists by writing this instead:

            

    while (my $line = <ARGV>) {

        print substr($line, 2);

    }

         
In fact, you use this magic filehandle all the time, possibly without even realizing it. *ARGV is the filehandle that's used when you don't specify any other:

            

    while (my $line = <>) {

        print substr($line, 2);

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Printing to Filehandles
Inhaltsvorschau
Always put filehandles in braces within any print statement.
It's easy to lose a lexical filehandle that's being used in the argument list of a print:

    print $file $name, $rank, $serial_num, "\n";
Putting braces around the filehandle helps it stand out clearly:

            

    print {$file} $name, $rank, $serial_num, "\n";

         
The braces also convey your intentions regarding that variable; namely, that you really did mean it to be treated as a filehandle, and didn't just forget a comma.
You should also use the braces if you need to print to a package-scoped filehandle:

            

    print {*STDERR} $name, $rank, $serial_num, "\n";

         
Another acceptable alternative is to load the IO::Handle module and then use Perl's object-oriented I/O interface:

            

    use IO::Handle;



    $file->print( $name, $rank, $serial_num, "\n" );



    *STDERR->print( $name, $rank, $serial_num, "\n" );

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Simple Prompting
Inhaltsvorschau
Always prompt for interactive input.
There are few things more frustrating than firing up a program and then sitting there waiting for it to complete its task, only to realize after a few minutes that it's actually been just sitting there too, silently waiting for you to start interacting with it:

            

    # The quit command is case-insensitive and may be abbreviated...

    Readonly my $QUIT => qr/\A q(?:uit)? \z/ixms;



    # No command entered yet...

    my $cmd = $EMPTY_STR;



    # Until the q[uit] command is entered...

    CMD:

    while ($cmd !~ $QUIT) {

        # Get the next command...

        $cmd = <>;

        last CMD if not defined $cmd;





        

        # Clean it up and run it...

        chomp $cmd;

        execute($cmd)

            or carp "Unknown command: $cmd";

    }
Interactive programs should always prompt for interaction whenever they're being run interactively:

            

               

    # Until the q[uit] command is entered...

            

            

    CMD:

    while ($cmd !~ $QUIT) {

        

                  # Prompt if we're running interactively...

               

        if (is_interactive()) {

            print get_prompt_str();

        }



        

                  # Get the next command...

               

        $cmd = <>;

        last CMD if not defined $cmd;



        

                  # Clean it up and run it...

               

        chomp $cmd;

        execute($cmd)

            or carp "Unknown command: $cmd";

    }

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Interactivity
Inhaltsvorschau
Don't reinvent the standard test for interactivity .
The is_interactive() subroutine used in the previous guideline is surprisingly difficult to implement. It sounds simple enough: just confirm that both input and output filehandles are connected to the terminal. If the input isn't, there's no need to prompt, as the user won't be entering the data directly. And if the output isn't, there's no need to prompt, because the user wouldn't see the prompt message anyway.
So most people just write:

    sub is_interactive {

        return -t *ARGV && -t *STDOUT;

    }



    # and later...



    if (is_interactive()) {

        print $PROMPT;

    }
Unfortunately, even with the use of *ARGV instead of *STDIN (in accordance with the earlier "Standard Input" guideline), that implementation of is_interactive() doesn't work.
For a start, the *ARGV filehandle has the special property that it only opens the files in @ARGV when the filehandle is actually first read. So you can't just use the -t builtin on *ARGV:

    -t *ARGV
*ARGV won't be opened until you read from it, and you can't read from it until you know whether to prompt; and to know whether to prompt, you have to check where *ARGV was opened to, but *ARGV won't be opened until you read from it.
Several other magical properties of *ARGV can also prevent simple -t tests on the filehandle from providing the correct answer, even if the input stream is already open. In order to cope with all the special cases, you have to write:

            

    use Scalar::Util qw( openhandle );



    sub is_interactive {

        

                  # Not interactive if output is not to terminal...

               

        return 0 if not -t *STDOUT;



        

                  # If *ARGV is opened, we're interactive if...

               

        if (openhandle *ARGV) {

            
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Power Prompting
Inhaltsvorschau
Use the IO::Prompt module for prompting .
Because programs so often need to prompt for interactive input and then read that input, it's probably not surprising that there would be a CPAN module to make that process easier. It's called IO::Prompt and it exports only a single subroutine: prompt(). At its simplest, you can just write:

            

    use IO::Prompt;



    my $line = prompt 'Enter a line: ';

         
The specified string will be printed (but only if the program is interactive), and then a single line will be read in. That line will also be automatically chomped, unless you specifically request it not be.
The prompt() subroutine can also control the echoing of characters. For example:

            

    my $password = prompt 'Password: ', -echo => '*';

         
which echoes an asterisk for each character typed in:

            

    > Password: ***********
You can even prevent echoing entirely (by echoing an empty string in place of each character):

            

    my $password = prompt 'Password: ', -echo => $EMPTY_STR;

         
prompt() can return a single key-press (without requiring the Return key to be pressed as well):

            

    my $choice = prompt 'Enter your choice [a-e]: ', -onechar;

         
It can ignore inputs that are not acceptable:

            

    my $choice = prompt 'Enter your choice [a-e]: ', -onechar,

                        -require=>{ 'Must be a, b, c, d, or e: ' => qr/[a-e]/xms };

         
It can be restricted to certain kinds of common inputs (e.g., only integers, only valid filenames, only 'y' or 'n'):

            

    CODE:

    while (my $ord = prompt -integer, 'Enter a code (zero to quit): ') {

        if ($ord == 0) {

            exit if prompt -yn, 'Really quit? ';

            next CODE;

        }

        print qq{Character $ord is: '}, chr($ord), qq{'\n};

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Progress Indicators
Inhaltsvorschau
Always convey the progress of long non-interactive operations within interactive applications .
As annoying as it is to sit like a mushroom whilst some mute program waits for your unprompted input, it's even more frustrating to tentatively start typing something into an interactive program, only to discover that the program is still busy initializing, or calculating, or connecting to a remote device:

            

    # Initialize from any config files...

    for my $possible_config ( @CONFIG_PATHS ) {

        init_from($possible_config);

    }



    # Connect to remote server...

    my $connection;

    TRY:

    for my $try (1..$MAX_TRIES) {

        # Retry connection with increasingly tolerant timeout intervals...

        $connection = connect_to($REMOTE_SERVER, { timeout => fibonacci($try) });

        last TRY if $connection;

    }

    croak "Can't contact server ($REMOTE_SERVER)"

        if !$connection;



    # Interactive portion of the program starts here...

    while (my $cmd = prompt($prompt_str, -fail_if=>$QUIT)) {

        remote_execute($connection, $cmd)

            or carp "Unknown command: $cmd";

    }
It's much better—and not much more onerous—to give an active indication that an interactive program is busy doing something non-interactive:

            

               

                  

    # Initialize from any config files...

               

    print {*STDERR} 'Initializing...';

    for my $possible_config ( @CONFIG_PATHS ) {

        print {*STDERR} '.';

        init_from($possible_config);

    }

    print {*STDERR} "done\n";



    

                  # Connect to remote server...

               

    print {*STDERR} 'Connecting to server...';

    my $connection;



    TRY:

    for my $try (1..$MAX_TRIES) {

        print {*STDERR} '.';

        $connection = connect_to($REMOTE_SERVER, { timeout => fibonacci($try) });

        last TRY if $connection;

    }

    croak "Can't contact server ($REMOTE_SERVER)"

        if not $connection;

    print {*STDERR} "done\n";



    
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Automatic Progress Indicators
Inhaltsvorschau
Consider using the Smart::Comments module to automate your progress indicators .
As an alternative to coding the inline progress indicators or writing utility subroutines (as suggested in the previous guideline), you might prefer to use the Smart::Comments CPAN module, which keeps the comments about phases, and dispenses with the indicator code instead:

            

    use Smart::Comments;



    for my $possible_config ( @CONFIG_PATHS ) {  

                  ### Initializing...  done

               

        init_from($possible_config);

    }



    my $connection;

    TRY:

    for my $try (1..$MAX_TRIES) {                

                  ### Connecting to server...  done

               

        $connection = connect_to($REMOTE_SERVER, {timeout=>$TIMEOUT});

        last TRY if $connection;

    }

    croak "Can't contact server ($REMOTE_SERVER)"

        if not $connection;



    

                  # Interactive portion of the program starts here...

               

            

         
Smart::Comments allows you to put a specially marked comment (###) on the same line as any for or while loop. It then uses that comment as a template, from which it builds an automatic progress indicator for the loop. Other useful features of the Smart::Comments module are described under "Semi-Automatic Debugging" in Chapter 18.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Autoflushing
Inhaltsvorschau
Avoid a raw select when setting autoflushes.
When it comes to maintainable code, it doesn't get much worse than this commonly used Perl idiom:

    select((select($fh), $|=1)[0]);
The evil one-argument form of select takes a filehandle and makes it the (global!) default destination for print statements from that point onwards. That is, after a select, instead of writing to *STDOUT, any print statement that isn't given an explicit filehandle will now write to the filehandle that was select'd.
This change of default happens even if the newly selected filehandle was formerly confined to a lexical scope:

    for my $filename (@files) {

        # Open a lexical handle (will be automatically closed at end of iteration)

        open my $fh, '>', $filename

            or next;



        # Make it the default print target...

        select $fh;



        # Print to it...

        print "[This file intentionally left blank]\n";

    }
In actual applications, that last print statement would probably be replaced by a long series of separate print statements, controlled by some complex text-generation algorithm. Hence the desire to make the current $fh the default output filehandle, so as to avoid having to explicitly specify the filehandle in every print statement.
Unfortunately, because select makes its argument the global default for print, when the final iteration of the loop is finished, the last file that was successfully opened will remain the global print default. That filehandle won't be garbage-collected and auto-closed like all the other filehandles were, because the global default still refers to it. And for the remainder of your program, every print that isn't given an explicit filehandle will print to that final iterated filehandle, rather than to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 11: References
Inhaltsvorschau
Pointers are like jumps, leading wildly from one part of
the data structure to another. Their introduction into
high-level languages has been a step backwards
from which we may never recover.
—Charles Hoare
References in Perl are much safer than raw pointers (such as those available in C or C++). Perl references cannot be left dangling towards a scalar that has been garbage-collected, and they cannot be coerced into pretending that a hash is an array.
Semantically they're very robust, but sometimes their syntax lets them down, making code that uses references confusing or misleading. In certain configurations, they can also interfere with the garbage collector.
Symbolic references have far more problems. It's entirely possible for them to dangle, and they can easily be used to access the wrong type of referent. They also subvert the pre-eminence of lexically scoped variables. All in all, they're more trouble than they're worth.
Fortunately, every one of these problems can be avoided by following a small number of simple guidelines . . .
Wherever possible, dereference with arrows.
Use the -> notation in preference to "circumfix" dereferencing. In other words, when you're accessing references to containers, use the arrow syntax:

            

    print 'Searching from ', $list_ref->[0] ,  "\n",

          '            to ', $list_ref->[-1] , "\n";

         
This style results in much cleaner code than explicit wrap-and-prefix dereferencing :

    print 'Searching from ', ${$list_ref}[0],  "\n",

          '            to ', ${$list_ref}[-1], "\n";
Note that the arrow syntax also interpolates correctly into strings, so the previous example would be better written:

            

    print "Searching from $list_ref->[0]\n",

          "            to $list_ref->[-1]\n";

         
Explicit dereferencing is prone to two specific mistakes, which can be hard to detect if
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Dereferencing
Inhaltsvorschau
Wherever possible, dereference with arrows.
Use the -> notation in preference to "circumfix" dereferencing. In other words, when you're accessing references to containers, use the arrow syntax:

            

    print 'Searching from ', $list_ref->[0] ,  "\n",

          '            to ', $list_ref->[-1] , "\n";

         
This style results in much cleaner code than explicit wrap-and-prefix dereferencing :

    print 'Searching from ', ${$list_ref}[0],  "\n",

          '            to ', ${$list_ref}[-1], "\n";
Note that the arrow syntax also interpolates correctly into strings, so the previous example would be better written:

            

    print "Searching from $list_ref->[0]\n",

          "            to $list_ref->[-1]\n";

         
Explicit dereferencing is prone to two specific mistakes, which can be hard to detect if use strict is not in effect. The first error is simply forgetting to wrap-and-prefix at all:

    print 'Searching from ', $list_ref[0],  "\n",

          '            to ', $list_ref[-1], "\n";
The second mistake is wrapping-and-prefixing correctly, but accidentally leaving off the reference variable's own sigil (i.e., the one inside the braces):

    print 'Searching from ', ${list_ref}[0],  "\n",

          '            to ', ${list_ref}[-1], "\n";
In both cases, the array accesses are accessing the variable @list_ref instead of the array referred to by the reference in $list_ref.
Of course, if you need to access more than one element of a container (i.e., to slice it) via a reference to that container, there's no choice except to use the wrap-and-prefix syntax:

            

    my ($from, $to) = @{$list_ref}[0, -1];

         
Attempting to use the arrow notation to achieve the same effect doesn't work:

    my ($from, $to) = $list_ref->[0, -1];
Because the access expression ($list_ref->[0, -1]) begins with a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Braced References
Inhaltsvorschau
Where prefix dereferencing is unavoidable, put braces around the reference.
You can dereference a reference without first putting it in braces:

    push @$list_ref, @results;



    print substr($$str_ref, 0, $max_cols);



    my $first = $$list_ref[0];

    my @rest  = @$list_ref[1..$MAX];





    my $first_name = $$name_ref{$first};

    my ($initial, $last_name) = @$name_ref{$middle, $last};



    print @$$ref_to_list_ref[1..$MAX];
All of these work correctly, but they may also produce intense uncertainty and anxiety on the part of future readers of your code, who will fret about the relative precedences of the multiple sigils, and of the indexing brackets and braces. Or they will misread the leading $$... sequence as being related to the $$ (a.k.a. $PID) variable—especially in string interpolations:

    print "Your current ID is: JAPH_$$_ID_REF\n";
Braced references are always visually unambiguous:

            

    print "Your current ID is: JAPH_${$_ID_REF}\n";

         
And they give the reader better clues as to the internal structure of dereference:

            

    push @{$list_ref}, @results;



    print substr(${$str_ref}, 0, $max_cols);



    my $first = ${$list_ref}[0];

    my @rest  = @{$list_ref}[1..$MAX];



    my $first_name = ${$name_ref}{$first};

    my ($initial, $last_name) = @{$name_ref}{$middle, $last};



    print @{${$ref_to_list_ref}}[1..$MAX];

         
In some cases, bracketing can prevent subtle errors caused by the ambiguity of human expectations:

    my $result = $$$stack_ref[0];
By which the programmer may have intended:

    my $result = ${${$stack_ref[0]}};
or:

    my $result = ${${$stack_ref}[0]};
or:

    my $result = ${${$stack_ref}}[0];
If you're not entirely sure which of those three alternatives the unbracketed $$$stack_ref[0] is actually equivalent to, that illustrates precisely how important it is to use the explicit braces. Or, better still, to unpack the reference in stages:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Symbolic References
Inhaltsvorschau
Never use symbolic references.
If use strict 'refs' isn't in effect, a string containing the name of a variable can be used to access that variable:

    my $hash_name = 'tag';



    ${$hash_name}{nick}   = ${nick};

    ${$hash_name}{rank}   = ${'rank'}[-1];     # Most recent rank

    ${$hash_name}{serial} = ${'serial_num'};
You can even use the arrow notation on a plain string to get the same effect:

    my $hash_name = 'tag';



    $hash_name->{nick}   = ${nick};

    $hash_name->{rank}   = 'rank'->[-1];

    $hash_name->{serial} = ${'serial_num'};
A string used in this way is known as a symbolic reference. It's called that because when Perl encounters a string where it was expecting a reference, it uses the string to look up the local symbol table and find an entry for the relevant variable of the same name.
Hence the previous examples (assuming they are in package main) are both equivalent to:

    (*{$main::{$hash_name}}{HASH})->{nick}   = ${*{$main::{'nick'}}{SCALAR}};

    (*{$main::{$hash_name}}{HASH})->{rank}   = *{$main::{'rank'}}{ARRAY}->[-1];

    (*{$main::{$hash_name}}{HASH})->{serial} = ${*{$main::{'serial_num'}}{SCALAR}};
(For the viewers at home, the breakdown of that first line is shown in Figure 11-1. "Breakdown" being the operative word here.)
Figure 11-1: Symbolic reference breakdown
You'd never willingly write complex, unreadable code like that. So don't write code that's surreptitiously equivalent to it.
The example deconstruction illustrates that a symbolic reference looks up the name of a variable in the current package's symbol table. That means that a symbol reference can only ever refer to a package variable. And since you won't be using package variables in your own development (see Chapter 5), that will only lead to confusion. For example:

            
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Cyclic References
Inhaltsvorschau
Use weaken to prevent circular data structures from leaking memory.
Actual circular linked lists are quite rare in most Perl applications, mainly because they're generally not an efficient solution. Nor are they particularly easy to implement. Generally speaking, a standard Perl array with a little added "modulo length" logic is a cleaner, simpler, and more robust solution. For example:

            

    {

        

                  # Make variables "private" by declaring them in a limited scope

               

        my @buffer;

        my $next = -1;



        

                  # Get the next element stored in our cyclic buffer...

               

        sub get_next_cyclic {

            $next++;                   

                  # ...increment cursor

               

            $next %= @buffer;          

                  # ...wrap around if at end of array

               

            return $buffer[$next];     

                  # ...return next element

               

        }



        

                  # Grow the cyclic buffer by inserting new element(s)...

               

        sub insert_cyclic {

            

                  # At next pos (or start): remove zero elems, then insert args...

               

            splice @buffer, max(0,$next), 0, @_;



            return;

        }



        

                  # etc.

               

    }

         
However, circular data structures are still surprisingly easy to create. The commonest way is to have "owner" back-links in a hierarchical data structure. That is, if container nodes have references to the data nodes they own, and each data node has a reference back to the node that owns it, then you have cyclic references.
Non-hierarchical data can also easily develop circularities. Many kinds of bidirectional data relationships (such as peer/peer, supplier/consumer, client/server, or event callbacks) are modeled with links in both directions, to provide convenient and efficient navigation within the data structure.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 12: Regular Expressions
Inhaltsvorschau
Some people, when confronted with a problem, think:
"I know, I'll use regular expressions".
Now they have two problems.
—Jamie Zawinski
Regular expressions are one of the signature features of Perl, providing it with most of the practical extraction facilities for which it is famous. Many of those who are new to Perl (and many who aren't so new) approach regexes with mistrust, trepidation, or outright fear.
And with some justification. Regexes are specified in a compact and sometimes baroque syntax that is, all by itself, responsible for much of Perl's "executable line noise" reputation. Moreover, in the right hands, patterns are capable of performing mystifying feats of text recognition, analysis, transformation, and computation.
It's no wonder they scare so many otherwise stalwart Perl hackers.
And no surprise that they also figure heavily in many suboptimal programming practices, especially of the "cut-and-paste" variety. Or, more often, of the "cut-and-paste-and-modify-slightly-and-oh-now-it-doesn't-work-at-all-so-let's-modify-it-some-more-and-see-if-that-helps-no-it-didn't-but-we're-committed-now-so-maybe-if-we-change-that-bit-instead-hmmmm-that's-closer-but-still-not-quite-right-maybe-if-I-made-that-third-repetition-non-greedy-instead-oops-now-it's-back-to-not-matching-at-all-perhaps-I-should-just-post-it-to-PerlMonks.org-and-see-if-they-know-what's-wrong" variety.
Yet the secret to taming regular expressions is remarkably simple. You merely have to recognize them for what they really are, and treat them accordingly.
And what are regular expressions really? They're subroutines. Text-matching subroutines. Text-matching subroutines that are coded in an embedded programming language that's nearly entirely unrelated to Perl.
Once you realize that regexes are just code, it becomes obvious that regex best practices will, for the most part, simply be adaptations of the universal coding best practices described in other chapters: consistent and readable layout, sensible naming conventions, decomposition of complex code, refactoring of commonly used constructs, choosing robust defaults, table-based techniques, code reuse, and test-driven development.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extended Formatting
Inhaltsvorschau
Always use the /x flag.
Because regular expressions are really just programs, all the arguments in favour of careful code layout that were advanced in Chapter 2 must apply equally to regexes. And possibly more than equally, since regexes are written in a language much "denser" than Perl itself.
At very least, it's essential to use whitespace to make the code more readable, and comments to record your intent. Writing a pattern like this:

    m{'[^\\']*(?:\\.[^\\']*)*'}
is no more acceptable than writing a program like this:

    sub'x{local$_=pop;sub'_{$_>=$_[0

    ]?$_[1]:$"}_(1,'*')._(5,'-')._(4

    ,'*').$/._(6,'|').($_>9?'X':$_>8

    ?'/':$")._(8,'|').$/._(2,'*')._(

    7,'-')._(3,'*').$/}print$/x($=).

    x(10)x(++$x/10).x($x%10)while<>;
And no more readable, or maintainable.
The /x mode allows regular expressions to be laid out and annotated in a maintainable manner. Under /x mode, whitespace in your regex is ignored (i.e., it no longer matches the corresponding whitespace character), so you're free to use spaces and newlines for indentation and layout, as you do in regular Perl code. The # character is also special under /x. Instead of matching a literal '#', it introduces a normal Perl comment.
For example, the pattern shown previously could be rewritten like so:

            

               

                  

    # Match a single-quoted string efficiently...

               



    m{ '             

                  # an opening single quote

               

       [^\\']*       

                  # any non-special chars (i.e., not backslash or single quote)

               

       (?:           

                  # then all of...

               

           \\ .      

                  #    any explicitly backslashed char

               

           [^\\']*   

                  #    followed by any non-special chars
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Line Boundaries
Inhaltsvorschau
Always use the /m flag.
In addition to always using the /x flag, always use the /m flag. In every regular expression you ever write.
The normal behaviour of the ^ and $ metacharacters is unintuitive to most programmers, especially if they're coming from a Unix background. Almost all of the Unix utilities that feature regular expressions (e.g., sed, grep, awk) are intrinsically line-oriented. So in those utilities, ^ and $ naturally mean "match at the start of any line" and "match at the end of any line", respectively.
But they don't mean that in Perl.
In Perl, ^ and $ mean "match at the start of the entire string" and "match at the end of the entire string". That's a crucial difference, and one that leads to a very common type of mistake:

            

    # Find the end of a Perl program...



    $text =~ m{ [^\0]*?       # match the minimal number of non-null chars

                ^_  _END_  _$    # until a line containing only an end-marker

              }x;
In fact, what that code really does is:

    $text =~ m{ [^\0]*?       # match the minimal number of non-null chars

                ^             # until the start of the string

                _  _END_  _      # then match the end-marker

                $             # then match the end of the string

              }x;
The minimal number of characters until the start of the string is, of course, zero. Then the regex has to match '_ _END_ _'. And then it has to be at the end of the string. So the only strings that this pattern matches are those that consist of '_ _END_ _'. That is clearly not what was intended.
The /m mode makes ^ and $ work "naturally". Under /m, ^ no longer means "match at the start of the string"; it means "match at the start of any line". Likewise, $ no longer means "at end of string"; it means "at end of any line".
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
String Boundaries
Inhaltsvorschau
Use \A and \z as string boundary anchors .
Even if you don't adopt the previous practice of always using /m, using ^ and $ with their default meanings is a bad idea. Sure, you know what ^ and $ actually mean in a Perl regex. But will those who read or maintain your code know? Or is it more likely that they will misinterpret those metacharacters in the ways described earlier?
Perl provides markers that always—and unambiguously—mean "start of string" and "end of string": \A and \z (capital A, but lowercase z). They mean "start/end of string" regardless of whether /m is active. They mean "start/end of string" regardless of what the reader thinks ^ and $ mean.
They also stand out well. They're unusual. They're likely to be unfamiliar to the readers of your code, in which case those readers will have to look them up, rather than blithely misunderstanding them.
So rather than:

            

    # Remove leading and trailing whitespace...

    $text =~ s{^ \s* | \s* $}{}gx;
use:

            

               

                  

    # Remove leading and trailing whitespace...

               

    $text =~ s{\A \s* | \s* \z}{}gxm;

         
And when you later need to match line boundaries as well, you can just use ^ and $ "naturally":

            

               

                  

    # Remove leading and trailing whitespace, and any -- line...

               

    $text =~ s{\A \s* | ^-- [^\n]* $ | \s* \z}{}gxm;

         
The alternative (in which ^ and $ each have three distinct meanings in different contexts) is unnecessarily cruel:

            

    # Remove leading and trailing whitespace, and any -- line...

    $text =~ s{^ \s* | (?m: ^-- [^\n]* $) | \s* $}{}gx;
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
End of String
Inhaltsvorschau
Use \z, not \Z, to indicate "end of string ".
Perl provides a variant of the \z marker: \Z. Whereas lowercase \z means "match at end of string", capital \Z means "match an optional newline, then at end of string". This variant can occasionally be convenient, if you're working with line-based input, as you don't have to worry about chomping the lines first:

            

    # Print contents of lines starting with --...

    LINE:

    while (my $line = <>) {

        next LINE if $line !~ m/ \A -- ([^\n]+) \Z/xm;

        print $1;

    }
But using \Z introduces a subtle distinction that can be hard to detect when displayed in some fonts. It's safer to be more explicit: to stick with using \z, and say precisely what you mean:

            

               

                  

    # Print contents of lines starting with --...

               

    LINE:

    while (my $line = <>) {

        next LINE if $line !~ m/ \A -- ([^\n]+) \n? \z/xm;  

                  # Might be newline at end

               

        print $1;

    }

         
especially if what you actually meant was:

            

               

                  

    # Print contents of lines starting with -- (including any trailing newline!)...

               

    LINE:

    while (my $line = <>) {

        next LINE if $line !~ m/ \A -- ([^\n]* \n?) \z/xm;

        print $1;

    }

         
Using \n? \z instead of \Z forces you to decide whether the newline is part of the output or merely part of the scenery.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Matching Anything
Inhaltsvorschau
Always use the /s flag.
At this point, you might be starting to detect a pattern. Once again, the problem is that the dot metacharacter (.) doesn't mean what most people think it means. Most people—even those who actually know better—habitually think of it as meaning: "match any character".
It's easy to forget that it doesn't really mean that, and accidentally write something like:

            

    # Capture the source of a Perl program...



    $text =~ m{\A                # From start of string...

               (.*?)             # ...match and capture any characters

               ^_  _END_  _$     # ...until the first _  _END_  _ line

              }xm;
But the dot metacharacter doesn't match newlines, so the only strings this regex will match are those that start with '_ _END_ _'. That's because the ^ (start-of-line) metacharacter can match only at the start of the string or after a newline. But the preceding dot metacharacter can never match a newline, so the only way the ^ can match is if the preceding dot matches a start-of-string. But the dot metacharacter never matches start-of-string, because dot always matches exactly one character and start-of-string isn't a character.
In other words, as with ^ and $, the default behaviour of the dot metacharacter fails to be unreasonable (i.e., to be what most people expect). Fortunately, however, dot can be made to conform to the typical programmer's unreasonable expectations, simply by adding the /s flag. Under /s mode, a dot really does match every character, including newline:

            

               

                  

    # Capture the source of a Perl program...

               



    $text =~ m{\A          

                  # From start of string...

               

               (.*?)       

                  # ...match and capture any characters (including newlines!)

               

               ^_  _END_  _$  

                  
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Lazy Flags
Inhaltsvorschau
Consider mandating the Regexp::Autoflags module.
It takes about a week to accustom your fingers to automatically typing /xms at the end of every regex. But, realistically, some programmers will still not have the discipline required to develop and foster that good habit.
An alternative is to allow (that is, require) them to use the Regexp::Autoflags CPAN module instead, at the start of every source file they create. That module will then automatically turn on /xms mode in every regex they write.
That is, if they put:

            

    use Regexp::Autoflags;

         
at the start of their file, from that point on they can write regexes like:

            

    $text =~ m{\A          

                  # From start of string...

               

               (.*?)       

                  # ...match and capture any characters (including newlines!)

               

               ^_  _END_  _$  

                  # ...until the first _  _END_  _ line

               

              };

         
and:

            

    $source_code =~ s{               

                  # Substitute...

               

                       \#            

                  # ...a literal octothorpe

               

                       [^\n]*        

                  # ...followed by any number of non-newlines

               

                     }

                     {$SPACE}g;      

                  # Replacing it with a single space

               

            

         
They won't have to remember to append the all-important /xms flags, because the Regexp::Autoflags module will have automatically applied them.
Of course, this merely replaces the need for one kind of discipline (always use /xms) with the requirement for another (always use Regexp::Autoflags). However, it's much easier to check whether a single module has been loaded at least once, than it is to verify that the right regex flags have been used everywhere.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Brace Delimiters
Inhaltsvorschau
Use m{...} in preference to /.../ in multiline regexes.
You might have noticed that every regex in this book that spans more than a single line is delimited with braces rather than slashes. That's because it's much easier to identify the boundaries of the brace-delimited form, both by eye and from within an editor.
That ability is especially important in regexes where you need to match a literal slash, or in regexes which use many escape characters. For example, this:

            

    Readonly my $C_COMMENT => qr{

        / \*   

                  # Opening C comment delimiter

               

        .*?    

                  # Smallest number of characters (C comments don't nest)

               

        \* /   

                  # Closing delimiter

               

    }xms;

         
is a little easier to read than the more heavily backslashed:

    Readonly my $C_COMMENT => qr/

        \/ \*  # Opening C comment delimiter

        .*?    # Smallest number of characters (delims don't nest)

        \* \/  # Closing delimiter

    /xms;
Using braces as delimiters can also be advantageous in single-line regexes that are heavily laden with slash characters. For example:

    $source_code =~ s/ \/ \* (.*?) \* \/ //gxms;
is considerably harder to unravel than:

            

    $source_code =~ s{ / \* (.*?) \* / }{}gxms;

         
In particular, a final empty {} as the replacement text is much easier to detect and decipher than a final empty //. Though, of course, it would be better still to write that substitution as:

            

    $source_code =~ s{$C_COMMENT}{$EMPTY_STR}gxms;

         
to ensure maximum maintainability.
Using braces as regex delimiters has two other advantages. Firstly, in a substitution, the two "halves" of the operation can be placed on separate lines, to further distinguish them from each other. For example:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Other Delimiters
Inhaltsvorschau
Don't use any delimiters other than /.../ or m{...} .
Although Perl allows you to use any non-whitespace character you like as a regex delimiter, don't. Because leaving some poor maintenance programmer to take care of (valid) code like this:

    last TRY if !$!!~m!/pattern/!;
or this:

    $same=m={===m=}=;
or this:

    harry s truman was the 33rd u.s. president;
is just cruel.
Even with more reasonable delimiter choices:

    last TRY if !$OS_ERROR !~ m!/pattern/!;



    $same = m#{# == m#}#;



    harry s|ruman was |he 33rd u.s. presiden|;
the boundaries of the regexes don't stand out well.
By sticking with the two recommended delimiters (and other best practices), you make your code more predictable, so it is easier for future readers to identify and understand your regexes:

            

    last TRY if !$OS_ERROR !~ m{ /pattern/ }xms;



    $same = ($str =~ m/{/xms  ==  $str =~ m/}/xms);



    harry( $str =~ s{ruman was }{he 33rd u.s. presiden}xms );

         
Note that the same advice also applies to substitutions and transliterations: stick to s/.../.../xms or s{...}{...}xms, and tr/.../.../ or tr{...}{...}.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Metacharacters
Inhaltsvorschau
Prefer singular character classes to escaped metacharacters .
Escaped metacharacters are harder to decipher, and harder to distinguish from their unescaped originals:

    m/ \{ . \. \d{2} \} /xms;
The alternative is to put each metacharacter in its own tiny, one-character character class, like so:

            

    m/ [{] . [.] \d{2} [}] /xms;

         
Once you're familiar with this convention, it's very much easier to see the literal metacharacters when they're square-bracketed. That's particularly true for spaces under the /x flag. For example, the literal spaces to be matched in:

            

    $name =~ m{ harry [ ] s [ ] truman

              | harry [ ] j [ ] potter

              }ixms;

         
stand out much better than those in:

    $name =~ m{ harry \ s \ truman

              | harry \ j \ potter

              }ixms;
Note, however, that this approach can reduce the optimizer's ability to accelerate pattern matching under some versions of Perl. If benchmarking (see Chapter 19) indicates that this may be a problem for you, try the alternative approach suggested in the next guideline.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Named Characters
Inhaltsvorschau
Prefer named characters to escaped metacharacters.
As an alternative to the previous guideline, Perl 5.6 (and later) supports named characters in regexes. As previously discussed, this mechanism is much better for "unprintable" components of a regex. For example, instead of:

    if ($escape_seq =~ /\177 \006 \030 Z/xms) {   # Octal DEL-ACK-CAN-Z

        blink(182);

    }
use:

            

    use charnames qw( :full );



    if ($escape_seq =~ m/\N{DELETE} \N{ACKNOWLEDGE} \N{CANCEL} Z/xms) {

        blink(182);

    }

         
Note, however that named whitespace characters are treated like ordinary whitespace (i.e., they're ignored) under the /x flag:

    use charnames qw( :full );



    # and later...



    $name =~ m{ harry \N{SPACE} s \N{SPACE} truman     # harrystruman

              | harry \N{SPACE} j \N{SPACE} potter     # harryjpotter

              }ixms;
You would still need to put them in characters classes to make them match:

            

    use charnames qw( :full );



    

                  # and later...

               



    $name =~ m{ harry [\N{SPACE}] s [\N{SPACE}] truman     

                  # harry s truman

               

              | harry [\N{SPACE}] j [\N{SPACE}] potter     

                  # harry j potter

               

              }ixms;

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Properties
Inhaltsvorschau
Prefer properties to enumerated character classes.
Explicit character classes are frequently used to match character ranges, especially alphabetics. For example:

            

    # Alphabetics-only identifier...

    Readonly my $ALPHA_IDENT => qr/ [A-Z] [A-Za-z]* /xms;
However, a character class like that doesn't actually match all possible alphabetics. It matches only ASCII alphabetics. It won't recognize the common Latin-1 variants, let alone the full gamut of Unicode alphabetics.
That result might be okay, if you're sure your data will never be other than parochial, but in today's post-modern, multicultural, outsourced world it's rather déclassé for an überhacking rōnin to create identifier regexes that won't even match 'déclassé' or 'überhacking' or 'rō*nin'.
Regular expressions in Perl 5.6 and later support the use of the \p{...} escape, which allows you to use full Unicode properties . Properties are Unicode-compliant named character classes and are both more general and more self-documenting than explicit ASCII character classes. The perlunicode manpage explains the mechanism in detail and lists the available properties.
So, if you're ready to concede that ASCII-centrism is a naïve façade that's gradually fading into Götterdämmerung, you might choose to bid it adiós and open your regexes to the full Unicode smörgåsbord, by changing the previous identifier regex to:

            

    Readonly my $ALPHA_IDENT => qr/ \p{Uppercase}  \p{Alphabetic}* /xms;

         
There are even properties to help create identifiers that follow the normal Perl conventions but are still language-independent. Instead of:

    Readonly my $PERL_IDENT => qr/ [A-Za-z_] \w*/xms;
you can use:

            

    Readonly my $PERL_IDENT => qr/ \p{ID_Start} \p{ID_Continue}* /xms;

         
One other particularly useful property is \p{Any}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Whitespace
Inhaltsvorschau
Consider matching arbitrary whitespace, rather than specific whitespace characters.
Unless you're matching regular expressions against fixed-format machine-generated data, avoid matching specific whitespace characters exactly. Because if humans were directly involved anywhere in the data acquisition, then the notion of "fixed" will probably have been more honoured in the breach than in the observance.
If, for example, the input is supposed to consist of a label, followed by a single space, followed by an equals sign, followed by a single space, followed by an value...don't bet on it. Most users nowadays will—quite reasonably—assume that whitespace is negotiable; nothing more than an elastic formatting medium. So, in a configuration file, you're just as likely to get something like:

    name       = Yossarian, J

    rank       = Captain

    serial_num = 3192304
The whitespaces in that data might be single tabs, multiple tabs, multiple spaces, single spaces, or any combination thereof. So matching that data with a pattern that insists on exactly one space character at the relevant points is unlikely to be uniformly successful:

    $config_line =~ m{ ($IDENT)  [\N{SPACE}]  =  [\N{SPACE}]  (.*) }xms
Worse still, it's also unlikely to be uniformly unsuccessful. For instance, in the example data, it might only match the serial number. And that kind of intermittent success will make your program much harder to debug. It might also make it difficult to realize that any debugging is required.
Unless you're specifically vetting data to verify that it conforms to a required fixed format, it's much better to be very liberal in what you accept when it comes to whitespace. Use \s+ for any required whitespace and \s* for any optional whitespace. For example, it would be far more robust to match the example data against:

            

    $config_line =~ m{ ($IDENT)  \s*  =  \s*  (.*) }xms

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Unconstrained Repetitions
Inhaltsvorschau
Be specific when matching "as much as possible".
The .* construct is a particularly blunt and ponderous weapon, especially under /s. For example, consider the following parser for some very simple language, in which source code, data, and configuration information are separated by % and & characters (which are otherwise illegal):

            

    # Format is: <statements> % <data> & <config>...



    if ($source =~ m/\A  (.*)  %  (.*)  &  (.*) /xms) {

        my ($statements, $data, $config) = ($1, $2, $3);



        my $prog = compile($statements, {config=>$config});

        my $res  = execute($prog, {data=>$data, config=>$config});

    }

    else {

        croak 'Invalid program';

    }
Under /s, the first .* will successfully match the entire string in $source. Then it will attempt to match a %, and immediately fail (because there's none of the string left to match). At that point the regex engine will backtrack one character from the end of the string and try to match a % again, which will probably also fail. So it will backtrack one more character, try again, backtrack once more, try again, et cetera, et cetera, et cetera.
Eventually it will backtrack far enough to successfully match %, whereupon the second .* will match the remainder of the string, then fail to match &, backtrack one character, try again, fail again, and the entire "one-step-forward-two-steps-back" sequence will be played out again. Sequences of unconstrained matches like this can easily cause regular expression matches to become unacceptably slow.
Using a .*? can help in such cases:

    if ($source =~ m/\A  (.*?)  %  (.*?)  &  (.*) /xms) {

        my ($statements, $data, $config) = ($1, $2, $3);



        my $prog = compile($statements, {config=>$config});

        my $res  = execute($prog, {data=>$data, config=>$config});

    }

    else {

        croak 'Invalid program';

    }
since the "parsimonious repetitions" will then consume as little of the string as possible. But, to do this, they effectively have to do a look-ahead at every character they match, which can also become expensive if the terminator is more complicated than just a single character.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Capturing Parentheses
Inhaltsvorschau
Use capturing parentheses only when you intend to capture.
It's a waste of processor cycles to capture a substring you don't need. More importantly, it's misleading to do so. When the unfortunates who have to maintain the following code see:

    if ( $cmd =~ m/\A (q | quit | bye | exit) \n? \z/xms ) {

        perform_cleanup();

        exit;

    }
they will almost certainly start casting around to determine where $1 is used (perhaps for an exit confirmation request, or inside perform_cleanup()).
They'll be rightly annoyed when they eventually discover that $1 isn't used anywhere. Because now they can't be sure whether that indicates a bug, or was just laziness on the part of the original coder. Hence, they'll probably have to re-examine the logic of perform_cleanup() to determine whether that unused capture is actually A.W.O.L. And that's a waste of maintainer cycles.
Perl provides a form of regex parentheses that deliberately don't capture: the (?:...) parentheses. If the previous example had been written:

            

    if ( $cmd =~ m/\A (?:q | quit | bye | exit) \n? \z/xms ) {

        perform_cleanup();

        exit;

    }

         
then there would be no doubt that the parentheses were being used simply to group the four alternative "exit" commands, rather than to capture the particular "exit" command used.
Use non-capturing parentheses by default, and reserve capturing parentheses for when you need to make use of some part of a matched string. That way, your coded instructions will also encode your intentions, which is a much more robust and effective style of programming.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Captured Values
Inhaltsvorschau
Use the numeric capture variables only when you're sure that the preceding match succeeded.
Pattern matches that fail never assign anything to $1, $2, etc., nor do they leave those variables undefined. After an unsuccessful pattern match, the numeric capture variables remain exactly as they were before the match was attempted. Often, that means that they retain whatever values some earlier successful pattern match gave them.
So you can't test whether a pattern has matched by testing the numeric capture variables directly. A common mistake along those lines is to write something like:

    $full_name =~ m/\A (Mrs?|Ms|Dr) \s+ (\S+) \s+ (\S+) \z/xms;



    if (defined $1) {

        ($title, $first_name, $last_name) = ($1, $2, $3);

    }
The problem is that, if the match fails, $1 may still have been set by some earlier successful match in the same scope, in which case the three variables would be assigned capture values left over from that previous match.
Captured values should be used only when it's certain they actually were captured. The easiest way to ensure that is to always put capturing matches inside some kind of preliminary boolean test. For example:

            

    if ($full_name =~ m/\A (Mrs?|Ms|Dr) \s+ (\S+) \s+ (\S+) \z/xms) {

        ($title, $first_name, $last_name) = ($1, $2, $3);

    }

         
or:

            

    next NAME if $full_name !~ m/\A (Mrs?|Ms|Dr) \s+ (\S+) \s+ (\S+) \z/xms;



    ($title, $first_name, $last_name) = ($1, $2, $3);

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Capture Variables
Inhaltsvorschau
Always give captured substrings proper names.
$1, $2, etc. are dreadful names for variables. Like the parameter variables $_[0], $_[1], etc. (see "Named Arguments" in Chapter 9), they convey absolutely nothing about the values they store, except the order in which they occurred. They produce unreadable code like this:

    CONFIG_LINE:

    while (my $config = <>) {

        # Ignore lines that are unrecognisable...

        next CONFIG_LINE

            if $config !~ m/ \A  (\S+)  \s* = \s*  ([^;]+) ;  \s* \# (.*)/xms;



        # Verify the option makes sense...

        debug($3);

        croak "Unknown option ($1)"

            if not exists $option{$2};



        # Record the configuration option...

        $option{$2} = $1;

    }
As the capture variables don't have meaningful names, it's much harder to work out what this code is actually doing, and to verify that it's correct. (It's not.)
Because numbered variables suffer from the same drawbacks as numbered arguments, it's not surprising that the solution is the same, too: simply unpack $1, $2, etc. into sensibly named variables immediately after a successful match. Doing that makes the purpose—and the errors—much more obvious:

            

    CONFIG_LINE:

    while (my $config = <>) {

        

                  # Ignore lines that are unrecognisable...

               

        next CONFIG_LINE

            if $config !~ m/ \A  (\S+)  \s* = \s*  ([^;]+) ;  \s* \# (.*)/xms;



        

                  # Name captured components...

               

        my ($opt_name, $opt_val, $comment) = ($1, $2, $3);



        

                  # Verify the option makes sense...

               

        debug($comment);

        croak "Unknown option ($opt_name)"

            if not exists $option{$opt_val};   

                  # Oops: value used as key



        # Record the configuration option...

               

        $option{$opt_val} = $opt_name;         

                  # Oops*2: value as key; name as value
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Piecewise Matching
Inhaltsvorschau
Tokenize input using the /gc flag.
The typical approach to breaking an input string into individual tokens is to "nibble" at it, repeatedly biting off the start of the input string with successive substitutions:

    while (length $input > 0) {

        if ($input =~ s{\A ($KEYWORD)}{}xms) {

            my $keyword = $1;

            push @tokens, start_cmd($keyword);

        }

        elsif ($input =~ s{\A ($IDENT)}{}xms) {

            my $ident = $1;

            push @tokens, make_ident($ident);

        }

        elsif ($input =~ s{\A ($BLOCK)}{}xms) {

            my $block = $1;

            push @tokens, make_block($block);

        }

        else {

            my ($context) = $input =~ m/ \A ([^\n]*) /xms;

            croak "Error near: $context";

        }

    }
But this approach requires a modification to the $input string on every successful match, which makes it expensive to start with, and then causes it to scale badly as well. Nibbling away at strings is slow and gets slower as the strings get bigger.
In Perl 5.004 and later, there's a much better way to use regexes for tokenizing an input: you can just "walk" the string, using the /gc flag. The /gc flag tells a regex to track where each successful match finishes matching. You can then access that "end-of-the-last-match" position via the built-in pos() function. There is also a \G metacharacter, which is a positional anchor, just like \A is. However, whereas \A tells the regex to match only at the start of the string, \G tells it to match only where the previous successful /gc match finished. If no previous /gc match was successful, \G acts like a \A and matches only at the start of the string.
All of which means that, instead of using a regex substitution to lop each token off the start of the string (s{\A...}{}), you can simply use a regex match to start looking for the next token at the point where the previous token match finished (m{\G...}gc
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Tabular Regexes
Inhaltsvorschau
Build regular expressions from tables.
Tables like the one shown at the end of the previous guideline are a cleaner way of structuring regex matches, but they can also be a cleaner way of building a regex in the first place—especially when the resulting regex will be used to extract keys for the table.
Don't duplicate existing table information as part of a regular expression:

            

    # Table of irregular plurals...

    my %irregular_plural_of = (

        'child'       => 'children',

        'brother'     => 'brethren',

        'money'       => 'monies',

        'mongoose'    => 'mongooses',

        'ox'          => 'oxen',

        'cow'         => 'kine',

        'soliloquy'   => 'soliloquies',

        'prima donna' => 'prime donne',

        'octopus'     => 'octopodes',

        'tooth'       => 'teeth',

        'toothfish'   => 'toothfish',

    );



    # Pattern matching any of those irregular plurals...

    my $has_irregular_plural = qr{

        child     | brother     | mongoose

      | ox        | cow         | monkey

      | soliloquy | prima donna | octopus

      | tooth(?:fish)?

    }xms;



    # Form plurals...

    while (my $word = <>) {

        chomp $word;



        if ($word =~ m/ ($has_irregular_plural) /xms) {

            print $irregular_plural_of{$word}, "\n";

        }

        else {

            print form_regular_plural_of($word), "\n";

        }

    }
Apart from the annoying redundancy of specifying each key twice, this kind of duplication is a prime opportunity for mistakes to creep in. As they did—twice—in the previous example.
It's much easier to ensure consistency between a look-up table and the regex that feeds it if the regex is automatically constructed from the table itself. That's relatively easy to achieve, by replacing the regex definition with:

            

               

                  

    # Build a pattern matching any of those irregular plurals...

               
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Constructing Regexes
Inhaltsvorschau
Build complex regular expressions from simpler pieces.
Building a regular expression from the keys of a hash is a special case of a much more general best practice. Most worthwhile regexes—even those for simple tasks—are still too tedious or too complicated to code directly. For example, to extract the components of a number, you could write:

    my ($number, $sign, $digits, $exponent)

        = $input =~ m{ (                          # Capture entire number

                         ( [+-]? )                # Capture leading sign (if any)

                         ( \d+ (?: [.] \d*)?      # Capture mantissa: NNN.NNN

                         | [.] \d+                #               or:    .NNN

                         )

                         ( (?:[Ee] [+-]? \d+)? )  # Capture exponent (if any)

                       )

                     }xms;
Even with the comments, that pattern is bordering on unreadable. And checking that it works as advertised is highly non-trivial.
But a regular expression is really just a program, so all the arguments in favour of program decomposition (see Chapter 9) apply to regexes too. In particular, it's often better to decompose a complex regular expression into manageable (named) fragments, like so:

            

               

                  

    # Build a regex that matches floating point representations...

               

    Readonly my $DIGITS    => qr{ \d+ (?: [.] \d*)? | [.] \d+         }xms;

    Readonly my $SIGN      => qr{ [+-]                                }xms;

    Readonly my $EXPONENT  => qr{ [Ee] $SIGN? \d+                     }xms;

    Readonly my $NUMBER    => qr{ ( ($SIGN?) ($DIGITS) ($EXPONENT?) ) }xms;



    

                  # and later...

               



    my ($number, $sign, $digits, $exponent)

        = $input =~ $NUMBER;

         
Here, the full $NUMBER regex is built up from simpler components ($DIGITS, $SIGN, and $EXPONENT), much in the same way that a full Perl program is built from simpler subroutines. Notice that, once again, refactoring cleans up both the refactored code itself and the place that code is later used.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Canned Regexes
Inhaltsvorschau
Consider using Regexp::Common instead of writing your own regexes.
Regular expressions are wonderfully easy to code wrongly: to miss edge-cases, to include unexpected (and incorrect) matches, or to create a pattern that's correct but hopelessly inefficient. And even when you get your regex right, you still have to maintain the code that you used to build it.
It's a drag. Worse, it's everybody's drag. All around the world there are thousands of Perl programmers continually reinventing the same regexes: to match numbers, and URLs, and quoted strings, and programming language comments, and IP addresses, and Roman numerals, and zip codes, and Social Security numbers, and balanced brackets, and credit card numbers, and email addresses.
Fortunately there's a CPAN module named Regexp::Common, whose entire purpose is to generate these kinds of everyday regular expressions for you. The module installs a single hash (%RE), through which you can create thousands of commonly needed regexes.
For example, instead of building yourself a number-matcher:

            

    # Build a regex that matches floating point representations...

    Readonly my $DIGITS    => qr{ \d+ (?: [.] \d*)? | [.] \d+         }xms;

    Readonly my $SIGN      => qr{ [+-]                                }xms;

    Readonly my $EXPONENT  => qr{ [Ee] $SIGN? \d+                     }xms;

    Readonly my $NUMBER    => qr{ ( ($SIGN?) ($DIGITS) ($EXPONENT?) ) }xms;



    # and later...



    my ($number)

        = $input =~ $NUMBER;
you can ask Regexp::Common to do it for you:

            

    use Regexp::Common;



    

                  # Build a regex that matches floating point representations...

               

    Readonly my $NUMBER => $RE{num}{real}{-keep};



    

                  # and later...

               



    my ($number)

        = $input =~ $NUMBER;

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Alternations
Inhaltsvorschau
Always use character classes instead of single-character alternations .
Individually testing for single character alternatives:

    if ($cmd !~ m{\A (?: a | d | i | q | r | w | x ) \z}xms) {

        carp "Unknown command: $cmd";

        next COMMAND;

    }
may make your regex slightly more readable. But that gain isn't sufficient to compensate for the heavy performance penalty this approach imposes. Furthermore, the cost of testing separate alternatives this way increases linearly with the number of alternatives to be tested.
The equivalent character class:

            

    if ($cmd !~ m{\A [adiqrwx] \z}xms) {

        carp "Unknown command: $cmd";

        next COMMAND;

    }

         
does exactly the same job, but 10 times faster. And it costs the same no matter how many characters are later added to the set.
Sometimes a set of alternatives will contain both single- and multicharacter alternatives:

    if ($quotelike !~ m{\A (?: qq | qr | qx | q | s | y | tr ) \z}xms) {

        carp "Unknown quotelike: $quotelike";

        next QUOTELIKE;

    }
In that case, you can still improve the regex by aggregating the single characters:

            

    if ($quotelike !~ m{\A (?: qq | qr | qx | [qsy] | tr ) \z}xms) {

        carp "Unknown quotelike: $quotelike";

        next QUOTELIKE;

    }

         
Sometimes you can then factor out the commonalities of the remaining multicharacter alternatives into an additional character class:

            

    if ($quotelike !~ m{\A (?: q[qrx] | [qsy] | tr ) \z}xms) {

        carp "Unknown quotelike: $quotelike";

        next QUOTELIKE;

    }

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Factoring Alternations
Inhaltsvorschau
Factor out common affixes from alternations .
It's not just single character alternatives that are slow. Any alternation of subpatterns can be expensive. Especially if the resulting set of alternatives involves a repetition.
Every alternative that has to be tried requires the regex engine to backtrack up the string and re-examine the same sequence of characters it just rejected. And, if the alternatives are inside a repeated subpattern, the repetition itself may have to backtrack and retry every alternative from a different starting point. That kind of nested backtracking can easily produce an exponential increase in the time the complete match requires.
As if those problems weren't bad enough, alternations aren't very smart either. If one alternative fails, the matching engine just backs up and tries the next possibility, with absolutely no forethought as to whether that next alternative can possibly match.
For example, when a regular expression like:

    m{

       with \s+ each \s+ $EXPR \s* $BLOCK

     | with \s+ each \s+ $VAR  \s* in \s* [(] $LIST [)] \s* $BLOCK

     | with \s+ [(] $LIST [)] \s* $BLOCK

    }xms
is matching a string, it obviously tries the first alternative first. Suppose the string begins 'with er go est...'. In that case, the first alternative will successfully match with, then successfully match \s+, then successfully match e, but will then fail to match r (since it expected ach at that point). So the regex engine will backtrack to the start of the string and try the second alternative instead. Once again, it will successfully match with and \s+ and e, but then once again fail to match r. So it will backtrack to the start of the string once more and try the third alternative. Yet again it will successfully match with, then \s+, before failing to match the [(].
That's much less efficient than it could be. The engine had to backtrack twice and, in doing so, it had to retest and rematch the same
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Backtracking
Inhaltsvorschau
Prevent useless backtracking .
In the final example of the previous guideline:

            

    qr{

       with \s+

       (?: each \s+

           (?:$EXPR

             | $VAR  \s* in \s* [(] $LIST [)]

           )

         | [(] $LIST [)]

       )

       \s* $BLOCK

    }xms

         
if the match successfully reaches the shared \s* $BLOCK suffix but subsequently fails to match the trailing block, then the regex engine will immediately backtrack. That backtracking will cause it to reconsider the various (nested) alternatives: first by backtracking within the previous successful alternative, and then by trying any remaining unexamined alternatives. That's potentially a lot of expensive matching, all of which is utterly useless. For a start, the syntaxes of the various options are mutually exclusive, so if one of them already matched, none of the subsequent candidates ever will.
Even if that weren't the case, the regex is backtracking only because there wasn't a valid block at the end of the loop specification. But backtracking and messing around with the other alternatives won't change that fact. Even if the regex does find another way to match the first part of the loop specification, there still won't be a valid block at the end of the string when matching reaches that point again.
This particular situation arises every time an alternation consists of mutually exclusive alternatives. The "dumb but fast" behaviour of the regex engine forces it to go back and mindlessly try every other possibility, even when—to an outside observer—that's provably a complete waste of time and the engine would do much better to just forget about backtracking into the alternation.
As before, you have to explicitly point that optimization out to Perl. In this case, that's done by enclosing the alternation in a special form of parentheses: (?>...). These are Perl's "don't-ever-backtrack-into-me" markers. They tell the regex engine that the enclosed subpattern can safely be skipped over during backtracking, because you're confident that re-matching the contents either won't succeed or, if it does succeed, won't help the overall match.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
String Comparisons
Inhaltsvorschau
Prefer fixed-string eq comparisons to fixed-pattern regex matches.
If you're trying to compare a string against a fixed number of fixed keywords, the temptation is to put them all inside a single regex, as anchored alternatives:

            

    # Quit command has several variants...

    last COMMAND if $cmd =~ m{\A (?: q | quit | bye ) \z}xms;
The usual rationale for this is that a single, highly optimized regex match must surely be quicker than three separate eq tests:

            

               

                  

    # Quit command has several variants...

               

    last COMMAND if $cmd eq 'q'

                 || $cmd eq 'quit'

                 || $cmd eq 'bye';

         
Unfortunately, that's not the case. Regex-matching against a series of fixed alternations is at least 20% slower than individually eq-matching the same strings—not to mention the fact that the eq-based version is significantly more readable.
Likewise, if you're doing a pattern match merely to get case insensitivity:

            

    # Quit command is case-insensitive...

    last COMMAND if $cmd =~ m{\A quit \z}ixms;
then it's more efficient, and arguably more readable, to write:

            

               

                  

    # Quit command is case-insensitive...

               

    last COMMAND if lc($cmd) eq 'quit';

         
Sometimes, if there are a large number of possibilities to test:

            

    Readonly my @EXIT_WORDS => qw(

        q  quit  bye  exit  stop  done  last  finish  aurevoir

    );

         
or the number of possibilities is indeterminate at compile time:

            

    Readonly my @EXIT_WORDS

        => slurp $EXIT_WORDS_FILE, {chomp=>1};

         
then a regex might seem like a better alternative, because it can easily be built on the fly:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 13: Error Handling
Inhaltsvorschau
Several recent languages have adopted an
Intercal-like, asynchronous, computed
COME-FROM concept. Only they
refer to it with funny terms like
"exception handling".
—Hans Mulder
The two central difficulties of programming are the same as the two central difficulties of road safety: like cars, programs are built by humans; and, like cars, programs are driven by humans.
Debugging (see Chapter 18) is the art of overcoming the fallibility of those who create software systems. Error handling is the art of surviving the fallibility of those who drive such systems.
Effective and maintainable error handling is one of the keys to creating software that can be considered robust. Even a program with no internal bugs must still interact with the environment in which it executes: at very least, the operating system, filesystem, terminal I/O, hardware devices, and network connections.
That environment must be treated as hostile, because any or all of its components may fail in some unpredictable manner. Robust software must allow for that possibility, detect when it occurs, and either overcome the problem, if possible, or report it and fail gracefully. All of which comes under the mantle of error handling.
This chapter suggests several coding practices that can help. Those practices are all based on two fundamental principles. The first is that all detectable run-time errors must be detected, classified, and reported. The second is that it should not be possible to ignore any detected error without a conscious and visible effort.
The important—though perhaps not obvious—consequence of these two principles is that detectable errors must be allowed to propagate only upwards (to callers), not laterally (to other statements within the same scope), and certainly never downwards (into subsequent subroutine calls).
Throw exceptions instead of returning special values or setting flags.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Exceptions
Inhaltsvorschau
Throw exceptions instead of returning special values or setting flags.
Returning a special error value on failure, or setting a special error flag, is a very common error-handling technique. Collectively, they're the basis for virtually all error notification from Perl's own built-in functions.
Error notification via flags and return values has a serious flaw: flags and return values can be silently ignored. And ignoring them requires absolutely no effort on the part of the programmer. In fact, in a void context, ignoring return values is Perl's default behaviour. Ignoring an error flag that has suddenly appeared in a special variable is just as easy: you simply don't bother to check the variable.
Moreover, because ignoring a return value is the void-context default, there's no syntactic marker for it. So there's no way to look at a program and immediately see where a return value is deliberately being ignored, which means there's also no way to be sure that it's not being ignored accidentally.
The bottom line: regardless of the programmer's (lack of ) intention, an error indicator is being ignored. That's not good programming.
Ignoring error indicators frequently causes programs to propagate errors in entirely the wrong direction, as happens in Example 13-1.
Example 13-1. Returning special error values

               # Find and open a file by name, returning the filehandle

               # or undef on failure...

sub locate_and_open {

    my ($filename) = @_;



    # Check acceptable directories in order...

    for my $dir (@DATA_DIRS) {

        my $path = "$dir/$filename";



        



        # If file exists in an acceptable directory, open and return it...
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Builtin Failures
Inhaltsvorschau
Make failed builtins throw exceptions too.
Given that exceptions are the recommended way of signaling and handling errors, Perl's own builtins pose something of a problem: they rely on special return values or flag variables instead.
Ignoring the return values of builtins makes for prettier, but much less robust, code:

    open my $fh, '>', $filename;

    print {$fh} $results;

    close $fh;
As it turns out, though, it's much easier to change how Perl's builtins fail than it is to change how Perl programmers code. You just need to use the standard Fatal module:

            

    use Fatal qw( open close );



    open my $fh, '>', $filename;

    print {$fh} $results;

    close $fh;

         
The Fatal module is passed a list of builtins and, by the use of dark and terrible magics, it transforms those functions so that they no longer return false on failure; they now throw an exception instead. This means that the last three untested lines of the previous example are now perfectly acceptable. Either each builtin will succeed, or one will fail, at which point that builtin will throw an exception.
use Fatal can also be applied to subroutines, to convert them from return-false-on-failure to throw-exception-on-failure. For example, in the previous guideline, instead of rewriting locate_and_open(), you could have Fatal'd it:

            

    # Load subroutine to find and open a file by name

    # (Unfortunately, we're stuck with using the original version,

    #  which returns false on failure.)

    use Our::Corporate::File::Utilities qw( locate_and_open );



    

                  # So change that unacceptable failure behaviour to throw exceptions instead...

               

    use Fatal qw( locate_and_open );



    

                  # and later...

               



    for my $filename  (@source_files) {

        my $fh = locate_and_open($filename);   
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Contextual Failure
Inhaltsvorschau
Make failures fatal in all contexts.
The Fatal pragma can also be invoked with the special marker :void. Loading Fatal with this extra marker causes it to rewrite builtins and subroutines in a slightly different way, such that they throw a failure exception only if they were called in a void context. Under :void, they continue to silently return false in non-void contexts. That is:

    use Fatal qw( :void open close );



    if (open my $out, '>', $filename) {         # Call to open() in non-void context so

                                                #     open() returns false on failure



        open my $in, '<', '$filename.dat';      # Call to open() in void context so

                                                #     open() throws exception on failure



        print {$out} <$in>;



        close $out                              # Call close() in non-void context so

            or carp "close failed: $OS_ERROR";  #     close() returns false on failure

            



        close $in;                              # Call close() in void context so

                                                #     close() throws exception on failure

    }
While this may seem like an improvement (more flexible, more Perlish), it's actually a step backwards in terms of code reliability. The problem is that it's far too easy to call a subroutine or function in a non-void context and still not actually test it. For example:

            

    # Change unacceptable failure behaviour to throw exceptions instead...

    use Fatal qw( :void locate_and_open );



    # and later...



    for my $filename  (@source_files) {

        my $fh = locate_and_open($filename);

        my $head = load_header_from($fh);

        print $head;

    }
Here, locate_and_open() is upgraded to throw exceptions on void-context failure. Unfortunately, it isn't called in a void context. It's called in scalar context, so it still returns its usual
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Systemic Failure
Inhaltsvorschau
Be careful when testing for failure of the system builtin.
The system command is a particularly nasty case. Unlike most other Perl builtins, it returns false on success and true on failure. Fatal doesn't work on it either, so most people give up and write something like:

    system $cmd

        and croak "Couldn't run: $cmd ($OS_ERROR)";
The flow-of-control there is highly counterintuitive unless you're familiar with system's unusual failure return value.
A cleaner approach is to use the WIFEXITED ("if-exited") subroutine from the standard POSIX module:

            

    use POSIX qw( WIFEXITED );



    

                  # And later...

               



    WIFEXITED(system $cmd)

        or croak "Couldn't run: $cmd ($OS_ERROR)";

         
Note that this particular return value anomaly will be fixed in Perl 6. The revised system function will still return an integer status value as in Perl 5, but the boolean value of that status will be "reversed": true if the status is zero and false otherwise. Those new semantics are already available in Perl 5, via the Perl6::Builtins CPAN module:

            

    use Perl6::Builtins qw( system );



    

                  # and later...

               



    system $cmd

        or croak "Couldn't run: $cmd ($OS_ERROR)";

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Recoverable Failure
Inhaltsvorschau
Throw exceptions on all failures, including recoverable ones.
All of the examples so far in this chapter have dealt with unrecoverable errors. If a file doesn't exist, can't be found, or can't be created, then there's not much more that a program can do except give up and throw an exception.
However, there are other kinds of resource acquisition failures—such as failing to open a file that's currently locked by someone else, or being unable to fork a new process when your process limit has been reached—that are not always hanging offenses. If the resource is likely to become available later, your application might choose to idle for a short period and try to acquire it again. It might even try that several times before giving up.
In such cases, it's tempting to report failure by returning undef:

    TRY:

    for my $try (1..$MAX_TRIES) {

        # Take care of locking of, and connection to, resource...

        $resource = acquire_resource($resource_id);



        # Got it...

        last TRY if defined $resource;



        



       # Report non-recoverable failure if no more tries

        croak 'Could not acquire resource' if $try == $MAX_TRIES;



        # Else wait for increasing random intervals to help resolve contention...

        nap( rand fibonacci($try) );

    }



    do_something_using($resource);
But, even when the expected failures are recoverable like this, it's still better to throw exceptions:

            

    TRY:

    for my $try (1..$MAX_TRIES) {

        

                  # If resource successfully acquired, we're done...

               

        eval {

            $resource = acquire_resource($resource_id);

            last TRY;

        };



        

                  # Report non-recoverable failure if no more tries

               

        croak( $EVAL_ERROR ) if $try == $MAX_TRIES;



        

                  # Otherwise, try again after an increasing randomized interval...
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reporting Failure
Inhaltsvorschau
Have exceptions report from the caller's location, not from the place where they were thrown.
If someone is using a subroutine you wrote:

            

    use Data::Checker qw( check_in_range );



    for my $measurement ( @remote_samples ) {

        check_in_range($measurement, {min => 0, max => $INSTRUMENT_MAX_VAL});

    }

         
they're not going to want to encounter an exception like this:

    Value 24536526 is out of range (0..99) at /usr/lib/perl/Data/Checker.pm line 1345
The message itself is fine, but the location information is close to useless. Developers who are using your code don't care where your code detected a problem; all they care about is where their code caused the problem. They want to see something like:

            

    Value 24536526 is out of range (0..99) at reactor_check.pl line 23

         
That is, they want to be told the location where the fatal subroutine was called, not the internal location where it actually threw the exception.
And, of course, that's the whole purpose of the standard Carp module: to report exceptions from the caller's point of view. So never use die to throw an exception:

    die "Value $val is out of range ($min..$max)"

        if $val < $min || $val > $max;
Always use croak() instead:

            

    use Carp;



    

                  # and later...

               



    croak( "Value $val is out of range ($min..$max)" )

        if $val < $min || $val > $max;

         
The only situation when die could reasonably be used instead of croak() is if the error is a purely internal problem within your code, and not the caller's fault in any way. For example, if your subroutine is supposed to generate a result within a certain range using a very complicated process, you might choose to test whether the result was valid before returning it, like so:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Error Messages
Inhaltsvorschau
Compose error messages in the recipient's dialect.
An error message is nearly useless if it's unintelligible to those who encounter it. For example, someone who uses a subroutine to load DAXML data:

            

    use XML::Parser::DAXML qw( load_DAXML );



    my $DAXML_root = load_DAXML($source_file);

         
will want to see an error message like this:

            

    File 'index.html' is not valid DAXML.

    Missing "</BLINK>" tag

    Problem detected near "</BLINK</HEAD>".

    Failed at 'DAXML_to_PDF.pl', line 3

         
An error message like that indicates what the overall problem is (not valid DAXML), why it was considered a problem (Missing "</BLINK>" tag), where the problem occurred (File 'index.html', near "<BLINK</HEAD>"), and which line in the caller's source failed ('DAXML_to_PDF.pl', line 3).
Collectively this information—what's wrong, why it's wrong, where in the data, and whence in the code—makes it easy for those who are using your utility to locate and correct their own problems.
Unfortunately, most exception messages are written by developers, and for developers (i.e., themselves). Most often, they're written during the testing or debugging process, so they tend to be written in the language of the developers, using the vocabulary of the implementation. So someone who is using your utility is likely to be confronted with an error message like:

    Invalid token ('<') at 'Acquisition.pm', line 2637
This is very concise (one-fifth the size of the error message suggested earlier) and completely accurate (the problem is indeed the unexpected < of the </HEAD> tag buried in the incomplete </BLINK tag). But it's likely to be of very little help to those who are using your module. They may have no idea what a parser token is, they're faced with thousands of angle-brackets in their data, and they certainly don't want to look through several thousand lines of your module source to try and work out what they did wrong.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Documenting Errors
Inhaltsvorschau
Document every error message in the recipient's dialect.
It's important to document every exception (or warning) your code may ever generate (see Chapter 7), but it's vital to do so in a way that will be comprehensible to the likely recipient of these messages.
For example, suppose someone uses your new Random::Utils module:

            

    use Random::Utils qw( pick_from );



    

                  # and later...

               



    $random_item = pick_from(@items);

         
And suppose that call to pick_from() causes their program to terminate unexpectedly with the message:

            

    Can't pick a random element from an empty list at monte_carlo.pl line 42

         
If they're not familiar with your module, they may be unsure what the problem is, or what caused it, or what to do about it. In which case, you'd hope that they'll try and work out what to do by reading the fine Random::Utils manual.
That kind of self-help will be far more likely to happen if your documentation actually does help readers solve their problems. To achieve that goal, you first need to explain the problem more fully, in one or more complete sentences; sentences that are longer—and written in less dense language—than the error message itself. You should then describe the most common causes of the problem, and finally suggest how the offending code might be fixed. For example:

            

    =head1 DIAGNOSTICS



    =over





    =item Can't pick an element from an empty list



    The C<pick_from()> subroutine was called without any arguments, which

    meant it had no values to choose amongst. Perhaps you forgot to supply

    an argument to C<pick_from()>. Alternatively, maybe you passed an

    array to the subroutine, but that array was empty at the time.

    If you need to pass C<pick_from()> an array that might sometimes have no

    elements, try using the C<pick_with_default_from()>  subroutine instead

    (see L<Picking randomly with a fall-back value>)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
OO Exceptions
Inhaltsvorschau
Use exception objects whenever failure data needs to be conveyed to a handler.
Since Perl 5.005 it has been possible to pass a single blessed reference to die or croak. Suppose, for example, you've created an exception class named X::TooBig. Then you can create an X::TooBig object and pass it straight to die or croak:

            

    croak( X::TooBig->new( {value=>$num, range=>[0,$MAX_ALLOWED_VALUE]} ) )

        if $num > $MAX_ALLOWED_VALUE;

         
Using objects as an exception has two important advantages: exception objects can be detected by type (using the exception classes' caught() methods), and they can ferry complex data structures back to an exception handler, carrying them inside the exception objects. For example:

            

               

                  

    # Get the next number...

               

    my $value = eval { get_number() };



    

                  # If the attempt fails...

               

    if ($EVAL_ERROR) {

        

                  # If the candidate was considered too big, go with the maximum allowed...

               

        if ( X::TooBig->caught() ) {

            my @range = $EVAL_ERROR->get_range();

            $value = $range[-1];

        }



       

                  

        # If the candidate was deemed too small, try it anyway...

               

        elsif ( X::TooSmall->caught() ) {

            $value = $EVAL_ERROR->get_value();

        }

        

                  # Otherwise, rethrow the exception...

               

        else {

            croak( $EVAL_ERROR );

        }

    }

         
Here, the exception coming back from get_number() is an object, so you can check it against each exception class to which it might belong:

            

    if ( X::TooBig->caught() ) {

        

                  # [Handle "Too Big" problem]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Volatile Error Messages
Inhaltsvorschau
Use exception objects when error messages may change.
In a string-based exception, the error message is the exception. That can lead to problems during development or maintenance, because it means that any exception handler's ability to recognize a string-based exception is inextricably tied to the structure of the error message itself.
If you ever need to change an exception message in any way, you're going to have to check and update every place that error might ever be caught. In practice, that means that you can never change the text of any exception once the code that throws it is in production.
In contrast, the error message of an object-oriented exception is merely one attribute of that object. More importantly, that message no longer defines the identity and type of the exception. That defining role is now played by the class into which the exception is blessed, or more particularly, by the caught() method that the class provides.
So the error message of an exception object can be rewritten whenever necessary. Provided the class of the exception remains the same, any exception handlers that catch it will be unaffected by the change of message.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Exception Hierarchies
Inhaltsvorschau
Use exception objects when two or more exceptions are related.
Another problem with using raw strings as exceptions is that string-based exceptions offer no easy way to create new and specialized forms of existing exceptions that existing code can still catch and handle.
Consider the string-based exception for reporting integers outside a given range, as shown previously in the "OO Exceptions" guideline:

    croak( "Numeric value $num too big (must be $MAX_ALLOWED_VALUE or less)" )

        if $num > $MAX_ALLOWED_VALUE;
Suppose you also need to provide a special version of that exception for reporting integers that are so big that they're outside the range that Perl can represent exactly:

    croak( "Numeric value $num waaaaay too big (must be $MAX_INT or less)" )

        if $num > $MAX_INT;
The original test for catching the string-based "big number" exception was:

            

    # If the candidate was considered too big, go with the maximum allowed...

    if ($EVAL_ERROR =~ m{\A Numeric [ ] value [ ] \S+ [ ] too [ ] big}xms) {
Unfortunately, that regex won't match the error message of this new exception, so the handler will completely ignore it. Unless, of course, it was changed to:

            

    # If the candidate was considered too big, go with the maximum allowed...

    if ($EVAL_ERROR =~ m{\A Numeric [ ] value [ ] \S+ [ ] (wa+y [ ])? too [ ] big}xms) {
But that change makes the code that catches these exceptions more complex, harder to read, and less maintainable. Worse still, you're also going to have to change every other similar regex anywhere else that the original exception was being caught.
In contrast, suppose you had originally used an object-oriented exception:

            

    croak( X::TooBig->new( {num=>$num, limit=>$MAX_ALLOWED_VALUE} )

        if $num > $MAX_ALLOWED_VALUE;

         
with a correspondingly object-oriented test in the exception handler:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing Exceptions
Inhaltsvorschau
Catch exception objects in most-derived-first order.
The only drawback to using method calls to detect particular types of exceptions:

            

    if ( X::TooBig->caught() ) {

         
is that you have to be careful about the order in which you try your alternatives. For example, if X::WaaaaayTooBig inherits from X:TooBig, the following code won't work correctly:

            

               

                  

    # If the attempt fails...

               

    if ($EVAL_ERROR) {

        

                  # If the candidate was considered too big, go with the maximum allowed...

               

        if ( X::TooBig->caught() ) {

            my @range = $EVAL_ERROR->get_range();

            $value = $range[-1];

        }

        

                  # If the candidate was considered waaaaay too big, rethrow the exception...

               

        elsif ( X::WaaaaayTooBig->caught() ) {

            $EVAL_ERROR->rethrow();

        }

        

                  # etc.

               

    }

         
The problem is that if an X::WaaaaayTooBig exception is thrown, $EVAL_ERROR will refer to an X::WaaaaayTooBig object. But the X::WaaaaayTooBig class inherits from the X::TooBig class, so an X::WaaaaayTooBig object is an X::TooBig object. That means the first if test will succeed, and the specialized derived-class exception will be treated like a generic base-class exception instead.
The solution is simple: whenever you're determining the type of an exception you just caught, test for the most-derived classes first.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Exception Classes
Inhaltsvorschau
Build exception classes automatically.
As the preceding guidelines illustrate, using objects as exceptions can significantly improve the robustness and future maintainability of your error-handling code. There is, however, a downside: you have to build the exception classes to instantiate those exceptions. And those exception classes need to be reasonably sophisticated in order to work correctly.
For example, they need to provide for throwing, rethrowing, and identifying exceptions; they need to provide the appropriate internal storage for preserving the error information and context; and they need some kind of stringification overloading (see Chapter 16) to ensure that they still produce sensible error messages in string contexts: for example, when they're printed out as they terminate a program. A minimal hash-based implementation of the X::EOF class used in the previous guidelines of this chapter is shown in Example 13-2.
Example 13-2. Minimal X::EOF exception class

               # Define the class representing end-of-file exceptions...

package X::EOF;

use Carp;



# Make X::EOF objects stringify to the same message used previously...

use overload (

    q{""} => sub {

        my ($self) = @_;

        return "Filehandle $self->{handle} at EOF $self->{caller_location}";

    },

    fallback => 1,

);



# Create a X::EOF exception...

sub new {

    my ($class, $args_ref) = @_;



    # Allocate memory for the object and initialize it...

    my %self = %{$args_ref};



    # If no filehandle is passed, indicate that it's unknown...

    if (! exists $self{handle}) {

        $self{handle} = '(unknown)';

    }



    # Ask Carp::shortmess() where croak() would report the error occurring...

    if (!exists $self{caller_location}) {

        $self{caller_location} = Carp::shortmess();

    }



    



    # Add it to the class and send it on its way...

    return bless \%self, $class;

}



# Give access to the handle that was passed into the constructor...
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Unpacking Exceptions
Inhaltsvorschau
Unpack the exception variable in extended exception handlers.
If an exception handler becomes long or complex, you may need to refactor parts of it. For example, consider the X::EOF handler inside try_next_line() from the "OO Exceptions" guideline:

            

    sub try_next_line {

        

                  # Give get_next_line() two chances...

               

        for my $already_retried (0..1) {



            

                  # Return immediately on success, but catch any failure...

               

            eval {



                return get_next_line()

            };



            

                  # Rethrow the caught exception if it isn't an EOF problem...

               

            croak $EVAL_ERROR

                if !X::EOF->caught();



            

                  # Also rethrow the caught exception

            # if we've already tried rewinding the filehandle...

               

            croak $EVAL_ERROR

                if $already_retried;



            

                  # Otherwise, try rewinding the filehandle...

               

            seek $EVAL_ERROR->handle(), 0, 0;

        }

    }

         
This code would seem to be cleaner and easier to extend if the separate rethrows were refactored like this:

    sub try_next_line {

        # Give get_next_line() two chances...

        for my $already_retried (0..1) {



            # Return immediately on success, but catch any failure...

            eval {

                return get_next_line()

            };



            # If we can handle this exception...

            if (X::EOF->caught() ) {

                # Fail on irremedially bad cases...

                fail_if_incorrigible($EVAL_ERROR, $already_retried);



                # Otherwise, try rewinding the filehandle...

                seek $EVAL_ERROR->handle(), 0, 0;

            }

            
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 14: Command-Line Processing
Inhaltsvorschau
The demiurge sits at his teletype, pounding out one command line after another,
specifying the values of fundamental constants of physics:
universe -G 6.672e-11 -e 1.602e-19 -h 6.626e-34....
and when he's finished typing out the command line,
his right pinky hesitates above the ENTER key for an aeon or two,
wondering what's going to happen; then down it comes—
and the WHACK you hear is another Big Bang.
—Neal Stephenson
In the Beginning was the Command Line
Perl started out as a language that was "also good for many system administration tasks". In the beginning, Larry created it to help him write utility programs for data mining, report generation, text munging, stream filtering, and pattern matching; as an easy way to build new command-line tools, without the constraints of shell scripting or the burdens of C programming.
Nearly two decades on, Perl is still beloved by sysadmins, toolsmiths, and other denizens of the shell, as a fast and powerful way to create new testaceous utilities. And for most of these utility programs, the command line is still the primary user interface.
If you're designing a new tool, script, utility, application, or suite, chances are it will need some kind of command-line interface. If it does, make sure that interface is convenient, powerful, flexible, mnemonic, consistent, and predictable.
Sounds difficult? It is. In fact, it's even more difficult than it sounds. But this chapter provides some guidelines that can help.
Enforce a single consistent command-line structure.
Command-line interfaces have a strong tendency to grow over time, accreting new options as features are added to the application. Unfortunately, the evolution of such interfaces is rarely designed, managed, or controlled, so the set of flags, options, and arguments that a given application accepts are likely to be ad hoc and unique.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Command-Line Structure
Inhaltsvorschau
Enforce a single consistent command-line structure.
Command-line interfaces have a strong tendency to grow over time, accreting new options as features are added to the application. Unfortunately, the evolution of such interfaces is rarely designed, managed, or controlled, so the set of flags, options, and arguments that a given application accepts are likely to be ad hoc and unique.
This also means they're likely to be inconsistent with the unique, ad hoc sets of flags, options, and arguments that other related applications provide. The result is inevitably a suite of programs, each of which is driven in a distinct and idiosyncratic way. For example:

    > orchestrate source.txt -to interim.orc



    > remonstrate +interim.rem -interim.orc



    > fenestrate  --src=interim.rem --dest=final.wdw

    Invalid input format



    > fenestrate --help

    Unknown option: --help.

    Type 'fenestrate -hmo' for help
Here, the orchestrate utility expects its input file as its first argument, while its output file is specified using the -to flag. But the related remonstrate tool uses - infile and + outfile options instead, with the output file coming first. And the fenestrate program seems to require GNU-style "long options": —src= infile and —dest= outfile. Except, apparently, for its oddly named help flag. All in all, it's a mess.
When you're providing a suite of programs, all of them should appear to work the same way, using the same flags and options for the same features across all applications. This enables your users to take advantage of existing knowledge instead of asking you.
Those three programs should work like this:

            > orchestrate -i source.txt -o dest.orc



    > remonstrate -i source.orc -o dest.rem



    > fenestrate  -i source.rem -o dest.wdw



    Input file ('source.rem') not a valid Remora file

    (type "fenestrate --help" for help)



            > fenestrate --help

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Command-Line Conventions
Inhaltsvorschau
Adhere to a standard set of conventions in your command-line syntax.
A large part of making interfaces consistent is being consistent in the way individual components of those interfaces are specified. Some conventions that may help to design consistent and predictable interfaces include:
Require a flag preceding every piece of command-line data, except filenames
The arguments advanced in Chapter 9 against passing subroutine arguments positionally apply equally well to entire applications. Users don't want to have to remember that your application requires "input file, output file, block size, operation, fallback strategy"...and requires them in that precise order:

    > lustrate sample_data proc_data 1000 normalize log

                  
They want to be able to say explicitly what they mean, in any order that suits them:

                     > lustrate sample_data proc_data -op=normalize -b1000 --fallback=log

Provide a flag for each filename, too, especially when a program can be given files for different purposes
Users might also not want to remember the order of the two positional filenames, so let them label those arguments as well, and specify them in whatever order they prefer:

                     > lustrate -i sample_data -op normalize -b1000 --fallback log -o proc_data

Use a single - prefix for short-form flags, up to three letters (-v, -i, -rw, -in, -out)
Short-form flags are appreciated by experienced users, as a way of reducing typing and limiting command-line clutter. So don't make them type two dashes in these shortcuts.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Meta-options
Inhaltsvorschau
Standardize your meta-options .
Meta-options are those command-line flags that tell the user how to use the application, rather than telling the application how to behave. They're the "What are my options?" options.
Every program you write should provide (at least) four of these, all of which print to standard output and then terminate the program immediately. Those four meta-options are:
—usage
This option should print a concise usage line.
—help
This option should print the —usage line, followed by a one-line summary of each available option.
—version
This option should print the program's version number.
—man
This option should print the complete documentation for the program, paging it out if necessary.
Note that the names of the four options are not negotiable. That's what "standardized" means.
And, yes, those standardized names are much longer than -u, -h, -v, and -m. That's also intentional. Meta-options should need to be called only relatively infrequently, especially if your other options have been designed carefully and consistently, so they're easy to remember. And, because they'll be infrequent choices, meta-options ought to have longer invocations, leaving the shorter names available for things that users type all the time.
For example, -h and -v are far more useful as flags to specify horizontality or verticality, or height and velocity, or hairiness and verbosity. But if all your applications already use them to summon help and version information, you'll be stuck with
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
In-situ Arguments
Inhaltsvorschau
Allow the same filename to be specified for both input and output.
When users want to do in-situ processing on a file, they often specify it as both the input and output file:

            > lustrate -i sample_data -o sample_data -op=normalize
But if the -i and -o flags are processed independently, the program will usually open the file for input, open it again for output (at which point the file will be truncated to zero length), and then attempt to read in the first line of the now-empty file:

            # Open both filehandles...

    use Fatal qw( open );

    open my $src,  '<', $source_file;

    open my $dest, '>', $destination_file;



    # Read, process, and output data, line-by-line...

    while (my $line = <$src>) {

        print {$dest} transform($line);

    }
Not only does this not perform the requested transformation on the file, it also destroys the original data, which conveniently prevents users from feeling frustrated, by making them irate instead.
Clobbering data files in this way during an in-situ update is perhaps the single commonest command-line interface design error. Fortunately, it's extremely easy to avoid—just make sure that you unlink the output file before you open it:

            

               

                  # Open both filehandles...

               

    use Fatal qw( open );

    open my $src,  '<', $source_file;

    unlink $destination_file;

    open my $dest, '>', $destination_file;



    

                  # Read, process, and output data, line-by-line...

               

    while (my $line = <$src>) {

        print {$dest} transform($line);

    }

         
If the input and output files are different, unlinking the output file merely removes a file that was about to be rewritten anyway. Then the second open simply recreates the output file, ready for writing.
If the two filenames actually refer to a single in-situ file, unlinking the output filename removes that filename from its directory, but doesn't remove the file itself from the filesystem. The file is already open through the filehandle in
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Command-Line Processing
Inhaltsvorschau
Standardize on a single approach to command-line processing.
Providing a consistent set of command-line arguments across all applications helps the users of the suite, but it can also help the implementers and the maintainers. If a collection of programs all use consistent command-line arguments, then each program can use the same approach to parsing those arguments.
Defining a consistent command-line interface makes the programs easier to write in the first place, because once the command-line processing has been set up for the first application, the universal components of it can be refactored into a separate module and reused by subsequent programs (as described under "Interapplication Consistency" later in this chapter). This approach also makes the suite much more maintainable, as debugging or enhancing that one module automatically fixes or extends the command-line processing of perhaps dozens of individual applications.
There are plenty of inappropriate ways to parse command lines. For example, Perl has a built-in -s option (as documented in the perlrun manpage) that will happily unpack your command line for you, as Example 14-1 demonstrates.
Example 14-1. Command-line parsing via perl -s

               #!/usr/bin/perl -s

               

# Use the -s shebang line option to handle command lines of the form:

               

#

               

#     > orchestrate -in=source.txt -out=dest.orc -v

               

# The -s automatically parses the command line into these package variables...

use vars qw( $in $out $verbose $len);





# Handle meta-options (which will appear in package variables whose names

               

# start with a dash. Oh, the humanity!!!)...

no strict qw( refs );

X::Version->throw() if ${-version};

X::Usage->throw()   if ${-usage};

X::Help->throw()    if ${-help};

X::Man->throw()     if ${-man};



Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Interface Consistency
Inhaltsvorschau
Ensure that your interface, run-time messages, and documentation remain consistent.
Making sure that a program's documentation matches its actual behaviour is a universal problem. And that problem is even tougher for command-line interface code, where the functionality and documentation must also stay consistent with the messages provided by the —usage, —help, and —man flags, as well as with any diagnostics produced by the command-line processor.
The best solutions to this challenge all rely on defining the desired command-line semantics in a single place, then using some tool to generate the actual parsing code, the meta-option responses, the error diagnostics, and the documentation.
For example, a feature of the Getopt::Clade module is that its —man meta-option is context-sensitive. Normally, a call like:

            > illustrate --man
extracts any POD documentation from the illustrate source file, replaces the SYNOPSIS, REQUIRED ARGUMENTS, and OPTIONS sections of that documentation with a description of the actual interface that was defined, feeds the modified POD though a pod-to-text formatter, and displays it. However, if —man is specified when the program's standard output stream is not attached to a terminal:

            > illustrate --man  > illustrate.pod
then Getopt::Clade still extracts and modifies the program's documentation, but doesn't format or page it in any way. The resulting file of raw POD can then be pasted back into the source file to ensure that the documentation is consistent with the interface.
Getopt::Clade even allows this process to be fully automated. If you type:

            > illustrate --man=update
then the module will generate its own
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Interapplication Consistency
Inhaltsvorschau
Factor out common command-line interface components into a shared module.
Tools such as Getopt::Long, Getopt::Clade, and Getopt::Euclid make it easy to follow the advice of the "Command-Line Structure" guideline to enforce a single consistent command-line structure across all of your applications.
If you're using Getopt::Long or Getopt::Clade, you can simply create a module that provides a suitable description of the standard interface. For example, if you're using Getopt::Clade, you might create a module (such as in Example 14-6) that provides the standard interface features that every application is expected to provide:
Example 14-6. Standard interface components for Getopt::Clade

               

package Corporate::Std::Cmdline;

use strict;

use warnings;



use Getopt::Clade q{



    -i[n]  [=] <file:in>    Specify input file  [default: '-']

    -o[ut] [=] <file:out>   Specify output file [default: '-']



    -v                      Print all warnings

    --verbose               [ditto]



};



1;  

                     # Magic true value required at the end of every module

                  

               

            
You could then reuse it in each program you created. For example, you could refactor Example 14-4 to Example 14-7.
Example 14-7. Standardized command-line parsing via Getopt::Clade

               

                  

                     # Specify and parse valid command-line arguments...

                  

use Corporate::Std::Cmdline plus => q{



    -l[en] [=] <l:+int>     Display length [default: 24 ]

    -w[id] [=] <w:+int>     Display width  [default: 78 ]



};





                     # Report intended behaviour...

                  

if ($ARGV{-v}) {

    print "Loading first $ARGV{'-l'} chunks of file: $ARGV{'-i'}\n"

}



                     # etc.

                  
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 15: Objects
Inhaltsvorschau
Object-oriented programming offers a sustainable
way to write spaghetti code. It lets you accrete
programs as a series of patches.
—Paul Graham
The Hundred-Year Language
Perl's approach to object orientation is almost excessively Perlish: there are far too many ways to do it.
There are at least a dozen different ways to build an object (from a hash, from an array, from a subroutine, from a string, from a database, from a memory-mapped file, from an empty scalar variable, etc., etc.). Then there are scores of ways to implement the behaviour of the associated class. On top of that, there are also hundreds of different techniques for access control, inheritance, method dispatch, operator overloading, delegation, metaclasses, generics, and object persistence. And, of course, many developers also make use of one or more of the over 400 "helper" modules from the CPAN's Class:: and Object:: namespaces.
There are just so many possible combinations of implementation, structure, and semantics that it's quite rare to find two unrelated class hierarchies that use precisely the same style of Perl OO.
That diversity creates a huge problem. The dizzying number of possible OO implementations makes it very much harder to comprehend any particular implementation, because the reader might not encounter a single familiar code structure by which to navigate the class definitions.
There is no guarantee of what a class declaration will look like, nor how it will specify its attributes and methods, nor where it will store its data, nor how its methods will mediate access to that data, nor what the class constructor will be called, nor what a method call will look like, nor how inheritance relationships will be declared, nor just about anything else.
You can't even assume that there will be a class declaration (see the Class::Classless module, for example), or that the attributes or methods are specified at all (as in Class::Tables), or that object data isn't stored outside the program completely (like
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using OO
Inhaltsvorschau
Make object orientation a choice, not a default.
There are plenty of excellent reasons to use object orientation: to achieve cleaner encapsulation of data; to better decouple the components of a system; to take advantage of hierarchical type relationships using polymorphism; or to ensure better long-term maintainability.
There are also plenty of reasons not to use object orientation: because it tends to result in poorer overall performance; because large numbers of method calls can reduce syntactic diversity and make your code less readable; or just because object orientation is simply a poor fit for your particular problem, which might be better solved using a procedural, functional, data flow, or constraint-based approach.
Make sure you choose to use OO because of the pros and despite the cons, not just because it's the big, familiar, comfortable hammer in your toolset.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Criteria
Inhaltsvorschau
Choose object orientation using appropriate criteria.
When deciding whether to use object orientation, look for features of the problem—or of the proposed solution—that suggest that OO might be a good fit. For example, object orientation might be the right approach in any of the following situations:
The system being designed is large, or is likely to become large
Object orientation helps in large systems, because it breaks them down into smaller decoupled systems (called "classes"), which are generally still simple enough to fit in a single brain—unlike the large system as a whole.
The data can be aggregated into obvious structures, especially if there's a large amount of data in each aggregate
Object orientation is about classifying data into coherent chunks (called "objects") and then specifying how those chunks can interact and change over time. If there are natural clusterings in the data to be handled by your system, then the natural place for those clusterings is probably inside an object. And the larger the amount of data in each chunk, the more likely it is that you're going to need to think of those chunks at some higher, more abstract level. It's also more likely that you'll need to control access to that data more tightly to ensure it remains consistent.
The various types of data aggregate form a natural hierarchy that facilitates the use of inheritance and polymorphism
Object orientation provides a way to capture, express, and take advantage of the abstract relationships between chunks of data in your code. If one kind of data is a special form of another kind of data (a restriction, or elaboration, or some other variation), then organizing that data into class hierarchies can minimize the amount of nearly identical code that has to be written.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Pseudohashes
Inhaltsvorschau
Don't use pseudohashes.
Pseudohashes were a mistake. Their goal—better compile-time type-checking, leading to comparatively faster run-time access—was entirely laudable. But they achieved that goal by actually slowing down all normal hash and array accesses.
They can also double both the memory footprint and the access-time for objects, unless they're used in exactly the right way. They're particularly inefficient if you ever forget to give their container variables a type (which is pretty much guaranteed, since you never have to give any other Perl variable a type, so you're not in the habit). Pseudohashes are also prone to very hard-to-fathom errors when used in inheritance hierarchies .
Don't use them. If you're currently using them, plan to remove them from your code. They don't work with Perl releases prior to Perl 5.005, they're deprecated in Perl 5.8, and will be removed from the language entirely in 5.10.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Restricted Hashes
Inhaltsvorschau
Don't use restricted hashes.
Restricted hashes were developed as a mechanism to partially replace pseudohashes . An ordinary hash can be converted into a restricted hash simply by calling one or more of the lock_keys(), lock_value(), or lock_hash() subroutines provided by the Hash::Util module, which is standard in Perl 5.8 and later.
If the keys of a hash are locked with lock_keys(), that hash is prevented from creating entries for keys other than the keys that existed at the time the hash keys were locked. If a hash value is locked with lock_value(), the value for that particular hash entry is made constant. And if the entire hash is locked with lock_hash(), neither its keys nor their associated values can be altered.
If you build a hash-based object and then lock its keys, no-one can accidentally access $self->{Name} when the object's attribute is supposed to be in $self->{name} instead. That's a valuable form of consistency checking. If you also lock the values before the constructor returns the object, then no-one outside the class can mess with the contents of your object, so you also get encapsulation. And as they're still just regular hashes, you don't lose any appreciable performance.
The problem is that like the now-deprecated pseudohashes, restricted hashes still offer only voluntary security. The Hash::Util module also provides unlock_keys(), unlock_value(), and unlock_hash() subroutines, with which all that pesky consistency checking and annoying attribute encapsulation can be instantly circumvented.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Encapsulation
Inhaltsvorschau
Always use fully encapsulated objects.
The voluntary nature of the security that restricted hashes offer is a genuine problem. Lack of encapsulation is one of the reasons why plain, unrestricted hashes aren't a suitable basis for objects either. Objects without effective encapsulation are vulnerable. Instead of politely respecting their public interface, like so:

            

               

                  

    # Use our company's proprietary OO file system interface...

               

    use File::Hierarchy;



    

                  # Make an object representing the user's home directory...

               

    my $fs = File::Hierarchy->new('~');



    

                  # Ask for the list of files in it...

               

    for my $file ( $fs->get_files() ) {

        

                  # ...then ask for the name of each file, and print it...

               

        print $file->get_name(), "\n";

    }

         
some clever client coder inevitably will realize that it's marginally faster to interact directly with the underlying implementation:

            

    # Use our company's proprietary OO file system interface...

    use File::Hierarchy;



    

    # Make an object representing the user's home directory...

    my $fs = File::Hierarchy->new('~');



    

    # Then poke around inside the (array-based) object

            

    # and pull out its embedded file objects...

    for my $file (@{$fs->{files}}) {

        # Then poke around inside each (hash-based) file object,

            

        # pull out its name, and print it...

        print $file->{name}, "\n";

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Constructors
Inhaltsvorschau
Give every constructor the same standard name.
Specifically, name the constructor of every class you write: new(). It's short, accurate, and standard across many OO languages.
If every constructor uses the same name, the developers using your classes will always be able to guess correctly what method they should call to create an object, which will save them time and frustration looking up the fine manual—yet again—to remind themselves which obscurely named method call is required to instantiate objects of each particular class.
More importantly, using a standard constructor will make it easier for the maintainers of your code to understand what a particular method call is doing. Specifically, if the call is to new(), then it will definitely be creating an object.
Constructors with clever names are cute and may sometimes even improve readability:

    my $port = Port->named($url);



    my $connection = Socket->connected_to($port);
But constructors with standard names make the resulting code easier to write correctly, and possible to comprehend in six months time:

            

    my $port = Port->new({ name => $url });



    my $connection = Socket->new({ connect_to => $port });

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Cloning
Inhaltsvorschau
Don't let a constructor clone objects.
If you overload your constructors to also clone objects, it's too hard to tell the difference between construction and copying in client code:

    $next_obj = $requested->new(\%args);     # New object or copy?

         
Methods that create new objects and methods that clone existing objects have a large amount of overlap in their behaviour. They both have to create a new data structure, bless it into an object, locate and verify the data to initialize its attributes, initialize its attributes, and finally return the new object. The only significant difference between construction and cloning is where the attribute data originates: externally in the case of a constructor, and internally in the case of a clone method.
The natural temptation is to combine the two methods into a single method. And the usual mental leap at that point is that Perl methods can always be called either as class methods or as instance methods. So, hey, why not simply have new() act like a constructor if it's called as a class method:

    $new_queue = Queue::Priority->new({ selector => \&most_urgent });
and then act like a cloning method if it's called on an existing object:

    $new_queue = $curr_queue->new();
Because that can be achieved by adding only a single "paragraph" at the start of the existing constructor, as Example 15-3 illustrates. Cool!
Example 15-3. A constructor that also clones

sub new {

    my ($invocant, $arg_ref) = @_;



    # If method called on an object (i.e., a blessed reference)...

    if (ref $invocant) {

        # ...then build the argument list by copying the data from the object...

        $arg_ref = {

            selector => $selector_of{ident $invocant},

            data     => [ @{$data_of{ident $invocant} } ],

        }

    }



    # Work out the actual class name...

    my $class = ref($invocant)||$invocant;



    
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Destructors
Inhaltsvorschau
Always provide a destructor for every inside-out class.
The many advantages of inside-out classes described earlier come at almost no performance cost. Almost. The one respect in which they are marginally less efficient is their destructor requirements.
Hash-based classes often don't even have a destructor requirement. When the object's reference count decrements to zero, the hash is automatically reclaimed, and any data structures stored inside the hash are likewise cleaned up. This technique works so well that many OO Perl programmers find that they never need to write a DESTROY() method; Perl's built-in garbage collection handles everything just fine.
The only time that hash-based classes do need a destructor is when their objects are managing resources that are external to the objects themselves: databases, files, system processes, hardware devices, and so on. Because the resources aren't inside the objects (or inside the program, for that matter), they aren't affected by the object's garbage collection. Their "owner" has ceased to exist, but they remain: still reserved for the use of the program in question, but now completely unbeknownst to it.
So the general rule for Perl classes is: always provide a destructor for any object that manages allocated resources that are not actually located inside the object.
But the whole point of an inside-out object is that its attributes are stored in allocated hashes that are not actually located inside the object. That's precisely how it achieves secure encapsulation: by not sending the attributes out into the client code.
Unfortunately, that means when an inside-out object is eventually garbage-collected, the only storage that is reclaimed is the single blessed scalar implementing the object. The object's attributes are entirely unaffected by the object's deallocation, because the attributes are not inside the object, nor are they referred to by it in any way.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Methods
Inhaltsvorschau
When creating methods, follow the general guidelines for subroutines.
Despite their obvious differences in dispatch semantics, methods and subroutines are similar in most respects. From a coding point of view, about the only significant difference between the two is that methods tend to have fewer parameters.
When you're writing methods, use the same approach to layout (Chapter 2), and the same naming conventions (Chapter 3), and the same argument-passing mechanisms and return behaviours (Chapter 9), and the same error-handling techniques (Chapter 13) as for subroutines.
The only exception to that advice concerns naming. Specifically, the "Homonyms" guideline in Chapter 9 doesn't apply to methods. Unlike subroutines, it's acceptable for a method to have the same name of a built-in function. That's because methods are always called with a distinctive syntax, so there's no possible ambiguity between:

            

    $size = length $target;     

                  # Stringify target object; take length of string

               

            

         
and:

            

    $size = $target->length();  

                  # Call length() method on target object

               

            

         
It's important to be able to use builtin names for methods, because one of the commonest uses of object-oriented Perl is to create new data types, which often need to provide the same kinds of behaviours as Perl's built-in data types. If that's the case, then those behaviours ought to be named the same as well. For instance, the class in Example 15-5 is a kind of queue, so code that uses that class will be easier to write, and later comprehend, if the queue objects push and shift data using push() and shift() methods:

            

    my $waiting_list = FuzzyQueue->new();



    

                  # Load client names...

               
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Accessors
Inhaltsvorschau
Provide separate read and write accessors .
Most developers who write classes in Perl provide access to an object's attributes in the way that's demonstrated in Example 15-6.
That is, they write a single method for each attribute, giving that method the same name as the attribute. Each accessor method always returns the current value of its corresponding attribute, and each can be called with an extra argument, in which case it also updates the attribute to that new value. For example:

            

    # Create the new military record...

    my $dogtag = Dogtag->new({ serial_num => 'AGC10178B' });



    $dogtag->name( 'MacArthur', 'Dee' );    # Called with args, so store name attr

    $dogtag->rank( 'General' );             # Called with arg, so store rank attr



            

    # Called without arg, so just retrieve attribute values...

    print 'Your new commander is: ',

          $dogtag->rank(), $SPACE, $dogtag->name()->{surname},

          "\n";



    print 'Her serial number is:  ', $dogtag->serial_num(), "\n";
This approach has the advantage of requiring only a single, obviously named method per attribute, which means less code to maintain. It also has the advantage that it's a widely known convention, used both throughout Perl's OO-related manpages and in numerous books.
However, despite those features, it's clearly not the best way to write accessor methods.
Example 15-6. The usual way accessor methods are implemented

package Dogtag;

use Class::Std::Utils;

{

    # Attributes...

    my %name_of;

    my %rank_of;

    my %serial_num_of;



    # The usual inside-out constructor...

    sub new {

        my ($class, $arg_ref) = @_;



        my $new_object = bless anon_scalar(), $class;



        $serial_num_of{ident $new_object} =  $arg_ref->{serial_num},



        return $new_object;

    }



    # Control access to the name attribute...

    sub name {

        my ($self, $new_surname, $new_first_name) = @_;

        my $ident = ident($self);          
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Lvalue Accessors
Inhaltsvorschau
Don't use lvalue accessors.
Since Perl 5.6, it has been possible to specify a subroutine that returns a scalar result as an lvalue, which can then be assigned to. So another popular approach to implementing attribute accessor methods has arisen: using lvalue subroutines, as in Example 15-8.
Example 15-8. Another way to implement accessor methods

               # Provide access to the name attribute...

sub name :lvalue {

    my ($self) = @_;

    return $name_of{ident $self};

}



sub rank :lvalue {

    my ($self) = @_;

    return $rank_of{ident $self};

}



# Serial numbers are read-only, so not lvalue...

sub serial_num  {

    my ($self) = @_;

    return $serial_num_of{ident $self};

}
The resulting code is certainly much more concise. And, perhaps surprisingly, the return to a single accessor per attribute doesn't reinstate the problems of uncertain intention leading to invisible errors, because the accessors would now be used differently, with a clear syntactic distinction between storing and retrieving:

            

    # Create the new military record...

    my $dogtag = Dogtag->new( {serial_num => 'AGC10178B'} );



    # Store attribute values...

    $dogtag->name = {surname=>'MacArthur', first_name=>'Dee'};

    $dogtag->rank = 'General' ;



    # Retrieve attribute values...

    print 'Your new commander is: ',

          $dogtag->rank(), $SPACE, $dogtag->name()->{surname}, "\n";



    print 'Her serial number is:  ',

          $dogtag->serial_num(), "\n";
And, now, if overgeneralization again leads to a misguided attempt to update the serial number:

    $dogtag->serial_num() = $division_code . $old_serial_num;
the compiler will again detect and report the problem:

            

    Can't modify non-lvalue subroutine call at rollcall.pl line 99

         
This certainly looks like a viable alternative to separate getting and storing. It requires less code and handles the psychology just as well. Unfortunately, lvalue methods are less reliable and less maintainable.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Indirect Objects
Inhaltsvorschau
Don't use the indirect object syntax.
Quite simply: indirect object syntax is ambiguous. Whereas an "arrowed" method call is certain to call the corresponding method:

            

    my $female_parent = $family->mom();

    my $male_parent   = $family->pop();

         
with an indirect object call, the outcome is not at all certain:

    my $female_parent = mom $family;    # Sometimes the same as: $family->mom()

    my $male_parent   = pop $family;    # Never the same as: $family->pop()

         
The pop() case is fairly obvious: Perl assumes you're calling the built-in pop function...and then complains that it's not being applied to an array. The potential problem in the mom() case is a little more subtle: if there's a mom() subroutine declared in the package in which mom $family is called, then Perl will interpret that call as mom($family) instead (that is, as a subroutine call, rather than as a method call).
Unfortunately, that particular problem often bites under the most common use of the indirect object syntax: constructor calls. Many programmers who would otherwise never write indirect object method calls will happily call their constructors that way:

    my $uniq_id = new Unique::ID;
The problem is that they often do this kind of thing in the method of some other class. For example, they might decide to improve the Dogtag class by using Unique::ID objects as serial numbers:

    package Dogtag;

    use Class::Std::Utils;

    {

        # Attributes...

        my %name_of;

        my %rank_of;

        my %serial_num_of;



        

        # The usual inside-out constructor...

        sub new {

            my ($class, $arg_ref) = @_;



            my $new_object = bless anon_scalar(), $class;



            # Now using special objects to ensure serial numbers are unique...

            $serial_num_of{ident $new_object} = new Unique::ID;



            return $new_object;

        }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Class Interfaces
Inhaltsvorschau
Provide an optimal interface, rather than a minimal one.
When it comes to designing the interface of a class, developers are often advised to follow Occam's Razor and avoid multiplying their methods unnecessarily. The result is all too often a class that offers only the absolute minimal set of functionality, as in Example 15-9.
Example 15-9. A bit-string class with the smallest possible interface

package Bit::String;

use Class::Std::Utils;

{

    Readonly my $BIT_PACKING => 'b*';    # i.e. vec() compatible binary

    Readonly my $BIT_DENSITY => 1;       # i.e. 1 bit/bit



               

    # Attributes...

    my %bitset_of;



    

    # Internally, bits are packed eight-to-the-character...

    sub new {

        my ($class, $arg_ref) = @_;



        my $new_object = bless anon_scalar(), $class;



        $bitset_of{ident $new_object}

            = pack $BIT_PACKING, map {$_ ? 1 : 0} @{$arg_ref->{bits}};



        return $new_object;

    }



    # Retrieve a specified bit...

    sub get_bit {

        my ($self, $bitnum) = @_;



        return vec($bitset_of{ident $self}, $bitnum, $BIT_DENSITY);

    }



    # Update a specified bit...

    sub set_bit {

        my ($self, $bitnum, $newbit) = @_;



        vec($bitset_of{ident $self}, $bitnum, $BIT_DENSITY) = $newbit ? 1 : 0;



        return 1;

    }

}
Rather than enhancing maintainability, classes like that often reduce it, because they force developers who are using the class to invent their own sets of utility subroutines for frequent tasks:

            

    # Convenience subroutine to flip individual bits...

    sub flip_bit_in {

        my ($bitset_obj, $bitnum) = @_;



        my $bit_val = $bitset_obj->get_bit($bitnum);

        $bitset_obj->set_bit( $bitnum, !$bit_val );



        return;

    }



    # Convenience subroutine to provide a string representation of the bits...

    sub stringify {

        my ($bitset_obj) = @_;



        my $bitstring = $EMPTY_STR;

        my $next_bitnum = 0;



        RETRIEVAL :

        while (1) {

            my $nextbit = $bitset_obj->get_bit($next_bitnum++);

            last RETRIEVAL if !defined $nextbit;



            $bitstring .= $nextbit;

        }



        return $bitstring;

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Operator Overloading
Inhaltsvorschau
Overload only the isomorphic operators of algebraic classes.
Operator overloading is very tempting. It offers the prospect of being able to express operations of your new data type in a compact and syntactically distinctive way. Unfortunately, overloading operators more often produces code that is both hard to comprehend and vastly less maintainable. For example:

            

    # Special string class with useful operators...

    package OpString;

    {

        use overload (

            '+'   => 'concatenate',

            '-'   => 'up_to',

            '/'   => 'either_or',

            '<=>' => 'swap_with',

            '~'   => 'optional',



            # Use Perl standard behaviours for other operations...

            fallback => 1,

        );

    }



    # And later...



    $search_for = $MR/$MRS + ~$first_name + $family_name;



    $allowed_pet_range = $CAT-$DOG;



    $home_phone <=> $work_phone;
Though the resulting client code is compact, the non-standard usages of the various operators make it much harder to understand and maintain, compared to:

            

    package OpString;

    {

        use overload (

            '.'   => 'concatenate',



            

                  # Use Perl standard behaviours for other operations...

               

            fallback => 1,

        );

    }



    

                  # And later...

               



    $search_for = $MR->either_or($MRS) . first_name->optional() . $family_name;



    $allowed_pet_range = $CAT->up_to($DOG);



    $home_phone->swap_with($work_phone);

         
Note that overloading the "dot" operator was perfectly acceptable here, as it (presumably) works just like Perl's built-in string concatenator.
Overloading other operators can make good sense (and good code), provided two conditions are met. First, the operators you choose to overload must match the standard algebraic notation within the problem's native domain: the set of operators that the domain experts routinely use. Second, the standard domain-specific notation you're recreating in your Perl class must conform to the Perlish precedences and associativities of the operators you're overloading.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Coercions
Inhaltsvorschau
Always consider overloading the boolean, numeric, and string coercions of objects.
When an object reference is used as a boolean, it always evaluates to true by default, so:

    croak( q{Can't use non-zero value} ) if $fuzzynum;
always throws an exception, even when $fuzzynum contains 0±0.
An even more serious problem arises when object references are treated as numbers: by default, they numerify to the integer value of their memory address. That means that a statement like:

    $constants[$fuzzynum] = 42;
is really something like:

    $constants[0x256ad1f3] = 42;
which is:

    $constants[627757555] = 42;
which will almost certainly segfault when it tries to allocate six hundred million elements in the @constants array.
A similar problem arises if an object is used where a string is expected:

    my $fuzzy_pi = Num::Fuzzy->new({val => 3.1, plus_or_minus => 0.0416});



    # And later...



    print "Pi is $fuzzy_pi\n";     # $fuzzy_pi expected to interpolate a string

         
In a string context, the object's reference is converted to a debugging value that specifies the class of the object, its underlying data type, and its hexadecimal memory address. So the previous print statement would print something like:

    Pi is Num::Fuzzy=SCALAR[0x256ad1f3]
The developer was probably hoping for something more like:

            Pi is 3.1 ± 0.0416

         
All of these problems occur because objects in Perl are almost always accessed via references. And those references behave like objects only when they're specifically used like objects (i.e., when methods are called on them). When they're used like values (as in the examples), they behave like reference values. The resulting bugs can be particularly hard to discover, and even harder to diagnose once they're noticed.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 16: Class Hierarchies
Inhaltsvorschau
The ham and cheese omelet class is worth special attention
because it must inherit characteristics from the pork, dairy,
and poultry classes. Thus, we see that the problem cannot be
properly solved without multiple inheritance. At run time, the
program must create the proper object and send a message to
the object that says, "Cook yourself". The semantics of this
message depend, of course, on the kind of object, so they have
a different meaning to a piece of toast than to scrambled eggs.
Reviewing the process so far, we see that the analysis phase
has revealed that the primary requirement is to cook any kind
of breakfast food. In the design phase, we have discovered
some derived requirements. Specifically, we need an object-
oriented language with multiple inheritance. Of course, users
don't want the eggs to get cold while the bacon is frying, so
concurrent processing is required, too.
—Do-While Jones
The Breakfast Food Cooker
The disadvantages of implementing classes via blessed hashes become even more pronounced when those classes are used as the bases of inheritance hierarchies. For example, the lack of encapsulation makes it almost inevitable that base-class attributes will be accessed directly in derived-class methods, thereby strongly coupling the two classes.
This notion that derived classes should have some kind of exemption to the encapsulation of their base class—usually known as "protected access"—certainly seemed like a good idea at the time. But long and bitter experience now strongly suggests that this practice is just as detrimental to the maintainability of class hierarchies as full "public access" is.
Worse still, in a hash-based object, the attributes live in a single namespace (the keys of the hash), so derived classes have to contend with their base classes, and with each other, for ownership of particular attributes.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Inheritance
Inhaltsvorschau
Don't manipulate the list of base classes directly.
One of the most unusual, and least robust, aspects of Perl's OO mechanism is that each class keeps its inheritance hierarchy information in an ordinary package variable: @ISA. Apart from bringing along all the problems of package variables (see Chapter 5), this approach also means that Perl class hierarchies are typically set up by run-time assignments:

    package Superman;

    our @ISA = qw( Avian Agrarian Alien );
instead of by compile-time declarations.
That arrangement can lead to very obscure compile-time bugs when objects are created and used before the run-time components of their class's code have been executed (for example, in a BEGIN block).
So always define a class's hierarchy declaratively at compile time, using the standard use base pragma:

            

    package Superman;

    use base qw( Avian Agrarian Alien );

         
This ensures that the inheritance relationship is set up as early as possible, and also ensures that the necessary modules (e.g., Avian.pm, Agrarian.pm, Alien.pm) are automatically loaded for you.
Better still, this approach discourages messing about with class hierarchies at run time, by reassigning @ISA. The temptation to modify @ISA at run time is usually a sign that your class might be better implemented as a factory, a façade, or with some other meta-object technique.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Objects
Inhaltsvorschau
Use distributed encapsulated objects.
Inside-out classes generalize very cleanly to class hierarchies, even multiple-inheritance hierarchies.
In particular, the inside-out structure neatly avoids the problem of "attribute collisions", in which both the base and derived class wish to use an attribute of the same name, but cannot successfully do so because there's only one key of that name in the object's hash.
Example 16-1 illustrates the problems of using a single, publicly accessible, collision-prone hash as your derived object. The Object class and the Psyche class each think they own the $self->{id} entry in each object's hash. But, because that attribute isn't encapsulated, neither of them can be assured of its contents. Both classes are able to alter it at will, and the attribute is also susceptible to external tampering, as the final line of the example demonstrates.
The describe() method is a particularly disturbing piece of code in this respect. Transcribed from a genuine real-world example, it illustrates how the powerful human ability to recognize intent by context can work against a developer. Within four lines, the programmer has used $self->{id} both as the Object's ID number, and as the Psyche's id...apparently, without the slightest awareness of the fundamental contradiction that represents.
Example 16-1. Making a hash of your psyche

               # Generic base class confers an ID number and description attribute

               

# on all derived classes...

package Object;



# Class attribute...

my $next_id = 1;



# Constructor expects description as argument,

               

# and automatically allocates ID number...

sub new {

    my ($class, $arg_ref) = @_;



    # Create object representation...

    my $new_object = bless {}, $class;



    

    # Initialize attributes...

    $new_object->{ id } = $next_id++;

    $new_object->{desc} = $arg_ref->{desc};



    return $new_object;

}



Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Blessing Objects
Inhaltsvorschau
Never use the one-argument form of bless .
The built-in bless function associates a referent of some kind (typically a hash, an array, or a scalar) with a particular class, thereby converting the raw data type into an object. Normally, bless takes two arguments: a reference to the referent that is to become the object, and a string naming the desired class of that object. However, the second argument is actually optional, and defaults to the current package name.
Developers will occasionally attempt to save a miniscule amount of effort by writing a constructor like so:

    package Client;

    use Class::Std::Utils;

    {

        my %client_num_of;



        sub new {

            my ($class, $arg_ref) = @_;



            my $new_object = bless anon_scalar();

            # (One-arg bless saves typing!)



            $client_num_of{ident $new_object} = $arg_ref->{client_num};



            return $new_object;

        }



        # etc.

    }
Unfortunately, the half a second they save that way can lead to much more substantial amounts of time lost when they have to work out why objects of the following derived class don't work correctly:

    package Client::Corporate;

    use base qw( Client );

    use Class::Std::Utils;

    {

        # Attribute...

        my %corporation_of;



        sub new {

            my ($class, $arg_ref) = @_;



            # Call base class constructor to allocate and initialize object...

            my $new_object = $class->SUPER::new($arg_ref);



            # Initialize derived classes own attributes...

            $corporation_of{ident $new_object} = $arg_ref->{corp};



            return $new_object;

        }



        # etc.

    }
What they will eventually discover is that calls like:

            

    Client::Corporate->new(\%client_data);

         
are actually producing objects of class Client, rather than of the requested subclass. That's because
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Constructor Arguments
Inhaltsvorschau
Pass constructor arguments as labeled values, using a hash reference.
As the examples in the earlier guidelines show, when creating an object of a derived class, the initialization phase of each constructor in the class hierarchy needs to pick out the appropriate initial values for that class's attributes.
This requirement makes positional arguments problematical at best, as the order in which arguments will then need to be passed to the derived constructor will depend on the order in which it inherits from its ancestral classes, as demonstrated in Example 16-3.
Example 16-3. Positional arguments to constructors

package Client;

use Class::Std::Utils;

{

    my %client_num_of;



    sub new {

        my ($class, $client_num) = @_;



        my $new_object = bless anon_scalar(), $class;



        $client_num_of{ident $new_object} = $client_num;



        return $new_object;

    }



    # etc.

}



package Client::Corporate;

use base qw( Client );

use Class::Std::Utils;

{

    my %corporation_of;



    sub new {

        my ($class, $client_num, $corp_name) = @_;



        my $new_object = $class->SUPER::new($client_num);



        $corporation_of{ident $new_object} = $corp_name;



        return $new_object;

    }



    # etc.

}



# and later...



my $new_client

    = Client::Corporate->new( '124C1', 'Florin' );
The real problem with this approach is that any subsequent change in argument ordering (for example, adding an extra argument to either of the classes) will then require that every constructor call be rewritten, or else every derived-class constructor will have to do some sly slicing-and-dicing of the original argument list before passing it on to a base class (as in Example 16-4).
Example 16-4. Adding extra positional arguments to constructors

package Client;

use Class::Std::Utils;

{

    my %client_num_of;

    my %name_of;         # New attribute in base class
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Base Class Initialization
Inhaltsvorschau
Distinguish arguments for base classes by class name as well.
As explained earlier, one of the great advantages of using inside-out classes instead of hashes is that a base class and a derived class can then each have an attribute of exactly the same name. In a single-level hash, that's impossible.
But that very fact also presents something of a problem when constructor arguments are themselves passed by hash. If two or more classes in the name hierarchy do happen to have attributes of the same name, the constructor will need two or more initializers with the name key—which a single hash can't provide.
The solution is to allow initializer values to be partitioned into distinct sets, each uniquely named, which are then passed to the appropriate base class. The easiest way to accomplish that is to pass in a hash of hashes, where each top-level key is the name of one of the base classes, and the corresponding value is a hash of initializers specifically for that base class. Example 16-6 shows how this can be achieved.
Example 16-6. Avoiding name collisions in constructor arguments

               

package Client;

use Class::Std::Utils;

{

    my %client_num_of;    

                     # Every client has an ID number

                  

    my %name_of;



    sub new {

        my ($class, $arg_ref) = @_;



        my $new_object = bless anon_scalar(), $class;



        

                     # Initialize this class's attributes with the appropriate argument set...

                  

        $client_num_of{ident $new_object} = $arg_ref->{'Client'}{client_num};

        $name_of{ident $new_object}       = $arg_ref->{'Client'}{client_name};



        return $new_object;

    }



}



package Client::Corporate;

use base qw( Client );

use Class::Std::Utils;

{

    my %client_num_of;     

                     # Corporate clients have an additional ID number
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Construction and Destruction
Inhaltsvorschau
Separate your construction, initialization, and destruction processes.
Classes that use a single new() method to both create and initialize objects usually don't work well under multiple inheritance. When a class hierarchy offers two or more new() methods (either at different inheritance levels, or in different base classes at the same level), then there is automatically a conflict of control.
Only one of those new() methods can ultimately allocate and bless the storage for the new object, and if there is multiple inheritance anywhere in the class hierarchy you're using, the new() chosen may not be the new() you expected. Even if it is the one you wanted, any constructors on other branches of the inheritance tree will have been pre-empted and the object will not be completely initialized.
Likewise, when the object's destructors are called, only one of the two or more inheritance branches can be followed during destructor look-up, so only one of the several base-class destructors will ever be called. That's particularly bad, because it's critical to call all the destructors of an inside-out object, to ensure that its attribute hashes don't leak memory (see "Destructors" in Chapter 15).
For example, you could create a well-implemented inside-out class like this:

            

    package Wax::Floor;

    use Class::Std::Utils;

    {

        

                  # Attributes...

               

        my %name_of;

        my %patent_of;



        sub new {

            my ($class, $arg_ref) = @_;



            

            my %init = extract_initializers_from($arg_ref);



            my $new_object = bless anon_scalar(), $class;



            $name_of{ident $new_object}   = $init{name};

            $patent_of{ident $new_object} = $init{patent};



            return $new_object;

        }



        sub DESTROY {

            my ($self) = @_;



            delete $name_of{ident $self};

            delete $patent_of{ident $self};



            return;

        }

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Automating Class Hierarchies
Inhaltsvorschau
Build the standard class infrastructure automatically.
The universal constructor and destructor demonstrated in the previous guideline are, by definition, supposed to be used for every class hierarchy, in every file of every program within every system you create. So it would make sense to factor them out into a separate module, from which they could then be supplied to every class that needs them.
There is already a CPAN module that does precisely that. It's called Class::Std, and it implements all of the class infrastructure shown in Example 16-9. So classes like Wax::Floor, Topping::Dessert, and Shimmer (Example 16-10 and the code that immediately follows it) could be implemented without having to construct that infrastructure yourself, merely by using Class::Std inside each class:

            

    package Wax::Floor;

    use Class::Std;

    {

        

                  # [Class definition, exactly as in

                

               Example 16-10

               

                  ]

               

    }

         
Loading Class::Std installs a generic constructor that creates and initializes inside-out objects using the approach explained in the preceding guidelines, but with some other convenient shortcuts (described later). The module also installs a destructor (see the next guideline, "Attribute Demolition") that greatly simplifies the cleanup of attributes. Class::Std also exports the ident() utility to your class's namespace.
Class::Std provides all the benefits of inside-out objects, as well as all the benefits of decoupled initialization and cleanup (i.e., it provides full support for BUILD() and DEMOLISH() methods). It is strongly recommended for any object-oriented Perl development.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Attribute Demolition
Inhaltsvorschau
Use Class::Std to automate the deallocation of attribute data.
As mentioned under "Destructors" in Chapter 15, one of the very few annoyances of using inside-out objects rather than blessed hashes is the inevitable need to write separate clean-up code for every attribute, as in Example 16-11.
Example 16-11. Cleaning up object attributes

               

package Book;

use Class::Std;

{

    

                     # Attributes...

                  

    my %title_of;

    my %author_of;

    my %publisher_of;

    my %year_of;

    my %topic_of;

    my %style_of;

    my %price_of;

    my %rating_of;



    

                     # and then...

                  



    sub DEMOLISH {

        my ($self, $ident) = @_;



        

                     # Update library information...

                  

        Library->remove($self);



        

                     # Clean up attribute hashes...

                  

        delete $title_of{$ident};

        delete $author_of{$ident};

        delete $publisher_of{$ident};

        delete $year_of{$ident};

        delete $topic_of{$ident};

        delete $style_of{$ident};

        delete $price_of{$ident};

        delete $rating_of{$ident};



        return;

    }

}

            
This kind of highly repetitive code structure is inherently error-prone to set up, unbearably tedious to read, and unnecessarily hard to maintain. For example, are you confident that the DEMOLISH() method shown in Example 16-11 actually did clean up every one of the object's attributes?
The goal here is always exactly the same: to iterate through every attribute hash in the class and delete the $ident entry inside it. It would be much better if there were some way for the class itself to keep track of its attribute hashes, so the class itself could automatically step through those attributes and remove the appropriate element from each.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Attribute Building
Inhaltsvorschau
Have attributes initialized and verified automatically.
Most of the BUILD() methods shown so far in this chapter do nothing except initialize attributes with values extracted from the constructor's initializer hash. For example:

            

    package Topping::Dessert;

    use Class::Std;

    {

        

                  # Attributes...

               

        my %name_of     :ATTR;

        my %flavour_of  :ATTR;



        sub BUILD {

            my ($self, $ident, $arg_ref) = @_;



            $name_of{$ident}    = $arg_ref->{name};

            $flavour_of{$ident} = $arg_ref->{flavour};



            return;

        }



        

                  # etc.

               

            

         
Because this is such a common requirement, Class::Std provides a shortcut. When you declare an attribute using the :ATTR marker, you can specify the entry of the constructor's initialization hash that is to be used to initialize it. For example:

            

    package Topping::Dessert;

    use Class::Std;

    {

        

                  # Attributes...

               

        my %name_of     :ATTR( init_arg => 'name'    );

        my %flavour_of  :ATTR( init_arg => 'flavour' );



        

                  # [No BUILD method required]

               



               

                  

        # etc.

               

            

         
This extra specification causes the new() method provided by Class::Std to automatically initialize those attributes with the correspondingly labeled values from the initialization hash it is passed.
More importantly, the approach also solves the problem of misspelled initializer labels (see "Base Class Initialization" earlier). When attributes are declared with :ATTR and an init_arg is specified, the Class::Std constructor will automatically throw an exception if the initialization hash doesn't contain a suitably named initialization value. For example, given the previous definition, a call like:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Coercions
Inhaltsvorschau
Specify coercions as :STRINGIFY, :NUMERIFY, and :BOOLIFY methods.
In addition to the :ATTR markers for attribute hashes, Class::Std also supplies markers for subroutines that implement conversions to numbers, strings, and booleans:

            

    sub count : NUMERIFY {    

                  # Call count() method whenever object used as number

               

        my ($self, $ident) = @_;

        return scalar @{ $elements_of{$ident} };

    }



    sub as_str : STRINGIFY {  

                  # Call as_str() method whenever object used as string

               

        my ($self, $ident) = @_;

        return sprintf '(%s)', join $COMMA, @{ $elements_of{$ident} };

    }



    sub is_okay : BOOLIFY {   

                  # Call is_okay() method whenever object used as boolean

               

        my ($self) = @_;

        return !$self->Houston_We_Have_A_Problem();

    }

         
This provides a simpler, more convenient, and less repetitive interface than use overload:

            

    sub count {

        my ($self) = @_;

        return scalar @{ $elements_of{ident $self} };

    }



    sub as_str {

        my ($self) = @_;

        return sprintf '(%s)', join $COMMA, @{ $elements_of{ident $self} };

    }



    sub is_okay {

        my ($self) = @_;

        return !$self->Houston_We_Have_A_Problem();

    }



    use overload (

        q{0+}   => 'count',

        q{""}   => 'as_str',

        q{bool} => 'is_okay',



        fallback => 1,

    );

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Cumulative Methods
Inhaltsvorschau
Use :CUMULATIVE methods instead of SUPER:: calls.
One of the most important advantages of using the BUILD() and DEMOLISH() mechanisms supplied by Class::Std is that those methods don't require nested calls to their ancestral methods via the SUPER pseudoclass. The constructor and destructor provided by Class::Std take care of the necessary redispatching automatically. Each BUILD() method can focus solely on its own responsibilities; it doesn't have to also help orchestrate the cumulative constructor effects across the class hierarchy by remembering to call $self->SUPER::BUILD().
This approach produces far more reliable class implementations, because forgetting to include the SUPER call in a "chained" constructor or destructor will immediately terminate the chain of calls, disenfranchising all the remaining construction/destruction methods higher up in the class's hierarchy.
Moreover, calls via SUPER can only ever call the method of exactly one ancestral class, which is not sufficient under multiple inheritance. This second problem can be solved in various ways (for example, by using the standard NEXT module), but all those solutions still rely on developers remembering to add the necessary code to every method in every class in order to continue the chain of calls. So all those solutions are inherently fragile.
Class::Std provides a different way of creating methods whose effects accumulate through a class hierarchy, in the same way as those of BUILD() and DEMOLISH() do. Specifically, the module allows you to define your own cumulative methods . An ordinary non-cumulative method hides any method of the same name inherited from any base class, so when a non-cumulative method is called, only the most-derived version of it is ever invoked. In contrast, a cumulative method doesn't hide ancestral methods of the same name; it
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Autoloading
Inhaltsvorschau
Don't use AUTOLOAD() .
Perl provides a mechanism by which you can capture and handle calls to methods that are not defined anywhere in your class hierarchy: the AUTOLOAD method.
Normally when you call a method, the interpreter starts at the class of the object on which the method was called. It then works its way upwards through the class hierarchy until it finds a package with a subroutine of the corresponding name, which it then invokes.
But if this hierarchical search fails to find a suitable method implementation anywhere in the inheritance tree, the interpreter returns to the most derived class and repeats the look-up process. On the second time through, it looks for a subroutine named AUTOLOAD() instead.
That means that the left-most-depth-first AUTOLOAD() that an object inherits will always be called to handle every unknown method call. And that's the problem. If the object's class hierarchy has two or more AUTOLOAD() definitions, it might be that the second one would have been the correct one to handle a particular missing method. But normally, that second one will never get the chance to do so.
There are various ways to circumvent that problem. For example, the standard NEXT module can be used to reject a particular AUTOLOAD() invocation and resume the original method look-up; or under Class::Std you can declare each AUTOLOAD() to be :CUMULATIVE and make sure only one of them ever returns a value; or you can dispense with AUTOLOAD() entirely and use Class::Std's AUTOMETHOD() mechanism instead.
However, none of these solutions uses the standard Perl AUTOLOAD() semantics, so all of them will be harder to maintain. And the first two suggestions also require additional vigilance to get right: either making certain that every AUTOLOAD() redispatches on failure via a call to $self->NEXT::AUTOLOAD(); or ensuring that every AUTOLOAD() is marked
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 17: Modules
Inhaltsvorschau
Any fool can make things bigger, more complex, and
more violent. It takes a touch of genius—and a lot of
courage—to move in the opposite direction.
—Albert Einstein
Code reuse is a core best practice, and modules are Perl's principal large-scale mechanism for code reuse. They are also at the heart of Perl's greatest software asset: the CPAN.
Refactoring source code into modules will not only increase the reusability of that code, it is also likely to make the code cleaner and easier to maintain. If nothing else, the programs from which the original code is removed will become shorter, better abstracted, and consequently more maintainable.
The keys to good module design and implementation are: designing the interface first, keeping that interface small and functional, using a standard implementation template, and not reinventing the wheel. The guidelines in this chapter explore these issues.
Design the module's interface first.
The most important aspect of any module is not how it implements the facilities it provides, but the way in which it provides those facilities in the first place. If the module's API is too awkward, or too complex, or too extensive, or too fragmented, or even just poorly named, developers will avoid using it. They'll write their own code instead.
In that way, a poorly designed module can actually reduce the overall maintainability of a system.
Designing module interfaces requires both experience and creativity. The easiest way to work out how an interface should work is to "play test" it: to write examples of code that will use the module before the module itself is implemented. The key is to write that code as if the module
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Interfaces
Inhaltsvorschau
Design the module's interface first.
The most important aspect of any module is not how it implements the facilities it provides, but the way in which it provides those facilities in the first place. If the module's API is too awkward, or too complex, or too extensive, or too fragmented, or even just poorly named, developers will avoid using it. They'll write their own code instead.
In that way, a poorly designed module can actually reduce the overall maintainability of a system.
Designing module interfaces requires both experience and creativity. The easiest way to work out how an interface should work is to "play test" it: to write examples of code that will use the module before the module itself is implemented. The key is to write that code as if the module were already available, and write it the way you'd most like the module to work.
Once you have some idea of the interface you want to create, convert your "play tests" into actual tests (see Chapter 18). Then it's just a Simple Matter Of Programming to make the module work the way that the code examples and tests want it to.
Of course, it may not be possible for the module to work the way you'd most like, in which case attempting to implement it that way will help you determine what aspects of your API are not practical, and allow you to work out what might be an acceptable alternative.
For example, when the IO::Prompt module (see Chapter 10) was being designed, having potential clients write hypothetical code fragments quickly made it obvious that what was needed was a drop-in replacement for the <> input operator. That is, to replace:

    CMD:

    while (my $cmd = <>) {

        chomp $cmd;

        last CMD if $cmd =~ m/\A (?: q(?:uit)? | bye ) \z/xms;



        my $args;

        if ($takes_arg{$cmd}) {

            $args = <>;

            chomp $args;

        }



        exec_cmd($cmd, $args);

    }
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Refactoring
Inhaltsvorschau
Place original code inline. Place duplicated code in a subroutine. Place duplicated subroutines in a module.
The first time you're tempted to copy-paste-and-modify a piece of code:

    package Process::Queue;

    use Carp;

    {

        use overload (

            # Type coercions don't make sense for process queues...

            q{""}   => sub {

                croak q{Can't stringify a Process::Queue};

            },

            q{0+}   => sub {

                croak q{Can't numerify a Process::Queue };

            },

            q{bool} => sub {

                croak q{Can't get the boolean value of a Process::Queue };

            },

        );

    }



    # and later...



    package Socket;

    use Carp;

    {

        use overload (

            # Type coercions don't make sense for sockets...

            q{""}   => sub {

                croak q{Can't convert a Socket to a string};

            },

            q{0+}   => sub {

                croak q{Can't convert a Socket to a number};

            },

            q{bool} => sub {

                croak q{Can't get the boolean value of a Socket };

            },

        );

    }
. . . don't do it!
Instead, convert the code into a subroutine, parameterize the parts you would have modified, and then replace both the original and duplicated code with calls to that subroutine:

            

    use Carp;



    sub _Class::cannot {

        

                  # What kind of coercion cannot be done?

               

        my ($coerce) = @_;



        

                  

        # Build a subroutine with the corresponding error message...

               

        return sub {

            my ($self) = @_;

            croak sprintf qq{Can't $coerce}, ref $self;

        };

    }



    

                  # and later...

               



    package Process::Queue;

    {

        use overload (

            

                  # Type coercions don't make sense for process queues...

               
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Version Numbers
Inhaltsvorschau
Use three-part version numbers .
When specifying the version number of a module, don't use vstrings:

    our $VERSION = v1.0.3;
They will break your code when it's run under older (pre-5.8.1) versions of Perl. They will also break it under newer versions of Perl, as they're deprecated in the 5.9 development branch and will be removed in the 5.10 release.
They're being removed because they're error-prone; in particular, because they're actually just weirdly specified character strings. For example, v1.0.3 is just shorthand for the character string "\x{1}\x{0}\x{3}". So vstrings don't compare correctly under numeric comparison.
Don't use floating-point version numbers, either:

    our $VERSION = 1.000_03;
It's too easy to get them wrong, as the preceding example does: it's equivalent to 1.0.30, not 1.0.3.
Instead, use the version CPAN module and the qv(...) version-object constructor:

            

    use version; our $VERSION = qv('1.0.3');

         
The resulting version objects are much more robust. In particular, they compare correctly under either numeric or string comparisons.
Note that, in the previous example, the use version statement and the $VERSION assignment were written on the same line. Loading and using the module in a single line is important, because it's likely that many users of your module will install it using either the ExtUtils::MakeMaker module or the Module::Build module. Each of these modules will attempt to extract and then evaluate the $VERSION assignment line in your module's source code, in order to ascertain the module's version number. But neither of them supports qv'd version numbers directly. By placing the $VERSION assignment on the same line as the use version, you ensure that when that line is extracted and executed, the qv() subroutine is correctly loaded from
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Version Requirements
Inhaltsvorschau
Enforce your version requirements programmatically.
Telling future maintainers about a module's version requirements is certainly a good practice:

    package Payload;

    # Only works under 5.6.1 and later



    use IO::Prompt;                 # must be 0.2.0 or better, but not 0.3.1

    use List::Util qw( max );       # must be 1.13 or better

    use Benchmark qw( cmpthese );   # but no later than version 1.52



            # etc.

         
But telling Perl itself about these constraints is an even better practice, as the compiler can then enforce those requirements.
Perl has a built-in mechanism to do (some of ) that enforcement for you. If you call use with a decimal number instead of a module name, the compiler will throw an exception if Perl's own version number is less than you specified:

            

    package Payload;

    use 5.006001;           

                  # Only works under 5.6.1 and later

               

            

         
Unfortunately, that version number has to be an old-style decimal version. You can't use the version module's qv() subroutine (as recommended in the previous guideline), because the compiler interprets the qv identifier as the name of a module to be loaded:

    package Payload;

    use version;

    use qv('5.6.1');        # Tries to load qv.pm

         
If you load a module with a normal use, but place a decimal version number after its name and before any argument list, then the compiler calls the module's VERSION method, which defaults to throwing an exception if the module's $VERSION variable is less than the version number that was specified:

            

    use IO::Prompt  0.002;              

                  # must be 0.2.0 or better

               

    use List::Util  1.13   qw( max );   

                  # must be 1.13 or better

               

            
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Exporting
Inhaltsvorschau
Export judiciously and, where possible, only by request.
As with classes (see Chapter 15), modules should aim for an optimal interface, rather than a minimal one. In particular, you should provide any non-fundamental utility subroutines that client coders will frequently need, and are therefore likely to (re-)write themselves.
On the other hand, it's also important to minimize the number of subroutines that are exported by default. Especially if those subroutines have common names. For example, if you're writing a module to support software testing, then you might want to provide subroutines like ok(), skip(), pass(), and fail():

    package Test::Utils;



    use base qw( Exporter );

    our @EXPORT = qw( ok skip pass fail );    # Will export these by default



            

    # [subroutine definitions here]

         
But exporting those subroutines by default can make the module more difficult to use, because the names of those subroutines may collide with subroutine or method definitions in the software you're testing:

    use Perl6::Rules;   # CPAN module implements a subset of Perl 6 regexes

    use Test::Utils;    # Let's test it...



    my ($matched)

        = 'abc' =~ m{ ab {ok 1} d    # Test nested code blocks in regexes

                    | {ok 2; fail}   # Test explicit failure of alternatives

                    | abc {ok 3}     # Test successful matches

                    }xms;



    if ($matched) {

        ok(4);

    }
Unfortunately, both the Perl6::Rules and Test::Utils modules export a fail() subroutine by default. As a result, the example test is subtly broken, because the Test::Utils::fail() subroutine has been exported "over the top of" the previously exported Perl6::Rules::fail() subroutine. So the fail() call inside the regex isn't invoking the expected
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Declarative Exporting
Inhaltsvorschau
Consider exporting declaratively.
The Exporter module has served Perl well over many years, but it's not without its flaws.
For a start, its interface is ungainly and hard to remember, which leads to unsanitary cutting and pasting. That interface also relies on subroutine names stored as strings in package variables. This design imposes all the inherent problems of using package variables, as well as the problems of symbolic references (see Chapters 5 and 11).
It's also redundant: you have to name each subroutine at least twice—once in its declaration and again in one (or more) of the export lists. And if those disadvantages weren't enough, there's also the ever-present risk of not successfully naming a particular subroutine twice, by misspelling it in one of the export lists.
Exporter also allows you to export variables from a module. Using variables as part of your interface is a bad interface practice (see the following guideline, "Interface Variables"), but actually aliasing them into another package is even worse. For a start, exported variables are ignored by use strict, so they may mask other problems in your code. But more importantly, exporting a module's state variables exposes that module's internal state in such a way that it can be modified without the module's name even appearing in the assignment:

    use Serialize ($depth);



    # and much later...



    $depth = -20;        # Change the internal state of the Serialize module

         
That's neither obvious, nor robust, nor comprehensible, nor easy to maintain.
To set up a module with a full range of export facilities, including default exports, exports-by-request, and tagged export sets, you have to write something like this:

    package Test::Utils;



    use base qw( Exporter );



    our @EXPORT    = qw( ok );                # Default export
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Interface Variables
Inhaltsvorschau
Never make variables part of a module's interface.
Variables make highly unsatisfactory interface components. They offer no control over who accesses their values, or how those values are changed. They expose part of the module's internal state information to the client code, and they provide no easy way to later impose constraints on how that state is used or modified.
This, in turn, forces every component of the module to re-verify any interface variable whenever it's used. For example, consider the parts of a module for serializing Perl data structures shown in Example 17-1.
Example 17-1. Variables as a module's interface

package Serialize;

use Carp;

use Readonly;

use Perl6::Export::Attrs;

use List::Util qw( max );



Readonly my $MAX_DEPTH => 100;



# Package variables that specify shared features of the module...

our $compaction = 'none';

our $depth      = $MAX_DEPTH;



# Table of compaction tools...

my %compactor = (

   # Value of      Subroutine returning

               

   # $compaction   compacted form of arg

      none     =>   sub { return shift },

      zip      =>   \&compact_with_zip,

      gzip     =>   \&compact_with_gzip,

      bz       =>   \&compact_with_bz,

      # etc.

);



# Subroutine to serialize a data structure, passed by reference...

sub freeze : Export {

    my ($data_structure_ref) = @_;



    # Check whether the $depth variable has a sensible value...

    $depth = max(0, $depth);



    

    # Perform actual serialization...

    my $frozen = _serialize($data_structure_ref);



    # Check whether the $compact variable has a sensible value...

    croak "Unknown compaction type: $compaction"

        if ! exists $compactor{$compaction};



    # Return the compacted form...

    return $compactor{$compaction}->($frozen);

}



# and elsewhere...



use Serialize qw( freeze );



$Serialize::depth      = -20;        
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating Modules
Inhaltsvorschau
Build new module frameworks automatically.
The "bones" of every new module are basically the same:

            

    package 

                  <MODULE NAME>

               ;



    use version; our $VERSION = qv('0.0.1');



    use warnings;

    use strict;

    use Carp;



    

                  # Module implementation here

               



    1; 

                  # Magic true value required at end of module

               

    __END_  _



    =head1 NAME



    

                  <MODULE NAME>

                - [One line description of module's purpose here]



    =head1 VERSION



    This document describes 

                  <MODULE NAME>

                version 0.0.1



    =head1 SYNOPSIS



        use <MODULE NAME>;



        

                  # And the rest of the documentation template here

               

               

                  

        # (as described in Chapter 7)

               

            

         
So it makes sense to create each new module automatically, reusing the same templates for each. This rule applies not just to the .pm file itself, but also to the other standard components of a module distribution: the MANIFEST file, the Makefile.PL , the Build.PL , the README, the Changes file, and the lib/ and t/ subdirectories.
The easiest way to create all those components consistently is to use the Module::Starter CPAN module. After installing Module::Starter and setting up a minimal ~/.module-starter/config file:

            

    author:  Yurnaam Heere

    email:   YHEERE@cpan.org

         
you can then simply type:

            

    > module-starter --module=New::Module::Name

         
on the command line. Module::Starter will immediately construct a new subdirectory named New-Module-Name/ and populate it with the basic files that are needed to create a complete module.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Standard Library
Inhaltsvorschau
Use core modules wherever possible.
It's definitely best practice to avoid unnecessary work, and code reuse is a primary example of that. Perl has two main software libraries of reusable code: the standard Perl library and the CPAN. It's almost always a serious mistake to start hacking on a solution without at least exploring whether your problem has already been solved.
The library of modules that come standard with every Perl distribution is the ideal place to start. There are no issues of availability: if a core module solves your problem, then that solution will already have been installed anywhere that Perl itself is available. There are no issues of authorization either: if Perl has been approved for production use in your organization, the library modules will almost certainly be acceptable too.
Another major advantage is that the standard library contains some of the most heavily used Perl modules available. Frequent use means they're also some of the most strenuously stress-tested—and therefore more likely to be both reliable and efficient.
Perl's standard library contains modules for creating declaration attributes; optimizing the loading of modules; using arbitrary precision numbers, complex numbers, and a full range of trigonometric functions; adding I/O layers to the standard streams; interfacing with flat-file and relational databases; verifying and debugging Perl code; benchmarking and profiling program performance; CGI scripting; accessing the CPAN; serializing and deserializing data structures; calculating message digests; dealing with different character encodings; accessing system error constants; imposing exception semantics on failure-returning functions and subroutines; processing filenames in a filesystem-independent manner; searching for, comparing, and copying files; filtering source code; command-line argument processing; performing common operations on scalars, arrays, and hashes; internationalizing and localizing programs; setting up and using pipes and sockets; interacting with network protocols (including FTP, NNTP, ping, POP3, and SMTP); encoding and decoding MIME; data caching and subroutine memoization; accessing the complete POSIX function library; processing POD documentation; building software test suites; text processing; thread programming; acquiring and manipulating time and date information; and using Unicode.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
CPAN
Inhaltsvorschau
Use CPAN modules where feasible.
The Comprehensive Perl Archive Network (CPAN) is often referred to as Perl's killer app, and rightly credited with much of Perl's success in recent years. It is a truly vast repository of code, providing solutions for just about every programming task you might commonly encounter.
As with Perl's standard library, many of the modules on the CPAN are heavily relied-upon—and severely stress-tested—by the global Perl community. This makes CPAN modules like DBI, DateTime, Device::SerialPort, HTML::Mason, POE, Parse::RecDescent, SpreadSheet::ParseExcel, Template::Toolkit, Text::Autoformat, and XML::Parser extremely reliable and powerful tools. Extremely reliable and powerful free tools.
Of course, not all the code archived on CPAN is equally reliable. There is no centralized quality control mechanism for the archive; that's not its purpose. There is an integrated ratings system for CPAN modules, but it is voluntary and many modules remain unrated. So it's important to carefully assess any modules you may be considering.
Nevertheless, if your organization allows it, always check the CPAN (http://search.cpan.org) before you try to solve a new problem yourself. An hour or so of searching, investigation, quality assessment, and prototyping will frequently save days or weeks of development effort. Even if you decide not to use an existing solution, those modules may give you ideas that will help you design and implement your own in-house version.
Of course, many organizations are wary of any external software, especially if it's open source. One way to encourage your organization to allow you to use the enormous resources of the CPAN is to explain it properly. In particular don't characterize your intent as "importing unknown software"; characterize it as "exporting known development delays, testing requirements, and maintenance costs".
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 18: Testing and Debugging
Inhaltsvorschau
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it.
—Brian Kernighan
Most people recognize that testing and debugging are somehow related; that debugging is the natural consequence of testing, and that testing is a natural tool during debugging.
But, when used correctly, testing and debugging are actually antagonistic: the better your testing, the less you'll need to debug. Better testing habits repay themselves many times over, by reducing the effort required to diagnose, locate, and fix bugs.
Testing and debugging are huge topics, and a single chapter like this can only outline the simplest and most universal practices. For much deeper explorations of the possibilities, see Perl Testing: A Developer's Notebook (O'Reilly, 2005), Perl Debugged (Addison Wesley, 2001), and Perl Medic (Addison Wesley, 2004).
Write the test cases first.
Probably the single best practice in all of software development is writing your test suite first.
A test suite is an executable, self-verifying specification of the behaviour of a piece of software. If you have a test suite, you can—at any point in the development process—verify that the code works as expected. If you have a test suite, you can—after any changes during the maintenance cycle—verify that the code is still working as expected.
So write the tests first. Write them as soon as you know what your interface will be (see "Interfaces" in Chapter 17). Write them before you start coding your application or module. Because unless you have tests, you have no unequivocal specification of what the software is supposed to do, and no way of knowing whether it does it.
Standardize your tests with Test::Simple or Test::More .
Writing tests always seems like a chore, and an unproductive chore at that: you don't have anything to test yet, so why write tests? And yet, most developers will—almost automatically—write driver software to test their new module in an ad hoc way:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Test Cases
Inhaltsvorschau
Write the test cases first.
Probably the single best practice in all of software development is writing your test suite first.
A test suite is an executable, self-verifying specification of the behaviour of a piece of software. If you have a test suite, you can—at any point in the development process—verify that the code works as expected. If you have a test suite, you can—after any changes during the maintenance cycle—verify that the code is still working as expected.
So write the tests first. Write them as soon as you know what your interface will be (see "Interfaces" in Chapter 17). Write them before you start coding your application or module. Because unless you have tests, you have no unequivocal specification of what the software is supposed to do, and no way of knowing whether it does it.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Modular Testing
Inhaltsvorschau
Standardize your tests with Test::Simple or Test::More .
Writing tests always seems like a chore, and an unproductive chore at that: you don't have anything to test yet, so why write tests? And yet, most developers will—almost automatically—write driver software to test their new module in an ad hoc way:

    > cat try_inflections.pl



            # Test my shiny new English inflections module...

    use Lingua::EN::Inflect qw( inflect );



    # Try some plurals (both standard and unusual inflections)...

    my %plural_of = (

        'house'         => 'houses',

        'mouse'         => 'mice',

        'box'           => 'boxes',

        'ox'            => 'oxen',

        'goose'         => 'geese',

        'mongoose'      => 'mongooses',

        'law'           => 'laws',

        'mother-in-law' => 'mothers-in-law',

    );



    # For each of them, print both the expected result and the actual inflection...

    for my $word ( keys %plural_of ) {

        my $expected = $plural_of{$word};

        my $computed = inflect( "PL_N($word)" );



        print "For $word:\n",

              "\tExpected: $expected\n",

              "\tComputed: $computed\n";

    }
A driver like that is actually harder to write than a test suite, because you have to worry about formatting the output in a way that is easy to read. And it's much harder to use the driver than it would be to use a test suite, because every time you run it you have to wade though that formatted output and verify "by eye" that everything is as it should be:

    > perl try_inflections.pl



    For house:

        Expected: houses



        Computed: houses

    For law:

        Expected: laws

        Computed: laws

    For mongoose:

        Expected: mongooses

        Computed: mongeese

    For goose:

        Expected: geese

        Computed: geese

    For ox:

        Expected: oxen

        Computed: oxen

    For mother-in-law:

        Expected: mothers-in-law

        Computed: mothers-in-laws

    For mouse:

        Expected: mice

        Computed: mice

    For box:

        Expected: boxes

        Computed: boxes
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Test Suites
Inhaltsvorschau
Standardize your test suites with Test::Harness .
Once you've written your tests using one of the Test:: modules, in a series of .t files in the t/ subdirectory (as described in the previous guideline, "Modular Testing"), you can use the Test::Harness module to make it easier to run all the test files in your test suite.
The module is specifically designed to understand and summarize the output format used by Test::Simple and Test::More. It comes with an invaluable utility program named prove, which makes it trivially easy to run all the tests in your /t directory and have the results summarized for you:

            

    > prove -r





    t/articles........ok

    t/inflections.....NOK 3#     Failed test (inflections.t at line 21)

    t/inflections.....NOK 6#     Failed test (inflections.t at line 21)

    t/inflections.....ok 8/0# Looks like you failed 2 tests of 8.

    t/inflections.....dubious

    t/other/conjunctions....ok

    t/verbs/participles.....ok



    Failed 1/4 test scripts, 75.00% okay. 2/119 subtests failed, 98.32% okay.
The -r option tells prove to recursively search through subdirectories looking for .t files to test. You can also specify precisely where to look for tests by explicitly telling prove the directory or file:

            

    > prove t/other





    t/other/conjunctions....ok



    All tests successful.
The utility has many other options that allow you to preview which tests will be run (without actually running them), change the file extension that is searched for, run tests in a random order (to catch any order dependencies), run tests in taint mode (see the perlsec manpage), or see the individual results of every test rather than just a summary.
Using a standard testing setup and a coordinating utility like this, it's trivial to regression test each modification you make to a module or application. Every time you modify the source of your module or application, you simply type
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Failure
Inhaltsvorschau
Write test cases that fail.
Testing is not actually about ensuring correctness; it's about discovering mistakes. The only successful test is one that fails, and thereby reveals a bug.
To use testing effectively, it's vital to get into the right (i.e., slightly counterintuitive) mindset when writing tests. You need to get to the point where you're mildly disappointed if the test suite runs without reporting a problem.
The logic behind that disappointment is simple. All non-trivial software has bugs. Your test suite's job is to find those bugs. If your software passes your test suite, then your test suite isn't doing its job.
Of course, at some point in the development process you have to decide that the code is finally good enough to deploy (or ship). And, at that point, you definitely want that code to pass its test suite before you send it out. But always remember: it's passing the test suite because you decided you'd found all the bugs you cared to test for, not because there were no more bugs to find.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
What to Test
Inhaltsvorschau
Test both the likely and the unlikely.
Having a test suite that fails to fail might not be a major problem, so long as your tests cover the most common ways in which your software will actually be used. The single most important practice here is to run your tests on real-world cases.
That is, if you're building software to handle particular datasets or data streams, test it using actual samples of that data. And make sure those samples are of a similar size to the data on which the software will eventually need to operate.
Play-testing (see Chapter 17) can also come in handy here. If you (or other prospective users) have prototyped the kind of code you expect to write, then you should test the kinds of activities that your exploratory code implements, and the kinds of errors that you made when writing that code. Better yet, just write your hypothetical code as a test suite, using one of the Test:: modules. Then, when you're ready to implement, your test suite will already be in place.
Testing the most likely uses of your software is essential, but it's also vital to write tests that examine both edge-cases (i.e., one parameter with an extreme or unusual value) and corner-cases (i.e., several parameters with an extreme or unusual value).
Good places to hunt for bad behaviour include:
  • The minimum and maximum possible values
  • Slightly less than the minimum possible value and slightly more than the maximum possible value
  • Negative values, positive values, and zero
  • Very small positive and negative values
  • Empty strings and multiline strings
  • Strings with control characters (including "\0")
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Debugging and Testing
Inhaltsvorschau
Add new test cases before you start debugging.
The first step in any debugging process is to isolate the incorrect behaviour of the system, by producing the shortest demonstration of it that you reasonably can. If you're lucky, this may even have been done for you:

            

    To: DCONWAY@cpan.org

    From: sascha@perlmonks.org

    Subject: Bug in inflect module



    Zdravstvuite,



    I have been using your Lingua::EN::Inflect module to normalize terms in a

    data-mining application I am developing, but there seems to be a bug in it,

    as the following example demonstrates:



        use Lingua::EN::Inflect qw( PL_N );



        print PL_N('man'), "\n";       # Prints "men", as expected

        print PL_N('woman'), "\n";     # Incorrectly prints "womans"

         
Once you have distilled a short working example of the bug, convert it to a series of tests, such as:

            

    use Lingua::EN::Inflect qw( PL_N );

    use Test::More qw( no_plan );



    is(PL_N('man') ,  'men',   'man -> men'     );

    is(PL_N('woman'), 'women', 'woman -> women' );

         
Don't try to fix the problem straightaway. Instead, immediately add those tests to your test suite. If that testing has been well set up, that can often be as simple as adding a couple of entries to a table:

            

    my %plural_of = (

        'mouse'         => 'mice',

        'house'         => 'houses',

        'ox'            => 'oxen',

        'box'           => 'boxes',

        'goose'         => 'geese',

        'mongoose'      => 'mongooses',

        'law'           => 'laws',

        'mother-in-law' => 'mothers-in-law',



        

                  # Sascha's bug, reported 27 August 2004...

               

        'man'           => 'men',

        'woman'         => 'women',

    );

         
The point is: if the original test suite didn't report this bug, then that test suite was
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Strictures
Inhaltsvorschau
Always use strict .
Making use strict your default will help perl (the interpreter) pick up a range of frequently made mistakes caused by Perl (the language) being overhelpful. For example, use strict detects and reports—at compile time—the common error of writing:

    my $list = get_list();



    # and later...



    print $list[-1];             # Oops! Wrong variable

         
instead of:

            

    my $list_ref = get_list();



    

                  # and later...

               



    print $list_ref->[-1];

         
But it's also important not to rely too heavily on use strict, or to assume that it's infallible. For example, it won't pick up that incorrect array access in the following example:

    my @list;



    # and later in the same scope...



    my $list = get_list();



    # and later...



    print $list[-1];
That's because now the problem with $list[-1] isn't just that someone forgot the arrow; it's that they're referring to the wrong (valid) variable.
Similarly, the following code contains both symbolic references and unqualified package variables, both of which use strict is supposed to prevent. Yet it compiles without even a warning:

    use strict;

    use warnings;

    use Data::Dumper;



    use Readonly;

    Readonly my $DUMP => 'Data::Dumper::Dumper';

    Readonly my $MAX  => 10;



    # and later...



    sub dump_a {

        my $dump = \&{$DUMP};                  # Symbolic reference



        my @a = (0..$MAX);



        for my $i (0..$#a) {

            $a->[$MAX-$i] = $a->[$i];          # Oops! Wrong variables

            print $dump->($a[$i]);

        }



        return;

    }
The uncaught symbolic reference is in \&{$DUMP}, where $DUMP contains a string, not a subroutine reference. The symbolic access is ignored because
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Warnings
Inhaltsvorschau
Always turn on warnings explicitly.
If you're developing under Perl 5.6 or later, always use warnings at the start of each file. Under earlier versions of Perl, always use the -w command-line flag, or set the $WARNING variable (available from use English) to a true value.
Perl's warning system is invaluable. It can detect more than 200 different questionable programming practices, including common errors like using the wrong sigil on an array access; trying to read an output stream (or vice versa); leaving parentheses off ambiguous declarations; runaway strings with no closing delimiter; dyslexic assignment operators (=-, =+, etc.); using non-numbers as numbers; using | instead of ||, or || instead of or; misspelling a package or class name; mixing up \1 and $1 in a regex; ambiguous subroutine/function calls; and improbable control flow (e.g., returning from a subroutine via a call to next).
Some of these warnings are enabled by default, but all of them are worth enabling.
Not taking advantage of these warnings can result in code like this, which compiles without complaint, even though it has (at least) nineteen distinct problems:

    my $n = 9;

    my $list = (1..$n);



    my $n = <TTY>;



    print ("\n" x lOO, keys %$list), "\n";

    print $list[$i];



    sub keys ($list) {

        $list ||= $_[1], \@default_list;

        push digits, @{$list} =~ m/([A-Za-\d])/g;

        return uc \1;

    }
Under use warnings the awful truth can be revealed:

            

    "my" variable $n masks earlier declaration in same scope at caveat.pl line 4.

    print (...) interpreted as function at caveat.pl line 6.

    Illegal character in prototype for main::keys : $list at caveat.pl line 9.

    Unquoted string "digits" may clash with future reserved word at caveat.pl line 11.

    False [] range "a-\d" in regex; marked by <-- HERE in m/([A-Za-\d <-- HERE ])/

    at caveat.pl line 11.



    Applying pattern match (m//) to @array will act on scalar(@array) at

    caveat.pl line 11.

    Array @digits missing the @ in argument 1 of push() at caveat.pl line 11.

    Useless use of reference constructor in void context at caveat.pl line 10.

    Useless use of a constant in void context at caveat.pl line 6.

    Name "main::list" used only once: possible typo at caveat.pl line 7.

    Name "main::default_list" used only once: possible typo at caveat.pl line 10.

    Name "main::TTY" used only once: possible typo at caveat.pl line 4.

    Name "main::digits" used only once: possible typo at caveat.pl line 11.

    Name "main::i" used only once: possible typo at caveat.pl line 7.

    Use of uninitialized value in range (or flip) at caveat.pl line 2.

    readline() on unopened filehandle TTY at caveat.pl line 4.

    Argument "lOO" isn't numeric in repeat (x) at caveat.pl line 6.

    Use of uninitialized value in array element at caveat.pl line 7.

    Use of uninitialized value in print at caveat.pl line 7
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Correctness
Inhaltsvorschau
Never assume that a warning-free compilation implies correctness.
use strict and use warnings are powerful developments aids, whose insights into the foibles of the typical programmer sometimes border on the magical. It is a serious mistake not to use them at all times.
But, as the examples in the previous guidelines illustrate, they are neither infallible nor omniscient. It may seem counterintuitive, but Perl's extensive list of warnings and strictures can sometimes result in code that is less robust than it otherwise might have been. The comforting knowledge that "use strict will pick up any problems" often engenders a false sense of security, and promotes the illusion that a silent compilation implies a correct compilation.
But no Perl pragma will ever be able to pick out the serious bug in this subroutine:

    sub is_monotonic_increasing {

        my ($data_ref) = @_;

        for my $i (1..$#{$data_ref}) {

            return 0 unless $data_ref->[$i-1] > $data_ref->[$i];

        }

        return 1;

    }
It's foolish not to make use of the very real protections that use strict and use warnings provide. Just don't let those protections make you complacent.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Overriding Strictures
Inhaltsvorschau
Turn off strictures or warnings explicitly, selectively, and in the smallest possible scope.
Sometimes you really do need to implement something arcane; something that would cause use strict or use warnings to complain. In this case, because you'll always be using both those pragmas (see the previous three guidelines, "Strictures", "Warnings", and "Correctness"), you'll need to turn them off temporarily.
The key to doing that without compromising the robustness of your code is to turn off warnings and strictures in the smallest possible scope. And to turn off only the particular warnings you intend to cause or those specific strictures that you're intentionally violating.
For example, suppose you needed a Sub::Tracking module that, when passed the name of a subroutine, would modify that subroutine so that any subsequent call to it was logged. For example:

            

    use Digest::SHA qw( sha512_base64 );



    use Sub::Tracking qw( track_sub );

    track_sub('sha512_base64');



    

                  # and later...

               



    my $text_key

        = sha512_base64($original_text);  

                  # Use of subroutine automatically logged

               

            

         
Such a module might be implemented as in Example 18-1.
Example 18-1. A module for tracking subroutine calls

               

package Sub::Tracking;



use version; our $VERSION = qv(0.0.1);



use strict;

use warnings;

use Carp;

use Perl6::Export::Attrs;

use Log::Stdlog {level => 'trace'};





                     



# Utility to create a tracked version of an existing subroutine...

                  

sub _make_tracker_for {

    my ($sub_name, $orig_sub_ref) = @_;



    

                     # Return a new subroutine...

                  

    return sub {



        

                     # ...which first determines and logs its call context

                  

        my ($package, $file, $line) = caller;

        print {*STDLOG} trace =>

            "Called $sub_name(@_) from package $package at '$file' line $line";



        
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Debugger
Inhaltsvorschau
Learn at least a subset of the perl debugger.
Perl's integrated debugger makes it very easy to watch your program's internal state change as it executes. At the very least, you should be familiar with the basic features summarized in Table 18-1.
Table 18-1: Debugger basics
Debugging task
Debugger command
To run a program under the debugger
> perl -d program.pl
To set a breakpoint at the current line
DB<1> b
To set a breakpoint at line 42
DB<1> b 42
To continue executing until the next break-point is reached
DB<1> c
To continue executing until line 86
DB<1> c 86
To continue executing until subroutine foo is called
DB<1>c foo
To execute the next statement
DB<1> n
To step into any subroutine call that's part of the next statement
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Manual Debugging
Inhaltsvorschau
Use serialized warnings when debugging "manually".
Many developers prefer not to use the debugger. Maybe they don't like the command-line interface, or the way the debugger slows down the execution of their code, or the fact that it actually changes the code it's debugging. Perhaps they just dislike the tedium of stepping through a program statement by statement.
The most popular alternative to using the debugger is to manually insert print statements at relevant points in the code. This has the distinct advantage of altering the code being debugged in limited and predictable ways.
But, if you're going to debug manually, don't use print for your print statements:

    my $results  = $scenario->project_outcomes();



    print "\$results: $results\n";  # debugging only

         
Use warn instead:

            

    my $results  = $scenario->project_outcomes();



    warn "\$results: $results";

         
Because warn statements will not be used anywhere else in your code (see "Reporting Failure" in Chapter 13), using them for debugging makes it very easy to subsequently find your debugging statements. Using warn also conveniently ensures that debugging messages are printed to *STDERR, rather than *STDOUT.
In addition, it's a good practice always to serialize the data structure you're reporting, using Data::Dumper:

            

    my $results  = $scenario->project_outcomes();



    use Data::Dumper qw( Dumper );

    warn '$results:', Dumper($results);

         
By printing the value you're reporting in a structured format, you maximize the information that's subsequently available to help you debug. For example, if the project_outcomes() method was expected to return an Achievements object, then debugging with:

    warn "\$results: $results\n";
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Semi-Automatic Debugging
Inhaltsvorschau
Consider using "smart comments" when debugging, rather than warn statements.
Serialized warnings work well for manual debugging, but they can be tedious to code correctly. And, even with the editor macro suggested earlier, the output of a statement like:

            

    warn 'results: ', Dumper($results);

         
still leaves something to be desired in terms of readability:

    results: $VAR1 = bless( do{\(my $o = undef)}, 'Achievements' )
The Smart::Comments module (previously described under "Automatic Progress Indicators" in Chapter 10) supports a form of smart comment that can help your debugging. For example, instead of:

    use Data::Dumper qw( Dumper );



    my $results  = $scenario->project_outcomes();



    warn '$results: ', Dumper($results);
you could just write:

            

    use Smart::Comments;



    my $results = $scenario->project_outcomes();



    

                  ### $results

               

            

         
which would then output either:

            

    ### $results: <opaque Achievements object (blessed scalar)>

         
or:

            

    ### $results: 'Achievements=SCALAR(0x811130)'

         
depending on whether $results is an actual object reference or merely its stringification.
Smart::Comments also supports comment-based assertions:

            

               

                  

    ### check: @candidates >= @elected

               

            

         
which issue warnings when the specified condition is not met. For example, the previous comment might print:

            

    ### @candidates >= @elected was not true at ch18/Ch18.049_Best line 23.

    ###     @candidates was: [

    ###                        'Smith',

    ###                        'Nguyen',

    ###                        'Ibrahim'

    ###                      ]

    ###     @elected was: [

    ###                     'Smith',

    ###                     'Nguyen',

    ###                     'Ibrahim',

    ###                     'Nixon'

    ###                   ]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 19: Miscellanea
Inhaltsvorschau
Advice is what we ask for when we already know
the answer but wish we didn't.
—Erica Jong
How to Save Your Own Life
This chapter contains a handful of guidelines that do not fit cleanly into any of the previous categories. They cover reasons for revision control , the intricacies of interfacing with other languages, the care and feeding of configuration files, the trouble with tied variables, the complexities of caching, flaws in formats, optimal optimization, and the cunning cruelty of cleverness.
Use a revision control system.
Maintaining control over the creation and modification of your source code is utterly essential for robust team-based development. Just as you wouldn't use an editor without an Undo button or a word processor that can't merge documents, so too you shouldn't use a filesystem you can't rewind, or a development environment that can't integrate the work of many contributors.
Programmers make mistakes, and occasionally those mistakes will be catastrophic. They will reformat the disk with the most recent version of the code. Or they'll mistype an editor macro and write zeros all through the source of a critical core module. Or two developers will unwittingly edit the same file at the same time and half their changes will be lost. Revision control systems can prevent those kinds of problems.
Moreover, occasionally the very best debugging technique is to just give up, stop trying to get yesterday's modifications to work correctly, roll the code back to a known stable state, and start over again. Less drastically, comparing the current condition of your code with the most recent stable version from your repository (even just a line-by-line diff ) can often help you isolate your recent "improvements" and work out which of them is the problem.
Revision control systems such as RCS, CVS, Subversion, Monotone, darcs, Perforce, GNU arch, or BitKeeper can protect against calamities, and ensure that you always have a working fallback position if maintenance goes horribly wrong. The various systems have different strengths and limitations, many of which stem from fundamentally different views on what exactly revision control is. So it's a good idea to audition the various revision control systems and find the one that works best for you.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Revision Control
Inhaltsvorschau
Use a revision control system.
Maintaining control over the creation and modification of your source code is utterly essential for robust team-based development. Just as you wouldn't use an editor without an Undo button or a word processor that can't merge documents, so too you shouldn't use a filesystem you can't rewind, or a development environment that can't integrate the work of many contributors.
Programmers make mistakes, and occasionally those mistakes will be catastrophic. They will reformat the disk with the most recent version of the code. Or they'll mistype an editor macro and write zeros all through the source of a critical core module. Or two developers will unwittingly edit the same file at the same time and half their changes will be lost. Revision control systems can prevent those kinds of problems.
Moreover, occasionally the very best debugging technique is to just give up, stop trying to get yesterday's modifications to work correctly, roll the code back to a known stable state, and start over again. Less drastically, comparing the current condition of your code with the most recent stable version from your repository (even just a line-by-line diff ) can often help you isolate your recent "improvements" and work out which of them is the problem.
Revision control systems such as RCS, CVS, Subversion, Monotone, darcs, Perforce, GNU arch, or BitKeeper can protect against calamities, and ensure that you always have a working fallback position if maintenance goes horribly wrong. The various systems have different strengths and limitations, many of which stem from fundamentally different views on what exactly revision control is. So it's a good idea to audition the various revision control systems and find the one that works best for you. Pragmatic Version Control Using Subversion, by Mike Mason (Pragmatic Bookshelf, 2005) and Essential CVS, by Jennifer Vesperman (O'Reilly, 2003) are useful starting points.
After all, rm * is never more than half a dozen keystrokes away.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Other Languages
Inhaltsvorschau
Integrate non-Perl code into your applications via the Inline:: modules.
Occasionally you may need to use code resources that are not written in Perl. Most often this will be C code, but it might also be C++, Java, Python, Ruby, Tcl, Scheme, AWK, or even Basic.
The CPAN provides interface tools for hooking all of these languages up to a Perl program, but most of those tools are very challenging to use correctly. By far the most frequently used is xsubpp, a compiler for Perl's own "XS" interface description language (see the perlxstut manpage).
Hooking Perl to C using XS requires you to write a shell .pm module to bootstrap an object file that has been compiled from C code, which was in turn generated by xsubpp from a .xs source file containing pseudo-C annotated with an XS interface description. If that sounds horribly complicated, then you have achieved an accurate understanding of the use of xsubpp. Example 19-1 shows just how much work is involved in even a very simple example.
Example 19-1. Creating a fast C-based rounding subroutine using XS

> cat Round.pm



package Round;

use strict;

use warnings;



use base qw( Exporter DynaLoader );

our $VERSION = '0.01';



@EXPORT = qw( round );



bootstrap Round $VERSION;



1;

_ _END_  _



> cat rounded.pl



use Round;

use IO::Prompt;



while (my $num = prompt -num => 'Enter a number: ') {

    print rounded($num), "\n";

}



> cat Round.xs



#include "EXTERN.h"

#include "perl.h"

#include "XSUB.h"



MODULE = Round     PACKAGE = Round



int

rounded(arg)

    double  arg

CODE:

    int res;

    /* Round towards zero... */

    if (arg > 0.0)      { res = floor(arg + 0.5); }

    else if (arg < 0.0) { res = ceil(arg - 0.5); }

    else                { res = 0; }

OUTPUT:

    res



> cat Makefile.PL



use ExtUtils::MakeMaker;

WriteMakefile(

    NAME         => 'Round',

    VERSION_FROM => 'Round.pm',

    LIBS         => ['-lm'],

);





> 
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Configuration Files
Inhaltsvorschau
Keep your configuration language uncomplicated.
If you're going to provide a configuration mechanism for your application, make it declarative and minimal. Keep in mind that configuration files are one of the few components of your system that are directly read by end-users, so they need to be simple. They're also one of the few components of your system that are directly written by end-users. So they need to be even simpler.
It's almost always enough to just support some variation on the widely used INI file format: named sections, individual key/value pairs, multiline values, repeated values (or lists), and comments. Example 19-3 shows a typical configuration file with all of those features.
Example 19-3. A simple configuration language

               > cat ~/.demorc





[Interface]

# Configurable bits that others will see...



Author: Jan-Yu Eyrie

E-mail: eju@calnet



Disclaimer: This code is provided AS IS, and comes with

          : ABSOLUTELY NO WARRANTY OF ANY KIND WHATSOEVER!



          : It's buggy, slow, and will almost certainly

          : break your computer. Use at your own risk!



[Internals]

# Stuff no-one else sees...



# Look-up path for plug-ins...

lib: ~/lib/perl5

lib: ~/lib/perl

lib: /usr/share/lib/perl



[strict]    # Don't allow malformed inputs

[verbose]   # Report every step

[log]       # And log every transaction

            
Fancier features like nested or hierarchical data structures, separate syntaxes for lists and scalar values, special notations for boolean configuration variables, or character escapes, are almost always a bad idea. The extra syntax will confuse most users and—worse—make it far more likely that they'll inadvertently type something that's valid, but not what they intended.
Don't use XML as your configuration file format. It may be human-readable, but it's almost never human-comprehensible, and the ratio of mark-up to content is vastly too high. No-one wants to write or maintain a configuration file that looks like Example 19-4.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Formats
Inhaltsvorschau
Don't use formats.
The format statement is one of the oldest and most fundamental features of Perl. It implements the original "R" of the "Practical Extraction and Reporting Language".
And even here in the 21st century—where data is more typically restructured, marked-up, CSS'd, JavaScripted, hyperlinked, and finally browsed—a simple text-based report is still often a cleaner and more usable alternative, especially in command-line environments:

            

    > contacts -find 'Damian'





     ==================================

    | NAME           | AGE | ID NUMBER |

    |----------------+-----+-----------|

    | Damian M.      | 40  |    869942 |

    | Conway         |     |           |

    |==================================|

    | COMMENTS                         |

    |----------------------------------|

    | Do not feed after midnight. Do   |

    | not mix with quantum physics. Do |

    | not allow subject to talk for    |

    | "as long as he likes".           |

     ==================================

         
But building such a report with format, as in Example 19-6, has some serious drawbacks, especially in terms of best-practice programming. For a start, formats are statically defined (i.e., specified at compile time), so it's difficult to build a format as your program executes; you have to resort to a string eval (see Chapter 9). Formats rely on global variables for configuration, and on package variables for the data they are to format (see Chapter 5). They also have to write their formatted text to a named filehandle (see Chapter 10). That's three best-practice strikes against formats already.
Example 19-6. Building a report with format

               # Predeclare report format with the necessary package variables...

our ($name, $ID, $age, $comments);



format CONTACT =

 ==================================

| NAME           | AGE | ID NUMBER |

|----------------+-----+-----------|

| ^<<<<<<<<<<<<< | ^|| | ^>>>>>>>> |~~

  $name,           $age, $ID,

|==================================|

| COMMENTS                         |



|----------------------------------|

| ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< |~~

  $comments,

 ==================================

.



Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Ties
Inhaltsvorschau
Don't tie variables or filehandles.
Ties provide a way of replacing the behaviour any type of variable, or of a filehandle. The full mechanism is described in the standard perltie documentation.
Tied variables look exactly like ordinary scalars, arrays, or hashes, but they don't act exactly like them. Their whole purpose is to hide special non-standard behaviour inside a familiar interface. As such, they can be wonderfully Perlish and Lazy, making it easy (for example) to create a variable that automatically self-increments every time its value is accessed:

            

    # Create a variable whose value cycles from zero to five...

    use Tie::Cycle;

    tie my $next_index, 'Tie::Cycle', [0..5];



    # Read in monthly results...

    my @cyclic_buffer;

    while (my $next_val = prompt 'Next: ') {

        # Saving them in a six-month cyclic buffer...

        $cyclic_buffer[$next_index] = $next_val;



        



        # And printing the moving average each month...

        print 'Half-yearly moving average: ',

              sum(@cyclic_buffer)/@cyclic_buffer, "\n";

    }
Every time $next_index is used as an index into @cyclic_buffer, it moves on to the next value in [0..5]. When there are no more values, it loops back to zero and starts again. So $cyclic_buffer[$next_index] is always the next element in the cyclic buffer, even though $next_index is never explicitly incremented or reset.
And that's the problem. If $next_index had been tied further away from the loop, it might easily seem to some maintainer that every new value is being assigned into the same element of the buffer. Tied variables make any code that uses them less maintainable, because they make normal variable operations behave in unexpected, non-standard ways.
They're also less efficient. A tied variable is actually a wrapper around some blessed object, and so every access on any tied variable requires a method call (instead of being implemented in highly optimized C code).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Cleverness
Inhaltsvorschau
Don't be clever.
Tied variables are a clever idea, but "cleverness" is the natural enemy of maintainable code. Unfortunately, Perl provides endless opportunities for cleverness.
For example, imagine coming across this result selector in production code:

    $optimal_result = [$result1=>$result2]->[$result2<=$result1];
The syntactic symmetry is very elegant, of course, and devising it obviously provided the original developer with a welcome diversion from the tedium of everyday coding. But a clever line of code like that is a (recurring) nightmare to understand and to maintain, and imposes an unnecessary burden on everyone in the development and maintenance teams.
Cleverness doesn't have to be nearly that flagrant either. Having finally deduced that the example expression returns the smaller of the two results, you would almost certainly be tempted to immediately replace it with something like the following:

    $optimal_result = $result1 <= $result2 ? $result1 : $result2;
While that's certainly an improvement in both readability and efficiency, it still requires some careful thought to verify that it's doing the right (i.e., minimizing) thing. And everyone who maintains this code will still have to decode that expression—possibly every time they come across it.
However, it's also possible to write that same expression in a way that's so obvious, straightforward, and plain-spoken that it requires no effort at all to verify that it implements the desired behaviour:

            

    use List::Util qw( min );



    $optimal_result = min($result1, $result2);

         
It's not "clever" and it's even marginally slower, but it is clean, clear, efficient, scalable, and easy to maintain. And that's always a much better choice.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Encapsulated Cleverness
Inhaltsvorschau
If you must rely on cleverness, encapsulate it.
Very occasionally a genuine need for efficiency may (appear to) make it essential to use non-obvious Perl idioms such as:

            

    # Make sure the requests are unique...

    @requests  = keys %{ {map {$_=>1} @raw_requests} };
This statement takes each request in @raw_requests, converts it to a pair ($_=>1) in which the request is now the key, and uses that list of pairs to initialize an anonymous hash ({map {$_=>1} @raw_requests }), which folds every repeated request into the same hash key. The hash is then dereferenced (%{ {map {$_=>1} @raw_requests}), and its unique keys are retrieved (keys %{ {map {$_=>1} @raw_requests} }) and finally assigned into @requests.
But an expression that complex should never be left raw in code. If it's kept at all, it should be kept as a dirty little secret, shamefully hidden away in a subroutine in some dark corner of your code:

            

    sub unique {

        return keys %{ { map {$_=>1} @_ } };  

                  # Mea culpa!

               

    }



    

                  # and later...

               



    @requests = unique(@raw_requests);

         
Apart from the obvious advantage that the request-handling code becomes vastly more readable, encapsulating the cleverness has another important benefit: when the cleverness proves not to be as clever as you first thought, it's very easy to replace it with something that's both slightly more readable and very much more efficient:

            

    sub unique {

        my %uniq;            

                  # Use keys of this hash to track unique values

               

        @uniq{@_} = ();      

                  # Use the args as those keys (the values don't matter)

               

        return keys %uniq;   

                  # Return those unique values

               

    }

         
In this version, the list of values that's passed in (
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Benchmarking
Inhaltsvorschau
Don't optimize code—benchmark it.
It's natural to think that a single expression like:

    keys %{ { map {$_=>1} @_ } }
will be more efficient than two statements:

            

    my %seen;

    return grep {!$seen{$_}++} @_;

         
But, unless you are deeply familiar with the internals of the Perl interpreter, intuitions about the relative performance of two constructs are exactly that: unconscious guesses.
The only way to know for sure which of two—or more—alternatives will perform better is to actually time each of them. The standard Benchmark module makes that easy, as Example 19-8 illustrates.
Example 19-8. Benchmarking the uniqueness functions

               

                  

                     # A short list of not-quite-unique values...

                  

our @data = qw( do re me fa so la ti do );





                     # Various candidates...

                  

sub unique_via_anon {

    return keys %{ { map {$_=>1} @_ } };

}



sub unique_via_grep {

    my %seen;

    return grep { !$seen{$_}++ } @_;

}



sub unique_via_slice {

    my %uniq;

    @uniq{@_} = ();

    return keys %uniq;

}





                     



# Compare the current set of data in @data

                  

sub compare {

    my ($title) = @_;



    print "\n[$title]\n";



    

                     # Create a comparison table of the various timings, making sure that

    # each test runs at least 10 CPU seconds...

                  

    use Benchmark qw( cmpthese );

    cmpthese -10, {

        anon   => 'my @uniq = unique_via_anon(@data)',

        grep   => 'my @uniq = unique_via_grep(@data)',

        slice  => 'my @uniq = unique_via_slice(@data)',

    };



    return;

}



compare('8 items, 10% repetition');





                     # Two copies of the original data...

                  

@data = (@data) x 2;

compare('16 items, 56% repetition');



Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Memory
Inhaltsvorschau
Don't optimize data structures—measure them.
Intuitions about the relative space efficiency of different data structures aren't very reliable, either. If you are concerned about the memory footprint of a data structure that you are using, the Devel::Size module makes it easy to see how heavy the burden actually is:

            

               

                  

    # This look-up table is handy, but seems to be too bloated...

               

    my %lookup = load_lookup_table($file);



    

                  # So let's look at how much memory it's using...

               

    use Devel::Size qw( size total_size );

    use Perl6::Form;



    my $hash_mem  = size(\%lookup);           

                  # Storage overheads only

               

    my $total_mem = total_size(\%lookup);     

                  # Overheads plus actual data

               

    my $data_mem  = $total_mem - $hash_mem;   

                  # Data only

               



    print form(

        'hash alone: {>>>,>>>,>>} bytes',  $hash_mem,

        'data alone: {>>>,>>>,>>} bytes',  $data_mem,

        '============================',

        'total:      {>>>,>>>,>>} bytes',  $total_mem,

    );

         
That might print something like:

            

    hash alone:    8,704,075 bytes

    data alone:    8,360,250 bytes

    ==============================

    total:        17,064,325 bytes

         
which indicates that storing your 8.36MB of data in a hash has incurred an overhead of an additional 8.70MB for buckets, hash tables, keys, and other internals.
The total_size() subroutine takes a reference to a variable and returns the total number of bytes of memory used by that variable. This includes both:
  • The memory that the variable uses for its own implementation. For example, the buckets that are needed to implement a hash, or the flag bits that are used inside every scalar.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Caching
Inhaltsvorschau
Look for opportunities to use caches.
It makes sense not to do the same calculation twice, if the result is small enough that it can reasonably be stored for reuse. The simplest form of that is putting a result into an interim variable whenever it will be used more than once. That is, instead of calling the same functions twice on the same data:

    print form(

        'hash alone: {>>>,>>>,>>} bytes', size(\%lookup),

        'data alone: {>>>,>>>,>>} bytes', total_size(\%lookup)-size(\%lookup),

        '==============================',

        'total:      {>>>,>>>,>>} bytes', total_size(\%lookup),

    );
call them once, store the results temporarily, and retrieve them each time they're needed:

            

    my $hash_mem  = size(\%lookup);

    my $total_mem = total_size(\%lookup);

    my $data_mem  = $total_mem - $hash_mem;



    print form(

        'hash alone: {>>>,>>>,>>} bytes',  $hash_mem,

        'data alone: {>>>,>>>,>>} bytes',  $data_mem,

        '==============================',



        'total:      {>>>,>>>,>>} bytes',  $total_mem,

    );

         
This often has the additional benefit of allowing you to name the interim values in ways that make the code more comprehensible.
Subroutines like size() and total_size() and functions like rand() or readline() don't always return the same result when called with the same arguments. Such subroutines are good candidates for temporary and localized reuse of results, but not for longer-term caching.
On the other hand, pure functions like sqrt() and int() and crypt() do always return the same result for the same list of arguments, so their return values can be stored long-term and reused whenever they're needed again. For example, if you have a subroutine that returns a case-insensitive SHA-512 digest:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Memoization
Inhaltsvorschau
Automate your subroutine caching.
The logic required to implement a caching strategy is always the same: check whether the result is already cached; otherwise, compute and cache it; either way, return the cached result. So, as usual, there's a CPAN module that automates the task: Memoize.
To add caching to a subroutine (a process called memoization ), you simply define the subroutine without any caching, and load Memoize. The module will automatically export a memoize() subroutine, which you then call, passing it a string containing the name of the subroutine you want cached. Like so:

            

    sub lc_digest {

        my ($text) = @_;



        use Digest::SHA qw( sha512 );

        return sha512(lc $text);

    }



    use Memoize;

    memoize( 'lc_digest' );

         
Notice how much cleaner this is than the "manually cached" version in Example 19-9.
It's also more reliable, as you can focus on getting the computation correct, and leave the details of the caching strategy to Memoize. For example, the caches that the module installs correctly differentiate between subroutine calls in list and scalar context. This is important, because the same subroutine called with the same arguments might still return different values, depending on whether it was expected to return a list or a single value. Forgetting this distinction is a very common error when implementing caching manually.
The memoize() subroutine has many other options for fine-tuning the kinds of caching it confers. The module documentation provides detailed descriptions of the many possibilities, including caching results in a database so that the cache persists between executions of your program.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Caching for Optimization
Inhaltsvorschau
Benchmark any caching strategy you use.
Caching is a strategy that would seem to have no downside. After all, computing a value only once is obviously always going to be quicker than recomputing it many times. That's true, of course, but it isn't the whole story. Occasionally, caching can backfire and actually make a computation slower.
It's certainly the case that computing once is always quicker than recomputing every time. However, caching isn't quite a case of computing-once; it's actually a case of computing-once-and-forever-after-rechecking-whether-you've-already-computed-and-if-so-then-accessing-the-previously-computed-value. That more complicated process may not always be quicker than recomputing every time. Searching and then accessing a look-up table has an intrinsic cost, which can occasionally be greater than redoing the entire calculation. Especially if the look-up table is a hash.
So, whenever you decide to add caching to a computation, it's essential to benchmark the resulting code, to make sure that the cache look-up costs aren't more expensive that the computation itself. For example, for the pixel square roots from the previous guideline, a simple speed comparison:

            

    use Benchmark qw( cmpthese );



    my @sqrt_of = map {sqrt $_} 0..255;



    cmpthese -30, {

        recompute      => q{ for my $n (0..255) { my $res = sqrt $n      } },

        look_up_array  => q{ for my $n (0..255) { my $res = $sqrt_of[$n] } },

    };

         
reveals that, in this instance, using a look-up table is only about 9% faster than just calling sqrt directly every time:

            

                     Rate         recompute   look_up_array

    recompute       3951/s            --             -8%

    look_up_array   4291/s            9%             --

         
You then need to decide whether that marginal performance improvement is enough to warrant the additional complexity in the code.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Profiling
Inhaltsvorschau
Don't optimize applications—profile them.
In the previous guideline, the benchmarked comparison between repeatedly computing sqrt $pixel_value and repeatedly looking up $sqrt_of[$pixel_value] indicated that caching provided a 9% improvement:

            

                     Rate         recompute   look_up_array

    recompute       3951/s            --             -8%

    look_up_array   4291/s            9%             --

         
That sounds impressive, but it's important to keep those numbers in perspective. Each iteration of the test did 256 square root retrievals. So, overall, the test was achieving 1,011,456 (i.e., 3951 × 256) sqrt calls per second, compared to 1,098,496 @sqrt_of look-ups per second.
Suppose you were processing the 786,432 pixels of a typical 1024 × 768 image. Using the example performance figures, the repeated sqrt calls would require around 0.78 seconds to process that many pixels, whereas the look-up table would take only about 0.72 seconds. Adding a cache to this section of your code would save you a grand total of 0.06 seconds per image.
That's an all-too-common outcome when code is optimized: developers focus their efforts on those components that are easy to optimize, rather than on those components in which improvements will produce the greatest benefit.
How do you find those places where optimization will do the most good? By understanding where your application spends most of its time. And the easiest way to do that is to profile your program using the standard Devel::DProf module, which can determine how long your application spends within each subroutine in your source code. That is, instead of running your program in the usual way:

            

    > perl application.pl datafile
run it under the auspices of the profiler module:

            

    > perl -d:DProf application.pl datafile
The
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Enbugging
Inhaltsvorschau
Be careful to preserve semantics when refactoring syntax.
The guidelines in this book are designed to improve the robustness, efficiency, and maintainability of your code. However, you need to take extra care when applying them retrospectively to the source of existing applications.
If you're rewriting existing code to bring it into line with the practices suggested here, be sure that you preserve the existing behaviour of the code through the changes you make. For example, if you have a loop such as:

    for (@candidates) { next unless m/^Name: (.+?); $/; $target_name = $1 and last }
you might refactor it to:

            

    # Find the first candidate with a valid Name: field...

    CANDIDATE:

    for my $candidate (@candidates) {

        # Extract the contents of the Name: field...

        my ($name)

            = $candidate =~ m/^Name: (.+?); $/xms;



        # ...or try elsewhere...

        next CANDIDATE if !defined $name;



        # If name found, save it and we're done...

        $target_name = $name;

        last CANDIDATE;

    }
However, adding the /xms (as recommended in Chapter 12) will alter the semantics of the pattern inside the regular expression. Specifically, it will change the meaning of ^, $, .+?, and the space character. Even though the pattern's syntax didn't change, its behaviour did. That's a particularly subtle way to break a piece of code.
In this case, you have to make sure that you apply all of the guidelines in Chapter 12, changing the pattern as well as the flags, so as to preserve the original behaviour:

            

               

                  

    # Find the first candidate with a valid Name: field...

               

    CANDIDATE:

    for my $candidate (@candidates) {

        

                  # Extract the Name: field...

               

        my ($name)

            = $candidate =~ m{\A Name: \s+ ([^\N]+) ; \s+ \n? \z}xms;



        
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Appendix A: Essential Perl Best Practices
Inhaltsvorschau
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Appendix B: Perl Best Practices
Inhaltsvorschau
This appendix lists the complete set of 256 guidelines presented in this book. The section heading under which each guideline appears is also provided in square brackets.

Section B.1: Chapter 2, Code Layout

Section B.2: Chapter 3, Naming Conventions

Section B.3: Chapter 4, Values and Expressions

Section B.4: Chapter 5, Variables

Section B.5: Chapter 6, Control Structures

Section B.6: Chapter 7, Documentation

Section B.7: Chapter 8, Built-in Functions

Section B.8: Chapter 9, Subroutines

Section B.9: Chapter 10, I/O

Section B.10: Chapter 11, References

Section B.11: Chapter 12, Regular Expressions

Section B.12: Chapter 13, Error Handling

Section B.13: Chapter 14, Command-Line Processing

Section B.14: Chapter 15, Objects

Section B.15: Chapter 16, Class Hierarchies

Section B.16: Chapter 17, Modules

Section B.17: Chapter 18, Testing and Debugging

Section B.18: Chapter 19, Miscellanea

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Brace and parenthesize in K&R style. [Bracketing]
  • Separate your control keywords from the following opening bracket. [Keywords]
  • Don't separate subroutine or variable names from the following opening bracket. [Subroutines and Variables]
  • Don't use unnecessary parentheses for builtins and "honorary" builtins. [Builtins]
  • Separate complex keys or indices from their surrounding brackets. [Keys and Indices]
  • Use whitespace to help binary operators stand out from their operands. [Operators]
  • Place a semicolon after every statement. [Semicolons]
  • Place a comma after every value in a multiline list. [Commas]
  • Use 78-column lines. [Line Lengths]
  • Use four-column indentation levels. [Indentation]
  • Indent with spaces, not tabs. [Tabs]
  • Never place two statements on the same line. [Blocks]
  • Code in paragraphs. [Chunking]
  • Don't cuddle an else. [Elses]
  • Align corresponding items vertically. [Vertical Alignment]
  • Break long expressions before an operator. [Breaking Long Lines]
  • Factor out long expressions in the middle of statements. [Non-Terminal Expressions]
  • Always break a long expression at the operator of the lowest possible precedence. [Breaking by Precedence]
  • Break long assignments before the assignment operator. [Assignments]
  • Format cascaded ternary operators in columns. [
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Use grammatical templates when forming identifiers. [Identifiers]
  • Name booleans after their associated test. [Booleans]
  • Mark variables that store references with a _ref suffix. [Reference Variables]
  • Name arrays in the plural and hashes in the singular. [Arrays and Hashes]
  • Use underscores to separate words in multiword identifiers. [Underscores]
  • Distinguish different program components by case. [Capitalization]
  • Abbr idents by prefx. [Abbreviations]
  • Abbreviate only when the meaning remains unambiguous. [Ambiguous Abbreviations]
  • Avoid using inherently ambiguous words in names. [Ambiguous Names]
  • Prefix "for internal use only" subroutines with an underscore. [Utility Subroutines]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Use interpolating string delimiters only for strings that actually interpolate. [String Delimiters]
  • Don't use "" or '' for an empty string. [Empty Strings]
  • Don't write one-character strings in visually ambiguous ways. [Single-Character Strings]
  • Use named character escapes instead of numeric escapes. [Escaped Characters]
  • Use named constants, but don't use constant. [Constants]
  • Don't pad decimal numbers with leading zeros. [Leading Zeros]
  • Use underscores to improve the readability of long numbers. [Long Numbers]
  • Lay out multiline strings over multiple lines. [Multiline Strings]
  • Use a heredoc when a multiline string exceeds two lines. [Here Documents]
  • Use a "theredoc" when a heredoc would compromise your indentation. [Heredoc Indentation]
  • Make every heredoc terminator a single uppercase identifier with a standard prefix. [Heredoc Terminators]
  • When introducing a heredoc, quote the terminator. [Heredoc Quoters]
  • Don't use barewords. [Barewords]
  • Reserve => for pairs. [Fat Commas]
  • Don't use commas to sequence statements. [Thin Commas]
  • Don't mix high- and low-precedence booleans. [Low-Precedence Operators]
  • Parenthesize every raw list. [Lists]
  • Use table-lookup to test for membership in lists of strings; use any() for membership of lists of anything else. [List Membership]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Avoid using non-lexical variables. [Lexical Variables]
  • Don't use package variables in your own development. [Package Variables]
  • If you're forced to modify a package variable, localize it. [Localization]
  • Initialize any variable you localize. [Initialization]
  • use English for the less familiar punctuation variables. [Punctuation Variables]
  • If you're forced to modify a punctuation variable, localize it. [Localizing Punctuation Variables]
  • Don't use the regex match variables. [Match Variables]
  • Beware of any modification via $_. [Dollar-Underscore]
  • Use negative indices when counting from the end of an array. [Array Indices]
  • Take advantage of hash and array slicing. [Slicing]
  • Use a tabular layout for slices. [Slice Layout]
  • Factor large key or index lists out of their slices. [Slice Factoring]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Use block if, not postfix if. [If Blocks]
  • Reserve postfix if for flow-of-control statements. [Postfix Selectors]
  • Don't use postfix unless, for, while, or until. [Other Postfix Modifiers]
  • Don't use unless or until at all. [Negative Control Statements]
  • Avoid C-style for statements. [C-Style Loops]
  • Avoid subscripting arrays or hashes within loops. [Unnecessary Subscripting]
  • Never subscript more than once in a loop. [Necessary Subscripting]
  • Use named lexicals as explicit for loop iterators. [Iterator Variables]
  • Always declare a for loop iterator variable with my. [Non-Lexical Loop Iterators]
  • Use map instead of for when generating new lists from old. [List Generation]
  • Use grep and first instead of for when searching for values in a list. [List Selections]
  • Use for instead of map when transforming a list in place. [List Transformation]
  • Use a subroutine call to factor out complex list transformations. [Complex Mappings]
  • Never modify $_ in a list function. [List Processing Side Effects]
  • Avoid cascading an if. [Multipart Selections]
  • Use table look-up in preference to cascaded equality tests. [Value Switches]
  • When producing a value, use tabular ternaries. [Tabular Ternaries]
  • Don't use do...while loops. [do-while Loops]
  • Reject as many iterations as possible, as early as possible. [Linear Coding]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Distinguish user documentation from technical documentation. [Types of Documentation]
  • Create standard POD templates for modules and applications. [Boilerplates]
  • Extend and customize your standard POD templates. [Extended Boilerplates]
  • Put user documentation in source files. [Location]
  • Keep all user documentation in a single place within your source file. [Contiguity]
  • Place POD as close as possible to the end of the file. [Position]
  • Subdivide your technical documentation appropriately. [Technical Documentation]
  • Use block templates for major comments. [Comments]
  • Use full-line comments to explain the algorithm. [Algorithmic Documentation]
  • Use end-of-line comments to point out subtleties and oddities. [Elucidating Documentation]
  • Comment anything that has puzzled or tricked you. [Defensive Documentation]
  • Consider whether it's better to rewrite than to comment. [Indicative Documentation]
  • Use "invisible" POD sections for longer technical discussions. [Discursive Documentation]
  • Check the spelling, syntax, and sanity of your documentation. [Proofreading]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Don't recompute sort keys inside a sort. [Sorting]
  • Use reverse to reverse a list. [Reversing Lists]
  • Use scalar reverse to reverse a scalar. [Reversing Scalars]
  • Use unpack to extract fixed-width fields. [Fixed-Width Data]
  • Use split to extract simple variable-width fields. [Separated Data]
  • Use Text::CSV_XS to extract complex variable-width fields. [Variable-Width Data]
  • Avoid string eval. [String Evaluations]
  • Consider building your sorting routines with Sort::Maker. [Automating Sorts]
  • Use 4-arg substr instead of lvalue substr. [Substrings]
  • Make appropriate use of lvalue values. [Hash Values]
  • Use glob, not <...>. [Globbing]
  • Avoid a raw select for non-integer sleeps. [Sleeping]
  • Always use a block with a map and grep. [Mapping and Grepping]
  • Use the "non-builtin builtins". [Utilities]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Call subroutines with parentheses but without a leading &. [Call Syntax]
  • Don't give subroutines the same names as built-in functions. [Homonyms]
  • Always unpack @_ first. [Argument Lists]
  • Use a hash of named arguments for any subroutine that has more than three parameters. [Named Arguments]
  • Use definedness or existence to test for missing arguments. [Missing Arguments]
  • Resolve any default argument values as soon as @_ is unpacked. [Default Argument Values]
  • Always return scalar in scalar returns. [Scalar Return Values]
  • Make list-returning subroutines return the "obvious" value in scalar context. [Contextual Return Values]
  • When there is no "obvious" scalar context return value, consider Contextual::Return instead. [Multi-Contextual Return Values]
  • Don't use subroutine prototypes. [Prototypes]
  • Always return via an explicit return. [Implicit Returns]
  • Use a bare return to return failure. [Returning Failure]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Don't use bareword filehandles. [Filehandles]
  • Use indirect filehandles. [Indirect Filehandles]
  • If you have to use a package filehandle, localize it first. [Localizing Filehandles]
  • Use either the IO::File module or the three-argument form of open. [Opening Cleanly]
  • Never open, close, or print to a file without checking the outcome. [Error Checking]
  • Close filehandles explicitly, and as soon as possible. [Cleanup]
  • Use while (<>), not for (<>). [Input Loops]
  • Prefer line-based I/O to slurping. [Line-Based Input]
  • Slurp a filehandle with a do block for purity. [Simple Slurping]
  • Slurp a stream with Perl6::Slurp for power and simplicity. [Power Slurping]
  • Avoid using *STDIN, unless you really mean it. [Standard Input]
  • Always put filehandles in braces within any print statement. [Printing to Filehandles]
  • Always prompt for interactive input. [Simple Prompting]
  • Don't reinvent the standard test for interactivity. [Interactivity]
  • Use the IO::Prompt module for prompting. [Power Prompting]
  • Always convey the progress of long non-interactive operations within interactive applications. [Progress Indicators]
  • Consider using the Smart::Comments module to automate your progress indicators. [Automatic Progress Indicators]
  • Avoid a raw select when setting autoflushes. [Autoflushing]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Wherever possible, dereference with arrows. [Dereferencing]
  • Where prefix dereferencing is unavoidable, put braces around the reference. [Braced References]
  • Never use symbolic references. [Symbolic References]
  • Use weaken to prevent circular data structures from leaking memory. [Cyclic References]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Always use the /x flag. [Extended Formatting]
  • Always use the /m flag. [Line Boundaries]
  • Use \A and \z as string boundary anchors. [String Boundaries]
  • Use \z, not \Z, to indicate "end of string". [End of String]
  • Always use the /s flag. [Matching Anything]
  • Consider mandating the Regexp::Autoflags module. [Lazy Flags]
  • Use m{...} in preference to /.../ in multiline regexes. [Brace Delimiters]
  • Don't use any delimiters other than /.../ or m{...}. [Other Delimiters]
  • Prefer singular character classes to escaped metacharacters. [Metacharacters]
  • Prefer named characters to escaped metacharacters. [Named Characters]
  • Prefer properties to enumerated character classes. [Properties]
  • Consider matching arbitrary whitespace, rather than specific whitespace characters. [Whitespace]
  • Be specific when matching "as much as possible". [Unconstrained Repetitions]
  • Use capturing parentheses only when you intend to capture. [Capturing Parentheses]
  • Use the numeric capture variables only when you're sure that the preceding match succeeded. [Captured Values]
  • Always give captured substrings proper names. [Capture Variables]
  • Tokenize input using the /gc flag. [Piecewise Matching]
  • Build regular expressions from tables. [Tabular Regexes]
  • Build complex regular expressions from simpler pieces. [Constructing Regexes]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Throw exceptions instead of returning special values or setting flags. [Exceptions]
  • Make failed builtins throw exceptions too. [Builtin Failures]
  • Make failures fatal in all contexts. [Contextual Failure]
  • Be careful when testing for failure of the system builtin. [Systemic Failure]
  • Throw exceptions on all failures, including recoverable ones. [Recoverable Failure]
  • Have exceptions report from the caller's location, not from the place where they were thrown. [Reporting Failure]
  • Compose error messages in the recipient's dialect. [Error Messages]
  • Document every error message in the recipient's dialect. [Documenting Errors]
  • Use exception objects whenever failure data needs to be conveyed to a handler. [OO Exceptions]
  • Use exception objects when error messages may change. [Volatile Error Messages]
  • Use exception objects when two or more exceptions are related. [Exception Hierarchies]
  • Catch exception objects in most-derived-first order. [Processing Exceptions]
  • Build exception classes automatically. [Exception Classes]
  • Unpack the exception variable in extended exception handlers. [Unpacking Exceptions]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Enforce a single consistent command-line structure. [Command-Line Structure]
  • Adhere to a standard set of conventions in your command-line syntax. [Command-Line Conventions]
  • Standardize your meta-options. [Meta-options]
  • Allow the same filename to be specified for both input and output. [In-situ Arguments]
  • Standardize on a single approach to command-line processing. [Command-Line Processing]
  • Ensure that your interface, run-time messages, and documentation remain consistent. [Interface Consistency]
  • Factor out common command-line interface components into a shared module. [Interapplication Consistency]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Make object orientation a choice, not a default. [Using OO]
  • Choose object orientation using appropriate criteria. [Criteria]
  • Don't use pseudohashes . [Pseudohashes]
  • Don't use restricted hashes. [Restricted Hashes]
  • Always use fully encapsulated objects. [Encapsulation]
  • Give every constructor the same standard name. [Constructors]
  • Don't let a constructor clone objects. [Cloning]
  • Always provide a destructor for every inside-out class. [Destructors]
  • When creating methods, follow the general guidelines for subroutines. [Methods]
  • Provide separate read and write accessors. [Accessors]
  • Don't use lvalue accessors. [Lvalue Accessors]
  • Don't use the indirect object syntax. [Indirect Objects]
  • Provide an optimal interface, rather than a minimal one. [Class Interfaces]
  • Overload only the isomorphic operators of algebraic classes. [Operator Overloading]
  • Always consider overloading the boolean, numeric, and string coercions of objects. [Coercions]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Don't manipulate the list of base classes directly. [Inheritance]
  • Use distributed encapsulated objects. [Objects]
  • Never use the one-argument form of bless. [Blessing Objects]
  • Pass constructor arguments as labeled values, using a hash reference. [Constructor Arguments]
  • Distinguish arguments for base classes by class name as well. [Base Class Initialization]
  • Separate your construction, initialization, and destruction processes. [Construction and Destruction]
  • Build the standard class infrastructure automatically. [Automating Class Hierarchies]
  • Use Class::Std to automate the deallocation of attribute data. [Attribute Demolition]
  • Have attributes initialized and verified automatically. [Attribute Building]
  • Specify coercions as :STRINGIFY, :NUMERIFY, and :BOOLIFY methods. [Coercions]
  • Use :CUMULATIVE methods instead of SUPER:: calls. [Cumulative Methods]
  • Don't use AUTOLOAD(). [Autoloading]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Design the module's interface first. [Interfaces]
  • Place original code inline. Place duplicated code in a subroutine. Place duplicated subroutines in a module. [Refactoring]
  • Use three-part version numbers. [Version Numbers]
  • Enforce your version requirements programmatically. [Version Requirements]
  • Export judiciously and, where possible, only by request. [Exporting]
  • Consider exporting declaratively. [Declarative Exporting]
  • Never make variables part of a module's interface. [Interface Variables]
  • Build new module frameworks automatically. [Creating Modules]
  • Use core modules wherever possible. [The Standard Library]
  • Use CPAN modules where feasible. [CPAN]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Write the test cases first. [Test Cases]
  • Standardize your tests with Test::Simple or Test::More. [Modular Testing]
  • Standardize your test suites with Test::Harness. [Test Suites]
  • Write test cases that fail. [Failure]
  • Test both the likely and the unlikely. [What to Test]
  • Add new test cases before you start debugging. [Debugging and Testing]
  • Always use strict. [Strictures]
  • Always turn on warnings explicitly. [Warnings]
  • Never assume that a warning-free compilation implies correctness. [Correctness]
  • Turn off strictures or warnings explicitly, selectively, and in the smallest possible scope. [Overriding Strictures]
  • Learn at least a subset of the perl debugger. [The Debugger]
  • Use serialized warnings when debugging "manually". [Manual Debugging]
  • Consider using "smart comments" when debugging, rather than warn statements. [Semi-Automatic Debugging]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
,
Inhaltsvorschau
  • Use a revision control system. [Revision Control]
  • Integrate non-Perl code into your applications via the Inline:: modules. [Other Languages]
  • Keep your configuration language uncomplicated. [Configuration Files]
  • Don't use formats. [Formats]
  • Don't tie variables or filehandles. [Ties]
  • Don't be clever. [Cleverness]
  • If you must rely on cleverness, encapsulate it. [Encapsulated Cleverness]
  • Don't optimize code—benchmark it. [Benchmarking]
  • Don't optimize data structures—measure them. [Memory]
  • Look for opportunities to use caches. [Caching]
  • Automate your subroutine caching. [Memoization]
  • Benchmark any caching strategy you use. [Caching for Optimization]
  • Don't optimize applications—profile them. [Profiling]
  • Be careful to preserve semantics when refactoring syntax. [Enbugging]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Appendix C: Editor Configurations
Inhaltsvorschau
A suitably configured editor can make coding much easier, and code much more robust. Automating common tasks ensures that those tasks are done correctly every time, and automating common formatting requirements means that those requirements can be followed consistently without effort.
The following sections provide additions for the configuration files of five popular text editors. These additions support many of the layout and debugging guidelines recommended in this book.

Section C.1: vim

Section C.2: vile

Section C.3: Emacs

Section C.4: BBEdit

Section C.5: TextWrangler

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
vim
Inhaltsvorschau
vim is one of several successors to the classic Unix text editor vi. You can learn about vim and download the latest open source version for all major operating systems from http://www.vim.org.
The following commands might make useful additions to your .vimrc file:

            

set autoindent                   

                  "Preserve current indent on new lines

               

set textwidth=78                 

                  "Wrap at this column

               

set backspace=indent,eol,start    

                  "Make backspaces delete sensibly

               

 

set tabstop=4                     

                  "Indentation levels every four columns

               

set expandtab                     

                  "Convert all tabs typed to spaces

               

set shiftwidth=4                  

                  "Indent/outdent by four columns

               

set shiftround                    

                  "Indent/outdent to nearest tabstop

               



set matchpairs+=<:>               

                  "Allow % to bounce between angles too



"Inserting these abbreviations inserts the corresponding Perl statement...

               

iab phbp  #! /usr/bin/perl -w

iab pdbg  use Data::Dumper 'Dumper';^Mwarn Dumper [];^[hi

iab pbmk  use Benchmark qw( cmpthese );^Mcmpthese -10, {};^[O

iab pusc  use Smart::Comments;^M^M###

iab putm  use Test::More qw( no_plan );



iab papp  ^[:r ~/.code_templates/perl_application.pl^M



iab pmod  ^[:r ~/.code_templates/perl_module.pm^M

         
For many more ways to customize and enhance vim, see http://www.vim.org/tips/.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
vile
Inhaltsvorschau
vile is another major successor to vi. For more information about vile, including source code and various precompiled distributions, see http://dickey.his.com/vile/vile.html. The following commands might make useful additions to your .vilerc file:

            

               

                  ;Preserve current indent on new lines

               

set autoindent





                  ;Wrap at the 78th column

               

set fillcol=78

set wrapwords





                  ; Use 4-space indents, not tabs

               

set tabspace=4

set shiftwidth=4

set noti





                  ;Allow % to bounce between angles too

               

set fence-pairs="()[]{}<>"





                  ;Inserting these abbreviations inserts the corresponding Perl statement...

               

abb phbp  #! /usr/bin/perl -w

abb pdbg  use Data::Dumper 'Dumper';^Mwarn Dumper [];^[hi

abb pbmk  use Benchmark qw( cmpthese );^Mcmpthese -10, {};^[O

abb pusc  use Smart::Comments;^M^M###

abb putm  use Test::More qw( no_plan );



abb papp  ^[:r ~/.code_templates/perl_application.pl^M

abb pmod  ^[:r ~/.code_templates/perl_module.pm^M

         
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Emacs
Inhaltsvorschau
Emacs is an "extensible, customizable, self-documenting real-time display editor". To learn about Emacs and download its free source code for just about any operating system, see http://www.gnu.org/software/emacs/emacs.html.
The following configuration commands might be useful in your .emacs file:

            

               

                  ;; Use cperl mode instead of the default perl mode

               

(defalias 'perl-mode 'cperl-mode)





                  ;; turn autoindenting on

               

(global-set-key "\r" 'newline-and-indent)





                  ;; Use 4 space indents via cperl mode

               

(custom-set-variables

 '(cperl-close-paren-offset -4)

 '(cperl-continued-statement-offset 4)

 '(cperl-indent-level 4)

'(cperl-indent-parens-as-block t)

 '(cperl-tab-always-indent t))





                  ;; Insert spaces instead of tabs

               

(setq-default indent-tabs-mode nil)





                  ;; Set line width to 78 columns...

               

(setq fill-column 78)

(setq auto-fill-mode t)





                  ;; Use % to match various kinds of brackets...

;; See: http://www.lifl.fr/~hodique/uploads/Perso/patches.el

               

(global-set-key "%" 'match-paren)

(defun match-paren (arg)

  "Go to the matching paren if on a paren; otherwise insert %."

  (interactive "p")

  (let ((prev-char (char-to-string (preceding-char)))

        (next-char (char-to-string (following-char))))

    (cond ((string-match "[[{(<]" next-char) (forward-sexp 1))

          ((string-match "[\]})>]" prev-char) (backward-sexp 1))

          (t (self-insert-command (or arg 1))))))





                  ;; Load an application template in a new unattached buffer...

               

(defun application-template-pl ()

  "Inserts the standard Perl application template"  

                  ; For help and info.

               

  (interactive "*")                                

                  ; Make this user accessible.

               

  (switch-to-buffer "application-template-pl")

  (insert-file "~/.code_templates/perl_application.pl"))

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
BBEdit
Inhaltsvorschau
BBEdit is a popular commercial text editor for Apple computers, considered by many Mac developers to be the best available. You can read about its extensive features, download a demonstration copy of the application, or purchase a full license for the software from http://www.barebones.com/products/bbedit/.
To configure BBEdit with the extra editor features suggested in this book, you might first need to create some local folders (in order to pre-empt the application's default support folder). See the application's user manual for more information.
Then, adjust your preferences settings. In the Preferences > Editor Defaults screen:
  • Turn on Auto-Indent.
  • Turn on Balance While Typing.
  • Turn on Auto-Expand Tabs.
  • Turn on Show Invisibles.
Adjust your tab stops to four spaces. For BBEdit 7, use the configuration panel under TextFonts&Tabs. For BBEdit 8, the option is under TextShow Fonts.
You can create stationery for any boilerplate file templates you wish to be able to load by using BBEdit to create a file containing the desired code. When the code template is ready, select FileSave As... and turn on the "Save as Stationery" option. Save the file to the folder ~/Library/Application Support/BBEdit Support/Stationery/ and it will then be available from the Stationery palette, or via the standard menu item File"New with Stationery". You might, for example, create the stationery files ~/Library/Application Support/BBEdit Support/Stationery/perl application.pl and ~/Library/Application Support/BBEdit Support/Stationery/perl module.pm.
To use an abbreviation in BBEdit, you need to install a Glossary item. First, create the folder ~/Library/Application Support/BBEdit Support/Glossary/Perl.pl/. Then, add a file named debug, with the following contents:

            

use Data::Dumper qw( Dumper );

warn Dumper [ #SELECT##INSERTION# ];
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
TextWrangler
Inhaltsvorschau
TextWrangler is a free text editor from the makers of BBEdit. Although it has a comparatively restricted set of features, it is still extremely capable and easy to use. You can download a free copy of it from http://www.barebones.com/products/textwrangler/.
First, adjust your preferences settings. In the Editor Defaults screen under Preferences:
  • Turn on Auto-Indent.
  • Turn on Balance While Typing.
  • Turn on Auto-Expand Tabs.
  • Turn on Show Invisibles.
Adjust your tab stops to four spaces using the option under TextShow Fonts.
You can create stationery for any boilerplate file templates you wish to load by using TextWrangler to create a file containing the desired code. When the code template is ready, select FileSave As... and turn on the "Save as Stationery" option. Save the file to the folder ~/Library/Application Support/TextWrangler Support/Stationery/. It will then be available from the Stationery palette, or via the standard menu item File"New with Stationery". For example, you might create the stationery files ~/Library/Application Support/TextWrangler Support/Stationery/perl application.pl and ~/Library/Application Support/TextWrangler Support/Stationery/perl module.pm.
To use abbreviations in TextWrangler, you need to write a small Perl script that will generate the text you want by filtering the current selection. First, create the folder ~/Library/Application Support/TextWrangler Support/Unix Support/Unix Filters/. Then, add a file named debug.pl, with the following contents:

            

#! /usr/bin/perl --

print 'use Data::Dumper qw( Dumper );\nwarn Dumper [ ', <>, ' ]';

         
You can then assign this filter to a particular keystroke using the palette available from the WindowsPalettesUnix Filters menu. Thereafter, typing that keystroke will take the current selection, pass it to the standard input of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Appendix D: Recommended Modules and Utilities
Inhaltsvorschau
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Recommended Core Modules
Inhaltsvorschau
Module name
Description
In core since
base
Specifies the base classes of the current package at compile time (see Chapter 16)
5.005
Benchmark
Provides utilities to time fragments of Perl code (see Chapter 19)
5.003
Carp
Provides subroutines that warn or throw exceptions, reporting the problem from the caller's location (see Chapter 13)
5.6
charnames
Enables the use of character names via \N{CHARNAME} string literal escapes (see Chapter 4)
5.6
CPAN
Simplifies the downloading and installation of CPAN modules
5.004
Data::Dumper
Converts data structures into string representations of Perl code (see Chapters 15, 17, and 18)
5.005
Devel::DProf
Profiles Perl code (see Chapter 19)
5.6
English
Defines readable English names for special variables (see Chapter 5)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Recommended CPAN Modules
Inhaltsvorschau
Module name
Description
Recommended version
Attribute::Types
Provides markers that confer type constraints on variables (see Chapter 3)
0.10 or later
Class::Std
Implements encapsulated class hierarchies (see Chapter 16)
Any
Class::Std::Utils
Provides utility functions for producing unique identifiers for any object, for creating anonymous scalars, and for extracting initialization values from a hierarchical initializer list (see Chapter 15)
Any
Config::General
Reads and writes almost any type of configuration file (see Chapter 19)
2.27 or later
Config::Std
Reads and writes simple configuration files, preserving their structure and comments (see Chapter 19)
Any
Config::Tiny
Reads and writes simple "INI" format configuration files with as little code as possible (see Chapter 19)
2.01 or later
Contextual::Return
Simplifies returning different values in different contexts (see Chapter 9)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Utility Subroutines
Inhaltsvorschau
Subroutine
Description
Available From
all()
Returns true if all its arguments are true (see Chapter 8)
List::MoreUtils
anon_scalar()
Returns a reference to an anonymous scalar (see Chapters 15 and 16)
Class::Std::Utils
any()
Returns true if any of its arguments are true (see Chapters 4 and 8)
List::MoreUtils
apply()
Applies a transformation to its list of arguments (see Chapter 8)
List::MoreUtils
blessed()
Returns true if its argument is a reference to a blessed object (see Chapter 8)
Scalar::Util
carp()
Prints a warning like warn does, but reports it from the caller's location (see Chapters 2, 6, 9, and 13)
Carp
cmp_these()
Times a set of alternative code fragments and compares the results in a table (see Chapter 19)
Benchmark
croak()
Throws an exception like die does, but reports it from the caller's location (see Chapters 2, 6, 9, and 13)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Appendix Bibliography: Bibliography
Inhaltsvorschau
Perl Coding and Development Practices, Testing and Debugging
Perl Debugged, Peter J.Scott, Ed Wright, Addison-Wesley, 2001, ISBN: 0-201-70054-9
Perl Medic: Transforming Legacy Code, Peter J Scott, Addison-Wesley, 2004, ISBN: 0-201-79526-4
Perl Testing: A Developer's Notebook, Ian Langworth, chromatic, O'Reilly, 2005, ISBN: 0-59610-092-2
Algorithms and Efficiency
Data Munging with Perl, David Cross, Manning Publications 2001, ISBN: 1-930110-00-6
Effective Perl Programming: Writing Better Programs with Perl, Joseph N Hall, Randal Schwartz, Addison-Wesley 1997, ISBN: 0-201-41975-0
Higher-Order Perl: Transforming Programs with Programs, Mark Jason Dominus, Morgan Kaufmann 2005, ISBN: 1-55860-701-3
Mastering Algorithms with Perl, Jon Orwant, Jarkko Hietaniemi, John Macdonald, O'Reilly 1999, ISBN: 1-56592-398-7
Mastering Regular Expressions, Second Edition, Jeffrey E. F.Friedl, O'Reilly 2002, ISBN: 0-596-00289-0
Object Oriented Perl, Damian Conway, Manning 1999, ISBN: 1-884777-79-1
Perl Cookbook, Second Edition, Tom Christiansen, Nathan Torkington, O'Reilly 2003, ISBN: 0-59600-313-7
Coding Style and Common Mistakes,
The perlstyle manpage
The perltrap manpage
General Coding and Development Practices, Coding Standards
C Style: Standards and Guidelines, David Straker, Prentice Hall 1992, ISBN: 0-13-116898-3
The Elements of Programming Style, 2nd emphasis, Brian W. Kernighan, P. J. Plauger, McGraw-Hill 1978, ISBN: 0-07-034207-5
Development Practices,
The Mythical Man-Month: Essays on Software Engineering, 20th Anniversary Edition, Frederick P Brooks, Addison-Wesley 1995, ISBN: 0-201-83595-9
The Practice of Programming, Brian W.Kernighan, Rob Pike, Addison-Wesley 1999, ISBN: 0-201-61586-X
The Pragmatic Programmer: From Journeyman to Master, Andrew Hunt, David Thomas, Addison-Wesley 1999, ISBN: 0-201-61622-X
Text Editors,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
	

Zurück zu Perl Best Practices


Themen

Buchreihen

Special Interest

International Sites

O'Reilly China O'Reilly USA O'Reilly Japan O'Reilly Taiwan