JETZT ONLINE BESTELLEN
Add to Cart
Real World Haskell

First Edition Dezember 2008
ISBN 978-0-596-51498-3
710 Seiten
EUR41.00

Weitere Informationen zu diesem Buch

Inhaltsverzeichnis | Kolophon | Rezensionen |
Beispiel Code |


Inhaltsverzeichnis

	
Chapter 1: Getting Started
Inhaltsvorschau
As you read the early chapters of this book, keep in mind that we will sometimes ideas in restricted, simplified form. Haskell is a deep language, and presenting every aspect of a given subject all at once is likely to prove overwhelming. As we build a solid foundation in Haskell, we will expand upon these initial explanations.
Haskell is a language with many implementations, two of which are widely used. Hugs is an interpreter that is primarily used for teaching. For real applications, the Glasgow Haskell Compiler (GHC) is much more popular. Compared to Hugs, GHC is more suited to "real work": it compiles to native code, supports parallel execution, and provides useful performance analysis and debugging tools. For these reasons, GHC is the Haskell implementation that we will be using throughout this book.
GHC has three main components:
ghc
An optimizing compiler that generates fast native code
ghci
An interactive interpreter and debugger
runghc
A program for running Haskell programs as scripts, without needing to compile them first
When we discuss the GHC system as a whole, we will refer to it as GHC. If we are talking about a specific command, we will mention ghc, ghci, or runghc by name.
We assume that you’re using at least version 6.8.2 of GHC, which was released in 2007. Many of our examples will work unmodified with older versions. However, we recommend using the newest version available for your platform. If you’re using Windows or Mac OS X, you can get started easily and quickly using a prebuilt installer. To obtain a copy of GHC for these platforms, visit the GHC download page and look for the list of binary packages and installers.
Many Linux distributors and providers of BSD and other Unix variants make custom binary packages of GHC available. Because these are built specifically for each environment, they are much easier to install and use than the generic binary packages that are available from the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Your Haskell Environment
Inhaltsvorschau
Haskell is a language with many implementations, two of which are widely used. Hugs is an interpreter that is primarily used for teaching. For real applications, the Glasgow Haskell Compiler (GHC) is much more popular. Compared to Hugs, GHC is more suited to "real work": it compiles to native code, supports parallel execution, and provides useful performance analysis and debugging tools. For these reasons, GHC is the Haskell implementation that we will be using throughout this book.
GHC has three main components:
ghc
An optimizing compiler that generates fast native code
ghci
An interactive interpreter and debugger
runghc
A program for running Haskell programs as scripts, without needing to compile them first
When we discuss the GHC system as a whole, we will refer to it as GHC. If we are talking about a specific command, we will mention ghc, ghci, or runghc by name.
We assume that you’re using at least version 6.8.2 of GHC, which was released in 2007. Many of our examples will work unmodified with older versions. However, we recommend using the newest version available for your platform. If you’re using Windows or Mac OS X, you can get started easily and quickly using a prebuilt installer. To obtain a copy of GHC for these platforms, visit the GHC download page and look for the list of binary packages and installers.
Many Linux distributors and providers of BSD and other Unix variants make custom binary packages of GHC available. Because these are built specifically for each environment, they are much easier to install and use than the generic binary packages that are available from the GHC download page. You can find a list of distributions that custom build GHC at the GHC page distribution packages.
For more detailed information about how to install GHC on a variety of popular platforms, we’ve provided some instructions in .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Getting Started with ghci, the Interpreter
Inhaltsvorschau
The interactive interpreter for GHC is a program named ghci. It lets us enter and evaluate Haskell expressions, explore modules, and debug our code. If you are familiar with Python or Ruby, ghci is somewhat similar to python and irb, the interactive Python and Ruby interpreters.
We typically cannot copy some code out of a Haskell source file and paste it into ghci. This does not have a significant effect on debugging pieces of code, but it can initially be surprising if you are used to, say, the interactive Python interpreter.
On Unix-like systems, we run ghci as a command in a shell window. On Windows, it’s available via the Start menu. For example, if you install the program using the GHC installer on Windows XP, you should go to All Programs, then GHC; you will see ghci in the list. (See for a screenshot.)
When we run ghci, it displays a startup banner, followed by a prompt. Here, we’re showing version 6.8.3 on a Linux box:
ghci

GHCi, version 6.8.3: http://www.haskell.org/ghc/  :? for help

Loading package base ... linking ... done.

The word in the prompt indicates that Prelude, a standard library of useful functions, is loaded and ready to use. When we load other modules or source files, they will show up in the prompt, too.
If you enter at the ghci prompt, it will print a long help message.
The Prelude moduleis sometimes referred to as "the standard prelude" because its contents are defined by the Haskell 98 standard. Usually, it’s simply shortened to "the prelude."
The prompt displayed by ghci changes frequently depending on what modules we have loaded. It can often grow long enough to leave little visual room on a single line for our input.
For brevity and consistency, we replaced ghci’s default prompts throughout this book with the prompt string .
If you want to do this youself, use ghci’s directive, as :
          

          :set prompt "ghci> "

          
The Prelude is always implicitly available; we don’t need to take any actions to use the types, values, or functions it defines. To use definitions from other modules, we must load them into
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Basic Interaction: Using ghci as a Calculator
Inhaltsvorschau
In addition to providing a convenient interface for testing code fragments, ghci can function as a readily accessible desktop calculator. We can easily express any calculator operation in ghci and, as an added bonus, we can add more complex operations as we become more familiar with Haskell. Even using the interpreter in this simple way can help us to become more comfortable with how Haskell works.
We can immediately start entering expressions, in order to see what ghci will do with them. Basic arithmetic works similarly to languages such as C and Python—we write expressions in infix form, where an operator appears between its operands:
2 + 2

4

31337 * 101

3165037

7.0 / 2.0

3.5
The infix style of writing an expression is just a convenience; we can also write an expression in prefix form, where the operator precedes its arguments. To do this, we must enclose the operator in parentheses:
2 + 2

4

(+) 2 2

4
As these expressions imply, Haskell has a notion of integers and floating-point numbers. Integers can be arbitrarily large. Here, (^) provides integer exponentiation:
313 ^ 15

27112218957718876716220410905036741257

Haskell presents us with one peculiarity in how we must write numbers: it’s often necessary to enclose a negative number in parentheses. This affects us as soon as we move beyond the simplest expressions.
We’ll start by writing a negative number:
-3

-3

The used in the preceding code is a unary operator. In other words, we didn’t write the single number "-3"; we wrote the number "3" and applied the operator to it. The operator is Haskell’s only unary operator, and we cannot mix it with infix operators:
2 + -3



<interactive>:1:0:

    precedence parsing error

        cannot mix `(+)' [infixl 6] and prefix `-' [infixl 6] in the same infix 

                                                              expression

If we want to use the unary minus near an infix operator, we must wrap the expression that it applies to in parentheses:
2 + (-3)

-1

3 + (-(13 * 37))

-478
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Command-Line Editing in ghci
Inhaltsvorschau
On most systems, ghci has some amount of command-line editing ability. In case you are not familiar with command-line editing, it’s a huge time saver. The basics are common to both Unix-like and Windows systems. Pressing the up arrow key on your keyboard recalls the last line of input you entered; pressing up repeatedly cycles through earlier lines of input. You can use the left and right arrow keys to move around inside a line of input. On Unix (but not Windows, unfortunately), the Tab key completes partially entered identifiers.
We’ve barely scratched the surface of command-line editing here. Since you can work more effectively if you’re familiar with the capabilities of your command-line editing system, you might find it useful to do some further reading.
On Unix-like systems, ghci uses the GNU readline library, which is and customizable. On Windows, ghci’s command-line editing are provided by the doskey command.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Lists
Inhaltsvorschau
A list is surrounded by square brackets; the elements are separated by commas:
[1, 2, 3]

[1,2,3]

Some languages permit the last element in a list to be followed by an optional trailing comma before a closing bracket, but Haskell doesn’t allow this. If you leave in a trailing comma (e.g., ), you’ll get a parse error.
A list can be of any length. An empty list is written []:
[]

[]

["foo", "bar", "baz", "quux", "fnord", "xyzzy"]

["foo","bar","baz","quux","fnord","xyzzy"]
All elements of a list must be of the same type. Here, we violate this rule. Our list starts with two Bool values, but ends with a string:
[True, False, "testing"]



<interactive>:1:14:

    Couldn't match expected type `Bool' against inferred type `[Char]'

    In the expression: "testing"

    In the expression: [True, False, "testing"]

    In the definition of `it': it = [True, False, "testing"]

Once again, ghci’s error message is verbose, but it’s simply telling us that there is no way to turn the string into a Boolean value, so the list expression isn’t properly typed.
If we write a series of elements using enumeration notation, Haskell will fill in the contents of the list for us:
[1..10]

[1,2,3,4,5,6,7,8,9,10]

Here, the .. characters denote an enumeration. We can only use this notation for types whose elements we can enumerate. It makes no sense for text strings, for instance—there is not any sensible, general way to enumerate .
By the way, notice that the preceding use of range notation gives us a closed interval; the list contains both endpoints.
When we write an enumeration, we can optionally specify the size of the step to use by providing the first two elements, followed by the value at which to stop generating the enumeration:
[1.0,1.25..2.0]

[1.0,1.25,1.5,1.75,2.0]

[1,4..15]

[1,4,7,10,13]

[10,9..1]

[10,9,8,7,6,5,4,3,2,1]
In the latter case, the list is quite sensibly missing the endpoint of the enumeration, because it isn’t an element of the series we defined.
We can omit the endpoint of an enumeration. If a type doesn’t have a natural
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Strings and Characters
Inhaltsvorschau
If you know a language such as Perl or C, you’ll find Haskell’s notations for strings familiar.
A text string is surrounded by double quotes:
"This is a string."

"This is a string."

As in many languages, we can represent hard-to-see characters by "escaping" them. Haskell’s escape characters and escaping rules follow the widely used conventions by the C language. For example, '\n' denotes a newline character, and '\t' is a tab character. For complete details, see .
putStrLn "Here's a newline -->\n<-- See?"

Here's a newline -->

<-- See?

The putStrLn function prints a string.
Haskell makes a distinction between single characters and text strings. A single character is enclosed in single quotes:
'a'

'a'

In fact, a text string is simply a list of individual characters. Here’s a painful way to write a short string, which ghci gives back to us in a more familiar form:
let a = ['l', 'o', 't', 's', ' ', 'o', 'f', ' ', 'w', 'o', 'r', 'k']a

"lots of work"

a == "lots of work"

True
The empty string is written , and is a synonym for :
"" == []

True

Since a string is a list of characters, we can use the regular list operators to construct new strings:
'a':"bc"

"abc"

"foo" ++ "bar"

"foobar"
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
First Steps with Types
Inhaltsvorschau
While we’ve talked a little about types already, our interactions with ghci have so far been free of much type-related thinking. We haven’t told ghci what types we’ve been using, and it’s mostly been willing to accept our input.
Haskell requires type names to start with an uppercase letter, and variable names must start with a lowercase letter. Bear this in mind as you read on; it makes it much easier to follow the names.
The first thing we can do to start exploring the world of types is to get ghci to tell us more about what it’s doing. ghci has a command, :set, that lets us change a few of its default behaviors. We can tell it to print more type information as follows:
:set +t'c'

'c'

it :: Char

"foo"

"foo"

it :: [Char]
What the does is tell ghci to print the type of an expression after the expression. That cryptic it in the output can be very useful: it’s actually the name of a special variable, in which ghci stores the result of the last expression we evaluated. (This isn’t a Haskell language feature; it’s specific to ghci alone.) Let’s break down the meaning of the last line of ghci output:
  • It tells us about the special variable it.
  • We can read text of the form as meaning “the expression x has the type y.”
  • Here, the expression "it" has the type [Char]. (The name String is often used instead of [Char]. It is simply a synonym for [Char].)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A Simple Program
Inhaltsvorschau
Let’s take a small leap ahead and write a small program that counts the number of lines in its input. Don’t expect to understand this yet—it’s just fun to get our hands dirty. In a text editor, enter the following code into a file, and save it as WC.hs:
-- file: ch01/WC.hs

-- lines beginning with "--" are comments.



main = interact wordCount

    where wordCount input = show (length (lines input)) ++ "\n"
Find or create a text file; let’s call it quux.txt:
cat quux.txt

Teignmouth, England

Paris, France

Ulm, Germany

Auxerre, France

Brunswick, Germany

Beaumont-en-Auge, France

Ryazan, Russia
From a shell or command prompt, run the following command:
runghc WC < quux.txt

7
We have successfully written a simple program that interacts with the real world! In the chapters that follow, we will continue to fill the gaps in our understanding until we can write programs of our own.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 2: Types and Functions
Inhaltsvorschau
Every expression and function in Haskell has a type. For example, the value True has the type Bool, while the value "foo" has the type String. The type of a value indicates that it shares certain properties with other values of the same type. For example, we can add numbers and concatenate lists; these are properties of those types. We say an expression has type X, or is of type X.
Before we launch into a deeper discussion of Haskell’s type system, let’s talk about why we should care about types at all—what are they even for? At the lowest level, a computer is concerned with bytes, with barely any additional structure. What a type system gives us is abstraction. A type adds meaning to plain bytes: it lets us say "these bytes are text," "those bytes are an airline reservation," and so on. Usually, a type system goes beyond this to prevent us from accidentally mixing up types. For example, a type system usually won’t let us treat a hotel reservation as a car rental receipt.
The benefit of introducing abstraction is that it lets us forget or ignore low-level details. If I know that a value in my program is a string, I don’t have to know the intimate details of how strings are implemented. I can just assume that my string is going to behave like all the other strings I’ve worked with.
What makes type systems interesting is that they’re not all equal. In fact, different type systems are often not even concerned with the same kinds of problems. A programming language’s type system deeply colors the way we think and write code in that .
Haskell’s type system allows us to think at a very abstract level, and it permits us to write concise, powerful programs.
There are three interesting aspects to types in Haskell: they are strong, they are static, and they can be automatically inferred. Let’s talk in more detail about each of these ideas. When possible, we’ll present similarities between concepts from Haskell’s type system and related ideas in other languages. We’ll also touch on the respective strengths and weaknesses of each of these properties.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Why Care About Types?
Inhaltsvorschau
Every expression and function in Haskell has a type. For example, the value True has the type Bool, while the value "foo" has the type String. The type of a value indicates that it shares certain properties with other values of the same type. For example, we can add numbers and concatenate lists; these are properties of those types. We say an expression has type X, or is of type X.
Before we launch into a deeper discussion of Haskell’s type system, let’s talk about why we should care about types at all—what are they even for? At the lowest level, a computer is concerned with bytes, with barely any additional structure. What a type system gives us is abstraction. A type adds meaning to plain bytes: it lets us say "these bytes are text," "those bytes are an airline reservation," and so on. Usually, a type system goes beyond this to prevent us from accidentally mixing up types. For example, a type system usually won’t let us treat a hotel reservation as a car rental receipt.
The benefit of introducing abstraction is that it lets us forget or ignore low-level details. If I know that a value in my program is a string, I don’t have to know the intimate details of how strings are implemented. I can just assume that my string is going to behave like all the other strings I’ve worked with.
What makes type systems interesting is that they’re not all equal. In fact, different type systems are often not even concerned with the same kinds of problems. A programming language’s type system deeply colors the way we think and write code in that .
Haskell’s type system allows us to think at a very abstract level, and it permits us to write concise, powerful programs.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Haskell’s Type System
Inhaltsvorschau
There are three interesting aspects to types in Haskell: they are strong, they are static, and they can be automatically inferred. Let’s talk in more detail about each of these ideas. When possible, we’ll present similarities between concepts from Haskell’s type system and related ideas in other languages. We’ll also touch on the respective strengths and weaknesses of each of these properties.
When we say that Haskell has a strong type system, we mean that the type system guarantees that a program cannot contain certain kinds of errors. These errors come from trying to write expressions that don’t make sense, such as using an integer as a function. For instance, if a function expects to work with integers and we pass it a string, a Haskell compiler will reject this.
We call an expression that obeys a language’s type rules well typed. An expression that disobeys the type rules is ill typed, and it will cause a type error.
Another aspect of Haskell’s view of strong typing is that it will not automatically coerce values from one type to another. (Coercion is also known as casting or conversion.) For example, a C compiler will automatically and silently coerce a value of type int into a float on our behalf if a function expects a parameter of type float, but a Haskell compiler will raise a compilation error in a similar situation. We must explicitly coerce types by applying coercion functions.
Strong typing does occasionally make it more difficult to write certain kinds of code. For example, a classic way to write low-level code in the C language is to be given a byte array and cast it to treat the bytes as if they’re really a complicated data structure. This is very efficient, since it doesn’t require us to copy the bytes around. Haskell’s type system does not allow this sort of coercion. In order to get the same structured view of the data, we would need to do some copying, which would cost a little in performance.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
What to Expect from the Type System
Inhaltsvorschau
Our exploration of the major capabilities and benefits of Haskell’s type system will span a number of chapters. Early on, you may find Haskell’s types to be a chore to deal with.
For example, instead of simply writing some code and running it to see if it works as you might expect in Python or Ruby, you’ll first need to make sure that your program passes the scrutiny of the type checker. Why stick with the learning curve?
While strong, static typing makes Haskell safe, type inference makes it concise. The result is potent: we end up with a language that’s safer than popular statically typed languages and often more expressive than dynamically typed languages. This is a strong claim to make, and we will back it up with evidence throughout the book.
Fixing type errors may initially feel like more work than using a dynamic language. It might help to look at this as moving much of your debugging up front. The compiler shows you many of the logical flaws in your code, instead of leaving you to stumble across problems at runtime.
Furthermore, because Haskell can infer the types of your expressions and functions, you gain the benefits of static typing without the added burden of "finger typing" imposed by less powerful statically typed languages. In other languages, the type system serves the needs of the compiler. In Haskell, it serves you. The trade-off is that you have to learn to work within the framework it provides.
We will introduce new uses of Haskell’s types throughout this book to help us write and test practical code. As a result, the complete picture of why the type system is worthwhile will emerge gradually. While each step should justify itself, the whole will end up greater than the sum of its parts.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Some Common Basic Types
Inhaltsvorschau
In , we introduced a few types. Here are several more of the most common base types:
A Char value
Represents a Unicode character.
A Bool value
Represents a value in Boolean logic. The possible values of type Bool are and .
The Int type
Used for signed, fixed-width integer values. The exact range of values represented as Int depends on the system’s longest "native" integer: on a 32-bit machine, an Int is usually 32 bits wide, while on a 64-bit machine, it is usually 64 bits wide. The Haskell standard guarantees only that an Int is wider than 28 bits. (Numeric types exist that are exactly 8, 16, and so on bits wide, in signed and unsigned flavors; we’ll get to those later.)
An Integer value
A signed integer of unbounded size. Integers are not used as often as Ints, because they are more expensive both in performance and space consumption. On the other hand, Integer computations do not silently overflow, so they give more reliably correct answers.
Values of type Double
Used for floating-point numbers. A Double value is typically 64 bits wide and uses the system’s native floating-point representation. (A narrower type, Float, also exists, but its use is discouraged; Haskell compiler writers concentrate more on making Double efficient, so Float is much slower.)
We have already briefly seen Haskell’s notation for types earlier in . When we write a type explicitly, we use the notation to say that has the type MyType. If we omit the and the type that follows, a Haskell compiler will infer the type of the expression:
:type 'a'

'a' :: Char

'a' :: Char

'a'

[1,2,3] :: Int



<interactive>:1:0:

    Couldn't match expected type `Int' against inferred type `[a]'

    In the expression: [1, 2, 3] :: Int

    In the definition of `it': it = [1, 2, 3] :: Int
The combination of and the type after it is called a type signature.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Function Application
Inhaltsvorschau
Now that we’ve had our fill of data types for a while, let’s turn our attention to working with some of the types we’ve seen, using functions.
To apply a function in Haskell, we write the name of the function followed by its arguments:
odd 3

True

odd 6

False
We don’t use parentheses or commas to group or separate the arguments to a function; merely writing the name of the function, followed by each argument in turn, is enough. As an example, let’s apply the compare function, which takes two arguments:
compare 2 3

LT

compare 3 3

EQ

compare 3 2

GT
If you’re used to function call syntax in other languages, this notation can take a little getting used to, but it’s simple and uniform.
Function application has higher precedence than using operators, so the following two expressions have the same meaning:
(compare 2 3) == LT

True

compare 2 3 == LT

True
The parentheses in the preceding code don’t do any harm, but they add some visual noise. Sometimes, however, we must use parentheses to indicate how we want a complicated expression to be parsed:
compare (sqrt 3) (sqrt 6)

LT

This applies compare to the results of applying and , respectively. If we omit the parentheses, it looks like we are trying to pass four arguments to compare, instead of the two it accepts.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Useful Composite Data Types: Lists and Tuples
Inhaltsvorschau
A composite data type is constructed from other types. The most common composite data types in Haskell are lists and tuples.
We’ve already seen the list type mentioned earlier in the , where we found that Haskell represents a text string as a list of Char values, and that the type "list of Char" is written [Char].
The head function returns the first element of a list:
head [1,2,3,4]

1

head ['a','b','c']

'a'
Its counterpart, tail, returns all but the head of a list:
tail [1,2,3,4]

[2,3,4]

tail [2,3,4]

[3,4]

tail [True,False]

[False]

tail "list"

"ist"

tail []

*** Exception: Prelude.tail: empty list
As you can see, we can apply head and tail to lists of different types. Applying head to a [Char] value returns a Char value, while applying it to a [Bool] value returns a Bool value. The head function doesn’t care what type of list it deals with.
Because the values in a list can have any type, we call the list type polymorphic. When we want to write a polymorphic type, we use a type variable, which must begin with a lowercase letter. A type variable is a placeholder, where we’ll eventually substitute a real type.
We can write the type "list of a" by enclosing the type variable in square brackets: [a]. This amounts to saying, "I don’t care what type I have; I can make a list with it."
We can now see why a type name must start with an uppercase letter: it makes it distinct from a type variable, which must start with a case letter.
When we talk about a list with values of a specific type, we substitute that type for our type variable. So, for example, the type [Int] is a list of values of type Int, because we substituted Int for a. Similarly, the type [MyPersonalType] is a list of values of type MyPersonalType. We can perform this substitution recursively, too: [[Int]] is a list of values of type [Int], i.e., a list of lists of Int.
:type [[True],[False,False]]

[[True],[False,False]] :: [[Bool]]

The type of this expression is a list of lists of Bool.
Lists are the bread and butter of Haskell collections. In an imperative language, we might perform a task many times by iterating through a loop. This is something that we often do in Haskell by traversing a list, either by recursing or using a function that recurses for us. Lists are the easiest stepping stone into the idea that we can use data to structure our program and its control flow. We’ll be spending a lot more time discussing lists in .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Functions over Lists and Tuples
Inhaltsvorschau
Our discussion of lists and tuples mentioned how we can construct them but little about how we do anything with them afterwards. So far we have only been introduced to two list functions, head and tail.
Two related list functions, take and drop, take two arguments. Given a number n and a list, take returns the first n elements of the list, while drop returns all but the first n elements of the list. (As these functions take two arguments, notice that we separate each function and its arguments using whitespace.)
take 2 [1,2,3,4,5]

[1,2]

drop 3 [1,2,3,4,5]

[4,5]
For tuples, the fst and snd functions return the first and second element of a pair, respectively:
fst (1,'a')

1

snd (1,'a')

'a'
If your background is in any of a number of other languages, each of these may look like an application of a function to two arguments. Under Haskell’s convention for function application, each one is an application of a function to a single pair.
If you are coming from the Python world, you’ll probably be used to lists and tuples being almost interchangeable. Although the elements of a Python tuple are immutable, it can be indexed and iterated over using the same methods as a list. This isn’t the case in Haskell, so don’t try to carry that idea with you into unfamiliar linguistic territory.
As an illustration, take a look at the type signatures of fst and snd: they’re defined only for pairs and can’t be used with tuples of other sizes. Haskell’s type system makes it tricky to write a generalized "get the second element from any tuple, no matter how wide" function.
In Haskell,function application is left-associative. This is best illustrated by example: the expression is equivalent to . If we want to use one expression as an argument to another, we have to use explicit parentheses to tell the parser what we really mean. Here’s an example:
head (drop 4 "azerty")

't'

We can read this as “pass the expression drop 4 "azerty" as the argument to head.” If we were to leave out the parentheses, the offending expression would be similar to passing three arguments to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Function Types and Purity
Inhaltsvorschau
Let’s take a look at a function’s type:
:type lines

lines :: String -> [String]

We can read the -> as "to," which loosely translates to "returns." The signature as a whole thus reads as "lines has the type String to list-of-String". Let’s try applying the function:
lines "the quick\nbrown fox\njumps"

["the quick","brown fox","jumps"]

The lines function splits a string on line boundaries. Notice that its type signature gives us a hint as to what the function might actually do: it takes one String, and returns many. This is an incredibly valuable property of types in a functional language.
A side effect introduces a dependency between the global state of the system and the behavior of a function. For example, let’s step away from Haskell for a moment and think about an imperative programming language. Consider a function that reads and returns the value of a global variable. If some other code can modify that global variable, then the result of a particular application of our function depends on the current value of the global variable. The function has a side effect, even though it never modifies the variable itself.
Side effects are essentially invisible inputs to, or outputs from, functions. In Haskell, the default is for functions to not have side effects: the result of a function depends only on the inputs that we explicitly provide. We call these functions pure; functions with side effects are impure.
If a function has side effects, we can tell by reading its type signature—the type of the function’s result will begin with IO:
:type readFile

readFile :: FilePath -> IO String

Haskell’s type system prevents us from accidentally mixing pure and impure code.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Haskell Source Files, and Writing Simple Functions
Inhaltsvorschau
Now that we know how to apply functions, it’s time we turned our attention to writing them. While we can write functions in ghci, it’s not a good environment for this. It accepts only a highly restricted subset of Haskell—most importantly, the syntax it uses for defining functions is not the same as we use in a Haskell source file. Instead, we’ll finally break down and create a source file.
Haskell source files are usually identified with a suffix of .hs. A simple function definition is to open up a file named add.hs and add these contents to it:
-- file: ch03/add.hs

add a b = a + b
On the lefthand side of the = is the name of the function, followed by the arguments to the function. On the righthand side is the body of the function. With our source file saved, we can load it into ghci, and use our new add function straightaway (the prompt that ghci displays will change after you load your file):
:load add.hs

[1 of 1] Compiling Main             ( add.hs, interpreted )

Ok, modules loaded: Main.

add 1 2

3
When you run ghci, it may not be able to find your source file. It will search for source files in whatever directory it was run. If this is not the directory that your source file is actually in, you can use ghci’s command to change its working directory:
          

          :cd /tmp
Alternatively, you can provide the path to your Haskell source file as the argument to . This path can be either absolute or relative to ghci’s current directory.
When we apply add to the values 1 and 2, the variables a and b on the lefthand side of our definition are given (or "bound to") the values 1 and 2, so the result is the expression .
Haskell doesn’t have a return keyword, because a function is a single expression, not a sequence of statements. The value of the expression is the result of the function. (Haskell does have a function called return, but we won’t discuss it for a while; it has a different meaning than in imperative languages.)
When you see an = symbol in Haskell code, it represents
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Understanding Evaluation by Example
Inhaltsvorschau
In our description of myDrop, we have so far focused on surface features. We need to go deeper and develop a useful mental model of how function application works. To do this, we’ll first work through a few simple examples, until we can walk through the evaluation of the expression .
We’ve talked a lot about substituting an expression for a variable, and we’ll make use of this capability here. Our procedure will involve rewriting expressions over and over, substituting expressions for variables until we reach a final result. This would be a good time to fetch a pencil and paper, so you can follow our descriptions by trying them yourself.
We will begin by looking at the definition of a simple, nonrecursive function:
-- file: ch02/RoundToEven.hs

isOdd n = mod n 2 == 1
Here, mod is the standard modulo function. The first big step to understanding how evaluation works in Haskell is figuring out the result of evaluating the expression .
Before we explain how evaluation proceeds in Haskell, let us recap the sort of evaluation strategy more familiar languages use. First, evaluate the subexpression , to give . Then apply the odd function with n bound to . Finally, evaluate to give , and to give .
In a language that uses strict evaluation, the arguments to a function are evaluated before the function is applied. Haskell chooses another path: nonstrict evaluation.
In Haskell, the subexpression is not reduced to the value . Instead, we create a "promise" that when the value of the expression is needed, we’ll be able to compute it. The record that we use to track an unevaluated expression is referred to as a thunk. This is all that happens: we create a thunk and defer the actual evaluation until it’s really needed. If the result of this expression is never subsequently used, we will not compute its value at all.
Nonstrict evaluation is often referred to as lazy evaluation.
Let us now look at the evaluation of the expression , where we use
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Polymorphism in Haskell
Inhaltsvorschau
When we introduced lists, we mentioned that the list type is polymorphic. We’ll talk about Haskell’s polymorphism in more detail here.
If we want to fetch the last element of a list, we use the last function. The value that it returns must have the same type as the elements of the list, but last operates in the same way no matter what type those elements actually are:
last [1,2,3,4,5]

5

last "baz"

'z'
To capture this idea, its type signature contains a type variable:
:type last

last :: [a] -> a

Here, a is the type variable. We can read the signature as “takes a list, all of whose elements have some type a, and returns a value of the same type a.”
Type variables always start with a lowercase letter. You can always tell a type variable from a normal variable by context, because the languages of types and functions are separate: type variables live in type signatures, and regular variables live in normal expressions.
It’s common Haskell practice to keep the names of type variables very short. One letter is overwhelmingly common; longer names show up infrequently. Type signatures are usually brief; we gain more in read by keeping names short than we would by making them .
When a function has type variables in its signature, indicating that some of its arguments can be of any type, we call the function polymorphic.
When we want to apply last to, say, a list of Char, the compiler substitutes Char for each a throughout the type signature. This gives us the type of last with an input of [Char] as [Char] -> Char.
This kind of polymorphism is called parametric polymorphism. The choice of naming is easy to understand by analogy: just as a function can have parameters that we can later bind to real values, a Haskell type can have parameters that we can later bind to other types.
If a type contains type parameters, we say that it is a parameterized type, or a polymorphic type. If a function or value’s type contains type , we call it polymorphic.
When we see a parameterized type, we’ve already noted that the code doesn’t care what the actual type is. However, we can make a stronger statement:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Type of a Function of More Than One Argument
Inhaltsvorschau
So far, we haven’t looked much at signatures for functions that take more than one argument. We’ve already used a few such functions; let’s look at the signature of one, take:
:type take

take :: Int -> [a] -> [a]

It’s pretty clear that there’s something going on with an Int and some lists, but why are there two -> symbols in the signature? Haskell groups this chain of arrows from right to left; that is, -> is right-associative. If we introduce parentheses, we can make it clearer how this type signature is interpreted:
-- file: ch02/Take.hs

take :: Int -> ([a] -> [a])
From this, it looks like we ought to read the type signature as a function that takes one argument, an Int, and returns another function. That other function also takes one argument, a list, and returns a list of the same type as its result.
This is correct, but it’s not easy to see what its consequences might be. We’ll return to this topic in , once we’ve spent a bit of time writing functions. For now, we can treat the type following the last as being the function’s return type, and the preceding types to be those of the function’s arguments.
We can now write a type signature for the myDrop function that we defined earlier:
-- file: ch02/myDrop.hs

myDrop :: Int -> [a] -> [a]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Why the Fuss over Purity?
Inhaltsvorschau
Few programming languages go as far as Haskell in insisting that purity should be the default. This choice has profound and valuable consequences.
Because the result of applying a pure function can only depend on its arguments, we can often get a strong hint of what a pure function does by simply reading its name and understanding its type signature. As an example, let’s look at not:
:type not

not :: Bool -> Bool

Even if we don’t know the name of this function, its signature alone limits the possible valid behaviors it could have:
  • Ignore its argument and always return either or .
  • Return its argument unmodified.
  • Negate its argument.
We also know that this function cannot do some things: access files, talk to the network, and tell what time it is.
Purity makes the job of understanding code easier. The behavior of a pure function does not depend on the value of a global variable, or the contents of a database, or the state of a network connection. Pure code is inherently modular: every function is self-contained and has a well-defined interface.
A nonobvious consequence of purity being the default is that working with impure code becomes easier. Haskell encourages a style of programming in which we separate code that must have side effects from code that doesn’t need side effects. In this style, impure code tends to be simple, with the "heavy lifting" performed in pure code.
Much of the risk in software lies in talking to the outside world, be it coping with bad or missing data or handling malicious attacks. Because Haskell’s type system tells us exactly which parts of our code have side effects, we can be appropriately on guard. Because our favored coding style keeps impure code isolated and simple, our "attack surface" is small.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Conclusion
Inhaltsvorschau
In this chapter, we’ve had a whirlwind overview of Haskell’s type system and much of its syntax. We’ve read about the most common types and discovered how to write simple functions. We’ve been introduced to polymorphism, conditional expressions, purity, and lazy evaluation.
This all amounts to a lot of information to absorb. In , we’ll build on this basic knowledge to further enhance our understanding of Haskell.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 3: Defining Types, Streamlining Functions
Inhaltsvorschau
Although lists and tuples are useful, we’ll often want to construct new data types of our own. This allows us to add structure to the values in our programs. Instead of using an anonymous tuple, we can give a collection of related values a name and a distinct type. Defining our own types also improves the type safety of our code: Haskell will not allow us to accidentally mix values of two types that are structurally similar but have different names.
For motivation, we’ll consider a few kinds of data that a small online bookstore might need to manage. We won’t make any attempt at complete or realistic data definitions, but at least we’re tying them to the real world.
We define a new data type using the keyword:
-- file: ch03/BookStore.hs

data BookInfo = Book Int String [String]

                deriving (Show)
BookInfo after the keyword is the name of our new type. We call BookInfo a type constructor. Once we define a type, we will use its type constructor to refer to it. As we’ve already mentioned, a type name, and hence a type constructor, must start with a capital letter.
The that follows is the name of the value constructor (sometimes called a data constructor). We use this to create a value of the type. A value constructor’s name must also start with a capital letter.
After , the , , and that follow are the components of the type. A component serves the same purpose in Haskell as a field in a structure or class would in another language: it’s a "slot" where we keep a value. (We’ll often refer to components as fields.)
In this example, the represents a book’s identifier (e.g., in a stock database), T represents its title, and represents the names of its authors.
To make the link to a concept we’ve already seen, the BookInfo type contains the same components as a 3-tuple of type , but it has a distinct type. We can’t accidentally (or deliberately) use one in a context where the other is expected. For instance, a bookstore is also likely to carry magazines:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Defining a New Data Type
Inhaltsvorschau
Although lists and tuples are useful, we’ll often want to construct new data types of our own. This allows us to add structure to the values in our programs. Instead of using an anonymous tuple, we can give a collection of related values a name and a distinct type. Defining our own types also improves the type safety of our code: Haskell will not allow us to accidentally mix values of two types that are structurally similar but have different names.
For motivation, we’ll consider a few kinds of data that a small online bookstore might need to manage. We won’t make any attempt at complete or realistic data definitions, but at least we’re tying them to the real world.
We define a new data type using the keyword:
-- file: ch03/BookStore.hs

data BookInfo = Book Int String [String]

                deriving (Show)
BookInfo after the keyword is the name of our new type. We call BookInfo a type constructor. Once we define a type, we will use its type constructor to refer to it. As we’ve already mentioned, a type name, and hence a type constructor, must start with a capital letter.
The that follows is the name of the value constructor (sometimes called a data constructor). We use this to create a value of the type. A value constructor’s name must also start with a capital letter.
After , the , , and that follow are the components of the type. A component serves the same purpose in Haskell as a field in a structure or class would in another language: it’s a "slot" where we keep a value. (We’ll often refer to components as fields.)
In this example, the represents a book’s identifier (e.g., in a stock database), T represents its title, and represents the names of its authors.
To make the link to a concept we’ve already seen, the BookInfo type contains the same components as a 3-tuple of type , but it has a distinct type. We can’t accidentally (or deliberately) use one in a context where the other is expected. For instance, a bookstore is also likely to carry magazines:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Type Synonyms
Inhaltsvorschau
We can introduce a synonym for an existing type at any time, in order to give a type a more descriptive name. For example, the in our type doesn’t tell us what the string is for, but we can clarify this:
-- file: ch03/BookStore.hs

type CustomerID = Int

type ReviewBody = String



data BetterReview = BetterReview BookInfo CustomerID ReviewBody
The type keyword introduces a type synonym. The new name is on the left of the , with the existing name on the right. The two names identify the same type, so type synonyms are purely for making code more readable.
We can also use a type synonym to create a shorter name for a verbose type:
-- file: ch03/BookStore.hs

type BookRecord = (BookInfo, BookReview)
This states that we can use as a synonym for the tuple ( ). A type synonym creates only a new name that refers to an existing type. We still use the same value constructors to create a value of the type.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Algebraic Data Types
Inhaltsvorschau
The familiar Bool is the simplest common example of a category of type called an algebraic data type. An algebraic data type can have more than one value constructor:
-- file: ch03/Bool.hs

data Bool = False | True
The Bool type has two value constructors, and . Each value constructor is separated in the definition by a | character, which we can read as "or"—we can construct a Bool that has the value , or the value . When a type has more than one value constructor, they are usually referred to as alternatives or cases. We can use any one of the alternatives to create a value of that type.
Although the phrase "algebraic data type" is long, we’re being careful to avoid using the acronym "ADT," which is already widely understood to stand for "abstract data type." Since Haskell supports both algebraic and abstract data types, we’ll be explicit and avoid the acronym entirely.
Each of an algebraic data type’s value constructors can take zero or more arguments. As an example, here’s one way we might represent billing information:
-- file: ch03/BookStore.hs

type CardHolder = String

type CardNumber = String

type Address = [String]



data BillingInfo = CreditCard CardNumber CardHolder Address

                 | CashOnDelivery

                 | Invoice CustomerID

                   deriving (Show)
Here, we’re saying that we support three ways to bill our customers. If they want to pay by credit card, they must supply a card number, the holder’s name, and the holder’s billing address as arguments to the value constructor. Alternatively, they can pay the person who delivers their shipment. Since we don’t need to store any extra information about this, we specify no arguments for the constructor. Finally, we can send an invoice to the specified customer, in which case, we need her as an argument to the constructor.
When we use a value constructor to create a value of type , we must supply the arguments that it requires:
:type CreditCard

CreditCard :: CardNumber -> CardHolder -> Address -> BillingInfo

CreditCard "2901650221064486" "Thomas Gradgrind" ["Dickens", "England"]

CreditCard "2901650221064486" "Thomas Gradgrind" ["Dickens","England"]

:type it

it :: BillingInfo

Invoice



<interactive>:1:0:

    No instance for (Show (CustomerID -> BillingInfo))

      arising from a use of `print' at <interactive>:1:0-6

    Possible fix:

      add an instance declaration for (Show (CustomerID -> BillingInfo))

    In the expression: print it

    In a stmt of a 'do' expression: print it

:type it

it :: BillingInfo
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Pattern Matching
Inhaltsvorschau
Now that we’ve seen how to construct values with algebraic data types, let’s discuss how we work with these values. If we have a value of some type, there are two things we would like to be able to do:
  • If the type has more than one value constructor, we need to be able to tell which value constructor was used to create the value.
  • If the value constructor has data components, we need to be able to extract those values.
Haskell has a simple, but tremendously useful, pattern matching facility that lets us do both of these things.
A pattern lets us look inside a value and bind variables to the data it contains. Here’s an example of pattern matching in action on a Bool value; we’re going to reproduce the not function:
-- file: ch03/add.hs

myNot True  = False

myNot False = True
It might seem that we have two functions named myNot here, but Haskell lets us define a function as a series of equations: these two clauses are defining the behavior of the same function for different patterns of input. On each line, the patterns are the items following the function name, up until the sign.
To understand how pattern matching works, let’s step through an example—say, .
When we apply myNot, the Haskell runtime checks the value we supply against the value constructor in the first pattern. This does not match, so it tries against the second pattern. That match succeeds, so it uses the righthand side of that equation as the result of the function application.
Here is a slightly more extended example. This function adds together the elements of a list:
-- file: ch03/add.hs

sumList (x:xs) = x + sumList xs

sumList []     = 0
Let us step through the evaluation of . The list notation is shorthand for the expression . We begin by trying to match the pattern in the first equation of the definition of . In the pattern, the : is the familiar list constructor, (:). We are now using it to match against a value, not to construct one. The value was constructed with , so the constructor in the value matches the constructor in the pattern. We say that the pattern
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Record Syntax
Inhaltsvorschau
Writing accessor functions for each of a data type’s components can be repetitive and tedious:
-- file: ch03/BookStore.hs

nicerID      (Book id _     _      ) = id

nicerTitle   (Book _  title _      ) = title

nicerAuthors (Book _  _     authors) = authors
We call this kind of code boilerplate—necessary,but bulky and irksome. Haskell programmers don’t like boilerplate. Fortunately, the language addresses this particular boilerplate problem: we can define a data type, and accessors for each of its components, simultaneously. (The positions of the commas here is a matter of preference. If you like, put them at the end of a line instead of the beginning.)
-- file: ch03/BookStore.hs

data Customer = Customer {

      customerID      :: CustomerID

    , customerName    :: String

    , customerAddress :: Address

    } deriving (Show)
This is almost exactly identical in meaning to the following, more familiar form:
-- file: ch03/AltCustomer.hs

data Customer = Customer Int String [String]

                deriving (Show)



customerID :: Customer -> Int

customerID (Customer id _ _) = id



customerName :: Customer -> String

customerName (Customer _ name _) = name



customerAddress :: Customer -> [String]

customerAddress (Customer _ _ address) = address
For each of the fields that we name in our type definition, Haskell creates an accessor function of that name:
:type customerID

customerID :: Customer -> CustomerID

We can still use the usual application syntax to create a value of this type:
-- file: ch03/BookStore.hs

customer1 = Customer 271828 "J.R. Hacker"

            ["255 Syntax Ct",

             "Milpitas, CA 95134",

             "USA"]
Record syntax adds a more verbose notation for creating a value. This can sometimes make code more readable:
-- file: ch03/BookStore.hs

customer2 = Customer {

              customerID = 271828

            , customerAddress = ["1048576 Disk Drive",

                                 "Milpitas, CA 95134",

                                 "USA"]

            , customerName = "Jane Q. Citizen"

            }
If we use this form, we can vary the order in which we list fields. Here, we moved the name and address fields from their positions in the declaration of the type.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parameterized Types
Inhaltsvorschau
We’ve repeatedly mentioned that the list type is polymorphic: the elements of a list can be of any type. We can also add polymorphism to our own types. To do this, we type variables into a type declaration. The Prelude defines a type named Maybe, which we can use to represent a value that could be either present or missing, for example, a field in a database row that could be null:
-- file: ch03/Nullable.hs

data Maybe a = Just a

             | Nothing
Here, the variable a is not a regular variable—it’s a type variable. It indicates that the Maybe type takes another type as its parameter. This lets us use Maybe on values of any type:
-- file: ch03/Nullable.hs

someBool = Just True



someString = Just "something"
As usual, we can experiment with this type in ghci:
Just 1.5

Just 1.5

Nothing

Nothing

:type Just "invisible bike"

Just "invisible bike" :: Maybe [Char]
Maybe is a polymorphic, or generic, type. We give the type constructor a parameter to create a specific type, such as or . As we might expect, these types are distinct.
We can nest uses of parameterized types inside each other, but when we do, we may need to use parentheses to tell the Haskell compiler how to parse our expression:
-- file: ch03/Nullable.hs

wrapped = Just (Just "wrapped")
To once again extend an analogy to more familiar languages, parameterized types bear some resemblance to templates in C++ and to generics in Java. Just be aware that this is a shallow analogy. Templates and generics were added to their respective languages long after the languages were initially defined, and they have an awkward feel. Haskell’s parameterized types are simpler and easier to use, as the language was designed with them from the beginning.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Recursive Types
Inhaltsvorschau
The familiar list type is recursive: it’s defined in terms of itself. To understand this, let’s create our own list-like type. We’ll use in place of the (:) constructor, and in place of :
-- file: ch03/ListADT.hs

data List a = Cons a (List a)

            | Nil

              deriving (Show)
Because appears on both the left and the right of the sign, the type’s definition refers to itself. If we want to use the constructor to create a new value, we must supply one value of type a and another of type List a. Let’s see where this leads us in practice.
The simplest value of type that we can create is . Save the type definition in a file, and then load it into ghci:
Nil

Nil

Because has a List type, we can use it as a parameter to :
Cons 0 Nil

Cons 0 Nil

And because has the type List a, we can use this as a parameter to :
Cons 1 it

Cons 1 (Cons 0 Nil)

Cons 2 it

Cons 2 (Cons 1 (Cons 0 Nil))

Cons 3 it

Cons 3 (Cons 2 (Cons 1 (Cons 0 Nil)))
We could continue in this fashion indefinitely, creating ever-longer chains, each with a single at the end.
For a third example of what a recursive type is, here is a definition of a binary tree type:
-- file: ch03/Tree.hs

data Tree a = Node a (Tree a) (Tree a)

            | Empty

              deriving (Show)
A binary tree is either a node with two children—which are themselves binary trees—or an empty value.
We can easily prove to ourselves that our type has the same shape as the built-in list type [a]. To do this, we write a function that takes any value of type and produces a value of type :
-- file: ch03/ListADT.hs

fromList (x:xs) = Cons x (fromList xs)

fromList []     = Nil
By inspection, this clearly substitutes a for every (:) and a for each . This covers both of the built-in list type’s constructors. The two types are isomorphic—they have the same shape:
fromList "durian"

Cons 'd' (Cons 'u' (Cons 'r' (Cons 'i' (Cons 'a' (Cons 'n' Nil)))))

fromList [Just True, Nothing, Just False]

Cons (Just True) (Cons Nothing (Cons (Just False) Nil))
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reporting Errors
Inhaltsvorschau
Haskell provides a standard function, error :: String -> a, that we can call when something has gone terribly wrong in our code. We give it a string parameter, which is the error message to display. Its type signature looks peculiar: how can it produce a value of any type a given only a string?
It has a result type of a so that we can call it anywhere and it will always have the right type. However, it does not return a value like a normal function. Instead, it immediately aborts evaluation and prints the error message we give it.
The mySecond function returns the second element of its input list but fails if its input list isn’t long enough:
-- file: ch03/MySecond.hs

mySecond :: [a] -> a



mySecond xs = if null (tail xs)

              then error "list too short"

              else head (tail xs)
As usual, we can see how this works in practice in ghci:
mySecond "xi"

'i'

mySecond [2]

*** Exception: list too short

head (mySecond [[9]])

*** Exception: list too short
Notice the third case, where we try to use the result of the call to mySecond as the argument to another function. Evaluation still terminates and drops us back to the ghci prompt. This is the major weakness of using error: it doesn’t let our caller distinguish between a recoverable error and a problem so severe that it really should terminate our program.
As we have already seen, a pattern matching failure causes a similar unrecoverable error:
mySecond []

*** Exception: Prelude.tail: empty list

We can use the type to represent the possibility of an error.
If we want to indicate that an operation has failed, we can use the Nothing constructor. Otherwise, we wrap our value with the Just constructor.
Let’s see how our mySecond function changes if we return a value instead of calling error:
-- file: ch03/MySecond.hs

safeSecond :: [a] -> Maybe a



safeSecond [] = Nothing

safeSecond xs = if null (tail xs)

                then Nothing

                else Just (head (tail xs))
If the list we’re passed is too short, we return to our caller. This lets them decide what to do, while a call to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introducing Local Variables
Inhaltsvorschau
Within the body of a function, we can introduce new local variables whenever we need them, using a let expression. Here is a simple function that determines whether we should lend some money to a customer. We meet a money reserve of at least 100, and we return our new balance after subtracting the amount we have loaned:
-- file: ch03/Lending.hs

lend amount balance = let reserve    = 100

                          newBalance = balance - amount

                      in if balance < reserve

                         then Nothing

                         else Just newBalance
The keywords to look out for here are let, which starts a block of variable declarations, and , which ends it. Each line introduces a new variable. The name is on the left of the =, and the expression to which it is bound is on the right.
Let us reemphasize our wording: a name in a let block is bound to an expression, not to a value. Because Haskell is a lazy language, the expression associated with a name won’t actually be evaluated until it’s needed. In the previous example, we could not compute the value of newBalance if we did not meet our reserve.
When we define a variable in a let block, we refer to it as a let-bound variable. This simply means what it says: we have bound the variable in a let block.
Also, our use of whitespace here is important. We’ll talk in more detail about the layout rules later in this chapter in .
We can use the names of a variable in a let block both within the block of declarations and in the expression that follows the keyword.
In general, we’ll refer to the places within our code where we can use a name as the name’s scope. If we can use a name, it’s in scope; otherwise, it’s out of scope. If a name is visible throughout a source file, we say it’s at the top level.
We can "nest" multiple let blocks inside each other in an expression:
-- file: ch03/NestedLets.hs

foo = let a = 1

      in let b = 2

         in a + b
It’s perfectly legal, but not exactly wise, to repeat a variable name in a nested
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Offside Rule and Whitespace in an Expression
Inhaltsvorschau
In our definitions of lend and lend2, the left margin of our text wandered around quite a bit. This was not an accident; in Haskell, whitespace has meaning.
Haskell uses indentation as a cue to parse sections of code. This use of layout to convey structure is sometimes called the offside rule. At the beginning of a source file, the first top-level declaration or definition can start in any column, and the Haskell compiler or interpreter remembers that indentation level. Every subsequent top-level declaration must have the same indentation.
Here’s an illustration of the top-level indentation rule; our first file, GoodIndent.hs, is well-behaved:
-- file: ch03/GoodIndent.hs

-- This is the leftmost column.



  -- It's fine for top-level declarations to start in any column...

  firstGoodIndentation = 1



  -- ...provided all subsequent declarations do, too!

  secondGoodIndentation = 2
Our second, BadIndent.hs, doesn’t play by the rules:
-- file: ch03/BadIndent.hs

-- This is the leftmost column.



    -- Our first declaration is in column 4.

    firstBadIndentation = 1



  -- Our second is left of the first, which is illegal!

  secondBadIndentation = 2
Here’s what happens when we try to load the two files into ghci:
:load GoodIndent.hs

[1 of 1] Compiling Main             ( GoodIndent.hs, interpreted )

Ok, modules loaded: Main.

:load BadIndent.hs

[1 of 1] Compiling Main             ( BadIndent.hs, interpreted )



BadIndent.hs:8:2: parse error on input `secondBadIndentation'

Failed, modules loaded: none.
An empty following line is treated as a continuation of the current item, as is a following line indented further to the right.
The rules for let expressions and where clauses are similar. After a let or where keyword, the Haskell compiler or interpreter remembers the indentation of the next token it sees. If the line that follows is empty, or its indentation is further to the right, it is considered as a continuation of the previous line. If the indentation is the same as the start of the preceding item, it is treated as beginning a new item in the same block:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The case Expression
Inhaltsvorschau
Function definitions are not the only place where we can use pattern matching. The case construct lets us match patterns within an expression. Here’s what it looks like. This function (defined for us in ) unwraps a value, using a default if the value is :
-- file: ch03/Guard.hs

fromMaybe defval wrapped =

    case wrapped of

      Nothing     -> defval

      Just value  -> value
The keyword is followed by an arbitrary expression; the pattern match is performed against the result of this expression. The keyword signifies the end of the expression and the beginning of the block of patterns and expressions.
Each item in the block consists of a pattern, followed by an arrow (, followed by an expression to evaluate whether that pattern matches. These expressions must all have the same type. The result of the expression is the result of the expression associated with the first pattern to match. Matches are attempted from top to bottom.
To express "here’s the expression to evaluate if none of the other patterns matches," we just use the wild card pattern as the last in our list of patterns. If a pattern match fails, we will get the same kind of runtime error that we saw earlier.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Common Beginner Mistakes with Patterns
Inhaltsvorschau
There are a few ways in which new Haskell programmers can misunderstand or misuse patterns. The following are some attempts at pattern matching gone awry. Depending on what you expect one of these examples to do, there may be some surprises.
Take a look at the following code:
-- file: ch03/BogusPattern.hs

data Fruit = Apple | Orange



apple = "apple"



orange = "orange"        



whichFruit :: String -> Fruit



whichFruit f = case f of

                 apple  -> Apple

                 orange -> Orange
A naive glance suggests that this code is trying to check the value f to see whether it matches the value apple or orange.
It is easier to spot the mistake if we rewrite the code in an equational style:
-- file: ch03/BogusPattern.hs

equational apple = Apple

equational orange = Orange
Now can you see the problem? Here, it is more obvious apple does not refer to the top-level value named apple—it is a local pattern variable.
We refer to a pattern that always succeeds as irrefutable. Plain variable names and the wild card (underscore) are examples of irrefutable .
Here’s a corrected version of this function:
-- file: ch03/BogusPattern.hs

betterFruit f = case f of

                  "apple"  -> Apple

                  "orange" -> Orange
We fixed the problem by matching against the literal values and .
What if we want to compare the values stored in two nodes of type , and then return one of them if they’re equal? Here’s an attempt:
-- file: ch03/BadTree.hs

bad_nodesAreSame (Node a _ _) (Node a _ _) = Just a

bad_nodesAreSame _            _            = Nothing
A name can appear only once in a set of pattern bindings. We cannot place a variable in multiple positions to express the notion "this value and that should be identical." Instead, we’ll solve this problem using guards, another invaluable Haskell feature.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Conditional Evaluation with Guards
Inhaltsvorschau
Pattern matching limits us to performing fixed tests of a value’s shape. Although this is useful, we will often want to make a more expressive check before evaluating a function’s body. Haskell provides a feature called guards that give us this ability. We’ll introduce the idea with a modification of the function we wrote to compare two nodes of a tree:
-- file: ch03/BadTree.hs

nodesAreSame (Node a _ _) (Node b _ _)

    | a == b     = Just a

nodesAreSame _ _ = Nothing
In this example, we use pattern matching to ensure that we are looking at values of the right shape, and a guard to compare pieces of them.
A pattern can be followed by zero or more guards, each an expression of type . A guard is introduced by a symbol. This is followed by the guard expression, then an symbol (or if we’re in a case expression), then the body to use if the guard expression evaluates to . If a pattern matches, each guard associated with that pattern is evaluated in the order in which they are written. If a guard succeeds, the body affiliated with it is used as the result of the function. If no guard succeeds, pattern matching moves on to the next pattern.
When a guard expression is evaluated, all of the variables mentioned in the pattern with which it is associated are bound and can be used.
Here is a reworked version of our lend function that uses guard:
-- file: ch03/Lending.hs

lend3 amount balance

     | amount <= 0            = Nothing

     | amount > reserve * 0.5 = Nothing

     | otherwise              = Just newBalance

    where reserve    = 100

          newBalance = balance - amount
The special-looking guard expression otherwise is simply a variable bound to the value that aids readability.
We can use guards anywhere that we can use patterns. Writing a function as a series of equations using pattern matching and guards can make it much clearer. Remember the myDrop function we defined in ?
-- file: ch02/myDrop.hs

myDrop n xs = if n <= 0 || null xs

              then xs

              else myDrop (n - 1) (tail xs)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 4: Functional Programming
Inhaltsvorschau
Our early learning of Haskell has two distinct obstacles. The first is coming to terms with the shift in mindset from imperative programming to functional: we have to replace our programming habits from other languages. We do this not because imperative techniques are bad, but because in a functional language other techniques work better.
Our second challenge is learning our way around the standard Haskell libraries. As in any language, the libraries act as a lever, enabling us to multiply our problem-solving ability. Haskell libraries tend to operate at a higher level of abstraction than those in many other languages. We’ll need to work a little harder to learn to use the libraries, but in exchange they offer a lot of power.
In this chapter, we’ll introduce a number of common functional programming techniques. We’ll draw upon examples from imperative languages in order to highlight the shift in thinking that we’ll need to make. As we do so, we’ll walk through some of the fundamentals of Haskell’s standard libraries. We’ll also intermittently cover a few more language features along the way.
In most of this chapter, we will concern ourselves with code that has no interaction with the outside world. To maintain our focus on practical code, we will begin by developing a gateway between our "pure" code and the outside world. Our framework simply reads the contents of one file, applies a function to the file, and writes the result to another file:
-- file: ch04/InteractWith.hs

-- Save this in a source file, e.g., Interact.hs



import System.Environment (getArgs)



interactWith function inputFile outputFile = do

  input <- readFile inputFile

  writeFile outputFile (function input)



main = mainWith myFunction

  where mainWith function = do

          args <- getArgs

          case args of

            [input,output] -> interactWith function input output

            _ -> putStrLn "error: exactly two arguments needed"



        -- replace "id" with the name of our function below

        myFunction = id
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Thinking in Haskell
Inhaltsvorschau
Our early learning of Haskell has two distinct obstacles. The first is coming to terms with the shift in mindset from imperative programming to functional: we have to replace our programming habits from other languages. We do this not because imperative techniques are bad, but because in a functional language other techniques work better.
Our second challenge is learning our way around the standard Haskell libraries. As in any language, the libraries act as a lever, enabling us to multiply our problem-solving ability. Haskell libraries tend to operate at a higher level of abstraction than those in many other languages. We’ll need to work a little harder to learn to use the libraries, but in exchange they offer a lot of power.
In this chapter, we’ll introduce a number of common functional programming techniques. We’ll draw upon examples from imperative languages in order to highlight the shift in thinking that we’ll need to make. As we do so, we’ll walk through some of the fundamentals of Haskell’s standard libraries. We’ll also intermittently cover a few more language features along the way.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A Simple Command-Line Framework
Inhaltsvorschau
In most of this chapter, we will concern ourselves with code that has no interaction with the outside world. To maintain our focus on practical code, we will begin by developing a gateway between our "pure" code and the outside world. Our framework simply reads the contents of one file, applies a function to the file, and writes the result to another file:
-- file: ch04/InteractWith.hs

-- Save this in a source file, e.g., Interact.hs



import System.Environment (getArgs)



interactWith function inputFile outputFile = do

  input <- readFile inputFile

  writeFile outputFile (function input)



main = mainWith myFunction

  where mainWith function = do

          args <- getArgs

          case args of

            [input,output] -> interactWith function input output

            _ -> putStrLn "error: exactly two arguments needed"



        -- replace "id" with the name of our function below

        myFunction = id
This is all we need to write simple, but complete, file-processing programs. This is a complete program, and we can compile it to an executable named InteractWith as follows:
ghc --make InteractWith

[1 of 1] Compiling Main             ( InteractWith.hs, InteractWith.o )

Linking InteractWith ...
If we run this program from the shell or command prompt, it will accept two filenames, the name of a file to read, and the name of a file to write:
./Interact

error: exactly two arguments needed

 ./Interact hello-in.txt hello-out.txt

 cat hello-in.txt

hello world

 cat hello-out.txt

hello world
Some of the notation in our source file is new. The do keyword introduces a block of actions that can cause effects in the real world, such as reading or writing a file. The operator is the equivalent of assignment inside a do block. This is enough explanation to get us started. We will talk in much more depth about these details of notation, and I/O in general, in .
When we want to test a function that cannot talk to the outside world, we simply replace the name id in the preceding code with the name of the function we want to test. Whatever our function does, it will need to have the type
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Warming Up: Portably Splitting Lines of Text
Inhaltsvorschau
Haskell provides a built-in function, lines, that lets us split a text string on line boundaries. It returns a list of strings with line termination characters omitted:
:type lines

lines :: String -> [String]

lines "line 1\nline 2"

["line 1","line 2"]

lines "foo\n\nbar\n"

["foo","","bar"]
While lines looks useful, it relies on us reading a file in "text mode" in order to work. Text mode is a feature common to many programming languages; it provides a special behavior when we read and write files on Windows. When we read a file in text mode, the file I/O library translates the line-ending sequence (carriage return followed by newline) to (newline alone), and it does the reverse when we write a file. On Unix-like systems, text mode does not perform any translation. As a result of this difference, if we read a file on one platform that was written on the other, the line endings are likely to become a mess. (Both readFile and writeFile operate in text mode.)
lines "a\r\nb"

["a\r","b"]

The lines function splits only on newline characters, leaving carriage returns dangling at the ends of lines. If we read a Windows-generated text file on a Linux or Unix box, we’ll get trailing carriage returns at the end of each line.
We have comfortably used Python’s "universal newline" support for years; this transparently handles Unix and Windows line-ending conventions for us. We would like to provide something similar in Haskell.
Since we are still early in our career of reading Haskell code, we will discuss our Haskell implementation in some detail:
-- file: ch04/SplitLines.hs

splitLines :: String -> [String]
Our function’s type signature indicates that it accepts a single string, the contents of a file with some unknown line-ending convention. It returns a list of strings, representing each line from the file:
-- file: ch04/SplitLines.hs

splitLines [] = []

splitLines cs =

    let (pre, suf) = break isLineTerminator cs

    in  pre : case suf of 

                ('\r':'\n':rest) -> splitLines rest

                ('\r':rest)      -> splitLines rest

                ('\n':rest)      -> splitLines rest

                _                -> []



isLineTerminator c = c == '\r' || c == '\n'
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Infix Functions
Inhaltsvorschau
Usually, when we define or apply a function in Haskell, we write the name of the function, followed by its arguments. This notation is referred to as prefix, because the name of the function comes before its arguments.
If a function or constructor takes two or more arguments, we have the option of using it in infix form, where we place it between its first and second arguments. This allows us to use functions as infix operators.
To define or apply a function or value constructor using infix notation, we enclose its name in backtick characters (sometimes known as backquotes). Here are simple infix definitions of a function and a type:
-- file: ch04/Plus.hs

a `plus` b = a + b



data a `Pair` b = a `Pair` b

                  deriving (Show)



-- we can use the constructor either prefix or infix

foo = Pair 1 2

bar = True `Pair` "quux"
Since infix notation is purely a syntactic convenience, it does not change a function’s behavior:
1 `plus` 2

3

plus 1 2

3

True `Pair` "something"

True `Pair` "something"

Pair True "something"

True `Pair` "something"
Infix notation can often help readability. For instance, the Prelude defines a function, elem, that indicates whether a value is present in a list. If we employ elem using prefix notation, it is fairly easy to read:
elem 'a' "camogie"

True

If we switch to infix notation, the code becomes even easier to understand. It is now clear that we’re checking to see if the value on the left is present in the list on the right:
3 `elem` [1,2,4,8]

False

We see a more pronounced improvement with some useful functions from the module. The isPrefixOf function tells us if one list matches the beginning of another:
:module +Data.List"foo" `isPrefixOf` "foobar"

True
The isInfixOf and isSuffixOf functions match anywhere in a list and at its end, :
"needle" `isInfixOf` "haystack full of needle thingies"

True

"end" `isSuffixOf` "the end"

True
There is no hard-and-fast rule that dictates when you ought to use infix versus prefix notation, although prefix notation is far more common. It’s best to choose whichever makes your code more readable in a specific situation.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Working with Lists
Inhaltsvorschau
As the bread and butter of functional programming, lists deserve some serious attention. The standard Prelude defines dozens of functions for dealing with lists. Many of these will be indispensable tools, so it’s important that we learn them early on.
For better or worse, this section is going to read a bit like a laundry list of functions. Why present so many functions at once? Because they are both easy to learn and absolutely ubiquitous. If we don’t have this toolbox at our fingertips, we’ll end up wasting time by reinventing simple functions that are already present in the standard libraries. So bear with us as we go through the list; the effort you’ll save will be huge.
The module is the "real" logical home of all standard list functions. The Prelude merely re-exports a large subset of the functions exported by . Several useful functions in are not re-exported by the standard Prelude. As we walk through list functions in the sections that follow, we will explicitly mention those that are only in :
        

        :module +Data.List
Because none of these functions is complex or takes more than about three lines of Haskell to write, we’ll be brief in our descriptions of each. In fact, a quick and useful learning exercise is to write a definition of each function after you’ve read about it.
The length function tells us how many elements are in a list:
:type length

length :: [a] -> Int

length []

0

length [1,2,3]

3

length "strings are lists, too"

22
If you need to determine whether a list is empty, use the null function:
:type null

null :: [a] -> Bool

null []

True

null "plugh"

False
To access the first element of a list, use the head function:
:type head

head :: [a] -> a

head [1,2,3]

1
The converse, tail, returns all but the head of a list:
:type tail

tail :: [a] -> [a]

tail "foo"

"oo"
Another function, last, returns the very last element of a list:
:type last

last :: [a] -> a

last "bar"

'r'
The converse of last is init, which returns a list of all but the last element of its input:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
How to Think About Loops
Inhaltsvorschau
Unlike traditional languages, Haskell has neither a loop nor a loop. If we’ve got a lot of data to process, what do we use instead? There are several possible answers to this question.
A straightforward way to make the jump from a language that has loops to one that doesn’t is to run through a few examples, looking at the differences. Here’s a C function that takes a string of decimal digits and turns them into an integer:
int as_int(char *str)

{

    int acc; /* accumulate the partial result */



    for (acc = 0; isdigit(*str); str++) {

	acc = acc * 10 + (*str - '0');

    }



    return acc;

}
Given that Haskell doesn’t have any looping constructs, how should we think about representing a fairly straightforward piece of code such as this?
We don’t have to start off by writing a type signature, but it helps to remind us of what we’re working with:
-- file: ch04/IntParse.hs

import Data.Char (digitToInt) -- we'll need ord shortly



asInt :: String -> Int
The C code computes the result incrementally as it traverses the string; the Haskell code can do the same. However, in Haskell, we can express the equivalent of a loop as a function. We’ll call ours loop just to keep things nice and explicit:
-- file: ch04/IntParse.hs

loop :: Int -> String -> Int



asInt xs = loop 0 xs
That first parameter to loop is the accumulator variable we’ll be using. Passing zero into it is equivalent to initializing the acc variable in C at the beginning of the loop.
Rather than leap into blazing code, let’s think about the data we have to work with. Our familiar is just a synonym for , a list of characters. The easiest way for us to get the traversal right is to think about the structure of a list: it’s either empty or a single element followed by the rest of the list.
We can express this structural thinking directly by pattern matching on the list type’s constructors. It’s often handy to think about the easy cases first; here, that means we will consider the empty list case:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Anonymous (lambda) Functions
Inhaltsvorschau
In many of the function definitions we’ve seen so far, we’ve written short helper :
-- file: ch04/Partial.hs

isInAny needle haystack = any inSequence haystack

    where inSequence s = needle `isInfixOf` s
Haskell lets us write completely anonymous functions, which we can use to avoid the need to give names to our helper functions. Anonymous functions are often called "lambda" functions, in a nod to their heritage in lambda calculus. We introduce an anonymous function with a backslash character () pronounced lambda. This is followed by the function’s arguments (which can include patterns), and then an arrow ( to introduce the function’s body.
Lambdas are most easily illustrated by example. Here’s a rewrite of isInAny using an anonymous function:
-- file: ch04/Partial.hs

isInAny2 needle haystack = any (\s -> needle `isInfixOf` s) haystack
We’ve wrapped the lambda in parentheses here so that Haskell can tell where the function body ends.
In every respect, anonymous functions behave identically to functions that have names, but Haskell places a few important restrictions on how we can define them. Most importantly, while we can write a normal function using multiple clauses containing different patterns and guards, a lambda can have only a single clause in its definition.
The limitation to a single clause restricts how we can use patterns in the definition of a lambda. We’ll usually write a normal function with several clauses to cover different pattern matching possibilities:
-- file: ch04/Lambda.hs

safeHead (x:_) = Just x

safeHead _ = Nothing
But as we can’t write multiple clauses to define a lambda, we must be certain that any patterns we use will match:
-- file: ch04/Lambda.hs

unsafeHead = \(x:_) -> x
This definition of unsafeHead will explode in our faces if we call it with a value on which pattern matching fails:
:type unsafeHead

unsafeHead :: [t] -> t

unsafeHead [1]

1

unsafeHead []

*** Exception: Lambda.hs:7:13-23: Non-exhaustive patterns in lambda
The definition typechecks, so it will compile, and the error will occur at runtime. The moral of this story is to be careful in how you use patterns when defining an anonymous function: make sure your patterns can’t fail!
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Partial Function Application and Currying
Inhaltsvorschau
You may wonder why the arrow is used for what seems to be two purposes in the type signature of a function:
:type dropWhile

dropWhile :: (a -> Bool) -> [a] -> [a]

It looks like the is separating the arguments to dropWhile from each other, but that it also separates the arguments from the return type. Iin fact -> has only one meaning: it denotes a function that takes an argument of the type on the left and returns a value of the type on the right.
The implication here is very important. In Haskell, all functions take only one argument. While dropWhile looks like a function that takes two arguments, it is actually a function of one argument, which returns a function that takes one argument. Here’s a perfectly valid Haskell expression:
:module +Data.Char:type dropWhile isSpace

dropWhile isSpace :: [Char] -> [Char]
Well, that looks useful. The value is a function that strips leading whitespace from a string. How is this useful? As one example, we can use it as an argument to a higher order function:
map (dropWhile isSpace) [" a","f","   e"]

["a","f","e"]

Every time we supply an argument to a function, we can "chop" an element off the front of its type signature. Let’s take zip3 as an example to see what we mean; this is a function that zips three lists into a list of three-tuples:
:type zip3

zip3 :: [a] -> [b] -> [c] -> [(a, b, c)]

zip3 "foo" "bar" "quux"

[('f','b','q'),('o','a','u'),('o','r','u')]
If we apply zip3 with just one argument, we get a function that accepts two arguments. No matter what arguments we supply to this compound function, its first argument will always be the fixed value we specified:
:type zip3 "foo"

zip3 "foo" :: [b] -> [c] -> [(Char, b, c)]

let zip3foo = zip3 "foo"

:type zip3foo

zip3foo :: [b] -> [c] -> [(Char, b, c)]

(zip3 "foo") "aaa" "bbb"

[('f','a','b'),('o','a','b'),('o','a','b')]

zip3foo "aaa" "bbb"

[('f','a','b'),('o','a','b'),('o','a','b')]

zip3foo [1,2,3] [True,False,True]

[('f',1,True),('o',2,False),('o',3,True)]
When we pass fewer arguments to a function than the function can accept, we call it
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
As-patterns
Inhaltsvorschau
Haskell’s tails function, in the module, generalizes the tail function we introduced earlier. Instead of returning one "tail" of a list, it returns all of them:
:m +Data.Listtail "foobar"

"oobar"

tail (tail "foobar")

"obar"

tails "foobar"

["foobar","oobar","obar","bar","ar","r",""]
Each of these strings is a suffix of the initial string, so tails produces a list of all suffixes, plus an extra empty list at the end. It always produces that extra empty list, even when its input list is empty:
tails []

[[]]

What if we want a function that behaves like tails but only returns the nonempty suffixes? One possibility would be for us to write our own version by hand. We’ll use a new piece of notation, the symbol:
-- file: ch04/SuffixTree.hs

suffixes :: [a] -> [[a]]

suffixes xs@(_:xs') = xs : suffixes xs'

suffixes _ = []
The pattern is called an as-pattern, and it means "bind the variable xs to the value that matches the right side of the symbol."
In our example, if the pattern after the @ matches, xs will be bound to the entire list that matched, and xs' will be bound to all but the head of the list (we used the wild card pattern to indicate that we’re not interested in the value of the head of the list):
tails "foo"

["foo","oo","o",""]

suffixes "foo"

["foo","oo","o"]
The as-pattern makes our code more readable. To see how it helps, let us compare a definition that lacks an as-pattern:
-- file: ch04/SuffixTree.hs

noAsPattern :: [a] -> [[a]]

noAsPattern (x:xs) = (x:xs) : noAsPattern xs

noAsPattern _ = []
Here, the list that we’ve deconstructed in the pattern match just gets put right back together in the body of the function.
As-patterns have a more practical use than simple readability: they can help us to share data instead of copying it. In our definition of noAsPattern, when we match , we construct a new copy of it in the body of our function. This causes us to allocate a new list node at runtime. That may be cheap, but it isn’t free. In contrast, when we defined suffixes
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Code Reuse Through Composition
Inhaltsvorschau
It seems a shame to introduce a new function, suffixes, that does almost the same thing as the existing tails function. Surely we can do better?
Recall the init function we introduced in —it returns all but the last element of a list:
-- file: ch04/SuffixTree.hs

suffixes2 xs = init (tails xs)
This suffixes2 function behaves identically to suffixes, but it’s a single line of code:
suffixes2 "foo"

["foo","oo","o"]

If we take a step back, we see the glimmer of a pattern. We’re applying a function, then applying another function to its result. Let’s turn that pattern into a function definition:
-- file: ch04/SuffixTree.hs

compose :: (b -> c) -> (a -> b) -> a -> c

compose f g x = f (g x)
We now have a function, compose, that we can use to "glue" two other functions :
-- file: ch04/SuffixTree.hs

suffixes3 xs = compose init tails xs
Haskell’s automatic currying lets us drop the xs variable, so we can make our definition even shorter:
-- file: ch04/SuffixTree.hs

suffixes4 = compose init tails
Fortunately, we don’t need to write our own compose function. Plugging functions into each other like this is so common that the Prelude provides function composition via the (.) operator:
-- file: ch04/SuffixTree.hs

suffixes5 = init . tails
The (.) operator isn’t a special piece of language syntax—it’s just a normal operator:
:type (.)

(.) :: (b -> c) -> (a -> b) -> a -> c

:type suffixes

suffixes :: [a] -> [[a]]

:type suffixes5

suffixes5 :: [a] -> [[a]]

suffixes5 "foo"

["foo","oo","o"]
We can create new functions at any time by writing chains of composed functions, stitched together with (.), so long (of course) as the result type of the function on the right of each (.) matches the type of parameter that the function on the left can accept.
As an example, let’s solve a simple puzzle. Count the number of words in a string that begins with a capital letter:
:module +Data.Charlet capCount = length . filter (isUpper . head) . wordscapCount "Hello there, Mom!"

2
We can understand what this composed function does by examining its pieces. The
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Tips for Writing Readable Code
Inhaltsvorschau
So far in this chapter, we’ve come across two tempting features of Haskell: tail recursion and anonymous functions. As nice as these are, we don’t often want to use them.
Many list manipulation operations can be most easily expressed using combinations of library functions such as map, take, and filter. Without a doubt, it takes some practice to get used to using these. In return for our initial investment, we can write and read code more quickly, and with fewer bugs.
The reason for this is simple. A tail recursive function definition has the same problem as a loop in an imperative language: it’s completely general. It might perform some filtering, some mapping, or who knows what else. We are forced to look in detail at the entire definition of the function to see what it’s really doing. In contrast, map and most other list manipulation functions do only one thing. We can take for granted what these simple building blocks do and can focus on the idea the code is trying to express, not the minute details of how it’s manipulating its inputs.
Two folds lie in the middle ground between tail recursive functions (with complete generality) and our toolbox of list manipulation functions (each of which does one thing). A fold takes more effort to understand than, say, a composition of map and filter that does the same thing, but it behaves more regularly and predictably than a tail recursive function. As a general rule, don’t use a fold if you can compose some library functions, but otherwise try to use a fold in preference to a hand-rolled tail recursive loop.
As for anonymous functions, they tend to interrupt the "flow" of reading a piece of code. It is very often as easy to write a local function definition in a or clause and use that as it is to put an anonymous function into place. The relative advantages of a named function are twofold: we don’t need to understand the function’s definition when we’re reading the code that uses it, and a well-chosen function name acts as a tiny piece of local documentation.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Space Leaks and Strict Evaluation
Inhaltsvorschau
The foldl function that we discussed earlier is not the only place where space leaks can happen in Haskell code. We will use it to illustrate how nonstrict evaluation can sometimes be problematic and how to solve the difficulties that can arise.
It is perfectly reasonable to skip this section until you encounter a space leak "in the wild." Provided you use foldr if you are generating a list, and foldl' instead of foldl otherwise, space leaks are unlikely to bother you in practice for a while.
We refer to an expression that is not evaluated lazily as strict, so foldl' is a strict left fold. It bypasses Haskell’s usual nonstrict evaluation through the use of a special function named seq:
-- file: ch04/Fold.hs

foldl' _    zero []     = zero

foldl' step zero (x:xs) =

    let new = step zero x

    in  new `seq` foldl' step new xs
This seq function has a peculiar type, hinting that it is not playing by the usual rules:
:type seq

seq :: a -> t -> t

It operates as follows: when a seq expression is evaluated, it forces its first argument to be evaluated, and then returns its second argument. It doesn’t actually do anything with the first argument. seq exists solely as a way to force that value to be evaluated. Let’s walk through a brief application to see what happens:
-- file: ch04/Fold.hs

foldl' (+) 1 (2:[])
This expands as follows:
-- file: ch04/Fold.hs

let new = 1 + 2

in new `seq` foldl' (+) new []
The use of seq forcibly evaluates new to and returns its second argument:
-- file: ch04/Fold.hs

foldl' (+) 3 []
We end up with the following result:
-- file: ch04/Fold.hs

3
Thanks to seq, there are no thunks in sight.
Without some direction, there is an element of mystery to using seq effectively. Here are some useful rules for using it well.
To have any effect, a seq expression must be the first thing evaluated in an expression:
-- file: ch04/Fold.hs

-- incorrect: seq is hidden by the application of someFunc

-- since someFunc will be evaluated first, seq may occur too late

hiddenInside x y = someFunc (x `seq` y)



-- incorrect: a variation of the above mistake

hiddenByLet x y z = let a = x `seq` someFunc y

                    in anotherFunc a z



-- correct: seq will be evaluated first, forcing evaluation of x

onTheOutside x y = x `seq` someFunc y
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 5: Writing a Library: Working
Inhaltsvorschau
In this chapter, we’ll develop a small, but complete, Haskell library. Our library will manipulate and serialize data in a popular form known as JSON (JavaScript Object Notation).
The JSON language is a small, simple representation for storing and transmitting structured data—for example—over a network connection. It is most commonly used to transfer data from a web service to a browser-based JavaScript application. The JSON format is described at http://www.json.org/, and in greater detail by RFC 4627.
JSON supports four basic types of value—strings, numbers, Booleans, and a special value named :
"a string" 12345 true

      null
The language provides two compound types: an array is an ordered sequence of values, and an object is an unordered collection of name/value pairs. The names in an object are always strings; the values in an object or array can be of any type:
[-3.14, true, null, "a string"]

      {"numbers": [1,2,3,4,5], "useful": false}
To work with JSON data in Haskell, we use an algebraic data type to represent the range of possible JSON types:
-- file: ch05/SimpleJSON.hs

data JValue = JString String

            | JNumber Double

            | JBool Bool

            | JNull

            | JObject [(String, JValue)]

            | JArray [JValue]

              deriving (Eq, Ord, Show)
For each JSON type, we supply a distinct value constructor. Some of these constructors have parameters: if we want to construct a JSON string, we must provide a String value as an argument to the constructor.
To start experimenting with this code, save the file SimpleJSON.hs in your editor, switch to a ghci window, and load the file into ghci:
:load SimpleJSON

[1 of 1] Compiling SimpleJSON       ( SimpleJSON.hs, interpreted )

Ok, modules loaded: SimpleJSON.

JString "foo"

JString "foo"

JNumber 2.7

JNumber 2.7

:type JBool True

JBool True :: JValue
We can see how to use a constructor to take a normal Haskell value and turn it into a JValue. To do the reverse, we use pattern matching. Here’s a function that we can add to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A Whirlwind Tour of JSON
Inhaltsvorschau
In this chapter, we’ll develop a small, but complete, Haskell library. Our library will manipulate and serialize data in a popular form known as JSON (JavaScript Object Notation).
The JSON language is a small, simple representation for storing and transmitting structured data—for example—over a network connection. It is most commonly used to transfer data from a web service to a browser-based JavaScript application. The JSON format is described at http://www.json.org/, and in greater detail by RFC 4627.
JSON supports four basic types of value—strings, numbers, Booleans, and a special value named :
"a string" 12345 true

      null
The language provides two compound types: an array is an ordered sequence of values, and an object is an unordered collection of name/value pairs. The names in an object are always strings; the values in an object or array can be of any type:
[-3.14, true, null, "a string"]

      {"numbers": [1,2,3,4,5], "useful": false}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Representing JSON Data in Haskell
Inhaltsvorschau
To work with JSON data in Haskell, we use an algebraic data type to represent the range of possible JSON types:
-- file: ch05/SimpleJSON.hs

data JValue = JString String

            | JNumber Double

            | JBool Bool

            | JNull

            | JObject [(String, JValue)]

            | JArray [JValue]

              deriving (Eq, Ord, Show)
For each JSON type, we supply a distinct value constructor. Some of these constructors have parameters: if we want to construct a JSON string, we must provide a String value as an argument to the constructor.
To start experimenting with this code, save the file SimpleJSON.hs in your editor, switch to a ghci window, and load the file into ghci:
:load SimpleJSON

[1 of 1] Compiling SimpleJSON       ( SimpleJSON.hs, interpreted )

Ok, modules loaded: SimpleJSON.

JString "foo"

JString "foo"

JNumber 2.7

JNumber 2.7

:type JBool True

JBool True :: JValue
We can see how to use a constructor to take a normal Haskell value and turn it into a JValue. To do the reverse, we use pattern matching. Here’s a function that we can add to SimpleJSON.hs that will extract a string from a JSON value for us. If the JSON value actually contains a string, our function will wrap the string with the constructor; otherwise, it will return :
-- file: ch05/SimpleJSON.hs

getString :: JValue -> Maybe String

getString (JString s) = Just s

getString _           = Nothing
When we save the modified source file, we can reload it in ghci and try the new definition. (The :reload command remembers the last source file we loaded, so we do not need to name it explicitly.)
:reload

Ok, modules loaded: SimpleJSON.

getString (JString "hello")

Just "hello"

getString (JNumber 3)

Nothing
A few more accessor functions and we’ve got a small body of code to work with:
-- file: ch05/SimpleJSON.hs

getInt (JNumber n) = Just (truncate n)

getInt _           = Nothing



getDouble (JNumber n) = Just n

getDouble _           = Nothing



getBool (JBool b) = Just b

getBool _         = Nothing



getObject (JObject o) = Just o

getObject _           = Nothing



getArray (JArray a) = Just a

getArray _          = Nothing



isNull v            = v == JNull
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Anatomy of a Haskell Module
Inhaltsvorschau
A Haskell source file contains a definition of a single module. A module lets us determine which names inside the module are accessible from other modules.
A source file begins with a module declaration. This must precede all other definitions in the source file:
-- file: ch05/SimpleJSON.hs

module SimpleJSON

    (

      JValue(..)

    , getString

    , getInt

    , getDouble

    , getBool

    , getObject

    , getArray

    , isNull

    ) where
The word is reserved. It is followed by the name of the module, which must begin with a capital letter. A source file must have the same base name (the component before the suffix) as the name of the module it contains. This is why our file SimpleJSON.hs contains a module named .
Following the module name is a list of exports, enclosed in parentheses. The keyword indicates that the body of the module follows.
The list of exports indicates which names in this module are visible to other modules. This lets us keep private code hidden from the outside world. The special notation that follows the name indicates that we are exporting both the type and all of its constructors.
It might seem strange that we can export a type’s name (i.e., its type constructor), but not its value constructors. The ability to do this is important: it lets us hide the details of a type from its users, making the type abstract. If we cannot see a type’s value constructors, we cannot pattern match against a value of that type, nor can we construct a new value of that type. Later in this chapter, we’ll discuss some situations in which we might want to make a type abstract.
If we omit the exports (and the parentheses that enclose them) from a module declaration, every name in the module will be exported:
-- file: ch05/Exporting.hs

module ExportEverything where
To export no names at all (which is rarely useful), we write an empty export list using a pair of parentheses:
-- file: ch05/Exporting.hs

module ExportNothing () where
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Compiling Haskell Source
Inhaltsvorschau
In addition to the ghci interpreter, the GHC distribution includes a compiler, ghc, that generates native code. If you are already familiar with a command-line compiler such as gcc or cl (the C++ compiler component of Microsoft’s Visual Studio), you’ll immediately be at home with ghc.
To compile a source file, we first open a terminal or command prompt window, and then invoke ghc with the name of the source file to compile:
        ghc -c SimpleJSON.hs
The -c option tells ghc to generate only object code. If we were to omit the -c option, the compiler would attempt to generate a complete executable. That would fail, because we haven’t written a main function, which GHC calls to start the execution of a standalone program.
After ghc completes, if we list the contents of the directory, it should contain two new files: SimpleJSON.hi and SimpleJSON.o. The former is an interface file, in which ghc stores information about the names exported from our module in machine-readable form. The latter is an object file, which contains the generated machine code.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Generating a Haskell Program and Importing Modules
Inhaltsvorschau
Now that we’ve successfully compiled our minimal library, we’ll write a tiny program to exercise it. Create the following file in your text editor and save it as Main.hs:
-- file: ch05/Main.hs

module Main () where



import SimpleJSON



main = print (JObject [("foo", JNumber 1), ("bar", JBool False)])
Notice the directive that follows the module declaration. This indicates that we want to take all of the names that are exported from the module and make them available in our module. Any directives must appear in a group at the beginning of a module, after the module declaration, but before all other code. We cannot, for example, scatter them throughout a source file.
Our choice of naming for the source file and function is deliberate. To create an executable, ghc expects a module named that contains a function named main (the main function is the one that will be called when we run the program once we’ve built it).
        ghc -o simple Main.hs SimpleJSON.o
This time around, we omit the -c option when we invoke ghc, so it will attempt to generate an executable. The process of generating an executable is called linking. As our command line suggests, ghc is perfectly able to both compile source files and link an executable in a single invocation.
We pass ghc a new option, -o, which takes one argument: the name of the executable that ghc should create. Here, we’ve decided to name the program simple. On Windows, the program will have the suffix .exe, but on Unix variants, there will not be a suffix.
Finally, we supply the name of our new source file, Main.hs, and the object file we already compiled, SimpleJSON.o. We must explicitly list every one of our files that contains code that should end up in the executable. If we forget a source or object file, ghc will complain about undefined symbols, which indicates that some of the definitions that it needs are not provided in the files we supplied.
When compiling, we can pass ghc any mixture of source and object files. If
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Printing JSON Data
Inhaltsvorschau
Now that we have a Haskell representation for JSON’s types, we’d like to be able to take Haskell values and render them as JSON data.
There are a few ways we could go about this. Perhaps the most direct would be to write a rendering function that prints a value in JSON form. Once we’re done, we’ll explore some more interesting approaches.
-- file: ch05/PutJSON.hs

module PutJSON where



import Data.List (intercalate)

import SimpleJSON



renderJValue :: JValue -> String



renderJValue (JString s)   = show s

renderJValue (JNumber n)   = show n

renderJValue (JBool True)  = "true"

renderJValue (JBool False) = "false"

renderJValue JNull         = "null"



renderJValue (JObject o) = "{" ++ pairs o ++ "}"

  where pairs [] = ""

        pairs ps = intercalate ", " (map renderPair ps)

        renderPair (k,v)   = show k ++ ": " ++ renderJValue v



renderJValue (JArray a) = "[" ++ values a ++ "]"

  where values [] = ""

        values vs = intercalate ", " (map renderJValue vs)
Good Haskell style involves separating pure code from code that performs I/O. Our renderJValue function has no interaction with the outside world, but we still need to be able to print a JValue:
-- file: ch05/PutJSON.hs

putJValue :: JValue -> IO ()

putJValue v = putStrLn (renderJValue v)
Printing a JSON value is now easy.
Why should we separate the rendering code from the code that actually prints a value? This gives us flexibility. For instance, if we want to compress the data before writing it out and intermix rendering with printing, it would be much more difficult to adapt our code to that change in circumstances.
This idea of separating pure from impure code is powerful, and it is pervasive in Haskell code. Several Haskell compression libraries exist, all of which have simple interfaces: a compression function accepts an uncompressed string and returns a compressed string. We can use function composition to render JSON data to a string, and then compress to another string, postponing any decision on how to actually display or transmit the data.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Type Inference Is a Double-Edged Sword
Inhaltsvorschau
A Haskell compiler’s ability to infer types is powerful and valuable. Early on, you’ll probably face a strong temptation to take advantage of type inference by omitting as many type declarations as possible. Let’s simply make the compiler figure the whole lot out!
Skimping on explicit type information has a downside, one that disproportionately affects new Haskell programmers. As such a programmer, we’re extremely likely to write code that will fail to compile due to straightforward type errors.
When we omit explicit type information, we force the compiler to figure out our . It will infer types that are logical and consistent, but perhaps not at all what we meant. If we and the compiler unknowingly disagree about what is going on, it will naturally take us longer to find the source of our problem.
Suppose, for instance, that we write a function that we believe returns a String, but we don’t write a type signature for it:
-- file: ch05/Trouble.hs

upcaseFirst (c:cs) = toUpper c -- forgot ":cs" here
Here, we want to uppercase the first character of a word, but we’ve forgotten to append the rest of the word onto the result. We think our function’s type is String -> String, but the compiler will correctly infer its type as String -> Char. Let’s say we then try to use this function somewhere else:
-- file: ch05/Trouble.hs

camelCase :: String -> String

camelCase xs = concat (map upcaseFirst (words xs))
When we try to compile this code or load it into ghci, we won’t necessarily get an obvious error message:
:load Trouble

[1 of 1] Compiling Main             ( Trouble.hs, interpreted )



Trouble.hs:9:27:

    Couldn't match expected type `[Char]' against inferred type `Char'

      Expected type: [Char] -> [Char]

      Inferred type: [Char] -> Char

    In the first argument of `map', namely `upcaseFirst'

    In the first argument of `concat', namely

        `(map upcaseFirst (words xs))'

Failed, modules loaded: none.

Notice that the error is reported where we use the upcaseFirst function. If we’re eously convinced that our definition and type for
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A More General Look at Rendering
Inhaltsvorschau
Our JSON rendering code is narrowly tailored to the exact needs of our data types and the JSON formatting conventions. The output it produces can be unfriendly to human eyes. We will now look at rendering as a more generic task: how can we build a library that is useful for rendering data in a variety of situations?
We would like to produce output that is suitable either for human consumption (e.g., for debugging) or for machine processing. Libraries that perform this job are referred to as pretty printers. Several Haskell pretty-printing libraries already exist. We are creating one of our own not to replace them, but for the many useful insights we will gain into both library design and functional programming techniques.
We will call our generic pretty-printing module , so our code will go into a source file named Prettify.hs.
In our module, we will base our names on those used by several established Haskell pretty-printing libraries., which will give us a degree of compatibility with existing mature libraries.
To make sure that meets practical needs, we write a new JSON renderer that uses the API. After we’re done, we’ll go back and fill in the details of the module.
Instead of rendering straight to a string, our module will use an abstract type that we’ll call Doc. By basing our generic rendering library on an abstract type, we can choose an implementation that is flexible and efficient. If we decide to change the underlying code, our users will not be able to tell.
We will name our new JSON rendering module PrettyJSON.hs and retain the name renderJValue for the rendering function. Rendering one of the basic JSON values is straightforward:
-- file: ch05/PrettyJSON.hs

renderJValue :: JValue -> Doc

renderJValue (JBool True)  = text "true"

renderJValue (JBool False) = text "false"

renderJValue JNull         = text "null"

renderJValue (JNumber num) = double num

renderJValue (JString str) = string str
Our module provides the text, double, and string functions.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Developing Haskell Code Without Going Nuts
Inhaltsvorschau
Early on, as we come to grips with Haskell development, we have so many new, concepts to keep track of at one time that it can be a challenge to write code that compiles at all.
As we write our first substantial body of code, it’s a huge help to pause every few minutes and try to compile what we’ve produced so far. Because Haskell is so strongly typed, if our code compiles cleanly, we’re assured that we’re not wandering too far off into the programming weeds.
One useful technique for quickly developing the skeleton of a program is to write holder, or stub, versions of types and functions. For instance, we just mentioned that our string, text and double functions would be provided by our module. If we don’t provide definitions for those functions or the Doc type, our attempts to "compile early, compile often" with our JSON renderer will fail, as the compiler won’t know anything about those functions. To avoid this problem, we write stub code that doesn’t do anything:
-- file: ch05/PrettyStub.hs

import SimpleJSON



data Doc = ToBeDefined

         deriving (Show)



string :: String -> Doc

string str = undefined



text :: String -> Doc

text str = undefined



double :: Double -> Doc

double num = undefined
The special value has the type a, so it always typechecks, no matter where we use it. If we attempt to evaluate it, it will cause our program to crash:
:type undefined

undefined :: a

undefined

*** Exception: Prelude.undefined

:type double

double :: Double -> Doc

double 3.14

*** Exception: Prelude.undefined
Even though we can’t yet run our stubbed code, the compiler’s type checker will ensure that our program is sensibly typed.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Pretty Printing a String
Inhaltsvorschau
When we must pretty print a string value, JSON has moderately involved escaping rules that we must follow. At the highest level, a string is just a series of characters wrapped in quotes:
-- file: ch05/PrettyJSON.hs

string :: String -> Doc

string = enclose '"' '"' . hcat . map oneChar
This style of writing a definition exclusively as a composition of other functions is called point-free style. The use of the word point is not related to the "" character used for function composition. The term point is roughly synonymous (in Haskell) with value, so a point-free expression makes no mention of the values that it operates on.
Contrast this point-free definition of string with this "pointy" version, which uses a variable, s, to refer to the value on which it operates:
-- file: ch05/PrettyJSON.hs

pointyString :: String -> Doc

pointyString s = enclose '"' '"' (hcat (map oneChar s))
The enclose function simply wraps a Doc value with an opening and closing character:
-- file: ch05/PrettyJSON.hs

enclose :: Char -> Char -> Doc -> Doc

enclose left right x = char left <> x <> char right
We provide a (<>) function in our pretty-printing library. It appends two Doc values, so it’s the Doc equivalent of (++):
-- file: ch05/PrettyStub.hs

(<>) :: Doc -> Doc -> Doc

a <> b = undefined



char :: Char -> Doc

char c = undefined
Our pretty-printing library also provides hcat, which concatenates multiple Doc values into one—it’s the analogue of concat for lists:
-- file: ch05/PrettyStub.hs

hcat :: [Doc] -> Doc

hcat xs = undefined
Our string function applies the oneChar function to every character in a string, concatenates the lot, and encloses the result in quotes. The oneChar function escapes or renders an individual character:
-- file: ch05/PrettyJSON.hs

oneChar :: Char -> Doc

oneChar c = case lookup c simpleEscapes of

              Just r -> text r

              Nothing | mustEscape c -> hexEscape c

                      | otherwise    -> char c

    where mustEscape c = c < ' ' || c == '\x7f' || c > '\xff'



simpleEscapes :: [(Char, String)]

simpleEscapes = zipWith ch "\b\n\f\r\t\\\"/" "bnfrt\\\"/"

    where ch a b = (a, ['\\',b])
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Arrays and Objects, and the Module Header
Inhaltsvorschau
Compared to strings, pretty printing arrays and objects is a snap. We already know that the two are visually similar: each starts with an opening character, followed by a series of values separated with commas, followed by a closing character. Let’s write a function that captures the common structure of arrays and objects:
-- file: ch05/PrettyJSON.hs

series :: Char -> Char -> (a -> Doc) -> [a] -> Doc

series open close item = enclose open close

                       . fsep . punctuate (char ',') . map item
We’ll start by interpreting this function’s type. It takes an opening and closing character, then a function that knows how to pretty print a value of some unknown type a, followed by a list of values of type a. It then returns a value of type Doc.
Notice that although our type signature mentions four parameters, we listed only three in the definition of the function. We are just following the same rule that lets us simplify a definiton such as to .
We have already written enclose, which wraps a Doc value in opening and closing characters. The fsep function will live in our module. It combines a list of Doc values into one, possibly wrapping lines if the output will not fit on a single line:
-- file: ch05/PrettyStub.hs

fsep :: [Doc] -> Doc

fsep xs = undefined
By now, you should be able to define your own stubs in Prettify.hs, following the examples we have supplied. We will not explicitly define any more stubs.
The punctuate function will also live in our module, and we can define it in terms of functions for which we’ve already written stubs:
-- file: ch05/Prettify.hs

punctuate :: Doc -> [Doc] -> [Doc]

punctuate p []     = []

punctuate p [d]    = [d]

punctuate p (d:ds) = (d <> p) : punctuate p ds
With this definition of series, pretty printing an array is entirely straightforward. We add this equation to the end of the block we’ve already written for our renderJValue function:
-- file: ch05/PrettyJSON.hs

renderJValue (JArray ary) = series '[' ']' renderJValue ary
To pretty print an object, we need to do only a little more work. For each element, we have both a name and a value to deal with:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a Module Header
Inhaltsvorschau
Now that we have written the bulk of our PrettyJSON.hs file, we must go back to the top and add a module declaration:
-- file: ch05/PrettyJSON.hs

module PrettyJSON

    (

      renderJValue

    ) where



import Numeric (showHex)

import Data.Char (ord)

import Data.Bits (shiftR, (.&.))



import SimpleJSON (JValue(..))

import Prettify (Doc, (<>), char, double, fsep, hcat, punctuate, text,

                 compact, pretty)
We export just one name from this module: renderJValue, our JSON rendering function. The other definitions in the module exist purely to support renderJValue, so there’s no reason to make them visible to other modules.
Regarding imports, the and modules are distributed with GHC. We’ve already written the module and filled our module with skeletal definitions. Notice that there’s no difference in the way we import standard modules from those we’ve written ourselves.
With each directive, we explicitly list each of the names we want to bring into our module’s namespace. This is not required. If we omit the list of names, all of the names exported from a module will be available to us. However, it’s generally a good idea to write an explicit import list for the following reasons:
  • An explicit list makes it clear which names we’re importing from where. This will make it easier for a reader to look up documentation if he encounters an unfamiliar function.
  • Occasionally, a library maintainer will remove or rename a function. If a function disappears from a third-party module that we use, any resulting compilation error is likely to happen long after we’ve written the module. The explicit list of imported names can act as a reminder to ourselves of where we had been importing the missing name from, which will help us to pinpoint the problem more quickly.
  • It is possible that someone will add a name to a module that is identical to a name already in our own code. If we don’t use an explicit import list, we’ll end up with the same name in our module twice. If we use that name,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Fleshing Out the Pretty-Printing Library
Inhaltsvorschau
In our module, we represent our Doc type as an algebraic data type:
-- file: ch05/Prettify.hs

data Doc = Empty

         | Char Char

         | Text String

         | Line

         | Concat Doc Doc

         | Union Doc Doc

           deriving (Show,Eq)
Observe that the Doc type is actually a tree. The and constructors create an internal node from two other Doc values, while the and other simple constructors build leaves.
In the header of our module, we will export the name of the type, but none of its constructors. This will prevent modules that use the Doc type from creating and pattern matching against Doc values.
Instead, to create a Doc, a user of the module will call a function that we provide. Here are the simple construction functions. As we add real definitions, we must replace any stubbed versions already in the Prettify.hs source file:
-- file: ch05/Prettify.hs

empty :: Doc

empty = Empty



char :: Char -> Doc

char c = Char c



text :: String -> Doc

text "" = Empty

text s  = Text s



double :: Double -> Doc

double d = text (show d)
The constructor represents a line break. The line function creates hard line breaks, which always appear in the pretty printer’s output. Sometimes we’ll want a soft line break, which is only used if a line is too wide to fit in a window or page (we’ll introduce a softline function shortly):
-- file: ch05/Prettify.hs

line :: Doc

line = Line
Almost as simple as the basic constructors is the (<>) function, which concatenates two Doc values:
-- file: ch05/Prettify.hs

(<>) :: Doc -> Doc -> Doc

Empty <> y = y

x <> Empty = x

x <> y = x `Concat` y
We pattern-match against so that concatenating a Doc value with on the left or right will have no effect, which keeps us from bloating the tree with useless values:
text "foo" <> text "bar"

Concat (Text "foo") (Text "bar")

text "foo" <> empty

Text "foo"

empty <> text "bar"

Text "bar"
If we briefly put on our mathematical hats, we can say that is the identity under concatenation, since nothing happens if we concatenate a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating a Package
Inhaltsvorschau
The Haskell community has built a standard set of tools, named Cabal, that help with building, installing, and distributing software. Cabal organizes software as a package. A package contains one library, and possibly several executable programs.
To do anything with a package, Cabal needs a description of it. This is contained in a text file whose name ends with the suffix .cabal. This file belongs in the top-level directory of your project. It has a simple format, which we’ll describe next.
A Cabal package must have a name. Usually, the name of the package matches the name of the .cabal file. We’ll call our package , so our file is mypretty.cabal. Often, the directory that contains a .cabal file will have the same name as the package, e.g., .
A package description begins with a series of global properties, which apply to every library and executable in the package:
Name:          mypretty

Version:       0.1



-- This is a comment.  It stretches to the end of the line.
Package names must be unique. If you create and install a package that has the same name as a package already present on your system, GHC will get very confused.
The global properties include a substantial amount of information that is intended for human readers, not Cabal itself:
Synopsis:      My pretty printing library, with JSON support

Description:

  A simple pretty-printing library that illustrates how to

  develop a Haskell library.

Author:        Real World Haskell

Maintainer:    nobody@realworldhaskell.org
As the field indicates, a field can span multiple lines, provided they’re indented.
Also included in the global properties is license information. Most Haskell packages are licensed under the BSD license, which Cabal calls . (Obviously, you’re free to choose whatever license you think is appropriate.) The optional field lets us specify the name of a file that contains the exact text of our package’s licensing terms.
The features supported by successive versions of Cabal evolve over time, so it’s wise to indicate what versions of Cabal we expect to be compatible with. The features we are describing are supported by versions 1.2 and higher of Cabal:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Practical Pointers and Further Reading
Inhaltsvorschau
GHC already bundles a pretty-printing library, . It provides the same basic API as our example but a much richer and more useful set of pretty-printing functions. We recommend using it, rather than writing your own.
John Hughes introduced the design of the pretty printer “The Design of a Pretty-Printing library” (http://citeseer.ist.psu.edu/hughes95design.html). The library was subsequently improved by Simon Peyton Jones, hence the name. Hughes’s paper is long, but well worth reading for his discussion of how to design a library in Haskell.
In this chapter, our pretty-printing library is based on a simpler system described by Philip Wadler in “A prettier printer” (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.635). His library was extended by Daan Leijen; this version is available for download from Hackage as wl-pprint. If you use the cabal command-line tool, you can download, build, and install it in one step with cabal install wl-pprint.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 6: Using Typeclasses
Inhaltsvorschau
Typeclasses are among the most powerful features in Haskell. They allow us to define generic interfaces that provide a common feature set over a wide variety of types. Typeclasses are at the heart of some basic language features such as equality testing and numeric operators. Before we talk about what exactly typeclasses are, though, we’d like to explain the need for them.
Let’s imagine that for some unfathomable reason, the designers of the Haskell language neglected to implement the equality test ==. Once you get over your shock at hearing this, you resolve to implement your own equality tests. Your application consists of a simple Color type, and so your first equality test is for this type. Your first attempt might look like this:
-- file: ch06/naiveeq.hs

data Color = Red | Green | Blue



colorEq :: Color -> Color -> Bool

colorEq Red   Red   = True

colorEq Green Green = True

colorEq Blue  Blue  = True

colorEq _     _     = False
You can test this with ghci:
:load naiveeq.hs

[1 of 1] Compiling Main             ( naiveeq.hs, interpreted )

Ok, modules loaded: Main.

colorEq Red Red

True

colorEq Red Green

False
Now, let’s say that you want to add an equality test for Strings. Since a Haskell String is a list of characters, we can write a simple function to perform that test. For simplicity, we cheat a bit and use the == operator here to illustrate:
-- file: ch06/naiveeq.hs

stringEq :: [Char] -> [Char] -> Bool



-- Match if both are empty

stringEq [] [] = True



-- If both start with the same char, check the rest

stringEq (x:xs) (y:ys) = x == y && stringEq xs ys



-- Everything else doesn't match

stringEq _ _ = False
You should now be able to see a problem: we have to use a function with a different name for every different type that we want to be able to compare. That’s inefficient and annoying. It’s much more convenient to be able to just use == to compare anything. It may also be useful to write generic functions such as /= that could be implemented in terms of ==, and valid for almost anything. By having a generic function that can compare anything, we can also make our code generic: if a piece of code needs only to compare things, then it ought to be able to accept any data type that the compiler knows how to compare. What’s more, if new data types are added later, the existing code shouldn’t have to be modified.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Need for Typeclasses
Inhaltsvorschau
Let’s imagine that for some unfathomable reason, the designers of the Haskell language neglected to implement the equality test ==. Once you get over your shock at hearing this, you resolve to implement your own equality tests. Your application consists of a simple Color type, and so your first equality test is for this type. Your first attempt might look like this:
-- file: ch06/naiveeq.hs

data Color = Red | Green | Blue



colorEq :: Color -> Color -> Bool

colorEq Red   Red   = True

colorEq Green Green = True

colorEq Blue  Blue  = True

colorEq _     _     = False
You can test this with ghci:
:load naiveeq.hs

[1 of 1] Compiling Main             ( naiveeq.hs, interpreted )

Ok, modules loaded: Main.

colorEq Red Red

True

colorEq Red Green

False
Now, let’s say that you want to add an equality test for Strings. Since a Haskell String is a list of characters, we can write a simple function to perform that test. For simplicity, we cheat a bit and use the == operator here to illustrate:
-- file: ch06/naiveeq.hs

stringEq :: [Char] -> [Char] -> Bool



-- Match if both are empty

stringEq [] [] = True



-- If both start with the same char, check the rest

stringEq (x:xs) (y:ys) = x == y && stringEq xs ys



-- Everything else doesn't match

stringEq _ _ = False
You should now be able to see a problem: we have to use a function with a different name for every different type that we want to be able to compare. That’s inefficient and annoying. It’s much more convenient to be able to just use == to compare anything. It may also be useful to write generic functions such as /= that could be implemented in terms of ==, and valid for almost anything. By having a generic function that can compare anything, we can also make our code generic: if a piece of code needs only to compare things, then it ought to be able to accept any data type that the compiler knows how to compare. What’s more, if new data types are added later, the existing code shouldn’t have to be modified.
Haskell’s typeclasses are designed to address all of these things.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
What Are Typeclasses?
Inhaltsvorschau
Typeclasses define a set of functions that can have different implementations depending on the type of data they are given. Typeclasses may look like the objects of object-oriented programming, but they are truly quite different.
Let’s use typeclasses to solve our equality dilemma from the previous section. To begin with, we must define the typeclass itself. We want a function that takes two parameters, both the same type, and returns a Bool indicating whether or not they are equal. We don’t care what that type is, but we just want two items of that type. Here’s our first definition of a typeclass:
-- file: ch06/eqclasses.hs

class BasicEq a where

    isEqual :: a -> a -> Bool
This says that we are declaring a typeclass named BasicEq, and we’ll refer to instance types with the letter a. An instance type of this typeclass is any type that implements the functions defined in the typeclass. This typeclass defines one function. That function takes two parameters—both corresponding to instance types—and returns a Bool.
The keyword to define a typeclass in Haskell is class. Unfortunately, this may be confusing for those of you coming from an object-oriented background, as we are not really defining the same thing.
On the first line, the name of the parameter a was chosen arbitrarily—we could have used any name. The key is that, when you list the types of your functions, you must use that name to refer to instance types.
Let’s look at this in ghci. Recall that you can type :type in ghci to have it show you the type of something. Let’s see what it says about isEqual:
*Main> :type isEqual

isEqual :: (BasicEq a) => a -> a -> Bool
You can read that this way: “For all types a, so long as a is an instance of BasicEq, isEqual takes two parameters of type a and returns a Bool.” Let’s take a quick look at defining isEqual for a particular type:
-- file: ch06/eqclasses.hs

instance BasicEq Bool where

    isEqual True  True  = True

    isEqual False False = True

    isEqual _     _     = False
You can also use
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Declaring Typeclass Instances
Inhaltsvorschau
Now that you know how to define typeclasses, it’s time to learn how to define instances of typeclasses. Recall that types are made instances of a particular typeclass by implementing the functions necessary for that typeclass.
Recall our attempt to create a test for equality over a Color type back in . Now let’s see how we could make that same Color type a member of the BasicEq3 class:
-- file: ch06/eqclasses.hs

instance BasicEq3 Color where

    isEqual3 Red Red = True

    isEqual3 Green Green = True

    isEqual3 Blue Blue = True

    isEqual3 _ _ = False
Notice that we provide essentially the same function as we used in . In fact, the implementation is identical. However, in this case, we can use isEqual3 on any type that we declare is an instance of BasicEq3, not just this one color type. We could define equality tests for anything from numbers to graphics using the same basic pattern. In fact, as you will see in , this is exactly how you can make Haskell’s == operator work for your own custom types.
Note also that the BasicEq3 class defined both isEqual3 and isNotEqual3, but we only one of them in the Color instance. That’s because of the default implementation contained in BasicEq3. Since we didn’t explicitly define isNotEqual3, the compiler automatically uses the default implementation given in the BasicEq3 .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Important Built-in Typeclasses
Inhaltsvorschau
Now that we’ve discussed defining your own typeclasses and making your types instances of typeclasses, it’s time to introduce you to typeclasses that are a standard part of the Haskell Prelude. As we mentioned at the beginning of this chapter, typeclasses are at the core of some important aspects of the language. We’ll cover the most common ones here. For more details, the Haskell library reference is a good resource. It will give you a description of the typeclasses and usually also will tell you which functions you must implement to have a complete definition.
The Show typeclass is used to convert values to Strings. It is perhaps most commonly used to convert numbers to Strings, but it is defined for so many types that it can be used to convert quite a bit more. If you have defined your own types, making them instances of Show will make it easy to display them in ghci or print them out in programs.
The most important function of Show is show. It takes one argument—the data to convert. It returns a String representing that data. ghci reports the type of show like this:
:type show

show :: (Show a) => a -> String

Let’s look at some examples of converting values to strings:
show 1

"1"

show [1, 2, 3]

"[1,2,3]"

show (1, 2)

"(1,2)"
Remember that ghci displays results as they would be entered into a Haskell program. So the expression show 1 returns a single-character string containing the digit 1. That is, the quotes are not part of the string itself. We can make that clear by using putStrLn:
putStrLn (show 1)

1

putStrLn (show [1,2,3])

[1,2,3]
You can also use show on Strings:
show "Hello!"

"\"Hello!\""

putStrLn (show "Hello!")

"Hello!"

show ['H', 'i']

"\"Hi\""

putStrLn (show "Hi")

"Hi"

show "Hi, \"Jane\""

"\"Hi, \\\"Jane\\\"\""

putStrLn (show "Hi, \"Jane\"")

"Hi, \"Jane\""
Running show on Strings can be confusing. Since show generates a result that is suitable for a Haskell literal, it adds quotes and escaping suitable for inclusion in a Haskell program.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Automatic Derivation
Inhaltsvorschau
For many simple data types, the Haskell compiler can automatically derive instances of Read, Show, Bounded, Enum, Eq, and Ord for us. This saves us the effort of having to manually write code to compare or display our own types:
-- file: ch06/colorderived.hs

data Color = Red | Green | Blue

     deriving (Read, Show, Eq, Ord)
The Haskell standard requires compilers to be able to automatically derive instances of these specific typeclasses. This automation is not available for other typeclasses.
Let’s take a look at how these derived instances work for us:
show Red

"Red"

(read "Red")::Color

Red

(read "[Red,Red,Blue]")::[Color]

[Red,Red,Blue]

(read "[Red, Red, Blue]")::[Color]

[Red,Red,Blue]

Red == Red

True

Red == Blue

False

Data.List.sort [Blue,Green,Blue,Red]

[Red,Green,Blue,Blue]

Red < Blue

True
Notice that the sort order for Color was based on the order in which the constructors were defined.
Automatic derivation is not always possible. For instance, if you defined a type data MyType = MyType (Int -> Bool), the compiler will not be able to derive an instance of Show because it doesn’t know how to render a function. We will get a compilation error in such a situation.
When we automatically derive an instance of some typeclass, the types that we refer to in our data declaration must also be instances of that typeclass (manually or ):
-- file: ch06/AutomaticDerivation.hs

data CannotShow = CannotShow

                deriving (Show)



-- will not compile, since CannotShow is not an instance of Show

data CannotDeriveShow = CannotDeriveShow CannotShow

                        deriving (Show)



data OK = OK



instance Show OK where

    show _ = "OK"



data ThisWorks = ThisWorks OK

                 deriving (Show)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Typeclasses at Work: Making JSON Easier to Use
Inhaltsvorschau
The JValue type that we introduced in is not especially easy to work with. Here is a truncated and tidied snippet of some real JSON data, produced by a well-known search engine:
{

  "query": "awkward squad haskell",

  "estimatedCount": 3920,

  "moreResults": true,

  "results":

  [{

    "title": "Simon Peyton Jones: papers",

    "snippet": "Tackling the awkward squad: monadic input/output ...",

    "url": "http://research.microsoft.com/~simonpj/papers/marktoberdorf/",

   },

   {

    "title": "Haskell for C Programmers | Lambda the Ultimate",

    "snippet": "... the best job of all the tutorials I've read ...",

    "url": "http://lambda-the-ultimate.org/node/724",

   }]

}
And here’s a further slimmed down fragment of that data, represented in Haskell:
-- file: ch05/SimpleResult.hs

import SimpleJSON



result :: JValue

result = JObject [

  ("query", JString "awkward squad haskell"),

  ("estimatedCount", JNumber 3920),

  ("moreResults", JBool True),

  ("results", JArray [

     JObject [

      ("title", JString "Simon Peyton Jones: papers"),

      ("snippet", JString "Tackling the awkward ..."),

      ("url", JString "http://.../marktoberdorf/")

     ]])

  ]
Because Haskell doesn’t natively support lists that contain types of different values, we can’t directly represent a JSON object that contains values of different types. Instead, we must wrap each value with a JValue constructor, which limits our flexibility—if we want to change the number to a string , we must change the constructor that we use to wrap it from to .
Haskell’s typeclasses offer a tempting solution to this problem:
-- file: ch06/JSONClass.hs

type JSONError = String



class JSON a where

    toJValue :: a -> JValue

    fromJValue :: JValue -> Either JSONError a



instance JSON JValue where

    toJValue = id

    fromJValue = Right
Now, instead of applying a constructor such as to a value in order to wrap it, we apply the toJValue function. If we change a value’s type, the compiler will choose a suitable implementation of toJValue to use with it.
We also provide a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Living in an Open World
Inhaltsvorschau
Haskell’s typeclasses are intentionally designed to let us create new instances of a typeclass whenever we see fit:
-- file: ch06/JSONClass.hs

doubleToJValue :: (Double -> a) -> JValue -> Either JSONError a

doubleToJValue f (JNumber v) = Right (f v)

doubleToJValue _ _ = Left "not a JSON number"



instance JSON Int where

    toJValue = JNumber . realToFrac

    fromJValue = doubleToJValue round



instance JSON Integer where

    toJValue = JNumber . realToFrac

    fromJValue = doubleToJValue round



instance JSON Double where

    toJValue = JNumber

    fromJValue = doubleToJValue id
We can add new instances anywhere; they are not confined to the module where we define a typeclass. This feature of the typeclass system is referred to as its open world assumption. If we had a way to express a notion of “the following are the only instances of this typeclass that can exist,” we would have a closed world.
We would like to be able to turn a list into what JSON calls an array. We won’t worry about implementation details just yet, so let’s use as the bodies of the instance’s methods:
-- file: ch06/BrokenClass.hs

instance (JSON a) => JSON [a] where

    toJValue = undefined

    fromJValue = undefined
It would also be convenient if we could turn a list of name/value pairs into a JSON object:
-- file: ch06/BrokenClass.hs

instance (JSON a) => JSON [(String, a)] where

    toJValue = undefined

    fromJValue = undefined
If we put these definitions into a source file and load them into ghci, everything seems fine initially:
:load BrokenClass

[1 of 2] Compiling SimpleJSON       ( ../ch05/SimpleJSON.hs, interpreted )

[2 of 2] Compiling BrokenClass      ( BrokenClass.hs, interpreted )

Ok, modules loaded: BrokenClass, SimpleJSON.

However, once we try to use the list-of-pairs instance, we run into trouble:
toJValue [("foo","bar")]



<interactive>:1:0:

    Overlapping instances for JSON [([Char], [Char])]

      arising from a use of `toJValue' at <interactive>:1:0-23

    Matching instances:

      instance (JSON a) => JSON [a]

        -- Defined at BrokenClass.hs:(44,0)-(46,25)

      instance (JSON a) => JSON [(String, a)]

        -- Defined at BrokenClass.hs:(50,0)-(52,25)

    In the expression: toJValue [("foo", "bar")]

    In the definition of `it': it = toJValue [("foo", "bar")]

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
How to Give a Type a New Identity
Inhaltsvorschau
In addition to the familiar data keyword, Haskell provides us with another way to create a new type, using the keyword:
-- file: ch06/Newtype.hs

data DataInt = D Int

    deriving (Eq, Ord, Show)



newtype NewtypeInt = N Int

    deriving (Eq, Ord, Show)
The purpose of a declaration is to rename an existing type, giving it a distinct identity. As we can see, it is similar in appearance to a type declared using the data keyword.
Although their names are similar, the type and keywords have different purposes. The type keyword gives us another way of referring to a type, like a nickname for a friend. We and the compiler know that [Char] and String names refer to the same type.
In contrast, the keyword exists to hide the nature of a type. Consider a UniqueID type:
-- file: ch06/Newtype.hs

newtype UniqueID = UniqueID Int

    deriving (Eq)
The compiler treats UniqueID as a different type from Int. As a user of a UniqueID, we know only that we have a unique identifier; we cannot see that it is implemented as an Int.
When we declare a , we must choose which of the underlying type’s typeclass instances we want to expose. Here, we’ve elected to make NewtypeInt provide Int’s instances for Eq, Ord, and Show. As a result, we can compare and print values of type NewtypeInt:
N 1 < N 2

True

Since we are not exposing Int’s Num or Integral instances, values of type NewtypeInt are not numbers. For instance, we can’t add them:
N 313 + N 37



<interactive>:1:0:

    No instance for (Num NewtypeInt)

      arising from a use of `+' at <interactive>:1:0-11

    Possible fix: add an instance declaration for (Num NewtypeInt)

    In the expression: N 313 + N 37

    In the definition of `it': it = N 313 + N 37

As with the data keyword, we can use a ’s value constructor to create a new value or to pattern match on an existing value.
If a does not use automatic deriving to expose the underlying type’s implementation of a typeclass, we are free to either write a new instance or leave the typeclass unimplemented.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
JSON Typeclasses Without Overlapping Instances
Inhaltsvorschau
Enabling GHC’s support for overlapping instances is an effective and quick way to make our JSON code happy. In more complex cases, we will occasionally be faced with several equally good instances for some typeclass, in which case, overlapping instances will not help us and we will need to put some declarations into place. To see what’s involved, let’s rework our JSON typeclass instances to use s instead of overlapping instances.
Our first task, then, is to help the compiler to distinguish between [a], the representation we use for JSON arrays, and [(String,[a])], which we use for objects. These were the types that gave us problems before we learned about . We wrap up the list type so that the compiler will not see it as a list:
-- file: ch06/JSONClass.hs

newtype JAry a = JAry {

      fromJAry :: [a]

    } deriving (Eq, Ord, Show)
When we export this type from our module, we’ll export the complete details of the type. Our module header will look like this:
-- file: ch06/JSONClassExport.hs

module JSONClass

    (

      JAry(..)

    ) where
The following the JAry name means “export all details of this type.”
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Dreaded Monomorphism Restriction
Inhaltsvorschau
The Haskell 98 standard has a subtle feature that can sometimes bite us in unexpected circumstances. Here’s a simple function definition that illustrates the issue:
-- file: ch06/Monomorphism.hs

myShow = show
If we try to load this definition into ghci, it issues a peculiar complaint:
:load Monomorphism

[1 of 1] Compiling Main             ( Monomorphism.hs, interpreted )



Monomorphism.hs:2:9:

    Ambiguous type variable `a' in the constraint:

      `Show a' arising from a use of `show' at Monomorphism.hs:2:9-12

    Possible cause: the monomorphism restriction applied to the following:

      myShow :: a -> String (bound at Monomorphism.hs:2:0)

    Probable fix: give these definition(s) an explicit type signature

                  or use -fno-monomorphism-restriction

Failed, modules loaded: none.

The monomorphism restriction to which the error message refers is a part of the Haskell 98 standard. Monomorphism is simply the opposite of polymorphism: it indicates that an expression has exactly one type. The restriction lies in the fact that Haskell sometimes forces a declaration to be less polymorphic than we would expect.
We mention the monomorphism restriction here because although it isn’t specifically related to typeclasses, they usually provide the circumstances in which it crops up.
It’s possible that you will not run into the monomorphism restriction in real code for a long time. We don’t think you need to try to remember the details of this section. It should suffice to make a mental note of its existence, until eventually GHC complains with something such as the just shown error message. If that occurs, simply remember that you read about the error in this chapter, and come back for guidance.
We won’t attempt to explain the monomorphism . The consensus within the Haskell community is that it doesn’t arise often, it is tricky to explain, and it provides almost no practical benefit. So, it mostly serves to trip people up. For an example of its trickiness, while the definition provided previously falls afoul of it, the following two compile without problems:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Conclusion
Inhaltsvorschau
In this chapter, you learned about the need for typeclasses and how to use them. We talked about defining our own typeclasses and then covered some of the important typeclasses that are defined in the Haskell library. Finally, we showed how to have the Haskell compiler automatically derive instances of certain typeclasses for your types.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 7: I/O
Inhaltsvorschau
It should be obvious that most, if not all, programs are devoted to gathering data from outside, processing it, and providing results back to the outside world. That is, input and output are key.
Haskell’s I/O system is powerful and expressive. It is easy to work with and important to understand. Haskell strictly separates pure code from code that could cause things to occur in the world. That is, it provides a complete isolation from side effects in pure code. Besides helping programmers to reason about the correctness of their code, it also permits compilers to automatically introduce optimizations and parallelism.
We’ll begin this chapter with simple, standard-looking I/O in Haskell. Then we’ll discuss some of the more powerful options, as well as provide more detail on how I/O fits into the pure, lazy, functional Haskell world.
Let’s get started with I/O in Haskell by looking at a program that appears to be surprisingly similar to I/O in other languages such as C or Perl:
-- file: ch07/basicio.hs

main = do

       putStrLn "Greetings!  What is your name?"

       inpStr <- getLine

       putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!"
You can compile this program to a standalone executable, run it with runghc, or invoke main from within ghci. Here’s a sample session using runghc:
$ runghc basicio.hs

Greetings!  What is your name?

John

Welcome to Haskell, John!
That’s a fairly simple, obvious result. You can see that putStrLn writes out a String, followed by an end-of-line character. getLine reads a line from standard input. The <- syntax may be new to you. Put simply, that binds the result from executing an I/O action to a name. We use the simple list concatenation operator ++ to join the input string with our own text.
Let’s take a look at the types of putStrLn and getLine. You can find that information in the library reference, or just ask ghci:
:type putStrLn

putStrLn :: String -> IO ()

:type getLine

getLine :: IO String
Notice that both of these types have IO in their return value. That is your key to knowing that they may have side effects, or they may return different values even when called with the same arguments, or both. The type of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Classic I/O in Haskell
Inhaltsvorschau
Let’s get started with I/O in Haskell by looking at a program that appears to be surprisingly similar to I/O in other languages such as C or Perl:
-- file: ch07/basicio.hs

main = do

       putStrLn "Greetings!  What is your name?"

       inpStr <- getLine

       putStrLn $ "Welcome to Haskell, " ++ inpStr ++ "!"
You can compile this program to a standalone executable, run it with runghc, or invoke main from within ghci. Here’s a sample session using runghc:
$ runghc basicio.hs

Greetings!  What is your name?

John

Welcome to Haskell, John!
That’s a fairly simple, obvious result. You can see that putStrLn writes out a String, followed by an end-of-line character. getLine reads a line from standard input. The <- syntax may be new to you. Put simply, that binds the result from executing an I/O action to a name. We use the simple list concatenation operator ++ to join the input string with our own text.
Let’s take a look at the types of putStrLn and getLine. You can find that information in the library reference, or just ask ghci:
:type putStrLn

putStrLn :: String -> IO ()

:type getLine

getLine :: IO String
Notice that both of these types have IO in their return value. That is your key to knowing that they may have side effects, or they may return different values even when called with the same arguments, or both. The type of putStrLn looks like a function. It takes a parameter of type String and returns value of type IO (). Just what is an IO () though?
Anything that is type IO something is an I/O action. You can store it and nothing will happen. I could say writefoo = putStrLn "foo" and nothing happens right then. But if I later use writefoo in the middle of another I/O action, the writefoo action will be executed when its parent action is executed—I/O actions can be glued together to form bigger I/O actions. The () is an empty tuple (pronounced "unit"), indicating that there is no return value from putStrLn. This is similar to void in Java or C.
Actions can be created, assigned, and passed anywhere. However, they may only be performed (executed) from within another I/O action.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Working with Files and Handles
Inhaltsvorschau
So far, you’ve seen how to interact with the user at the computer’s terminal. Of course, you’ll often need to manipulate specific files. That’s easy to do, too.
Haskell defines quite a few basic functions for I/O, many of which are similar to functions seen in other programming languages. The library reference for System.IO provides a good summary of all the basic I/O functions, should you need one that we aren’t touching upon here.
You will generally begin by using openFile, which will give you a file Handle. That Handle is then used to perform specific operations on the file. Haskell provides functions such as hPutStrLn that work just like putStrLn but take an additional argument, a Handle, that specifies which file to operate upon. When you’re done, you’ll use hClose to close the Handle. These functions are all defined in System.IO, so you’ll need to import that module when working with files. There are “h” functions corresponding to virtually all of the non-“h” functions; for instance, there is print for printing to the screen and hPrint for printing to a file.
Let’s start with an imperative way to read and write files. This should seem similar to a while loop that you may find in other languages. This isn’t the best way to write it in Haskell; later, you’ll see examples of more Haskellish approaches.
-- file: ch07/toupper-imp.hs

import System.IO

import Data.Char(toUpper)



main :: IO ()

main = do 

       inh <- openFile "input.txt" ReadMode

       outh <- openFile "output.txt" WriteMode

       mainloop inh outh

       hClose inh

       hClose outh



mainloop :: Handle -> Handle -> IO ()

mainloop inh outh = 

    do ineof <- hIsEOF inh

       if ineof

           then return ()

           else do inpStr <- hGetLine inh

                   hPutStrLn outh (map toUpper inpStr)

                   mainloop inh outh
Like every Haskell program, execution of this program begins with main. Two files are opened: input.txt is opened for reading, and output.txt is opened for writing. Then we call
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extended Example: Functional I/O and Temporary Files
Inhaltsvorschau
Here’s a larger example that puts together some concepts from this chapter, from some earlier chapters, and a few you haven’t seen yet. Take a look at the program and see if you can figure out what it does and how it works:
-- file: ch07/tempfile.hs

import System.IO

import System.Directory(getTemporaryDirectory, removeFile)

import System.IO.Error(catch)

import Control.Exception(finally)



-- The main entry point.  Work with a temp file in myAction.

main :: IO ()

main = withTempFile "mytemp.txt" myAction



{- The guts of the program.  Called with the path and handle of a temporary

   file.  When this function exits, that file will be closed and deleted

   because myAction was called from withTempFile. -}

myAction :: FilePath -> Handle -> IO ()

myAction tempname temph = 

    do -- Start by displaying a greeting on the terminal

       putStrLn "Welcome to tempfile.hs"

       putStrLn $ "I have a temporary file at " ++ tempname



       -- Let's see what the initial position is

       pos <- hTell temph

       putStrLn $ "My initial position is " ++ show pos



       -- Now, write some data to the temporary file

       let tempdata = show [1..10]

       putStrLn $ "Writing one line containing " ++ 

                  show (length tempdata) ++ " bytes: " ++

                  tempdata

       hPutStrLn temph tempdata



       -- Get our new position.  This doesn't actually modify pos

       -- in memory, but makes the name "pos" correspond to a different 

       -- value for the remainder of the "do" block.

       pos <- hTell temph

       putStrLn $ "After writing, my new position is " ++ show pos



       -- Seek to the beginning of the file and display it

       putStrLn $ "The file content is: "

       hSeek temph AbsoluteSeek 0



       -- hGetContents performs a lazy read of the entire file

       c <- hGetContents temph



       -- Copy the file byte-for-byte to stdout, followed by \n

       putStrLn c



       -- Let's also display it as a Haskell literal

       putStrLn $ "Which could be expressed as this Haskell literal:"

       print c



{- This function takes two parameters: a filename pattern and another

   function.  It will create a temporary file, and pass the name and Handle

   of that file to the given function.



   The temporary file is created with openTempFile.  The directory is the one

   indicated by getTemporaryDirectory, or, if the system has no notion of

   a temporary directory, "." is used.  The given pattern is passed to

   openTempFile.



   After the given function terminates, even if it terminates due to an

   exception, the Handle is closed and the file is deleted. -}

withTempFile :: String -> (FilePath -> Handle -> IO a) -> IO a

withTempFile pattern func =

    do -- The library ref says that getTemporaryDirectory may raise on

       -- exception on systems that have no notion of a temporary directory.

       -- So, we run getTemporaryDirectory under catch.  catch takes

       -- two functions: one to run, and a different one to run if the

       -- first raised an exception.  If getTemporaryDirectory raised an

       -- exception, just use "." (the current working directory).

       tempdir <- catch (getTemporaryDirectory) (\_ -> return ".")

       (tempfile, temph) <- openTempFile tempdir pattern 



       -- Call (func tempfile temph) to perform the action on the temporary

       -- file.  finally takes two actions.  The first is the action to run.

       -- The second is an action to run after the first, regardless of

       -- whether the first action raised an exception.  This way, we ensure

       -- the temporary file is always deleted.  The return value from finally

       -- is the first action's return value.

       finally (func tempfile temph) 

               (do hClose temph

                   removeFile tempfile)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Lazy I/O
Inhaltsvorschau
So far in this chapter, you’ve seen examples of fairly traditional I/O. Each line, or block of data, is requested and processed individually.
Haskell has another approach available to you as well. Since Haskell is a lazy language, meaning that any given piece of data is only evaluated when its value must be known, there are some novel ways of approaching I/O.
One novel way to approach I/O is with the hGetContents function. hGetContents has the type Handle -> IO String. The String it returns represents all of the data in the file given by the Handle.
In a strictly evaluated language, using such a function is often a bad idea. It may be fine to read the entire contents of a 2 KB file, but if you try to read the entire contents of a 500 GB file, you are likely to crash due to lack of RAM to store all that data. In these languages, you would traditionally use mechanisms such as loops to process the file’s entire data.
But hGetContents is different. The String it returns is evaluated lazily. At the moment you call hGetContents, nothing is actually read. Data is only read from the Handle as the elements (characters) of the list are processed. As elements of the String are no longer used, Haskell’s garbage collector automatically frees that memory. All of this happens completely transparently to you. And since you have what looks like (and, really, is) a pure String, you can pass it to pure (non-IO) code.
Let’s take a quick look at an example. Back in , you saw an imperative program that converted the entire content of a file to uppercase. Its imperative algorithm was similar to what you’d see in many other languages. Here now is the much simpler algorithm that exploits lazy evaluation:
-- file: ch07/toupper-lazy1.hs

import System.IO

import Data.Char(toUpper)



main :: IO ()

main = do 

       inh <- openFile "input.txt" ReadMode

       outh <- openFile "output.txt" WriteMode

       inpStr <- hGetContents inh

       let result = processData inpStr

       hPutStr outh result

       hClose inh

       hClose outh



processData :: String -> String

processData = map toUpper
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The IO Monad
Inhaltsvorschau
You’ve seen a number of examples of I/O in Haskell by this point. Let’s take a moment to step back and think about how I/O relates to the broader Haskell language.
Since Haskell is a pure language, if you give a certain function a specific argument, the function will return the same result every time you give it that argument. Moreover, the function will not change anything about the program’s overall state.
You may be wondering, then, how I/O fits into this picture. Surely if you want to read a line of input from the keyboard, the function to read input can’t possibly return the same result every time it is run, right? Moreover, I/O is all about changing state. I/O could cause pixels on a terminal to light up, cause paper to start coming out of a printer, or even to cause a package to be shipped from a warehouse on a different continent. doesn’t just change the state of a program. You can think of I/O as changing the state of the world.
Most languages do not make a distinction between a pure function and an impure one. Haskell has functions in the mathematical sense: they are purely computations that cannot be altered by anything external. Moreover, the computation can be performed at any time—or even never, if its result is never needed.
Clearly, then, we need some other tool to work with I/O. That tool in Haskell is called actions. Actions resemble functions. They do nothing when they are defined, but perform some task when they are invoked. I/O actions are defined within the IO monad. Monads are a powerful way of chaining functions together purely and are covered in . It’s not necessary to understand monads in order to understand I/O. Just understand that the result type of actions is “tagged” with IO. Let’s take a look at some types:
:type putStrLn

putStrLn :: String -> IO ()

:type getLine

getLine :: IO String
The type of putStrLn is just like any other function. The function takes one parameter and returns an IO (). This IO () is the action. You can store and pass actions in pure code if you wish, though this isn’t frequently done. An action doesn’t do anything until it is invoked. Let’s look at an example of this:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Is Haskell Really Imperative?
Inhaltsvorschau
These do blocks may look a lot like an imperative language. After all, you’re giving commands to run in sequence most of the time.
But Haskell remains a lazy language at its core. While it is sometimes necessary to sequence actions for I/O, this is done using tools that are part of Haskell already. Haskell achieves a nice separation of I/O from the rest of the language through the IO monad as well.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Side Effects with Lazy I/O
Inhaltsvorschau
Earlier in this chapter, you read about hGetContents. We explained that the String it returns can be used in pure code.
We need to get a bit more specific about what side effects are. When we say Haskell has no side effects, what exactly does that mean?
At a certain level, side effects are always possible. A poorly written loop, even if written in pure code, could cause the system’s RAM to be exhausted and the machine to crash. Or it could cause data to be swapped to disk.
When we speak of no side effects, we mean that pure code in Haskell can’t run commands that trigger side effects. Pure functions can’t modify a global variable, request I/O, or run a command to take down a system.
When you have a String from hGetContents that is passed to a pure function, the function has no idea that this String is backed by a disk file. It will behave just as it always would, but processing that String may cause the environment to issue I/O commands. The pure function isn’t issuing them; they are happening as a result of the processing the pure function is doing, just as with the example of swapping RAM to disk.
In some cases, you may need more control over exactly when your I/O occurs. Perhaps you are reading data interactively from the user, or via a pipe from another program, and need to communicate directly with the user. In those cases, hGetContents will probably not be appropriate.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Buffering
Inhaltsvorschau
The I/O subsystem is one of the slowest parts of a modern computer. Completing a write to disk can take thousands of times as long as a write to memory. A write over the network can be hundreds or thousands of times slower yet. Even if your operation doesn’t directly communicate with the disk—perhaps because the data is cached— still involves a system call, which slows things down by itself.
For this reason, modern operating systems and programming languages both provide tools to help programs perform better where I/O is concerned. The operating system typically performs caching—storing frequently used pieces of data in memory for faster access.
Programming languages typically perform buffering. This means that they may request one large chunk of data from the operating system, even if the code underneath is processing data one character at a time. By doing this, they can achieve remarkable performance gains because each request for I/O to the operating system carries a processing cost. Buffering allows us to read the same amount of data with far fewer I/O requests.
Haskell, too, provides buffering in its I/O system. In many cases, it is even on by default. Up until now, we have pretended it isn’t there. Haskell usually is good about picking a good default buffering mode, but it is rarely the fastest. If you have speed-critical code, changing buffering could have a significant impact on your program.
There are three different buffering modes in Haskell. They are defined as the BufferMode type: NoBuffering, LineBuffering, and BlockBuffering.
NoBuffering does just what it sounds like—no buffering. Data read via functions like hGetLine will be read from the OS one character at a time. Data written will be written immediately, and also often will be written one character at a time. For this reason, NoBuffering is usually a very poor performer and not suitable for general-purpose use.
LineBuffering causes the output buffer to be written whenever the newline character is output, or whenever it gets too large. On input, it will usually attempt to read whatever data is available in chunks until it first sees the newline character. When reading from the terminal, it should return data immediately after each press of Enter. It is often a reasonable default.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading Command-Line Arguments
Inhaltsvorschau
Many command-line programs are interested in the parameters passed on the command line. System.Environment.getArgs returns IO [String] listing each argument. This is the same as argv in C, starting with argv[1]. The program name (argv[0] in C) is available from System.Environment.getProgName.
The System.Console.GetOpt module provides some tools for parsing command-line options. If you have a program with complex options, you may find it useful. You can find an example of its use in .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Environment Variables
Inhaltsvorschau
If you need to read environment variables, you can use one of two functions in System.Environment: getEnv or getEnvironment. getEnv looks for a specific variable and raises an exception if it doesn’t exist. getEnvironment returns the whole environment as a [(String, String)], and then you can use functions such as lookup to find the environment entry you want.
Setting environment variables is not defined in a cross-platform way in Haskell. If you are on a POSIX platform such as Linux, you can use putEnv or setEnv from the System.Posix.Env module. Environment setting is not defined for Windows.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 8: Efficient File Processing, Regular Expressions, and Filename Matching
Inhaltsvorschau
This simple microbenchmark reads a text file full of numbers and prints their sum:
-- file: ch08/SumFile.hs

main = do

    contents <- getContents

    print (sumFile contents)

  where sumFile = sum . map read . words
Although the String type is the default used for reading and writing files, it is not efficient, so a simple program like this will perform badly.
A String is represented as a list of Char values; each element of a list is allocated and has some bookkeeping overhead. These factors affect the memory consumption and performance of a program that must read or write text or binary data. On simple benchmarks like this, even programs written in interpreted languages such as Python can outperform Haskell code that uses String by an order of magnitude.
The library provides a fast, cheap alternative to the String type. Code written with can often match or exceed the performance and memory footprint of C, while maintaining Haskell’s expressivity and conciseness.
The library supplies two modules—each defines functions that are nearly drop-in for their String counterparts:
Defines a strict type named ByteString. This represents a string of binary or text data in a single array.
Provides a lazy type, also named ByteString. This represents a string of data as a list of chunks, arrays of up to 64 KB in size.
Each ByteString type performs better under particular circumstances. For streaming a large quantity (hundreds of megabytes to terabytes) of data, the lazy ByteString type is usually best. Its chunk size is tuned to be friendly to a modern CPU’s L1 cache, and a garbage collector can quickly discard chunks of streamed data that are no longer being used.
The strict ByteString type performs best for applications that are less concerned with memory footprint or that need to access data randomly.
Let’s develop a small function to illustrate some of the API. We will determine if a file is an ELF object file—this is the format used for executables on almost all modern Unix-like systems.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Efficient File Processing
Inhaltsvorschau
This simple microbenchmark reads a text file full of numbers and prints their sum:
-- file: ch08/SumFile.hs

main = do

    contents <- getContents

    print (sumFile contents)

  where sumFile = sum . map read . words
Although the String type is the default used for reading and writing files, it is not efficient, so a simple program like this will perform badly.
A String is represented as a list of Char values; each element of a list is allocated and has some bookkeeping overhead. These factors affect the memory consumption and performance of a program that must read or write text or binary data. On simple benchmarks like this, even programs written in interpreted languages such as Python can outperform Haskell code that uses String by an order of magnitude.
The library provides a fast, cheap alternative to the String type. Code written with can often match or exceed the performance and memory footprint of C, while maintaining Haskell’s expressivity and conciseness.
The library supplies two modules—each defines functions that are nearly drop-in for their String counterparts:
Defines a strict type named ByteString. This represents a string of binary or text data in a single array.
Provides a lazy type, also named ByteString. This represents a string of data as a list of chunks, arrays of up to 64 KB in size.
Each ByteString type performs better under particular circumstances. For streaming a large quantity (hundreds of megabytes to terabytes) of data, the lazy ByteString type is usually best. Its chunk size is tuned to be friendly to a modern CPU’s L1 cache, and a garbage collector can quickly discard chunks of streamed data that are no longer being used.
The strict ByteString type performs best for applications that are less concerned with memory footprint or that need to access data randomly.
Let’s develop a small function to illustrate some of the API. We will determine if a file is an ELF object file—this is the format used for executables on almost all modern Unix-like systems.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Filename Matching
Inhaltsvorschau
Many systems-oriented programming languages provide library routines that let us match a filename against a pattern, or that will give a list of files that match the pattern. In other languages, this function is often named fnmatch.) Although Haskell’s standard library generally has good systems programming facilities, it doesn’t provide these kinds of pattern matching functions. We’ll take this as an opportunity to develop our own.
The kinds of patterns we’ll be dealing with are commonly referred to as glob patterns (the term we’ll use), wild card patterns, or shell-style patterns. They have just a few simple rules. You probably already know them, but we’ll quickly recap here:
  • Matching a string against a pattern starts at the beginning of the string, and finishes at the end.
  • Most literal characters match themselves. For example, the text foo in a pattern will match foo, and only foo, in an input string.
  • The * (asterisk) character means "match anything"; it will match any text, including the empty string. For instance, the pattern will match any string that begins with , such as itself, , or . The pattern will match any string that begins with and ends in , such as .
  • The ? (question mark) character matches any single character. The pattern will match names like or .
  • A [ (open square bracket) character begins a character class, which is ended by a ]. Its meaning is "match any character in this class". A character class can be negated by following the opening [ with a !, so that it means "match any character not in this class".
    As a shorthand, a character followed by a - (dash), followed by another character, denotes a range: “match any character within this set.”
    Character classes have an added subtlety; they can’t be empty. The first character after the opening [ or [! is part of the class, so we can write a class containing the ] character as
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Regular Expressions in Haskell
Inhaltsvorschau
In this section, we assume that you are already familiar with regular expressions by way of some other language, such as Python, Perl, or Java.
For brevity, we will abbreviate "regular expression" as regexp from here on.
Rather than introduce regexps as something new, we will focus on what’s different about regexp handling in Haskell, compared to other languages. Haskell’s regular expression matching libraries are a lot more expressive than those of other languages, so there’s plenty to talk about.
To begin our exploration of the regexp libraries, the only module we’ll need to work with is Text.Regex.Posix. As usual, the most convenient way to explore this module is by interacting with it via ghci:
        

        :module +Text.Regex.Posix
The only function that we’re likely to need for normal use is the regexp matching function, an infix operator named (=~) (borrowed from Perl). The first hurdle to overcome is that Haskell’s regexp libraries make heavy use of polymorphism. As a result, the type signature of the (=~) operator is difficult to understand, so we will not explain it here.
The =~ operator uses typeclasses for both of its arguments and also for its return type. The first argument (on the left of the =~) is the text to match; the second (on the right) is the regular expression to match against. We can pass either a String or a ByteString as argument.
The =~ operator is polymorphic in its return type, so the Haskell compiler needs some way to know what type of result we would like. In real code, it may be able to infer the right type, due to the way we subsequently use the result. But such cues are often lacking when we’re exploring with ghci. If we omit a specific type for the result, we’ll get an error from the interpreter, as it does not have enough information to successfuly infer the result type.
When ghci can’t infer the target type, we tell it what we’d like the type to be. If we want a result of type Bool, we’ll get a pass/fail answer:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
More About Regular Expressions
Inhaltsvorschau
As we noted earlier, the =~ operator uses typeclasses for its argument types and its return type. We can use either String or strict ByteString values for both the regular expression and the text to match against:
:module +Data.ByteString.Char8:type pack "foo"

pack "foo" :: ByteString
We can then try using different combinations of String and ByteString:
pack "foo" =~ "bar" :: Bool

False

"foo" =~ pack "bar" :: Int

0

pack "foo" =~ pack "o" :: [(Int, Int)]

[(1,1),(2,1)]
However, we need to be aware that if we want a string value in the result of a match, the text we’re matching against must be the same type of string. Let’s see what this means in practice:
pack "good food" =~ ".ood" :: [ByteString]

["good","food"]

In the above example, we’ve used the pack to turn a String into a ByteString. The type checker accepts this because ByteString appears in the result type. But if we try getting a String out, that won’t work:
"good food" =~ ".ood" :: [ByteString]



<interactive>:1:0:

    No instance for (Text.Regex.Base.RegexLike.RegexContext

                       Regex [Char] [ByteString])

      arising from a use of `=~' at <interactive>:1:0-20

    Possible fix:

      add an instance declaration for

      (Text.Regex.Base.RegexLike.RegexContext Regex [Char] [ByteString])

    In the expression: "good food" =~ ".ood" :: [ByteString]

    In the definition of `it':

        it = "good food" =~ ".ood" :: [ByteString]

We can easily fix this problem by making the string types of the lefthand side and the result match once again:
"good food" =~ ".ood" :: [String]

["good","food"]

This restriction does not apply to the type of the regexp we’re matching against. It can be either a String or ByteString, unconstrained by the other types in use.
When you look through Haskell library documentation, you’ll see several regexp- modules. The modules under Text.Regex.Base define the common API adhered to by all of the other regexp modules. It’s possible to have multiple implementations of the regexp API installed at one time. At the time of this writing,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Translating a glob Pattern into a Regular Expression
Inhaltsvorschau
Now that we’ve seen the myriad of ways to match text against regular expressions, let’s turn our attention back to glob patterns. We want to write a function that will take a glob pattern and return its representation as a regular expression. Both glob patterns and regexps are text strings, so the type that our function ought to have seems clear:
-- file: ch08/GlobRegex.hs

module GlobRegex

    (

      globToRegex

    , matchesGlob

    ) where



import Text.Regex.Posix ((=~))



globToRegex :: String -> String
The regular expression that we generate must be anchored so that it starts matching from the beginning of a string and finishes at the end:
-- file: ch08/GlobRegex.hs

globToRegex cs = '^' : globToRegex' cs ++ "$"
Recall that the String is just a synonym for [Char], a list of Chars. The : operator puts a value (the ^ character in this case) onto the front of a list, where the list is the value returned by the yet-to-be-seen globToRegex' function.
Haskell does not require that a value or function be declared or defined in a source file before it’s used. It’s perfectly normal for a definition to come after the first place it’s used. The Haskell compiler doesn’t care about ordering at this level. This grants us the flexibility to structure our code in the manner that makes most logical sense to us, rather than follow an order that makes the compiler writer’s life easiest.
Haskell module writers often use this flexibility to put "more important" code earlier in a source file, relegating "plumbing" to later. This is exactly how we are presenting the globToRegex function and its helpers here.
With the regular expression rooted, the globToRegex' function will do the bulk of the translation work. We’ll use the convenience of Haskell’s pattern matching to enumerate each of the cases we’ll need to cover:
-- file: ch08/GlobRegex.hs

globToRegex' :: String -> String

globToRegex' "" = ""



globToRegex' ('*':cs) = ".*" ++ globToRegex' cs



globToRegex' ('?':cs) = '.' : globToRegex' cs



globToRegex' ('[':'!':c:cs) = "[^" ++ c : charClass cs

globToRegex' ('[':c:cs)     = '['  :  c : charClass cs

globToRegex' ('[':_)        = error "unterminated character class"



globToRegex' (c:cs) = escape c ++ globToRegex' cs
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
An important Aside: Writing Lazy Functions
Inhaltsvorschau
In an imperative language, the globToRegex' function is one that we’d usually express as a loop. For example, Python’s standard module includes a function named translate that does exactly the same job as our globToRegex function. It’s written as a loop.
If you’ve been exposed to functional programming through a language such as Scheme or ML, you’ve probably had drilled into your head the notion that “the way to emulate a loop is via tail recursion.”
Looking at the globToRegex' function, we can see that it is not tail recursive. To see why, examine its final clause again (several of its other clauses are structured similarly):
-- file: ch08/GlobRegex.hs

globToRegex' (c:cs) = escape c ++ globToRegex' cs
It applies itself recursively, and the result of the recursive application is used as a parameter to the (++) function. Since the recursive application isn’t the last thing the function does, globToRegex' is not tail recursive.
Why is our definition of this function not tail recursive? The answer lies with Haskell’s nonstrict evaluation strategy. Before we start talking about that, let’s quickly talk about why, in a traditional language, we’d try to avoid this kind of recursive definition. Here is a simpler definition of the (++) operator. It is recursive, but not tail recursive:
-- file: ch08/append.hs

(++) :: [a] -> [a] -> [a]



(x:xs) ++ ys = x : (xs ++ ys)

[]     ++ ys = ys
In a strict language, if we evaluate , the entire list is constructed, and then returned. Non-strict evaluation defers much of the work until it is needed.
If we demand an element of the expression , the first pattern of the function’s definition matches, and we return the expression . Because the constructor is nonstrict, the evaluation of can be deferred: we generate more elements of the result at whatever rate they are demanded. When we generate more of the result, we will no longer be using x, so the garbage collector can reclaim it. Since we generate elements of the result on demand, and do not hold onto parts that we are done with, the compiler can evaluate our code in constant space.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Making Use of Our Pattern Matcher
Inhaltsvorschau
It’s all very well to have a function that can match glob patterns, but we’d like to be able to put this to practical use. On Unix-like systems, the glob function returns the names of all files and directories that match a given glob pattern. Let’s build a similar function in Haskell. Following the Haskell norm of descriptive naming, we’ll call our function namesMatching:
-- file: ch08/Glob.hs

module Glob (namesMatching) where
We specify that namesMatching is the only name that users of our module will be able to see.
This function will obviously have to manipulate filesystem paths a lot, splicing and joining them as it goes. We’ll need to use a few previously unfamiliar modules along the way.
The System.Directory module provides standard functions for working with directories and their contents:
-- file: ch08/Glob.hs

import System.Directory (doesDirectoryExist, doesFileExist,

                         getCurrentDirectory, getDirectoryContents)
The System.FilePath module abstracts the details of an operating system’s path name conventions. The (</>) function joins two path components:
:m +System.FilePath"foo" </> "bar"

Loading package filepath-1.1.0.0 ... linking ... done.

"foo/bar"
The name of the dropTrailingPathSeparator function is perfectly descriptive:
dropTrailingPathSeparator "foo/"

"foo"

The splitFileName function splits a path at the last slash:
splitFileName "foo/bar/Quux.hs"

("foo/bar/","Quux.hs")

splitFileName "zippity"

("","zippity")
Using together with the System.Directory module, we can write a portable namesMatching function that will run on both Unix-like and Windows systems:
-- file: ch08/Glob.hs

import System.FilePath (dropTrailingPathSeparator, splitFileName, (</>))
In this module, we’ll be emulating a "for" loop; getting our first taste of exception handling in Haskell; and of course using the matchesGlob function we just wrote:
-- file: ch08/Glob.hs

import Control.Exception (handle)

import Control.Monad (forM)

import GlobRegex (matchesGlob)
Since directories and files live in the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Handling Errors Through API Design
Inhaltsvorschau
It’s not necessarily a disaster if our globToRegex is passed a malformed pattern. Perhaps a user mistyped a pattern, in which case, we’d like to be able to report a meaningful error message.
Calling the error function when this kind of problem occurs can be a drastic response (exploring its consequences was the focus of ). The error throws an exception. Pure Haskell code cannot deal with exceptions, so control is going to rocket out of our pure code into the nearest caller that lives in IO and has an appropriate exception handler installed. If no such handler is installed, the Haskell runtime will default to terminating our program (or print a nasty error message, in ghci).
So calling error is a little like pulling the handle of a fighter plane’s ejection seat. We’re bailing out of a catastrophic situation that we can’t deal with gracefully, and there’s likely to be a lot of flaming wreckage strewn about by the time we hit the ground.
We’ve established that error is for disasters, but we’re still using it in globToRegex. In that case, malformed input should be rejected, but not turned into a big deal. What would be a better way to handle this?
Haskell’s type system and libraries to the rescue! We can encode the possibility of failure in the type signature of globToRegex using the predefined Either type:
-- file: ch08/GlobRegexEither.hs

type GlobError = String



globToRegex :: String -> Either GlobError String
A value returned by globToRegex will now be either Left "an error message" or Right "a valid regexp". This return type forces our callers to deal with the possibility of error. (You’ll find that this use of the Either type occurs frequently in Haskell code.)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Putting Our Code to Work
Inhaltsvorschau
The namesMatching function isn’t very exciting by itself, but it’s a useful building block. Combine it with a few more functions, and we can start to do interesting things.
Here’s one such example. Let’s define a renameWith function that, instead of simply renaming a file, applies a function to the file’s name, and renames the file to whatever that function returns:
-- file: ch08/Useful.hs

import System.FilePath (replaceExtension)

import System.Directory (doesFileExist, renameDirectory, renameFile)

import Glob (namesMatching)



renameWith :: (FilePath -> FilePath)

           -> FilePath

           -> IO FilePath



renameWith f path = do

    let path' = f path

    rename path path'

    return path'
Once again, we work around the ungainly file/directory split in System.Directory with a helper function:
-- file: ch08/Useful.hs

rename :: FilePath -> FilePath -> IO ()



rename old new = do

    isFile <- doesFileExist old

    let f = if isFile then renameFile else renameDirectory

    f old new
The System.FilePath module provides many useful functions for manipulating filenames. These functions mesh nicely with our renameWith and namesMatching functions, so that we can quickly use them to create functions with complex behavior. As an example, this terse function changes the filename suffixing convention for C++ source files:
-- file: ch08/Useful.hs

cc2cpp =

  mapM (renameWith (flip replaceExtension ".cpp")) =<< namesMatching "*.cc"
The cc2cpp function uses a few functions we’ll see over and over. The flip function takes another function as argument and swaps the order of its arguments (inspect the type of replaceExtension in ghci to see why). The =<< function feeds the result of the action on its right side to the action on its left.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 9: I/O Case Study: A Library for Searching the Filesystem
Inhaltsvorschau
The problem of "I know I have this file, but I don’t know where it is" has been around for as long as computers have had hierarchical filesystems. The fifth edition of Unix introduced the find command in 1974; it remains indispensable today. The state of the art has come a long way: modern operating systems ship with advanced document indexing and search capabilities.
There’s still a valuable place for find-like capability in the programmer’s toolbox. In this chapter, we’ll develop a library that gives us many of find’s capabilities, without leaving Haskell. We’ll explore several different approaches to writing this library, each with different strengths.
If you don’t use a Unix-like operating system, or you’re not a heavy shell user, it’s quite possible you may not have heard of find. Given a list of directories, it searches each one recursively and prints the name of every entry that matches an expression.
Individual expressions can take such forms as “name matches this glob pattern,” “entry is a plain file,” “last modified before this date,” and many more. They can be stitched together into more complex expressions using "and" and "or" operators.
Before we plunge into designing our library, let’s solve a few smaller issues. Our first problem is to recursively list the contents of a directory and its subdirectories:
-- file: ch09/RecursiveContents.hs

module RecursiveContents (getRecursiveContents) where



import Control.Monad (forM)

import System.Directory (doesDirectoryExist, getDirectoryContents)

import System.FilePath ((</>))



getRecursiveContents :: FilePath -> IO [FilePath]



getRecursiveContents topdir = do

  names <- getDirectoryContents topdir

  let properNames = filter (`notElem` [".", ".."]) names

  paths <- forM properNames $ \name -> do

    let path = topdir </> name

    isDirectory <- doesDirectoryExist path

    if isDirectory

      then getRecursiveContents path

      else return [path]

  return (concat paths)
The filter expression ensures that a listing for a single directory won’t contain the special directory names
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The find Command
Inhaltsvorschau
If you don’t use a Unix-like operating system, or you’re not a heavy shell user, it’s quite possible you may not have heard of find. Given a list of directories, it searches each one recursively and prints the name of every entry that matches an expression.
Individual expressions can take such forms as “name matches this glob pattern,” “entry is a plain file,” “last modified before this date,” and many more. They can be stitched together into more complex expressions using "and" and "or" operators.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Starting Simple: Recursively Listing a Directory
Inhaltsvorschau
Before we plunge into designing our library, let’s solve a few smaller issues. Our first problem is to recursively list the contents of a directory and its subdirectories:
-- file: ch09/RecursiveContents.hs

module RecursiveContents (getRecursiveContents) where



import Control.Monad (forM)

import System.Directory (doesDirectoryExist, getDirectoryContents)

import System.FilePath ((</>))



getRecursiveContents :: FilePath -> IO [FilePath]



getRecursiveContents topdir = do

  names <- getDirectoryContents topdir

  let properNames = filter (`notElem` [".", ".."]) names

  paths <- forM properNames $ \name -> do

    let path = topdir </> name

    isDirectory <- doesDirectoryExist path

    if isDirectory

      then getRecursiveContents path

      else return [path]

  return (concat paths)
The filter expression ensures that a listing for a single directory won’t contain the special directory names . or .., which refer to the current and parent directory, respectively. If we forgot to filter these out, we’d recurse endlessly.
We encountered forM in the previous chapter; it is mapM with its arguments flipped:
:m +Control.Monad:type mapM

mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]

:type forM

forM :: (Monad m) => [a] -> (a -> m b) -> m [b]
The body of the loop checks to see whether the current entry is a directory. If it is, it recursively calls getRecursiveContents to list that directory. Otherwise, it returns a -element list that is the name of the current entry. (Don’t forget that the return function has a unique meaning in Haskell: it wraps a value with the monad’s type constructor.)
Another thing worth pointing out is the use of the variable isDirectory. In an imperative language such as Python, we’d normally write . However, the doesDirectoryExist function is an action; its return type is IO Bool, not Bool. Since an if expression requires an expression of type Bool, we have to use to get the Bool result of the action out of its IO wrapper so that we can use the plain, unwrapped Bool in the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A Naive Finding Function
Inhaltsvorschau
We can use our getRecursiveContents function as the basis for a simple-minded file finder:
-- file: ch09/SimpleFinder.hs

import RecursiveContents (getRecursiveContents)



simpleFind :: (FilePath -> Bool) -> FilePath -> IO [FilePath]



simpleFind p path = do

  names <- getRecursiveContents path

  return (filter p names)
This function takes a predicate that we use to filter the names returned by getRecursiveContents. Each name passed to the predicate is a complete path, so how can we perform a common operation such as "find all files ending in the extension .c"?
The module contains numerous invaluable functions that help us to manipulate filenames. In this case, we want takeExtension:
:m +System.FilePath:type takeExtension

takeExtension :: FilePath -> String

takeExtension "foo/bar.c"

Loading package filepath-1.1.0.0 ... linking ... done.

".c"

takeExtension "quux"

""
This gives us a simple matter of writing a function that takes a path, extracts its extension, and compares it with .c:
:load SimpleFinder

[1 of 2] Compiling RecursiveContents ( RecursiveContents.hs, interpreted )

[2 of 2] Compiling Main             ( SimpleFinder.hs, interpreted )

Ok, modules loaded: RecursiveContents, Main.

:type simpleFind (\p -> takeExtension p == ".c")

simpleFind (\p -> takeExtension p == ".c") :: FilePath -> IO [FilePath]
While simpleFind works, it has a few glaring problems. The first is that the predicate is not very expressive. It can only look at the name of a directory entry; it cannot, for example, find out whether it’s a file or a directory. This means that our attempt to use simpleFind will list directories ending in .c as well as files with the same extension.
The second problem is that simpleFind gives us no control over how it traverses the filesystem. To see why this is significant, consider the problem of searching for a source file in a tree managed by the Subversion revision control system. Subversion maintains a private .svn directory in every directory that it manages; each one contains many subdirectories and files that are of no interest to us. While we can easily filter out any path containing
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Predicates: From Poverty to Riches, While Remaining Pure
Inhaltsvorschau
Our predicates can only look at filenames. This excludes a wide variety of interesting behaviors—for instance, what if we’d like to list files greater than a given size?
An easy reaction to this is to reach for IO: instead of our predicate being of type FilePath -> Bool, why don’t we change it to FilePath -> IO Bool? This would let us perform arbitrary I/O as part of our predicate. As appealing as this might seem, it’s also potentially a problem: such a predicate could have arbitrary side effects, since a function with return type IO a can have whatever side effects it pleases.
Let’s enlist the type system in our quest to write more predictable, less buggy code; we’ll keep predicates pure by avoiding the taint of “IO.” This will ensure that they can’t have any nasty side effects. We’ll feed them more information, too, so that they can gain the expressiveness we want without also becoming potentially dangerous.
Haskell’s portable module provides a useful, albeit limited, set of file metadata:
        

        :m +System.Directory
We can use doesFileExist and doesDirectoryExist to determine whether a directory entry is a file or a directory. There are not yet portable ways to query for other file types that have become widely available in recent years, such as named pipes, hard links, and symbolic links:
:type doesFileExist

doesFileExist :: FilePath -> IO Bool

doesFileExist "."

Loading package old-locale-1.0.0.0 ... linking ... done.

Loading package old-time-1.0.0.0 ... linking ... done.

Loading package directory-1.0.0.1 ... linking ... done.

False

:type doesDirectoryExist

doesDirectoryExist :: FilePath -> IO Bool

doesDirectoryExist "."

True
The getPermissions function lets us find out whether certain operations on a file or directory are allowed:
:type getPermissions

getPermissions :: FilePath -> IO Permissions

:info Permissions

data Permissions

  = Permissions {readable :: Bool,

                 writable :: Bool,

                 executable :: Bool,

                 searchable :: Bool}

  	-- Defined in System.Directory

instance Eq Permissions -- Defined in System.Directory

instance Ord Permissions -- Defined in System.Directory

instance Read Permissions -- Defined in System.Directory

instance Show Permissions -- Defined in System.Directory

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Sizing a File Safely
Inhaltsvorschau
Although doesn’t let us find out how large a file is, we can use the similarly portable module to do this. It contains a function named hFileSize, which returns the size in bytes of an open file. Here’s a simple function that wraps it:
-- file: ch09/BetterPredicate.hs

simpleFileSize :: FilePath -> IO Integer



simpleFileSize path = do

  h <- openFile path ReadMode

  size <- hFileSize h

  hClose h

  return size
While this function works, it’s not yet suitable for us to use. In betterFind, we call getFileSize unconditionally on any directory entry; it should return if an entry is not a plain file, or it returns the size wrapped by otherwise. This function instead throws an exception if an entry is not a plain file or could not be opened (perhaps due to insufficient permissions), and returns the size unwrapped.
Here’s a safer version of this function:
-- file: ch09/BetterPredicate.hs

saferFileSize :: FilePath -> IO (Maybe Integer)



saferFileSize path = handle (\_ -> return Nothing) $ do

  h <- openFile path ReadMode

  size <- hFileSize h

  hClose h

  return (Just size)
The body of the function is almost identical, save for the handle clause.
Our exception handler ignores the exception it’s passed and returns . The only change to the body that follows is that it wraps the file size with Just.
The saferFileSize function now has the correct type signature, and it won’t throw any exceptions. But it’s still not completely well behaved. There are directory entries on which openFile will succeed, but hFileSize will throw an exception. This can happen with, for example, named pipes. Such an exception will be caught by handle, but our call to hClose will never occur.
A Haskell implementation will automatically close the file handle when it notices that the handle is no longer being used. That will not occur until the garbage collector runs, and the delay until the next garbage collection pass is not predictable.
File handles are scarce resources, enforced by the underlying operating system. On Linux, for example, a process is by default allowed to have only 1,024 files open .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A Domain-Specific Language for Predicates
Inhaltsvorschau
Let’s take a stab at writing a predicate that will check for a C++ source file that is over 128 KB in size:
-- file: ch09/BetterPredicate.hs

myTest path _ (Just size) _ =

    takeExtension path == ".cpp" && size > 131072

myTest _ _ _ _ = False
This isn’t especially pleasing. The predicate takes four arguments, always ignores two of them, and requires two equations to define. Surely we can do better. Let’s create some code that will help us write more concise predicates.
Sometimes, this kind of library is referred to as an embedded domain-specific language: we use our programming language’s native facilities (hence embedded) to write code that lets us solve some narrow problem (hence domain-specific) particularly .
Our first step is to write a function that returns one of its arguments. This one extracts the path from the arguments passed to a Predicate:
-- file: ch09/BetterPredicate.hs

pathP path _ _ _ = path
If we don’t provide a type signature, a Haskell implementation will infer a very general type for this function. This can later lead to error messages that are difficult to interpret, so let’s give pathP a type:
-- file: ch09/BetterPredicate.hs

type InfoP a =  FilePath        -- path to directory entry

             -> Permissions     -- permissions

             -> Maybe Integer   -- file size (Nothing if not file)

             -> ClockTime       -- last modified

             -> a



pathP :: InfoP FilePath
We’ve created a type synonym that we can use as shorthand for writing other, similarly structured functions. Our type synonym accepts a type parameter so that we can specify different result types:
-- file: ch09/BetterPredicate.hs

sizeP :: InfoP Integer

sizeP _ _ (Just size) _ = size

sizeP _ _ Nothing     _ = -1
(We’re being a little sneaky here and returning a size of –1 for entries that are not files or that we couldn’t open.)
In fact, a quick glance shows that the Predicate type that we defined near the beginning of this chapter is the same type as InfoP Bool. (We could thus legitimately get rid of the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Controlling Traversal
Inhaltsvorschau
When traversing the filesystem, we’d like to give ourselves more control over which directories we enter, and when. An easy way in which we can allow this is to pass in a function that takes a list of subdirectories of a given directory and returns another list. This list can have elements removed, or it can be ordered differently than the original list, or both. The simplest such control function is id, which will return its input list unmodified.
For variety, we’re going to change a few aspects of our representation here. Instead of the elaborate function type InfoP a, we’ll use a normal algebraic data type to substantially represent the same information:
-- file: ch09/ControlledVisit.hs

data Info = Info {

      infoPath :: FilePath

    , infoPerms :: Maybe Permissions

    , infoSize :: Maybe Integer

    , infoModTime :: Maybe ClockTime

    } deriving (Eq, Ord, Show)



getInfo :: FilePath -> IO Info
We’re using record syntax to give ourselves "free" accessor functions, such as infoPath. The type of our traverse function is simple, as we just proposed. To obtain Info about a file or directory, we call the getInfo action:
-- file: ch09/ControlledVisit.hs

traverse :: ([Info] -> [Info]) -> FilePath -> IO [Info]
The definition of traverse is short, but dense:
-- file: ch09/ControlledVisit.hs

traverse order path = do

    names <- getUsefulContents path

    contents <- mapM getInfo (path : map (path </>) names)

    liftM concat $ forM (order contents) $ \info -> do

      if isDirectory info && infoPath info /= path

        then traverse order (infoPath info)

        else return [info]



getUsefulContents :: FilePath -> IO [String]

getUsefulContents path = do

    names <- getDirectoryContents path

    return (filter (`notElem` [".", ".."]) names)



isDirectory :: Info -> Bool

isDirectory = maybe False searchable . infoPerms
While we’re not introducing any new techniques here, this is one of the densest function definitions we’ve yet encountered. Let’s walk through it almost line by line, explaining what is going on.
The first couple of lines hold no mystery, as they’re almost verbatim copies of code we’ve already seen. Things begin to get interesting when we assign to the variable
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Density, Readability, and the Learning Process
Inhaltsvorschau
Code as dense as traverse is not unusual in Haskell. The gain in expressiveness is significant, and it requires a relatively small amount of practice to be able to fluently read and write code in this style.
For comparison, here’s a less dense presentation of the same code (this might be more typical of a less experienced Haskell programmer):
-- file: ch09/ControlledVisit.hs

traverseVerbose order path = do

    names <- getDirectoryContents path

    let usefulNames = filter (`notElem` [".", ".."]) names

    contents <- mapM getEntryName ("" : usefulNames)

    recursiveContents <- mapM recurse (order contents)

    return (concat recursiveContents)

  where getEntryName name = getInfo (path </> name)

        isDirectory info = case infoPerms info of

                             Nothing -> False

                             Just perms -> searchable perms

        recurse info = do

            if isDirectory info && infoPath info /= path

                then traverseVerbose order (infoPath info)

                else return [info]
All we’ve done here is make a few substitutions. Instead of liberally using partial and function composition, we’ve defined some local functions in a where block. In place of the maybe combinator, we’re using a case expression. And instead of using liftM, we’re manually lifting concat ourselves.
This is not to say that density is a uniformly good property. Each line of the original traverse function is short. We introduce a local variable (usefulNames) and a local function (isDirectory) specifically to keep the lines short and the code clearer. Our names are descriptive. While we use function composition and pipelining, the longest pipeline contains only three elements.
The key to writing maintainable Haskell code is to find a balance between density and readability. Where your code falls on this continuum is likely to be influenced by your level of experience, as detailed here:
  • As a beginning Haskell programmer, Andrew doesn’t know his way around the standard libraries very well. As a result, he unwittingly duplicates a lot of existing code.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Another Way of Looking at Traversal
Inhaltsvorschau
While the traverse function gives us more control than our original betterFind function, it still has a significant failing: we can avoid recursing into directories, but we can’t filter other names until after we’ve generated the entire list of names in a tree. If we are traversing a directory containing 100,000 files of which we care about only 3, we’ll allocate a 100,000-element list before we have a chance to trim it down to the 3 we really want.
One approach would be to provide a filter function as a new argument to traverse, which we would apply to the list of names as we generate it. This would allow us to allocate a list of only as many elements as we need.
However, this approach also has a weakness. Say we know that we want at most 3 entries from our list, and that those 3 entries happen to be the first 3 of the 100,000 that we traverse. In this case, we’ll needlessly visit 99,997 other entries. This is not by any means a contrived example: for instance, the Maildir mailbox format stores a folder of email messages as a directory of individual files. It’s common for a single directory representing a mailbox to contain tens of thousands of files.
We can address the weaknesses of our two prior traversal functions by taking a different perspective: what if we think of filesystem traversal as a fold over the directory ?
The familiar folds, foldr and foldl', neatly generalize the idea of traversing a list while accumulating a result. It’s hardly a stretch to extend the idea of folding from lists to directory trees, but we’d like to add an element of control to our fold. We’ll represent this control as an algebraic data type:
-- file: ch09/FoldDir.hs

data Iterate seed = Done     { unwrap :: seed }

                  | Skip     { unwrap :: seed }

                  | Continue { unwrap :: seed }

                    deriving (Show)



type Iterator seed = seed -> Info -> Iterate seed
The Iterator type gives us a convenient alias for the function that we fold with. It takes a seed and an
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Useful Coding Guidelines
Inhaltsvorschau
While many good Haskell programming habits come with experience, we have a few general guidelines to offer so that you can write readable code more quickly.
As we already mentioned in , never use tab characters in Haskell source files. Use spaces.
If you find yourself proudly thinking that a particular piece of code is fiendishly clever, stop and consider whether you’ll be able to understand it again after you’ve stepped away from it for a month.
The conventional way of naming types and variables with compound names is to use camel case, i.e., myVariableName. This style is almost universal in Haskell code. Regardless of your opinion of other naming practices, if you follow a nonstandard convention, your Haskell code will be somewhat jarring to the eyes of other readers.
Until you’ve been working with Haskell for a substantial amount of time, spend a few minutes searching for library functions before you write small functions. This applies particularly to ubiquitous types such as lists, Maybe, and Either. If the standard libraries don’t already provide exactly what you need, you might be able to combine a few functions to obtain the result you desire.
Long pipelines of composed functions are hard to read, where long means a series of more than three or four elements. If you have such a pipeline, use a let or where block to break it into smaller parts. Give each one of these pipeline elements a meaningful name, and then glue them back together. If you can’t think of a meaningful name for an element, ask yourself if you can even describe what it does. If the answer is “no,” simplify your code.
Even though it’s easy to resize a text editor window far beyond 80 columns, this width is still very common. Wider lines are wrapped or truncated in 80-column text editor windows, which severely hurts readability. Treating lines as no more than 80 characters long limits the amount of code you can cram onto a single line. This helps to keep individual lines less complicated, and therefore easier to understand.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 10: Code Case Study: Parsing a Binary Data Format
Inhaltsvorschau
In this chapter, we’ll discuss a common task: parsing a binary file. We will use it for two purposes. Our first is indeed to talk a little about parsing, but our main goal is to talk about program organization, refactoring, and “boilerplate removal.” We will demonstrate how you can tidy up repetitious code, and set the stage for our discussion of monads in .
The file formats that we will work with come from the netpbm suite, an ancient and venerable collection of programs and file formats for working with bitmap images. These file formats have the dual advantages of being widely used and being fairly easy, though not completely trivial, to parse. Most importantly for our convenience, netpbm files are not compressed.
The name of netpbm’s grayscale file format is PGM (portable gray map). It is actually not one format, but two; the plain (or P2) format is encoded as ASCII, while the more common raw (P5) format is mostly binary.
A file of either format starts with a header, which in turn begins with a "magic" string describing the format. For a plain file, the string is P2, and for raw, it’s P5. The magic string is followed by whitespace, and then by three numbers: the width, height, and maximum gray value of the image. These numbers are represented as ASCII decimal numbers, separated by whitespace.
After the maximum gray value comes the image data. In a raw file, this is a string of binary values. In a plain file, the values are represented as ASCII decimal numbers separated by single-space characters.
A raw file can contain a sequence of images, one after the other, each with its own header. A plain file contains only one image.
For our first try at a parsing function, we’ll only worry about raw PGM files. We’ll write our PGM parser as a pure function. It’s won’t be responsible for obtaining the data to parse, just for the actual parsing. This is a common approach in Haskell programs. By separating the reading of the data from what we subsequently do with it, we gain flexibility in where we take the data from.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Grayscale Files
Inhaltsvorschau
The name of netpbm’s grayscale file format is PGM (portable gray map). It is actually not one format, but two; the plain (or P2) format is encoded as ASCII, while the more common raw (P5) format is mostly binary.
A file of either format starts with a header, which in turn begins with a "magic" string describing the format. For a plain file, the string is P2, and for raw, it’s P5. The magic string is followed by whitespace, and then by three numbers: the width, height, and maximum gray value of the image. These numbers are represented as ASCII decimal numbers, separated by whitespace.
After the maximum gray value comes the image data. In a raw file, this is a string of binary values. In a plain file, the values are represented as ASCII decimal numbers separated by single-space characters.
A raw file can contain a sequence of images, one after the other, each with its own header. A plain file contains only one image.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing a Raw PGM File
Inhaltsvorschau
For our first try at a parsing function, we’ll only worry about raw PGM files. We’ll write our PGM parser as a pure function. It’s won’t be responsible for obtaining the data to parse, just for the actual parsing. This is a common approach in Haskell programs. By separating the reading of the data from what we subsequently do with it, we gain flexibility in where we take the data from.
We’ll use the ByteString type to store our graymap data, because it’s compact. Since the header of a PGM file is ASCII text but its body is binary, we import both the text- and binary-oriented ByteString modules:
-- file: ch10/PNM.hs

import qualified Data.ByteString.Lazy.Char8 as L8

import qualified Data.ByteString.Lazy as L

import Data.Char (isSpace)
For our purposes, it doesn’t matter whether we use a lazy or strict ByteString, so we’ve somewhat arbitrarily chosen the lazy kind.
We’ll use a straightforward data type to represent PGM images:
-- file: ch10/PNM.hs

data Greymap = Greymap {

      greyWidth :: Int

    , greyHeight :: Int

    , greyMax :: Int

    , greyData :: L.ByteString

    } deriving (Eq)
Normally, a Haskell Show instance should produce a string representation that we can read back by calling read. However, for a bitmap graphics file, this would potentially produce huge text strings, for example, if we were to show a photo. For this reason, we’re not going to let the compiler automatically derive a Show instance for us; we’ll write our own and intentionally simplify it:
-- file: ch10/PNM.hs

instance Show Greymap where

    show (Greymap w h m _) = "Greymap " ++ show w ++ "x" ++ show h ++

                             " " ++ show m
Because our Show instance intentionally avoids printing the bitmap data, there’s no point in writing a Read instance, as we can’t reconstruct a valid Greymap from the result of show.
Here’s an obvious type for our parsing function:
-- file: ch10/PNM.hs

parseP5 :: L.ByteString -> Maybe (Greymap, L.ByteString)
This will take a ByteString, and if the parse succeeds, it will return a single parsed
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Getting Rid of Boilerplate Code
Inhaltsvorschau
While our parseP5 function works, the style in which we wrote it is somehow not pleasing. Our code marches steadily to the right of the screen, and it’s clear that a slightly more complicated function would soon run out of visual real estate. We repeat a pattern of constructing and then deconstructing Maybe values, only continuing if a particular value matches Just. All of the similar case expressions act as boilerplate code, busywork that obscures what we’re really trying to do. In short, this function is begging for some abstraction and refactoring.
If we step back a little, we can see two patterns. First is that many of the functions that we apply have similar types. Each takes a ByteString as its last argument and returns Maybe something else. Second, every step in the "ladder" of our parseP5 function deconstructs a Maybe value, and either fails or passes the unwrapped result to a .
We can quite easily write a function that captures this second pattern:
-- file: ch10/PNM.hs

(>>?) :: Maybe a -> (a -> Maybe b) -> Maybe b

Nothing >>? _ = Nothing

Just v  >>? f = f v
The (>>?) function acts very simply: it takes a value as its left argument, and a function as its right. If the value is not Nothing, it applies the function to whatever is wrapped in the constructor. We have defined our function as an operator so that we can use it to chain functions together. Finally, we haven’t provided a fixity declaration for (>>?), so it defaults to (left-associative, strongest operator precedence). In other words, will be evaluated from left to right, as .
With this chaining function in hand, we can take a second try at our parsing function:
-- file: ch10/PNM.hs

parseP5_take2 :: L.ByteString -> Maybe (Greymap, L.ByteString)

parseP5_take2 s =

    matchHeader (L8.pack "P5") s      >>?

    \s -> skipSpace ((), s)           >>?

    (getNat . snd)                    >>?

    skipSpace                         >>?

    \(width, s) ->   getNat s         >>?

    skipSpace                         >>?

    \(height, s) ->  getNat s         >>?

    \(maxGrey, s) -> getBytes 1 s     >>?

    (getBytes (width * height) . snd) >>?

    \(bitmap, s) -> Just (Greymap width height maxGrey bitmap, s)



skipSpace :: (a, L.ByteString) -> Maybe (a, L.ByteString)

skipSpace (a, s) = Just (a, L8.dropWhile isSpace s)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Implicit State
Inhaltsvorschau
We’re not yet out of the woods. Our code explicitly passes pairs around, using one element for an intermediate part of the parsed result and the other for the current residual ByteString. If we want to extend the code, for example, to track the number of bytes we’ve consumed so that we can report the location of a parse failure, we already have eight different spots that we will need to modify, just to pass a three-tuple around.
This approach makes even a small body of code difficult to change. The problem lies with our use of pattern matching to pull values out of each pair: we have embedded the knowledge that we are always working with pairs straight into our code. As pleasant and helpful as pattern matching is, it can lead us in some undesirable directions if we do not use it carefully.
Let’s do something to address the inflexibility of our new code. First, we will change the type of state that our parser uses:
-- file: ch10/Parse.hs

data ParseState = ParseState {

      string :: L.ByteString

    , offset :: Int64           -- imported from Data.Int

    } deriving (Show)
In our switch to an algebraic data type, we added the ability to track both the current residual string and the offset into the original string since we started parsing. The more important change was our use of record syntax: we can now avoid pattern matching on the pieces of state that we pass around and use the accessor functions string and offset instead.
We have given our parsing state a name. When we name something, it can become easier to reason about. For example, we can now look at parsing as a kind of function: it consumes a parsing state and produces both a new parsing state and some other piece of information. We can directly represent this as a Haskell type:
-- file: ch10/Parse.hs

simpleParse :: ParseState -> (a, ParseState)

simpleParse = undefined
To provide more help to our users, we would like to report an error message if parsing fails. This requires only a minor tweak to the type of our parser:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introducing Functors
Inhaltsvorschau
We’re by now thoroughly familiar with the map function, which applies a function to every element of a list, returning a list of possibly a different type:
map (+1) [1,2,3]

[2,3,4]

map show [1,2,3]

["1","2","3"]

:type map show

map show :: (Show a) => [a] -> [String]
This map-like activity can be useful in other instances. For example, consider a binary tree:
-- file: ch10/TreeMap.hs

data Tree a = Node (Tree a) (Tree a)

            | Leaf a

              deriving (Show)
If we want to take a tree of strings and turn it into a tree containing the lengths of those strings, we could write a function to do this:
-- file: ch10/TreeMap.hs

treeLengths (Leaf s) = Leaf (length s)

treeLengths (Node l r) = Node (treeLengths l) (treeLengths r)
Now that our eyes are attuned to looking for patterns that we can turn into generally useful functions, we can see a possible case of this here:
-- file: ch10/TreeMap.hs

treeMap :: (a -> b) -> Tree a -> Tree b

treeMap f (Leaf a)   = Leaf (f a)

treeMap f (Node l r) = Node (treeMap f l) (treeMap f r)
As we might hope, treeLengths and treeMap length give the same results:
let tree = Node (Leaf "foo") (Node (Leaf "x") (Leaf "quux"))treeLengths tree

Node (Leaf 3) (Node (Leaf 1) (Leaf 4))

treeMap length tree

Node (Leaf 3) (Node (Leaf 1) (Leaf 4))

treeMap (odd . length) tree

Node (Leaf True) (Node (Leaf True) (Leaf False))
Haskell provides a well-known typeclass to further generalize treeMap. This typeclass is named Functor, and it defines one function, fmap:
-- file: ch10/TreeMap.hs

class Functor f where

    fmap :: (a -> b) -> f a -> f b
We can think of fmap as a kind of lifting function, as we introduced in . It takes a function over ordinary values a -> b, and lifts it to become a function over containers f a -> f b, where f is the container type.
If we substitute Tree for the type variable f, for example, then the type of fmap is identical to the type of treeMap, and in fact we can use treeMap as the implementation of fmap over Trees:
-- file: ch10/TreeMap.hs

instance Functor Tree where

    fmap = treeMap
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing a Functor Instance for Parse
Inhaltsvorschau
For the types we have surveyed so far, the behavior we ought to expect of fmap has been obvious. This is a little less clear for Parse, due to its complexity. A reasonable guess is that the function we’re fmapping should be applied to the current result of a parse, and leave the parse state untouched:
-- file: ch10/Parse.hs

instance Functor Parse where

    fmap f parser = parser ==> \result ->

                    identity (f result)
This definition is easy to read, so let’s perform a few quick experiments to see if we’re following our rules for functors.
First, we’ll check that identity is preserved. Let’s try this first on a parse that ought to fail—parsing a byte from an empty string (remember that (<$>) is fmap):
parse parseByte L.empty

Left "byte offset 0: no more input"

parse (id <$> parseByte) L.empty

Left "byte offset 0: no more input"
Good. Now for a parse that should succeed:
let input = L8.pack "foo"L.head input

102

parse parseByte input

Right 102

parse (id <$> parseByte) input

Right 102
Inspecting these results, we can also see that our Functor instance is obeying our second rule of preserving shape. Failure is preserved as failure, and success as success.
Finally, we’ll ensure that composability is preserved:
parse ((chr . fromIntegral) <$> parseByte) input

Right 'f'

parse (chr <$> fromIntegral <$> parseByte) input

Right 'f'
On the basis of this brief inspection, our Functor instance appears to be well behaved.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Functors for Parsing
Inhaltsvorschau
All this talk of functors has a purpose: they often let us write tidy, expressive code. Recall the parseByte function that we introduced earlier. In recasting our PGM parser to use our new parser infrastructure, we’ll often want to work with ASCII characters instead of Word8 values.
While we could write a parseChar function that has a similar structure to parseByte, we can now avoid this code duplication by taking advantage of the functor nature of Parse. Our functor takes the result of a parse and applies a function to it, so what we need is a function that turns a Word8 into a Char:
-- file: ch10/Parse.hs

w2c :: Word8 -> Char

w2c = chr . fromIntegral



-- import Control.Applicative

parseChar :: Parse Char

parseChar = w2c <$> parseByte
We can also use functors to write a compact "peek" function. This returns if we’re at the end of the input string. Otherwise, it returns the next character without consuming it (i.e., it inspects, but doesn’t disturb, the current parsing state):
-- file: ch10/Parse.hs

peekByte :: Parse (Maybe Word8)

peekByte = (fmap fst . L.uncons . string) <$> getState
The same lifting trick that let us define parseChar lets us write a compact definition for peekChar:
-- file: ch10/Parse.hs

peekChar :: Parse (Maybe Char)

peekChar = fmap w2c <$> peekByte
Notice that peekByte and peekChar each make two calls to fmap, one of which is disguised as (<$>). This is necessary because the type Parse (Maybe a) is a functor within a functor. We thus have to lift a function twice to "get it into" the inner functor.
Finally, we’ll write another generic combinator, which is the Parse analogue of the familiar takeWhile. It consumes its input while its predicate returns True:
-- file: ch10/Parse.hs

parseWhile :: (Word8 -> Bool) -> Parse [Word8]

parseWhile p = (fmap p <$> peekByte) ==> \mp ->

               if mp == Just True

               then parseByte ==> \b ->

                    (b:) <$> parseWhile p

               else identity []
Once again, we’re using functors in several places (doubled up, when necessary) to reduce the verbosity of our code. Here’s a rewrite of the same function in a more direct style that does not use functors:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Rewriting Our PGM Parser
Inhaltsvorschau
With our new parsing code, what does the raw PGM parsing function look like now?
-- file: ch10/Parse.hs

parseRawPGM =

    parseWhileWith w2c notWhite ==> \header -> skipSpaces ==>&

    assert (header == "P5") "invalid raw header" ==>&

    parseNat ==> \width -> skipSpaces ==>&

    parseNat ==> \height -> skipSpaces ==>&

    parseNat ==> \maxGrey ->

    parseByte ==>&

    parseBytes (width * height) ==> \bitmap ->

    identity (Greymap width height maxGrey bitmap)

  where notWhite = (`notElem` " \r\n\t")
This definition makes use of a few more helper functions that we present here, following a pattern that should be familiar by now:
-- file: ch10/Parse.hs

parseWhileWith :: (Word8 -> a) -> (a -> Bool) -> Parse [a]

parseWhileWith f p = fmap f <$> parseWhile (p . f)



parseNat :: Parse Int

parseNat = parseWhileWith w2c isDigit ==> \digits ->

           if null digits

           then bail "no more input"

           else let n = read digits

                in if n < 0

                   then bail "integer overflow"

                   else identity n



(==>&) :: Parse a -> Parse b -> Parse b

p ==>& f = p ==> \_ -> f



skipSpaces :: Parse ()

skipSpaces = parseWhileWith w2c isSpace ==>& identity ()



assert :: Bool -> String -> Parse ()

assert True  _   = identity ()

assert False err = bail err
The (==>&) combinator chains parsers such as (==>), but the righthand side ignores the result from the left. The assert function lets us check a property and abort parsing with a useful error message if the property is .
Notice how few of the functions that we have written make any reference to the current parsing state. Most notably, where our old parseP5 function explicitly passed two-tuples down the chain of dataflow, all of the state management in parseRawPGM is hidden from us.
Of course, we can’t completely avoid inspecting and modifying the parsing state. Here’s a case in point, the last of the helper functions needed by parseRawPGM:
-- file: ch10/Parse.hs

parseBytes :: Int -> Parse L.ByteString

parseBytes n =

    getState ==> \st ->

    let n' = fromIntegral n

        (h, t) = L.splitAt n' (string st)

        st' = st { offset = offset st + L.length h, string = t }

    in putState st' ==>&

       assert (L.length h == n') "end of input" ==>&

       identity h
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Future Directions
Inhaltsvorschau
Our main theme in this chapter has been abstraction. We found passing explicit state down a chain of functions to be unsatisfactory, so we abstracted this detail away. We noticed some recurring needs as we worked out our parsing code, and abstracted those into common functions. Along the way, we introduced the notion of a functor, which offers a generalized way to map over a parameterized type.
We will revisit parsing in , when we discuss Parsec, a widely used and flexible parsing library. And in , we will return to our theme of abstraction, where we will find that much of the code that we have developed in this chapter can be further simplified by the use of monads.
For efficiently parsing binary data represented as a ByteString, a number of packages are available via the Hackage package database. At the time of this writing, the most popular is , which is easy to use and offers high performance.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 11: Testing and Quality Assurance
Inhaltsvorschau
Building real systems means caring about quality control, robustness, and correctness. With the right quality assurance mechanisms in place, well-written code can feel like a precision machine, with all functions performing their tasks exactly as specified. There is no sloppiness around the edges, and the final result can be code that is self-explanatory—and obviously correct—the kind of code that inspires confidence.
In Haskell, we have several tools at our disposal for building such precise systems. The most obvious tool, and one built into the language itself, is the expressive type system, which allows for complicated invariants to be enforced statically—making it impossible to write code violating chosen constraints. In addition, purity and polymorphism encourage a style of code that is modular, refactorable, and testable. This is the kind of code that just doesn’t go wrong.
Testing plays a key role in keeping code on the straight-and-narrow path. The main testing mechanisms in Haskell are traditional unit testing (via the HUnit library) and its more powerful descendant, type-based "property" testing, with QuickCheck, an open source testing framework for Haskell. Property-based testing that encourages a high-level approach to testing in the form of abstract invariants functions should satisfy universally, with the actual test data generated for the programmer by the testing library. In this way, code can be hammered with thousands of tests that would be infeasible to write by hand, often uncovering subtle corner cases that wouldn’t be found otherwise.
In this chapter, we’ll look at how to use QuickCheck to establish invariants in code, and then re-examine the pretty printer developed in previous chapters, testing it with the framework. We’ll also see how to guide the testing process with GHC’s code coverage tool: HPC.
To get an overview of how property-based testing works, we’ll begin with a simple scenario: you’ve written a specialized sorting function and want to test its behavior.
First, we import the QuickCheck library, and any other modules we need:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
QuickCheck: Type-Based Testing
Inhaltsvorschau
To get an overview of how property-based testing works, we’ll begin with a simple scenario: you’ve written a specialized sorting function and want to test its behavior.
First, we import the QuickCheck library, and any other modules we need:
-- file: ch11/QC-basics.hs

import Test.QuickCheck

import Data.List
And the function we want to test—a custom sort routine:
-- file: ch11/QC-basics.hs

qsort :: Ord a => [a] -> [a]

qsort []     = []

qsort (x:xs) = qsort lhs ++ [x] ++ qsort rhs

    where lhs = filter  (< x) xs

          rhs = filter (>= x) xs
This is the classic Haskell sort implementation: a study in functional programming elegance, if not efficiency (this isn’t an inplace sort). Now, we’d like to check that this function obeys the basic rules a good sort should follow. One useful invariant to start with, and one that comes up in a lot of purely functional code, is idempotency—applying a function twice has the same result as applying it only once. For our sort routine—a stable sort algorithm—this should certainly be true, or things have gone horribly wrong! This invariant can be encoded as a property simply, as follows:
-- file: ch11/QC-basics.hs

prop_idempotent xs = qsort (qsort xs) == qsort xs
We’ll use the QuickCheck convention of prefixing test properties with in order to distinguish them from normal code. This idempotency property is written simply as a Haskell function stating an equality that must hold for any input data that is sorted. We can check this makes sense for a few simple cases by hand:
prop_idempotent []       

True

prop_idempotent [1,1,1,1]  

True

prop_idempotent [1..100]

True

prop_idempotent [1,5,2,1,2,0,9]

True
Looks good. However, writing out the input data by hand is tedious and violates the moral code of the efficient functional programmer: let the machine do the work! To automate this, the QuickCheck library comes with a set of data generators for all the basic Haskell data types. QuickCheck uses the Arbitrary typeclass to present a uniform interface to (pseudo)random data generation with the type system used to resolve the question of which generator to use. QuickCheck normally hides the data generation plumbing; however, we can also run the generators by hand to get a sense for the distribution of data that QuickCheck produces. For example, to generate a random list of Boolean values:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Testing Case Study: Specifying a Pretty Printer
Inhaltsvorschau
Testing individual functions for their natural properties is one of the basic building blocks that guides development of large systems in Haskell. We’ll look now at a more complicated scenario: taking the pretty-printing library developed in earlier chapters and building a test suite for it.
Recall that the pretty printer is built around the Doc, an algebraic data type that represents well-formed documents:
-- file: ch11/Prettify2.hs



data Doc = Empty

         | Char Char

         | Text String

         | Line

         | Concat Doc Doc

         | Union Doc Doc

         deriving (Show,Eq)
The library itself is implemented as a set of functions that build and transform values of this document type, before finally rendering the finished document to a string.
QuickCheck encourages an approach to testing where the developer specifies invariants that should hold for any data we can throw at the code. To test the pretty-printing library, then, we’ll need a source of input data. To do this, we take advantage of the small combinator suite for building random data that QuickCheck provides via the Arbitrary class. The class provides a function, , to generate data of each type. With it, we can define our data generator for our custom data types:
-- file: ch11/Arbitrary.hs

class Arbitrary a where

  arbitrary   :: Gen a
One thing to notice is that the generators run in a Gen environment, indicated by the type. This is a simple state-passing monad that is used to hide the random number generator state that is threaded through the code. We’ll look thoroughly at monads in later chapters, but for now it suffices to know that, as Gen is defined as a monad, we can use syntax to write new generators that access the implicit random number source. To actually write generators for our custom type, we use any of a set of functions defined in the library for introducing new random values and gluing them together to build up data structures of the type we’re interested in. The types of the key functions are:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Measuring Test Coverage with HPC
Inhaltsvorschau
HPC (Haskell Program Coverage) is an extension to the compiler to observe what parts of the code were actually executed during a given program run. This is useful in the context of testing, as it lets us observe exactly which functions, branches, and expressions were evaluated. The result is precise knowledge about the percent of code tested that’s easy to obtain. HPC comes with a simple utility to generate useful graphs of program coverage, making it easy to zoom in on weak spots in the test suite.
To obtain test coverage data, all we need to do is add the flag to the command line when compiling the tests:
$ ghc -fhpc Run.hs --make
Then run the tests as normal:
$ ./Run

                 simple : .....                            (1000)

                complex : ..                               (400)
During the test run, the trace of the program is written to .tix and .mix files in the current directory. Afterwards, these files are used by the command-line tool, , to display various statistics about what happened. The basic interface is textual. To begin, we can get a summary of the code tested during the run using the flag to . We’ll exclude the test programs themselves (using the flag), so as to concentrate only on code in the pretty-printer library. Entering the following into the console:
$ hpc report Run --exclude=Main --exclude=QC

  18% expressions used (30/158)

   0% boolean coverage (0/3)

        0% guards (0/3), 3 unevaluated

        100% 'if' conditions (0/0)

        100% qualifiers (0/0)

   23% alternatives used (8/34)

    0% local declarations used (0/4)

   42% top-level declarations used (9/21)
we see that, on the last line, 42% of top-level definitions were evaluated during the test run. Not too bad for a first attempt. As we test more and more functions from the library, this figure will rise. The textual version is useful for a quick summary, but to really see what’s going on, it is best to look at the marked up output. To generate this, use the flag instead:
$ hpc markup Run --exclude=Main --exclude=QC
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 12: Barcode Recognition
Inhaltsvorschau
In this chapter, we’ll make use of the image-parsing library we developed in to build a barcode recognition application. Given a picture of the back of a book taken with a camera phone, we could use this to extract its ISBN number.
The vast majority of packaged and mass-produced consumer goods sold have a barcode somewhere on them. Although there are dozens of barcode systems used across a variety of specialized domains, consumer products typically use either UPC-A or EAN-13. UPC-A was developed in the United States, while EAN-13 is European in origin.
EAN-13 was developed after UPC-A and is a superset of UPC-A. (In fact, UPC-A has been officially declared obsolete since 2005, though it’s still widely used within the United States.) Any software or hardware that can understand EAN-13 barcodes will automatically handle UPC-A barcodes. This neatly reduces our descriptive problem to one standard.
As the name suggests, EAN-13 describes a 13-digit sequence, which is broken into four groups:
Number system
The first two digits. This can either indicate the nationality of the manufacturer or describe one of a few other categories, such as ISBN (book identifier) numbers.
Manufacturer ID
The next five digits. These are assigned by a country’s numbering authority.
Product ID
The next five digits. These are assigned by the manufacturer. (Smaller manufacturers may have a longer manufacturer ID and shorter product ID, but they still add up to 10 digits.)
Check digit
The last digit. This allows a scanner to validate the digit string it scans.
The only way in which an EAN-13 barcode differs from a UPC-A barcode is that the latter uses a single digit to represent its number system. EAN-13 barcodes retain compatibility by setting the first number system digit to zero.
Before we worry about decoding an EAN-13 barcode, we need to understand how they are encoded. The system EAN-13 uses is a little involved. We start by computing the check digit, which is the last digit of a string:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A Little Bit About Barcodes
Inhaltsvorschau
The vast majority of packaged and mass-produced consumer goods sold have a barcode somewhere on them. Although there are dozens of barcode systems used across a variety of specialized domains, consumer products typically use either UPC-A or EAN-13. UPC-A was developed in the United States, while EAN-13 is European in origin.
EAN-13 was developed after UPC-A and is a superset of UPC-A. (In fact, UPC-A has been officially declared obsolete since 2005, though it’s still widely used within the United States.) Any software or hardware that can understand EAN-13 barcodes will automatically handle UPC-A barcodes. This neatly reduces our descriptive problem to one standard.
As the name suggests, EAN-13 describes a 13-digit sequence, which is broken into four groups:
Number system
The first two digits. This can either indicate the nationality of the manufacturer or describe one of a few other categories, such as ISBN (book identifier) numbers.
Manufacturer ID
The next five digits. These are assigned by a country’s numbering authority.
Product ID
The next five digits. These are assigned by the manufacturer. (Smaller manufacturers may have a longer manufacturer ID and shorter product ID, but they still add up to 10 digits.)
Check digit
The last digit. This allows a scanner to validate the digit string it scans.
The only way in which an EAN-13 barcode differs from a UPC-A barcode is that the latter uses a single digit to represent its number system. EAN-13 barcodes retain compatibility by setting the first number system digit to zero.
Before we worry about decoding an EAN-13 barcode, we need to understand how they are encoded. The system EAN-13 uses is a little involved. We start by computing the check digit, which is the last digit of a string:
-- file: ch12/Barcode.hs

checkDigit :: (Integral a) => [a] -> a

checkDigit ds = 10 - (sum products `mod` 10)

    where products = mapEveryOther (*3) (reverse ds)



mapEveryOther :: (a -> a) -> [a] -> [a]

mapEveryOther f = zipWith ($) (cycle [f,id])
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introducing Arrays
Inhaltsvorschau
Before we continue, here are all of the imports that we will be using in the remainder of this chapter:
-- file: ch12/Barcode.hs

import Data.Array (Array(..), (!), bounds, elems, indices,

                   ixmap, listArray)



import Control.Applicative ((<$>))

import Control.Monad (forM_)

import Data.Char (digitToInt)

import Data.Ix (Ix(..))

import Data.List (foldl', group, sort, sortBy, tails)

import Data.Maybe (catMaybes, listToMaybe)

import Data.Ratio (Ratio)

import Data.Word (Word8)

import System.Environment (getArgs)

import qualified Data.ByteString.Lazy.Char8 as L

import qualified Data.Map as M



import Parse                    -- from chapter 11
The barcode encoding process can largely be table-driven, in which we use small tables of bit patterns to decide how to encode each digit. Haskell’s bread-and-butter—data types, lists, and tuples—are not well-suited to use for tables whose elements may be accessed randomly. A list has to be traversed linearly to reach the kth element. A tuple doesn’t have this problem, but Haskell’s type system makes it difficult to write a function that takes a tuple and an element offset and returns the element at that offset within the tuple. (We’ll explore why in the exercises that follow.)
The usual data type for constant-time random access is of course the array. Haskell provides several array data types. We’ll thus represent our encoding tables as arrays of strings.
The simplest array type is in the module, which we’re using here. This presents arrays that can contain values of any Haskell type. Like other common Haskell types, these arrays are immutable. An immutable array is populated with values just once, when it is created. Its contents cannot subsequently be modified. (The standard libraries also provide other array types, some of which are mutable, but we won’t cover those for a while.)
-- file: ch12/Barcode.hs

leftOddList = ["0001101", "0011001", "0010011", "0111101", "0100011",

               "0110001", "0101111", "0111011", "0110111", "0001011"]



rightList = map complement <$> leftOddList

    where complement '0' = '1'

          complement '1' = '0'



leftEvenList = map reverse rightList



parityList = ["111111", "110100", "110010", "110001", "101100",

              "100110", "100011", "101010", "101001", "100101"]



listToArray :: [a] -> Array Int a

listToArray xs = listArray (0,l-1) xs

    where l = length xs



leftOddCodes, leftEvenCodes, rightCodes, parityCodes :: Array Int String



leftOddCodes = listToArray leftOddList

leftEvenCodes = listToArray leftEvenList

rightCodes = listToArray rightList

parityCodes = listToArray parityList
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Encoding an EAN-13 Barcode
Inhaltsvorschau
Even though our goal is to decode a barcode, it’s useful to have an encoder for reference. This will allow us to, for example, ensure that our code is correct by checking that the output of is the same as its input:
-- file: ch12/Barcode.hs

encodeEAN13 :: String -> String

encodeEAN13 = concat . encodeDigits . map digitToInt



-- | This function computes the check digit; don't pass one in.

encodeDigits :: [Int] -> [String]

encodeDigits s@(first:rest) =

    outerGuard : lefties ++ centerGuard : righties ++ [outerGuard]

  where (left, right) = splitAt 5 rest

        lefties = zipWith leftEncode (parityCodes ! first) left

        righties = map rightEncode (right ++ [checkDigit s])



leftEncode :: Char -> Int -> String

leftEncode '1' = (leftOddCodes !)

leftEncode '0' = (leftEvenCodes !)



rightEncode :: Int -> String

rightEncode = (rightCodes !)



outerGuard = "101"

centerGuard = "01010"
The string to encode is 12 digits long, with encodeDigits adding a 13th check digit.
The barcode is encoded as two groups of six digits, with a guard sequence in the middle and "outside" sequences on either side. But if we have two groups of six digits, what happened to the missing digit?
Each digit in the left group is encoded using either odd or even parity, with the parity chosen based on the bits of the first digit in the string. If a bit of the first digit is zero, the corresponding digit in the left group is encoded with even parity. A one bit causes the digit to be encoded with odd parity. This encoding is an elegant hack, chosen to make EAN-13 barcodes backwards-compatible with the older UPC-A standard.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Constraints on Our Decoder
Inhaltsvorschau
Before we talk about decoding, let’s set a few practical limitations on what kinds of barcode images we can work with.
Phone cameras and webcams generally output JPEG images, but writing a JPEG decoder would take us several chapters. We’ll simplify our parsing problem by the netpbm file format. We will use the parsing combinators we developed earlier in .
We’d like to deal with real images from the kinds of cheap, fixed-focus cameras that come with low-end cell phones. These images tend to be out of focus, noisy, low in contrast, and of poor resolution. Fortunately, it’s not hard to write code that can handle noisy, defocused VGA-resolution (640 × 480) images with terrible contrast ratios. We’ve verified that the code in this chapter captures barcodes from real books, using pictures taken by authentically mediocre cameras.
We will avoid any image-processing heroics, because that’s another chapter-consuming subject. We won’t correct perspective (such as in ). Neither will we sharpen images taken from too near to the subject (), which causes narrow bars to fade out; or from too far (), which causes adjacent bars to blur together.
Figure : Barcode image distorted by perspective, due to photo being taken from an angle
Figure : Barcode image blurred by being taken from inside the focal length of the camera lens, causing bars to run together
Figure : Barcode image contains insufficient detail, due to poor resolution of camera lens and CCD
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Divide and Conquer
Inhaltsvorschau
Our task is to take a camera image and extract a valid barcode from it. Given such a nonspecific description, it can be hard to see how to make progress. However, we can break the big problem into a series of subproblems, each of which is self-contained and more tractable:
  • Convert color data into a form we can easily work with.
  • Sample a single scan line from the image and extract a set of guesses as to what the encoded digits in this line could be.
  • From the guesses, create a list of valid decodings.
Many of these subproblems can be further divided, as we’ll see.
You might wonder how closely this approach of subdivision mirrors the actual work we did when writing the code that we present in this chapter. The answer is that we’re far from image-processing gurus, and when we started writing this chapter, we didn’t know exactly what our solution was going to look like.
We made some early educated guesses as to what a reasonable solution might appear as and came up with the subtasks just listed. We were then able to start tackling those parts that we knew how to solve, using our spare time to think about the bits that we had no prior experience with. We certainly didn’t have a preexisting algorithm or master plan in mind.
Dividing the problem up like this helped us in two ways. By making progress on familiar ground, we had the psychological advantage of starting to solve the problem, even when we didn’t really know where we were going. And as we started to work on a particular subproblem, we found ourselves able to further subdivide it into tasks of varying familiarity. We continued to focus on easier components, deferring ones we hadn’t thought about in enough detail yet, and jumping from one element of the master list to another. Eventually, we ran out of problems that were both unfamiliar and unsolved, and we had a complete idea of our eventual solution.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Turning a Color Image into Something Tractable
Inhaltsvorschau
Since we want to work with barcodes (which are sequences of black and white stripes) and we want to write a simple decoder, an easy representation to work with will be a monochrome image, in which each pixel is either black or white.
As we mentioned earlier, we’ll work with netpbm images. The netpbm color image format is only slightly more complicated than the grayscale image format that we parsed in . The identifying string in a header is “P6,” with the rest of the header layout identical to the grayscale format. In the body of an image, each pixel is represented as three bytes, one each for red, green, and blue.
We’ll represent the image data as a two-dimensional array of pixels. We’re using arrays here purely to gain experience with them. For this application, we could just as well use a list of lists. The only advantage of an array is slight—we can efficiently extract a row:
-- file: ch12/Barcode.hs

type Pixel = Word8

type RGB = (Pixel, Pixel, Pixel)



type Pixmap = Array (Int,Int) RGB
We provide a few type synonyms to make our type signatures more readable.
Since Haskell gives us considerable freedom in how we lay out an array, we must choose a representation. We’ll play it safe and follow a popular convention: indices begin at zero. We don’t need to store the dimensions of the image explicitly, since we can extract them using the bounds function.
The actual parser is mercifully short, thanks to the combinators we developed in :
-- file: ch12/Barcode.hs

parseRawPPM :: Parse Pixmap

parseRawPPM =

    parseWhileWith w2c (/= '\n') ==> \header -> skipSpaces ==>&

    assert (header == "P6") "invalid raw header" ==>&

    parseNat ==> \width -> skipSpaces ==>&

    parseNat ==> \height -> skipSpaces ==>&

    parseNat ==> \maxValue ->

    assert (maxValue == 255) "max value out of spec" ==>&

    parseByte ==>&

    parseTimes (width * height) parseRGB ==> \pxs ->

    identity (listArray ((0,0),(width-1,height-1)) pxs)



parseRGB :: Parse RGB

parseRGB = parseByte ==> \r ->

           parseByte ==> \g ->

           parseByte ==> \b ->

           identity (r,g,b)



parseTimes :: Int -> Parse a -> Parse [a]

parseTimes 0 _ = identity []

parseTimes n p = p ==> \x -> (x:) <$> parseTimes (n-1) p
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
What Have We Done to Our Image?
Inhaltsvorschau
Let’s step back for a moment and consider what we did to our image when we converted it from color to monochrome. shows an image captured from a VGA- camera. All we’ve done is crop it down to the barcode.
Figure : Barcode photo, somewhat blurry and dim
The encoded digit string, 9780132114677, is printed below the barcode. The left group encodes the digits 780132, with 9 encoded in their parity. The right group encodes the digits 114677, where the final 7 is the check digit. shows a clean encoding of this barcode, from one of the many websites that offers barcode image generation for free.
Figure : Automatically generated image of the same barcode
In , we’ve chosen a row from the captured image and stretched it out vertically to make it easier to see. We’ve superimposed this on top of the perfect image and stretched it out so that the two are aligned.
Figure : Photographic and generated images of barcode juxtaposed to illustrate the variation in bar brightness and resolution
The luminance-converted row from the photo is in the dark gray band. It is low in contrast and poor in quality, with plenty of blurring and noise. The paler band is the same row with the contrast adjusted.
Somewhat below these two bands is another: this shows the effect of thresholding the luminance-converted row. Notice that some bars have gotten thicker, others thinner, and many bars have moved a little to the left or right.
Clearly, any attempt to find exact matches in an image with problems such as these is not going to succeed very often. We must write code that’s robust in the face of bars that are too thick, too thin, or not exactly where they’re supposed to be. The widths of our bars will depend on how far our book was from the camera, so we can’t make any assumptions about widths, either.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Finding Matching Digits
Inhaltsvorschau
Our first problem is to find the digits that might be encoded at a given position. For the next while, we’ll make a couple simplifying assumptions. The first is that we’re working with a single row. The second is that we know exactly where in a row the left edge of a barcode begins.
How can we overcome the problem of not even knowing how thick our bars are? The answer is to run length encode (instead of repeating a value some number of times, run length encoding presents it once, with a count of the number of consecutive repeats):
-- file: ch12/Barcode.hs

type Run = Int

type RunLength a = [(Run, a)]



runLength :: Eq a => [a] -> RunLength a

runLength = map rle . group

    where rle xs = (length xs, head xs)
The group function takes sequences of identical elements in a list and groups them into sublists:
group [1,1,2,3,3,3,3]

[[1,1],[2],[3,3,3,3]]

Our runLength function represents each group as a pair of its length and first element:
let bits = [0,0,1,1,0,0,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0]runLength bits

Loading package array-0.1.0.0 ... linking ... done.

Loading package containers-0.1.0.2 ... linking ... done.

Loading package bytestring-0.9.0.1.1 ... linking ... done.

[(2,0),(2,1),(2,0),(2,1),(6,0),(4,1),(4,0)]
Since the data we’re run length encoding are just ones and zeros, the encoded numbers will simply alternate between one and zero. We can throw the encoded values away without losing any useful information, keeping only the length of each run:
-- file: ch12/Barcode.hs

runLengths :: Eq a => [a] -> [Run]

runLengths = map fst . runLength
runLengths bits

[2,2,2,2,6,4,4]

The bit patterns aren’t random; they’re the left outer guard and first encoded digit of a row from our captured image. If we drop the guard bars, we’re left with the run lengths . How do we find matches for these in the encoding tables we wrote in ?
One possible approach is to scale the run lengths so that they sum to one. We’ll use the Ratio Int type instead of the usual Double to manage these scaled values, as
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Life Without Arrays or Hash Tables
Inhaltsvorschau
In an imperative language, the array is as much a "bread and butter" type as a list or tuple in Haskell. We take it for granted that an array in an imperative language is usually mutable; we can change an element of an array whenever it suits us.
As we mentioned in , Haskell arrays are not mutable. This means that to "modify" a single array element, a copy of the entire array is made, with that single element set to its new value. Clearly, this approach is not a winner for performance.
The mutable array is a building block for another ubiquitous imperative data structure, the hash table. In the typical implementation, an array acts as the "spine" of the table, with each element containing a list of elements. To add an element to a hash table, we hash the element to find the array offset and modify the list at that offset to add the element to it.
If arrays aren’t mutable for updating a hash table, we must create a new one. We copy the array, putting a new list at the offset indicated by the element’s hash. We don’t need to copy the lists at other offsets, but we’ve already dealt performance a fatal blow simply by having to copy the spine.
At a single stroke, then, immutable arrays have eliminated two canonical imperative data structures from our toolbox. Arrays are somewhat less useful in pure Haskell code than in many other languages. Still, many array codes update an array only during a build phase, and subsequently use it in a read-only manner.
This is not the calamitous situation that it might seem, though. Arrays and hash tables are often used as collections indexed by a key, and in Haskell we use trees for this purpose.
Implementing a naive tree type is particularly easy in Haskell. Beyond that, more useful tree types are also unusually easy to implement. Self-balancing structures, such as red-black trees, have struck fear into generations of undergraduate computer science students, because the balancing algorithms are notoriously hard to get right.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Turning Digit Soup into an Answer
Inhaltsvorschau
We’ve got yet another problem to solve. We have many candidates for the last 12 digits of the barcode. In addition, we need to use the parities of the first six digits to figure out what the first digit is. Finally, we need to ensure that our answer’s check digit makes sense.
This seems quite challenging! We have a lot of uncertain data; what should we do? It’s reasonable to ask if we could perform a brute-force search. Given the candidates we saw in th preceding ghci session, how many combinations would we have to examine?
product . map length . candidateDigits $ input

34012224

So much for that idea. Once again, we’ll initially focus on a subproblem that we know how to solve and postpone worrying about the rest.
Let’s abandon the idea of searching for now, and focus on computing a check digit. The check digit for a barcode can assume 1 of 12 possible values. For a given parity digit, which input sequences can cause that digit to be computed?
-- file: ch12/Barcode.hs

type Map a = M.Map Digit [a]
In this map, the key is a check digit, and the value is a sequence that evaluates to this check digit. We have two further map types based on this definition:
-- file: ch12/Barcode.hs

type DigitMap = Map Digit

type ParityMap = Map (Parity Digit)
We’ll generically refer to these as solution maps, because they show us the digit sequence that "solves for" each check digit.
Given a single digit, here’s how we can update an existing solution map:
-- file: ch12/Barcode.hs

updateMap :: Parity Digit       -- ^ new digit

          -> Digit              -- ^ existing key

          -> [Parity Digit]     -- ^ existing digit sequence

          -> ParityMap          -- ^ map to update

          -> ParityMap

updateMap digit key seq = insertMap key (fromParity digit) (digit:seq)



insertMap :: Digit -> Digit -> [a] -> Map a -> Map a

insertMap key digit val m = val `seq` M.insert key' val m

    where key' = (key + digit) `mod` 10
With an existing check digit drawn from the map, the sequence that solves for it, and a new input digit, this function updates the map with the new sequence that leads to the new check digit.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Working with Row Data
Inhaltsvorschau
We’ve mentioned repeatedly that we are taking a single row from our image. Here’s how:
-- file: ch12/Barcode.hs

withRow :: Int -> Pixmap -> (RunLength Bit -> a) -> a

withRow n greymap f = f . runLength . elems $ posterized

    where posterized = threshold 0.4 . fmap luminance . row n $ greymap
The withRow function takes a row, converts it to monochrome, and then calls another function on the run length encoded row data. To get the row data, it calls row:
-- file: ch12/Barcode.hs

row :: (Ix a, Ix b) => b -> Array (a,b) c -> Array a c

row j a = ixmap (l,u) project a

    where project i = (i,j)

          ((l,_), (u,_)) = bounds a
This function takes a bit of explaining. Whereas fmap transforms the values in an array, ixmap transforms the indices of an array. It’s a very powerful function that lets us "slice" an array however we please.
The first argument to ixmap is the bounds of the new array. These bounds can be of a different dimension than the source array. In , for example, we’re extracting a one-dimensional array from a two-dimensional array.
The second argument is a projection function. This takes an index from the new array and returns an index into the source array. The value at that projected index then becomes the value in the new array at the original index. For example, if we pass into the projection function and it returns , the element at index of the new array will be taken from element of the source array.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Pulling It All Together
Inhaltsvorschau
Our candidateDigits function gives an empty result unless we call it at the beginning of a barcode sequence. We can easily scan across a row until we get a match as follows:
-- file: ch12/Barcode.hs

findMatch :: [(Run, Bit)] -> Maybe [[Digit]]

findMatch = listToMaybe

          . filter (not . null)

          . map (solve . candidateDigits)

          . tails
Here, we’re taking advantage of lazy evaluation. The call to map over tails will only be evaluated until it results in a nonempty list.
Next, we choose a row from an image and try to find a barcode in it:
-- file: ch12/Barcode.hs

findEAN13 :: Pixmap -> Maybe [Digit]

findEAN13 pixmap = withRow center pixmap (fmap head . findMatch)

  where (_, (maxX, _)) = bounds pixmap

        center = (maxX + 1) `div` 2
Finally, here’s a very simple wrapper that prints barcodes from whatever netpbm image files we pass into our program on the command line:
-- file: ch12/Barcode.hs

main :: IO ()

main = do

  args <- getArgs

  forM_ args $ \arg -> do

    e <- parse parseRawPPM <$> L.readFile arg

    case e of

      Left err ->     print $ "error: " ++ err

      Right pixmap -> print $ findEAN13 pixmap
Notice that, of the more than 30 functions we’ve defined in this chapter, main is the only one that lives in IO.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A Few Comments on Development Style
Inhaltsvorschau
You may have noticed that many of the functions we presented in this chapter were short functions at the top level of the source file. This is no accident. As we mentioned earlier, when we started writing this chapter, we didn’t know what form our solution was going to take.
Quite often, then, we had to explore a problem space in order to figure out where we were going. To do this, we spent a lot of time fiddling about in ghci, performing tiny experiments on individual functions. This kind of exploration requires that a function be declared at the top level of a source file; otherwise, ghci won’t be able to see it.
Once we were satisfied that individual functions were behaving themselves, we started to glue them together, again investigating the consequences in ghci. This is where our devotion to writing type signatures paid back, as we immediately discovered when a particular composition of functions couldn’t possibly work.
At the end of this process, we were left with a large number of very small top-level functions, each with a type signature. This isn’t the most compact representation possible; we could have hoisted many of those functions into let or where blocks when we were done with them. However, we find that the added vertical space, small function bodies, and type signatures make the code far more readable, so we generally avoided "golfing" functions after we wrote them.
Working in a language with strong, static typing does not at all interfere with incrementally and fluidly developing a solution to a problem. We find the turnaround between writing a function and getting useful feedback from ghci to be very rapid; it greatly assists us in writing good code quickly.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 13: Data Structures
Inhaltsvorschau
Often, we have to deal with data that is unordered but is indexed by a key. For instance, a Unix administrator might have a list of numeric UIDs (user IDs) and the textual usernames that they correspond to. The value of this list lies in being able to look up a textual username for a given UID, not in the order of the data. In other words, the UID is a key into a database.
In Haskell, there are several ways to handle data that is structured in this way. The two most common are association lists and the Map type provided by Data.Map module. Association lists are handy because they are simple. They are standard Haskell lists, so all the familiar list functions work with them. However, for large data sets, Map will have a considerable performance advantage over association lists. We’ll use both in this chapter.
An association list is just a normal list containing (key, value) tuples. The type of a list of mappings from UID to username might be [(Integer, String)]. We could use just about any type for both the key and the value.
We can build association lists just like we do any other list. Haskell comes with one built-in function called Data.List.lookup to look up data in an association list. Its type is Eq a => a -> [(a, b)] -> Maybe b. Can you guess how it works from that type? Let’s take a look in ghci:
let al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")]lookup 1 al

Just "one"

lookup 5 al

Nothing
The lookup function is really simple. Here’s one way we could write it:
-- file: ch13/lookup.hs

myLookup :: Eq a => a -> [(a, b)] -> Maybe b

myLookup _ [] = Nothing

myLookup key ((thiskey,thisval):rest) =

    if key == thiskey

       then Just thisval

       else myLookup key rest
This function returns Nothing if passed the empty list. Otherwise, it compares the key with the key we’re looking for. If a match is found, the corresponding value is returned; otherwise, it searches the rest of the list.
Let’s take a look at a more complex example of association lists. On Unix/Linux machines, there is a file called
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Association Lists
Inhaltsvorschau
Often, we have to deal with data that is unordered but is indexed by a key. For instance, a Unix administrator might have a list of numeric UIDs (user IDs) and the textual usernames that they correspond to. The value of this list lies in being able to look up a textual username for a given UID, not in the order of the data. In other words, the UID is a key into a database.
In Haskell, there are several ways to handle data that is structured in this way. The two most common are association lists and the Map type provided by Data.Map module. Association lists are handy because they are simple. They are standard Haskell lists, so all the familiar list functions work with them. However, for large data sets, Map will have a considerable performance advantage over association lists. We’ll use both in this chapter.
An association list is just a normal list containing (key, value) tuples. The type of a list of mappings from UID to username might be [(Integer, String)]. We could use just about any type for both the key and the value.
We can build association lists just like we do any other list. Haskell comes with one built-in function called Data.List.lookup to look up data in an association list. Its type is Eq a => a -> [(a, b)] -> Maybe b. Can you guess how it works from that type? Let’s take a look in ghci:
let al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")]lookup 1 al

Just "one"

lookup 5 al

Nothing
The lookup function is really simple. Here’s one way we could write it:
-- file: ch13/lookup.hs

myLookup :: Eq a => a -> [(a, b)] -> Maybe b

myLookup _ [] = Nothing

myLookup key ((thiskey,thisval):rest) =

    if key == thiskey

       then Just thisval

       else myLookup key rest
This function returns Nothing if passed the empty list. Otherwise, it compares the key with the key we’re looking for. If a match is found, the corresponding value is returned; otherwise, it searches the rest of the list.
Let’s take a look at a more complex example of association lists. On Unix/Linux machines, there is a file called
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Maps
Inhaltsvorschau
The Data.Map module provides a Map type with behavior that is similar to association lists but has much better performance.
Maps give us the same capabilities as hash tables do in other languages. Internally, a map is implemented as a balanced binary tree. Compared to a hash table, this is a much more efficient representation in a language with immutable data. This is the most visible example of how deeply pure functional programming affects how we write code: we choose data structures and algorithms that we can express cleanly and that perform efficiently, but our choices for specific tasks are often different from their counterparts in imperative languages.
Some functions in the module have the same names as those in the Prelude. Therefore, we will import it with import qualified Data.Map as Map and use Map.name to refer to names in that module. Let’s start our tour of Data.Map by taking a look at some ways to build a map:
-- file: ch13/buildmap.hs

import qualified Data.Map as Map



-- Functions to generate a Map that represents an association list

-- as a map



al = [(1, "one"), (2, "two"), (3, "three"), (4, "four")]



{- | Create a map representation of 'al' by converting the association

-  list using Map.fromList -}

mapFromAL =

    Map.fromList al



{- | Create a map representation of 'al' by doing a fold -}

mapFold =

    foldl (\map (k, v) -> Map.insert k v map) Map.empty al



{- | Manually create a map with the elements of 'al' in it -}

mapManual =

    Map.insert 2 "two" . 

    Map.insert 4 "four" .

    Map.insert 1 "one" .

    Map.insert 3 "three" $ Map.empty
Functions such as Map.insert work in the usual Haskell way: they return a copy of the input data, with the requested change applied. This is quite handy with maps. It means that you can use foldl to build up a map as in the mapFold example. Or, you can chain together calls to Map.insert as in the mapManual example. Let’s use ghci to verify that all of these maps are as expected:
:l buildmap.hs

[1 of 1] Compiling Main             ( buildmap.hs, interpreted )

Ok, modules loaded: Main.

al

Loading package array-0.1.0.0 ... linking ... done.

Loading package containers-0.1.0.2 ... linking ... done.

[(1,"one"),(2,"two"),(3,"three"),(4,"four")]

mapFromAL

fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]

mapFold

fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]

mapManual

fromList [(1,"one"),(2,"two"),(3,"three"),(4,"four")]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Functions Are Data, Too
Inhaltsvorschau
Part of Haskell’s power is the ease with which it lets us create and manipulate functions. Let’s take a look at a record that stores a function as one of its fields:
-- file: ch13/funcrecs.hs

{- | Our usual CustomColor type to play with -}

data CustomColor =

  CustomColor {red :: Int,

               green :: Int,

               blue :: Int}

  deriving (Eq, Show, Read)



{- | A new type that stores a name and a function.



The function takes an Int, applies some computation to it, and returns

an Int along with a CustomColor -}

data FuncRec =

    FuncRec {name :: String,

             colorCalc :: Int -> (CustomColor, Int)}



plus5func color x = (color, x + 5)



purple = CustomColor 255 0 255



plus5 = FuncRec {name = "plus5", colorCalc = plus5func purple}

always0 = FuncRec {name = "always0", colorCalc = \_ -> (purple, 0)}
Notice the type of the colorCalc field: it’s a function. It takes an Int and returns a tuple of (CustomColor, Int). We create two FuncRec records: plus5 and always0. Notice that the colorCalc for both of them will always return the color purple. FuncRec itself has no field to store the color in, yet that value somehow becomes part of the function itself. This is called a closure. Let’s play with this a bit:
:l funcrecs.hs

[1 of 1] Compiling Main             ( funcrecs.hs, interpreted )

Ok, modules loaded: Main.

:t plus5

plus5 :: FuncRec

name plus5

"plus5"

:t colorCalc plus5

colorCalc plus5 :: Int -> (CustomColor, Int)

(colorCalc plus5) 7

(CustomColor {red = 255, green = 0, blue = 255},12)

:t colorCalc always0

colorCalc always0 :: Int -> (CustomColor, Int)

(colorCalc always0) 7

(CustomColor {red = 255, green = 0, blue = 255},0)
That worked well enough, but you might wonder how to do something more advanced, such as making a piece of data available in multiple places. A type construction function can be helpful. Here’s an example:
-- file: ch13/funcrecs2.hs

data FuncRec =

    FuncRec {name :: String,

             calc :: Int -> Int,

             namedCalc :: Int -> (String, Int)}



mkFuncRec :: String -> (Int -> Int) -> FuncRec

mkFuncRec name calcfunc =

    FuncRec {name = name,

             calc = calcfunc,

             namedCalc = \x -> (name, calcfunc x)}



plus5 = mkFuncRec "plus5" (+ 5)

always0 = mkFuncRec "always0" (\_ -> 0)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extended Example: /etc/passwd
Inhaltsvorschau
In order to illustrate the usage of a number of different data structures together, we’ve prepared an extended example. This example parses and stores entries from files in the format of a typical /etc/passwd file:
-- file: ch13/passwdmap.hs

import Data.List

import qualified Data.Map as Map

import System.IO

import Text.Printf(printf)

import System.Environment(getArgs)

import System.Exit

import Control.Monad(when)



{- | The primary piece of data this program will store.

   It represents the fields in a POSIX /etc/passwd file -}

data PasswdEntry = PasswdEntry {

    userName :: String,

    password :: String,

    uid :: Integer,

    gid :: Integer,

    gecos :: String,

    homeDir :: String,

    shell :: String}

    deriving (Eq, Ord)



{- | Define how we get data to a 'PasswdEntry'. -}

instance Show PasswdEntry where

    show pe = printf "%s:%s:%d:%d:%s:%s:%s" 

                (userName pe) (password pe) (uid pe) (gid pe)

                (gecos pe) (homeDir pe) (shell pe)



{- | Converting data back out of a 'PasswdEntry'. -}

instance Read PasswdEntry where

    readsPrec _ value =

        case split ':' value of

             [f1, f2, f3, f4, f5, f6, f7] ->

                 -- Generate a 'PasswdEntry' the shorthand way:

                 -- using the positional fields.  We use 'read' to convert

                 -- the numeric fields to Integers.

                 [(PasswdEntry f1 f2 (read f3) (read f4) f5 f6 f7, [])]

             x -> error $ "Invalid number of fields in input: " ++ show x

        where 

        {- | Takes a delimiter and a list.  Break up the list based on the

        -  delimiter. -}

        split :: Eq a => a -> [a] -> [[a]]



        -- If the input is empty, the result is a list of empty lists.

        split _ [] = [[]]

        split delim str =

            let -- Find the part of the list before delim and put it in

                -- "before".  The rest of the list, including the leading 

                -- delim, goes in "remainder".

                (before, remainder) = span (/= delim) str

                in

                before : case remainder of

                              [] -> []

                              x -> -- If there is more data to process,

                                   -- call split recursively to process it

                                   split delim (tail x)



-- Convenience aliases; we'll have two maps: one from UID to entries

-- and the other from username to entries

type UIDMap = Map.Map Integer PasswdEntry

type UserMap = Map.Map String PasswdEntry



{- | Converts input data to maps.  Returns UID and User maps. -}

inputToMaps :: String -> (UIDMap, UserMap)

inputToMaps inp =

    (uidmap, usermap)

    where

    -- fromList converts a [(key, value)] list into a Map

    uidmap = Map.fromList . map (\pe -> (uid pe, pe)) $ entries

    usermap = Map.fromList . 

              map (\pe -> (userName pe, pe)) $ entries

    -- Convert the input String to [PasswdEntry]

    entries = map read (lines inp)



main = do

    -- Load the command-line arguments

    args <- getArgs



    -- If we don't have the right number of args,

    -- give an error and abort



    when (length args /= 1) $ do

        putStrLn "Syntax: passwdmap filename"

        exitFailure



    -- Read the file lazily

    content <- readFile (head args)

    let maps = inputToMaps content

    mainMenu maps



mainMenu maps@(uidmap, usermap) = do

    putStr optionText

    hFlush stdout

    sel <- getLine

    -- See what they want to do.  For every option except 4,

    -- return them to the main menu afterwards by calling

    -- mainMenu recursively

    case sel of

         "1" -> lookupUserName >> mainMenu maps

         "2" -> lookupUID >> mainMenu maps

         "3" -> displayFile >> mainMenu maps

         "4" -> return ()

         _ -> putStrLn "Invalid selection" >> mainMenu maps



    where 

    lookupUserName = do

        putStrLn "Username: "

        username <- getLine

        case Map.lookup username usermap of

             Nothing -> putStrLn "Not found."

             Just x -> print x

    lookupUID = do

        putStrLn "UID: "

        uidstring <- getLine

        case Map.lookup (read uidstring) uidmap of

             Nothing -> putStrLn "Not found."

             Just x -> print x

    displayFile = 

        putStr . unlines . map (show . snd) . Map.toList $ uidmap

    optionText = 

          "\npasswdmap options:\n\

           \\n\

           \1   Look up a user name\n\

           \2   Look up a UID\n\

           \3   Display entire file\n\

           \4   Quit\n\n\

           \Your selection: "
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extended Example: Numeric Types
Inhaltsvorschau
We’ve told you how powerful and expressive Haskell’s type system is. We’ve shown you a lot of ways to use that power. Here’s a chance to really see that in action.
Back in , we showed the numeric typeclasses that come with Haskell. Let’s see what we can do by defining new types and utilizing the numeric typeclasses to integrate them with basic mathematics in Haskell.
To begin let’s think through what we’d like to see out of ghci when we interact with our new types. To start with, it might be nice to render numeric expressions as strings, making sure to indicate proper precedence. Perhaps we could create a function called prettyShow to do that. We’ll show you how to write it in a bit, but first we’ll look at how we might use it:
:l num.hs

[1 of 1] Compiling Main             ( num.hs, interpreted )

Ok, modules loaded: Main.

5 + 1 * 3

8

prettyShow $ 5 + 1 * 3

"5+(1*3)"

prettyShow $ 5 * 1 + 3

"(5*1)+3"
That looks nice, but it wasn’t all that smart. We could easily simplify out the 1 * part of the expression. How about a function to do some very basic simplification?
prettyShow $ simplify $ 5 + 1 * 3

"5+3"

How about converting a numeric expression to Reverse Polish Notation (RPN)? RPN is a postfix notation that never requires parentheses and is commonly found on HP calculators. RPN is a stack-based notation. We push numbers onto the stack, and when we enter operations, they pop the most recent numbers off the stack and place the result on the stack:
rpnShow $ 5 + 1 * 3

"5 1 3 * +"

rpnShow $ simplify $ 5 + 1 * 3

"5 3 +"
Maybe it would be nice to be able to represent simple expressions with symbols for the unknowns:
prettyShow $ 5 + (Symbol "x") * 3

"5+(x*3)"

It’s often important to track units of measure when working with numbers. For instance, when you see the number 5, does it mean 5 meters, 5 feet, or 5 bytes? Of course, if you divide 5 meters by 2 seconds, the system ought to be able to figure out the appropriate units. Moreover, it should stop you from adding 2 seconds to 5 meters:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Taking Advantage of Functions as Data
Inhaltsvorschau
In an imperative language, appending two lists is cheap and easy. Here’s a simple C structure in which we maintain a pointer to the head and tail of a list:
struct list {

    struct node *head, *tail;

};
When we have one list and want to append another list onto its end, we modify the last node of the existing list to point to its node, and then update its pointer to point to its node.
Obviously, this approach is off limits to us in Haskell if we want to stay pure. Since pure data is immutable, we can’t go around modifying lists in place. Haskell’s (++) operator appends two lists by creating a new one:
-- file: ch13/Append.hs

(++) :: [a] -> [a] -> [a]

(x:xs) ++ ys = x : xs ++ ys

_      ++ ys = ys
From inspecting the code, we can see that the cost of creating a new list depends on the length of the initial one.
We often need to append lists over and over in order to construct one big list. For instance, we might be generating the contents of a web page as a String, emitting a chunk at a time as we traverse some data structure. Each time we have a chunk of markup to add to the page, we will naturally want to append it onto the end of our existing String.
If a single append has a cost proportional to the length of the initial list, and each repeated append makes the initial list longer, we end up in an unhappy situation: the cost of all of the repeated appends is proportional to the square of the length of the final list.
To understand this, let’s dig in a little. The (++) operator is right-associative:
:info (++)

(++) :: [a] -> [a] -> [a] 	-- Defined in GHC.Base

infixr 5 ++

This means that a Haskell implementation will evaluate the expression as though we had put parentheses around it as follows: . This makes good performance sense, because it keeps the left operand as short as possible.
When we repeatedly append onto the end of a list, we defeat this associativity. Let’s say we start with the list and append , and save the result as our new list. If we later append onto this new list, our left operand is now . In this scheme, every time we append, our left operand gets longer.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
General-Purpose Sequences
Inhaltsvorschau
Both Haskell’s built-in list type and the DList type that we defined earlier have poor performance characteristics under some circumstances. The module defines a Seq container type that gives good performance for a wider variety of .
As with other modules, is intended to be used via qualified import:
-- file: ch13/DataSequence.hs

import qualified Data.Sequence as Seq
We can construct an empty Seq using empty and a single-element container using singleton:
Seq.empty

Loading package array-0.1.0.0 ... linking ... done.

Loading package containers-0.1.0.2 ... linking ... done.

fromList []

Seq.singleton 1

fromList [1]
We can create a Seq from a list using fromList:
        

        let a = Seq.fromList [1,2,3]
The module provides some constructor functions in the form of operators. When we perform a qualified import, we must qualify the name of an operator in our code (which is ugly):
1 Seq.<| Seq.singleton 2

fromList [1,2]

If we import the operators explicitly, we can avoid the need to qualify them:
-- file: ch13/DataSequence.hs

import Data.Sequence ((><), (<|), (|>))
By removing the qualification from the operator, we improve the readability of our code:
Seq.singleton 1 |> 2

fromList [1,2]

A useful way to remember the (<|) and (|>) functions is that the "arrow" points to the element we’re adding to the . The element will be added on the side to which the arrow points: (<|) adds on the left, (|>) on the right.
Both adding on the left and adding on the right are constant-time operations. Appending two s is also cheap, occurring in time proportional to the logarithm of whichever is shorter. To append, we use the (><) operator:
let left = Seq.fromList [1,3,3]let right = Seq.fromList [7,1]left >< right

fromList [1,3,3,7,1]
If we want to create a list from a Seq, we must use the module, which is best imported qualified:
-- file: ch13/DataSequence.hs

import qualified Data.Foldable as Foldable
This module defines a typeclass, Foldable, which Seq implements:
Foldable.toList (Seq.fromList [1,2,3])

[1,2,3]

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 14: Monads
Inhaltsvorschau
In , we talked about the IO monad, but we intentionally kept the discussion narrowly focused on how to communicate with the outside world. We didn’t discuss what a monad is.
We’ve already seen in that the IO monad is easy to work with. Notational differences aside, writing code in the IO monad isn’t much different from coding in any other imperative language.
When we had practical problems to solve in earlier chapters, we introduced structures that, as we will soon see, are actually monads. We aim to show you that a monad is often an obvious and useful tool to help solve a problem. We’ll define a few monads in this chapter, to show how easy it is.
Let’s take another look at the parseP5 function that we wrote in :
-- file: ch10/PNM.hs

matchHeader :: L.ByteString -> L.ByteString -> Maybe L.ByteString



-- "nat" here is short for "natural number"

getNat :: L.ByteString -> Maybe (Int, L.ByteString)



getBytes :: Int -> L.ByteString

         -> Maybe (L.ByteString, L.ByteString)



parseP5 s =

  case matchHeader (L8.pack "P5") s of

    Nothing -> Nothing

    Just s1 ->

      case getNat s1 of

        Nothing -> Nothing

        Just (width, s2) ->

          case getNat (L8.dropWhile isSpace s2) of

            Nothing -> Nothing

            Just (height, s3) ->

              case getNat (L8.dropWhile isSpace s3) of

                Nothing -> Nothing

                Just (maxGrey, s4)

                  | maxGrey > 255 -> Nothing

                  | otherwise ->

                      case getBytes 1 s4 of

                        Nothing -> Nothing

                        Just (_, s5) ->

                          case getBytes (width * height) s5 of

                            Nothing -> Nothing

                            Just (bitmap, s6) ->

                              Just (Greymap width height maxGrey bitmap, s6)
When we introduced this function, it threatened to march off the right side of the page if it got much more complicated. We brought the staircasing under control using the (>>?) function:
-- file: ch10/PNM.hs

(>>?) :: Maybe a -> (a -> Maybe b) -> Maybe b

Nothing >>? _ = Nothing

Just v  >>? f = f v
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Inhaltsvorschau
In , we talked about the IO monad, but we intentionally kept the discussion narrowly focused on how to communicate with the outside world. We didn’t discuss what a monad is.
We’ve already seen in that the IO monad is easy to work with. Notational differences aside, writing code in the IO monad isn’t much different from coding in any other imperative language.
When we had practical problems to solve in earlier chapters, we introduced structures that, as we will soon see, are actually monads. We aim to show you that a monad is often an obvious and useful tool to help solve a problem. We’ll define a few monads in this chapter, to show how easy it is.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Revisiting Earlier Code Examples
Inhaltsvorschau
Let’s take another look at the parseP5 function that we wrote in :
-- file: ch10/PNM.hs

matchHeader :: L.ByteString -> L.ByteString -> Maybe L.ByteString



-- "nat" here is short for "natural number"

getNat :: L.ByteString -> Maybe (Int, L.ByteString)



getBytes :: Int -> L.ByteString

         -> Maybe (L.ByteString, L.ByteString)



parseP5 s =

  case matchHeader (L8.pack "P5") s of

    Nothing -> Nothing

    Just s1 ->

      case getNat s1 of

        Nothing -> Nothing

        Just (width, s2) ->

          case getNat (L8.dropWhile isSpace s2) of

            Nothing -> Nothing

            Just (height, s3) ->

              case getNat (L8.dropWhile isSpace s3) of

                Nothing -> Nothing

                Just (maxGrey, s4)

                  | maxGrey > 255 -> Nothing

                  | otherwise ->

                      case getBytes 1 s4 of

                        Nothing -> Nothing

                        Just (_, s5) ->

                          case getBytes (width * height) s5 of

                            Nothing -> Nothing

                            Just (bitmap, s6) ->

                              Just (Greymap width height maxGrey bitmap, s6)
When we introduced this function, it threatened to march off the right side of the page if it got much more complicated. We brought the staircasing under control using the (>>?) function:
-- file: ch10/PNM.hs

(>>?) :: Maybe a -> (a -> Maybe b) -> Maybe b

Nothing >>? _ = Nothing

Just v  >>? f = f v
We carefully chose the type of (>>?) to let us chain together functions that return a Maybe value. So long as the result type of one function matches the parameter of the next, we can chain functions returning Maybe together indefinitely. The body of (>>?) hides the details of whether the chain of functions we build is short-circuited somewhere, due to one returning , or whenever it is completely evaluated.
Useful as (>>?) was for cleaning up the structure of parseP5, we had to incrementally consume pieces of a string as we parsed it. This forced us to pass the current value of the string down our chain of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Looking for Shared Patterns
Inhaltsvorschau
When we look at the preceding examples in detail, they don’t seem to have much in common. Obviously, they’re both concerned with chaining functions together and hiding details to let us write tidier code. However, let’s take a step back and consider them in less detail.
First, let’s look at the type definitions:
-- file: ch14/Maybe.hs

data Maybe a = Nothing

             | Just a
-- file: ch10/Parse.hs

newtype Parse a = Parse {

      runParse :: ParseState -> Either String (a, ParseState)

    }
The common feature of these two types is that each has a single type parameter on the left of the definition, which appears somewhere on the right. These are thus generic types, which know nothing about their payloads.
Next, we’ll examine the chaining functions that we wrote for the two types:
:type (>>?)

(>>?) :: Maybe a -> (a -> Maybe b) -> Maybe b

:type (==>)

(==>) :: Parse a -> (a -> Parse b) -> Parse b

These functions have strikingly similar types. If we were to turn those type constructors into a type variable, we’d end up with a single more abstract type:
-- file: ch14/Maybe.hs

chain :: m a -> (a -> m b) -> m b
Finally, in each case, we have a function that takes a "plain" value and "injects" it into the target type. For Maybe, this function is simply the value constructor , but the injector for Parse is more complicated:
-- file: ch10/Parse.hs

identity :: a -> Parse a

identity a = Parse (\s -> Right (a, s))
Again, it’s not the details or complexity that we’re interested in, it’s the fact that each of these types has an "injector" function, which looks like this:
-- file: ch14/Maybe.hs

inject :: a -> m a
It is exactly these three properties, and a few rules about how we can use them together, that define a monad in Haskell. Let’s revisit the preceding list in condensed form:
  • A type constructor m.
  • A function of type m a -> (a -> m b) -> m b for chaining the output of one function into the input of another.
  • A function of type a -> m a for injecting a normal value into the chain, that is, it wraps a type
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Monad Typeclass
Inhaltsvorschau
We can capture the notions of chaining and injection, and the types that we want them to have, in a Haskell typeclass. The standard Prelude already defines just such a typeclass, named :
-- file: ch14/Maybe.hs

class Monad m where

    -- chain

    (>>=)  :: m a -> (a -> m b) -> m b

    -- inject

    return :: a -> m a
Here, (>>=) is our chaining function. We’ve already been introduced to it in . It’s often referred to as bind, as it binds the result of the computation on the left to the parameter of the one on the right.
Our injection function is return. As we noted in , the choice of the name return is a little unfortunate. That name is widely used in imperative languages, where it has a fairly well-understood meaning. In Haskell, its behavior is much less constrained. In particular, calling return in the middle of a chain of functions won’t cause the chain to exit early. A useful way to link its behavior to its name is that it returns a pure value (of type a) into a monad (of type m a). But really, “inject” would be a better name.
While (>>=) and return are the core functions of the typeclass, it also defines two other functions. The first is (>>). Like (>>=), it performs chaining, but it ignores the value on the left:
-- file: ch14/Maybe.hs

    (>>) :: m a -> m b -> m b

    a >> f = a >>= \_ -> f
We use this function when we want to perform actions in a certain order, but don’t care what the result of one is. This might seem pointless: why would we not care what a function’s return value is? Recall, though, that we defined a (==>&) combinator earlier to express exactly this. Alternatively, consider a function such as print, which provides a placeholder result that we do not need to inspect:
:type print "foo"

print "foo" :: IO ()

If we use plain (>>=), we have to provide, as its righthand side, a function that ignores its argument:
print "foo" >>= \_ -> print "bar"

"foo"

"bar"

But if we use (>>), we can omit the needless function:
print "baz" >> print "quux"

"baz"

"quux"

As we just showed, the default implementation of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
And Now, a Jargon Moment
Inhaltsvorschau
There are a few terms of art around monads that you may not be familiar with. These aren’t formal, but they’re commonly used, so it’s helpful to know about them:
  • Monadic simply means “pertaining to monads.” A monadic type is an instance of the typeclass; a monadic value has a monadic type.
  • When we say that a type “is a monad,” this is really a shorthand way of saying that it’s an instance of the typeclass. Being an instance of gives us the necessary monadic triple of type constructor, injection function, and chaining .
  • In the same way, a reference to "the Foo monad" implies that we’re talking about the type named Foo and that it’s an instance of .
  • An action is another name for a monadic value. This use of the word probably originated with the introduction of monads for I/O, where a monadic value such as can have an observable side effect. A function with a monadic return type might also be referred to as an action, though this is a little less common.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using a New Monad: Show Your Work!
Inhaltsvorschau
In our introduction to monads, we showed how some preexisting code was already monadic in form. Now that we are beginning to grasp what a monad is and have seen the typeclass, let’s build a monad with foreknowledge of what we’re doing. We’ll start out by defining its interface, and then we’ll put it to use. Once we have those out of the way, we’ll finally build it.
Pure Haskell code is wonderfully clean to write, but, of course, it can’t perform I/O. Sometimes, we’d like to have a record of decisions we made, without writing log to a file. Let’s develop a small library to help with this.
Recall the globToRegex function that we developed in . We will modify it so that it keeps a record of each of the special pattern sequences that it translates. We are revisiting familiar territory for a reason: it lets us compare nonmonadic and monadic versions of the same code.
To start off, we’ll wrap our result type with a type constructor:
-- file: ch14/Logger.hs

globToRegex :: String -> Logger String
We’ll intentionally keep the internals of the Logger module abstract:
-- file: ch14/Logger.hs

module Logger

    (

      Logger

    , Log

    , runLogger

    , record

    ) where
Hiding the details like this has two benefits: it grants us considerable flexibility in how we implement our monad, and more importantly, it gives users a simple interface.
Our Logger type is purely a type constructor. We don’t export the value constructor that a user would need to create a value of this type. All they can use Logger for is writing type signatures.
The Log type is just a synonym for a list of strings, to make a few signatures more readable. We use a list of strings to keep the implementation simple:
-- file: ch14/Logger.hs

type Log = [String]
Instead of giving our users a value constructor, we provide them with a function, runLogger, that evaluates a logged action. This returns both the result of an action and whatever was logged while the result was being computed:
-- file: ch14/Logger.hs

runLogger :: Logger a -> (a, Log)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Mixing Pure and Monadic Code
Inhaltsvorschau
Based on the code we’ve seen so far, monads seem to have a substantial shortcoming: the type constructor that wraps a monadic value makes it tricky to use a normal, pure function on a value trapped inside a monadic wrapper. Here’s a simple illustration of the apparent problem. Let’s say we have a trivial piece of code that runs in the Logger monad and returns a string:
        

        let m = return "foo" :: Logger String
If we want to find out the length of that string, we can’t simply call length. The string is wrapped, so the types don’t match up:
length m



<interactive>:1:7:

    Couldn't match expected type `[a]'

           against inferred type `Logger String'

    In the first argument of `length', namely `m'

    In the expression: length m

    In the definition of `it': it = length m

So far, to work around this, we’ve something like the following:
:type   m >>= \s -> return (length s)

m >>= \s -> return (length s) :: Logger Int

We use (>>=) to unwrap the string, and then write a small anonymous function that calls length and rewraps the result using return.
This need crops up often in Haskell code. You won’t be surprised to learn that a shorthand already exists: we use the lifting technique that we introduced for functors in . Lifting a pure function into a functor usually involves unwrapping the value inside the functor, calling the function on it, and rewrapping the result with the same constructor.
We do exactly the same thing with a monad. Because the typeclass already provides the (>>=) and return functions that know how to unwrap and wrap a value, the liftM function doesn’t need to know any details of a monad’s implementation:
-- file: ch14/Logger.hs

liftM :: (Monad m) => (a -> b) -> m a -> m b

liftM f m = m >>= \i ->

            return (f i)
When we declare a type to be an instance of the typeclass, we have to write our own version of fmap specially tailored to that type. By contrast, liftM doesn’t need to know anything of a monad’s internals, because they’re abstracted by
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Putting a Few Misconceptions to Rest
Inhaltsvorschau
We’ve now seen enough examples of monads in action to have some feel for what’s going on. Before we continue, there are a few oft-repeated myths about monads that we’re going to address. You’re bound to encounter these assertions “in the wild,” so you might as well be prepared with a few good retorts:
Monads can be hard to understand
We’ve already shown that monads "fall out naturally" from several problems. We’ve found that the best key to understanding them is to explain several concrete examples, and then talk about what they have in common.
Monads are only useful for I/O and imperative coding
While we use monads for in Haskell, they’re valuable for many other purposes as well. We’ve already used them for short-circuiting a chain of computations, hiding complicated state, and logging. Even so, we’ve barely scratched the surface.
Monads are unique to Haskell
Haskell is probably the language that makes the most explicit use of monads, but people write them in other languages, too, ranging from C++ to OCaml. They happen to be particularly tractable in Haskell, due to do notation, the power and inference of the type system, and the language’s syntax.
Monads are for controlling the order of evaluation
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Building the Logger Monad
Inhaltsvorschau
The definition of our Logger type is very simple:
-- file: ch14/Logger.hs

newtype Logger a = Logger { execLogger :: (a, Log) }
It’s a pair, where the first element is the result of an action, and the second is a list of messages logged while that action was run.
We’ve wrapped the tuple in a to make it a distinct type. The runLogger function extracts the tuple from its wrapper. The function that we’re exporting to execute a logged action, runLogger, is just a synonym for execLogger:
-- file: ch14/Logger.hs

runLogger = execLogger
Our record helper function creates a singleton list of the message that we pass it:
-- file: ch14/Logger.hs

record s = Logger ((), [s])
The result of this action is , so that’s the value we put in the result slot.
Let’s begin our instance with return, which is trivial. It logs nothing and stores its input in the result slot of the tuple:
-- file: ch14/Logger.hs

instance Monad Logger where

    return a = Logger (a, [])
Slightly more interesting is (>>=), which is the heart of the monad. It combines an action and a monadic function to give a new result and a new log:
-- file: ch14/Logger.hs

    -- (>>=) :: Logger a -> (a -> Logger b) -> Logger b

    m >>= k = let (a, w) = execLogger m

                  n      = k a

                  (b, x) = execLogger n

              in Logger (b, w ++ x)
Let’s spell out explicitly what is going on. We use runLogger to extract the result a from the action m, and we pass it to the monadic function k. We extract the result b from that in turn, and put it into the result slot of the final action. We concatenate the logs w and x to give the new log.
Our definition of (>>=) ensures that messages logged on the left will appear in the new log before those on the right. However, it says nothing about when the values a and b are evaluated: (>>=) is lazy.
Like most other aspects of a monad’s behavior, strictness is under the control of the its implementor. It is not a constant shared by all monads. Indeed, some monads come in multiple flavors, each with different levels of strictness.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Maybe Monad
Inhaltsvorschau
The Maybe type is very nearly the simplest instance of . It represents a computation that might not produce a result:
-- file: ch14/Maybe.hs

instance Monad Maybe where

    Just x >>= k  =  k x

    Nothing >>= _ =  Nothing



    Just _ >> k   =  k

    Nothing >> _  =  Nothing



    return x      =  Just x



    fail _        =  Nothing
If, when we chain together a number of computations over Maybe using (>>=) or (>>), any of them returns , we don’t evaluate any of the remaining .
Note, though, that the chain is not completely short-circuited. Each (>>=) or (>>) in the chain will still match a on its left and produce a Nothing on its right, all the way to the end. It’s easy to forget this point: when a computation in the chain fails, the subsequent production, chaining, and consumption of values are cheap at runtime, but they’re not free.
A function suitable for executing the Maybe monad is maybe. (Remember that "executing" a monad involves evaluating it and returning a result that’s had the monad’s type wrapper removed.)
-- file: ch14/Maybe.hs

maybe :: b -> (a -> b) -> Maybe a -> b

maybe n _ Nothing  = n

maybe _ f (Just x) = f x
Its first parameter is the value to return if the result is . The second is a function to apply to a result wrapped in the constructor; the result of that application is then returned.
Since the Maybe type is so simple, it’s about as common to simply pattern match on a Maybe value as it is to call maybe. Each one is more readable in different circumstances.
Here’s an example of Maybe in use as a monad. Given a customer’s name, we want to find the billing address of her mobile phone carrier:
-- file: ch14/Carrier.hs

import qualified Data.Map as M



type PersonName = String

type PhoneNumber = String

type BillingAddress = String

data MobileCarrier = Honest_Bobs_Phone_Network

                   | Morrisas_Marvelous_Mobiles

                   | Petes_Plutocratic_Phones

                     deriving (Eq, Ord)



findCarrierBillingAddress :: PersonName

                          -> M.Map PersonName PhoneNumber

                          -> M.Map PhoneNumber MobileCarrier

                          -> M.Map MobileCarrier BillingAddress

                          -> Maybe BillingAddress
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The List Monad
Inhaltsvorschau
While the type can represent either no value or one, there are many situations where we might want to return some number of results that we do not know in advance. Obviously, a list is well suited to this purpose. The type of a list suggests that we might be able to use it as a monad, because its type constructor has one free variable. And sure enough, we can use a list as a monad.
Rather than simply present the Prelude’s instance for the list type, let’s try to figure out what an instance ought to look like. This is easy to do: we’ll look at the types of (>>=) and return, perform some substitutions, and see if we can use a few familiar list functions.
The more obvious of the two functions is return. We know that it takes a type a, and wraps it in a type constructor m to give the type m a. We also know that the type constructor here is []. Substituting this type constructor for the type variable m gives us the type [] a (yes, this really is valid notation!), which we can rewrite in more familiar form as [a].
We now know that return for lists should have the type . There are only a few sensible possibilities for an implementation of this function. It might return the empty list, a singleton list, or an infinite list. The most appealing behavior, based on what we know so far about monads, is the singleton list—it doesn’t throw away information, nor does it repeat it infinitely:
-- file: ch14/ListMonad.hs

returnSingleton :: a -> [a]

returnSingleton x = [x]
If we perform the same substitution trick on the type of (>>=) as we did with return, we discover that it should have the type . This seems close to the type of map:
:type (>>=)

(>>=) :: (Monad m) => m a -> (a -> m b) -> m b

:type map

map :: (a -> b) -> [a] -> [b]
The ordering of the types in map’s arguments doesn’t match, but that’s easy to fix:
:type (>>=)

(>>=) :: (Monad m) => m a -> (a -> m b) -> m b

:type flip map

flip map :: [a] -> (a -> b) -> [b]
We’ve still got a problem: the second argument of has the type , whereas the second argument of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Desugaring of do Blocks
Inhaltsvorschau
Haskell’s do syntax is an example of syntactic sugar: it provides an alternative way of writing monadic code, without using (>>=) and anonymous functions. Desugaring is the translation of syntactic sugar back to the core language.
The rules for desugaring a do block are easy to follow. We can think of a compiler as applying these rules mechanically and repeatedly to a do block until no more do keywords remain.
A do keyword followed by a single action is translated to that action by itself:
-- file: ch14/Do.hs            -- file: ch14/Do.hs 

doNotation1 =                  translated1 =

    do act                         act
A do keyword followed by more than one action is translated to the first action, then (>>), followed by a do keyword and the remaining actions. When we apply this rule repeatedly, the entire do block ends up chained together by applications of (>>):
-- file: ch14/Do.hs            -- file: ch14/Do.hs

doNotation2 =                  translated2 =

    do act1                        act1 >>

       act2                        do act2

       {- ... etc. -}                 {- ... etc. -}

       actN                           actN



                              finalTranslation2 =

                                  act1 >>

                                  act2 >>

                                  {- ... etc. -}

                                  actN
The <- notation has a translation that’s worth paying close attention to. On the left of the <- is a normal Haskell pattern. This can be a single variable or something more complicated, but a guard expression is not allowed:
-- file: ch14/Do.hs            -- file: ch14/Do.hs

doNotation3 =                  translated3 =

    do pattern <- act1             let f pattern = do act2

       act2                                           let f pattern = do act2

       {- ... etc. -}                                 actN

       actN                            f _     = fail "..."

                                   in act1 >>= f
This pattern is translated into a let binding that declares a local function with a unique name (we’re just using
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The State Monad
Inhaltsvorschau
We discovered earlier in this chapter that Parse from was a monad. It has two logically distinct aspects. One is the idea of a parse failing and providing a message with the details (we represented this using the Either type). The other involves carrying around a piece of implicit state, in our case, the partially consumed ByteString.
This need for a way to read and write state is common enough in Haskell programs that the standard libraries provide a monad named State that is dedicated to this purpose. This monad lives in the module.
Where our Parse type carried around a ByteString as its piece of state, the State monad can carry any type of state. We’ll refer to the state’s unknown type as s.
What’s an obvious and general thing we might want to do with a state? Given a state value, we inspect it, and then produce a result and a new state value. Let’s say the result can be of any type a. A type signature that captures this idea is s -> (a, s). Take a state s, do something with it, and return a result a and possibly a new state s.
Let’s develop some simple code that’s almost the State monad, and then take a look at the real thing. We’ll start with our type definition, which has exactly the obvious type that we just described:
-- file: ch14/SimpleState.hs

type SimpleState s a = s -> (a, s)
Our monad is a function that transforms one state into another, yielding a result when it does so. Because of this, the State monad is sometimes called the state transformer monad.
Yes, this is a type synonym, not a new type, and so we’re cheating a little. Bear with us for now; this simplifies the description that follows.
Earlier in this chapter, we said that a monad has a type constructor with a single type variable, and yet here we have a type with two parameters. The key is to understand that we can partially apply a type just as we can partially apply a normal function. This is easiest to follow with an example:
-- file: ch14/SimpleState.hs

type StringState a = SimpleState String a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Monads and Functors
Inhaltsvorschau
Functors and monads are closely related. The terms are borrowed from a branch of mathematics called category theory, but they did not make the transition to Haskell completely unscathed.
In category theory, a monad is built from a functor. You might expect that in Haskell, the typeclass would thus be a subclass of , but it isn’t defined as such in the standard Prelude—an unfortunate oversight.
However, authors of Haskell libraries use a workaround: when programmers define an instance of for a type, they almost always write a instance for it, too. You can expect that you’ll be able to use the typeclass’s fmap function with any monad.
If we compare the type signature of fmap with those of some of the standard monad functions that we’ve already seen, we get a hint as to what fmap on a monad does:
:type fmap

fmap :: (Functor f) => (a -> b) -> f a -> f b

:module +Control.Monad

:type liftM

liftM :: (Monad m) => (a1 -> r) -> m a1 -> m r
Sure enough, fmap lifts a pure function into the monad, just as liftM does.
Now that we know about the relationship between functors and monads, if we look back at the list monad, we can see something interesting. Specifically, take a look at the definition of (>>=) for lists:
-- file: ch14/ListMonad.hs

instance Monad [] where

    return x = [x]

    xs >>= f = concat (map f xs)
Recall that f has type a -> [a]. When we call , we get back a value of type [[a]], which we have to "flatten" using .
Consider what we could do if was a subclass of Functor. Since fmap for lists is defined to be map, we could replace map with fmap in the definition of (>>=). This is not very interesting by itself, but suppose we go further.
The concat function is of type [[a]] -> [a]. As we mentioned, it flattens the nesting of lists. We could generalize this type signature from lists to monads, giving us the "remove a level of nesting" type m (m a) -> m a. The function that has this type is conventionally named join.
If we had definitions of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Monad Laws and Good Coding Style
Inhaltsvorschau
In , we introduced two rules for how functors should always behave:
-- file: ch14/MonadLaws.hs

fmap id        ==   id 

fmap (f . g)   ==   fmap f . fmap g
There are also rules for how monads ought to behave. The three laws described in the following paragraphs are referred to as the monad laws. A Haskell implementation doesn’t enforce these laws—it’s up to the author of a instance to follow them.
The monad laws are simply formal ways of saying “a monad shouldn’t surprise me.” In principle, we could probably get away with skipping over them entirely. It would be a shame if we did, however, because the laws contain gems of wisdom that we might otherwise overlook.
You can read each of the following laws as "the expression on the left of the is equivalent to that on the right."
The first law states that return is a left identity for (>>=):
-- file: ch14/MonadLaws.hs

return x >>= f            ===   f x
Another way to phrase this is that there’s no reason to use return to wrap up a pure value if all you’re going to do is unwrap it again with (>>=). It’s actually a common style error among programmers new to monads to wrap a value with return, and then unwrap it with (>>=) a few lines later in the same function. Here’s the same law written with do notation:
-- file: ch14/MonadLaws.hs

do y <- return x

   f y                    ===   f x
This law has practical consequences for our coding style: we don’t want to write unnecessary code, and the law lets us assume that the terse code will be identical in its effect to the more verbose version.
The second monad law states that return is a right identity for (>>=):
-- file: ch14/MonadLaws.hs

m >>= return              ===   m
This law also has style consequences in real programs, particularly if you’re coming from an imperative language: there’s no need to use return if the last action in a block would otherwise be returning the correct result. Let’s look at this law in do notation:
-- file: ch14/MonadLaws.hs

do y <- m

   return y               ===   m
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 15: Programming with Monads
Inhaltsvorschau
Web clients and servers often pass information around as a simple textual list of key-value pairs:
name=Attila+%42The+Hun%42&occupation=Khan
The encoding is named , and it’s easy to understand. Each key-value pair is separated by an & character. Within a pair, a key is a series of characters, followed by an =, followed by a value.
We can obviously represent a key as a String, but the HTTP specification is not clear about whether a key must be followed by a value. We can capture this ambiguity by representing a value as a Maybe String. If we use for a value, then there is no value present. If we wrap a string in , then there is a value. Using Maybe lets us distinguish between "no value" and “empty value.”
Haskell programmers use the name association list for the type [(a, b)], where we can think of each element as an association between a key and a value. The name originates in the Lisp community, where it’s usually abbreviated as an alist. We could thus represent the preceding string as the following Haskell value:
-- file: ch15/MovieReview.hs

    [("name",       Just "Attila \"The Hun\""),

     ("occupation", Just "Khan")]
In , we’ll parse an string, and we will represent the result as an alist of [(String, Maybe String)]. Let’s say we want to use one of these alists to fill out a data structure:
-- file: ch15/MovieReview.hs

data MovieReview = MovieReview {

      revTitle :: String

    , revUser :: String

    , revReview :: String

    }
We’ll begin by belaboring the obvious with a naive function:
-- file: ch15/MovieReview.hs

simpleReview :: [(String, Maybe String)] -> Maybe MovieReview

simpleReview alist =

  case lookup "title" alist of

    Just (Just title@(_:_)) ->

      case lookup "user" alist of

        Just (Just user@(_:_)) ->

          case lookup "review" alist of

            Just (Just review@(_:_)) ->

                Just (MovieReview title user review)

            _ -> Nothing -- no review

        _ -> Nothing -- no user

    _ -> Nothing -- no title
It returns a MovieReview
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Golfing Practice: Association Lists
Inhaltsvorschau
Web clients and servers often pass information around as a simple textual list of key-value pairs:
name=Attila+%42The+Hun%42&occupation=Khan
The encoding is named , and it’s easy to understand. Each key-value pair is separated by an & character. Within a pair, a key is a series of characters, followed by an =, followed by a value.
We can obviously represent a key as a String, but the HTTP specification is not clear about whether a key must be followed by a value. We can capture this ambiguity by representing a value as a Maybe String. If we use for a value, then there is no value present. If we wrap a string in , then there is a value. Using Maybe lets us distinguish between "no value" and “empty value.”
Haskell programmers use the name association list for the type [(a, b)], where we can think of each element as an association between a key and a value. The name originates in the Lisp community, where it’s usually abbreviated as an alist. We could thus represent the preceding string as the following Haskell value:
-- file: ch15/MovieReview.hs

    [("name",       Just "Attila \"The Hun\""),

     ("occupation", Just "Khan")]
In , we’ll parse an string, and we will represent the result as an alist of [(String, Maybe String)]. Let’s say we want to use one of these alists to fill out a data structure:
-- file: ch15/MovieReview.hs

data MovieReview = MovieReview {

      revTitle :: String

    , revUser :: String

    , revReview :: String

    }
We’ll begin by belaboring the obvious with a naive function:
-- file: ch15/MovieReview.hs

simpleReview :: [(String, Maybe String)] -> Maybe MovieReview

simpleReview alist =

  case lookup "title" alist of

    Just (Just title@(_:_)) ->

      case lookup "user" alist of

        Just (Just user@(_:_)) ->

          case lookup "review" alist of

            Just (Just review@(_:_)) ->

                Just (MovieReview title user review)

            _ -> Nothing -- no review

        _ -> Nothing -- no user

    _ -> Nothing -- no title
It returns a MovieReview only if the alist contains all of the necessary values, and they’re all nonempty strings. However, the fact that it validates its inputs is its only merit. It suffers badly from the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Generalized Lifting
Inhaltsvorschau
Although using liftM3 tidies up our code, we can’t use a liftM-family function to solve this sort of problem in general, because the standard libraries define them only up to liftM5. We could write variants up to whatever number we pleased, but that would amount to drudgery.
If we had a constructor or pure function that takes, say, 10 parameters, and decided to stick with the standard libraries, you might think we’d be out of luck.
Of course, our toolbox isn’t empty yet. In , there’s a function named ap with an interesting type signature:
:m +Control.Monad:type ap

ap :: (Monad m) => m (a -> b) -> m a -> m b
You might wonder who would put a single-argument pure function inside a monad, and why. Recall, however, that all Haskell functions really take only one argument, and you’ll begin to see how this might relate to the constructor:
:type MovieReview

MovieReview :: String -> String -> String -> MovieReview

We can just as easily write that type as:
String -> (String -> (String -> MovieReview))
If we use plain old liftM to lift into the monad, we’ll have a value of type:
Maybe (String -> (String -> (String -> MovieReview)))
We can now see that this type is suitable as an argument for ap, in which case, the result type will be:
        Maybe (String -> (String -> MovieReview))
We can pass this, in turn, to ap, and continue to chain until we end up with this :
-- file: ch15/MovieReview.hs

apReview alist =

    MovieReview `liftM` lookup1 "title" alist

                   `ap` lookup1 "user" alist

                   `ap` lookup1 "review" alist
We can chain applications of ap such as this as many times as we need to, thereby bypassing the liftM family of functions.
Another helpful way to look at ap is that it’s the monadic equivalent of the familiar ($) operator; think of pronouncing ap as apply. We can see this clearly when we compare the type signatures of the two functions:
:type ($)

($) :: (a -> b) -> a -> b

:type ap

ap :: (Monad m) => m (a -> b) -> m a -> m b
In fact, ap is usually defined as either or .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Looking for Alternatives
Inhaltsvorschau
Here’s a simple representation of a person’s phone numbers:
-- file: ch15/VCard.hs

data Context = Home | Mobile | Business

               deriving (Eq, Show)



type Phone = String



albulena = [(Home, "+355-652-55512")]



nils = [(Mobile, "+47-922-55-512"), (Business, "+47-922-12-121"),

        (Home, "+47-925-55-121"), (Business, "+47-922-25-551")]



twalumba = [(Business, "+260-02-55-5121")]
Suppose we want to get in touch with someone to make a personal call. We don’t want his business number, and we’d prefer to use his home number (if he has one) instead of their mobile number:
-- file: ch15/VCard.hs

onePersonalPhone :: [(Context, Phone)] -> Maybe Phone

onePersonalPhone ps = case lookup Home ps of

                        Nothing -> lookup Mobile ps

                        Just n -> Just n
Of course, if we use Maybe as the result type, we can’t accommodate the possibility that someone might have more than one number that meets our criteria. For that, we switch to a list:
-- file: ch15/VCard.hs

allBusinessPhones :: [(Context, Phone)] -> [Phone]

allBusinessPhones ps = map snd numbers

    where numbers = case filter (contextIs Business) ps of

                      [] -> filter (contextIs Mobile) ps

                      ns -> ns



contextIs a (b, _) = a == b
Notice that these two functions structure their case expressions similarly—one alternative handles the case where the first lookup returns an empty result, while the other handles the nonempty case:
onePersonalPhone twalumba

Nothing

onePersonalPhone albulena

Just "+355-652-55512"

allBusinessPhones nils

["+47-922-12-121","+47-922-25-551"]
Haskell’s module defines a typeclass, MonadPlus, that lets us abstract the common pattern out of our case expressions:
-- file: ch15/VCard.hs

class Monad m => MonadPlus m where

   mzero :: m a	

   mplus :: m a -> m a -> m a
The value represents an empty result, while mplus combines two results into one. Here are the standard definitions of and mplus for Maybe and lists:
-- file: ch15/VCard.hs

instance MonadPlus [] where

   mzero = []

   mplus = (++)



instance MonadPlus Maybe where

   mzero = Nothing



   Nothing `mplus` ys  = ys

   xs      `mplus` _ = xs
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Adventures in Hiding the Plumbing
Inhaltsvorschau
In , we showed how to use the State monad to give ourselves access to random numbers in a way that is easy to use.
A drawback of the code we developed is that it’s leaky: Users know that they’re executing inside the State monad. This means that they can inspect and modify the state of the random number generator just as easily as we, the authors, can.
Human nature dictates that if we leave our internal workings exposed, someone will surely come along and monkey with them. For a sufficiently small program, this may be fine, but in a larger software project, when one consumer of a library modifies its internals in a way that other consumers are not prepared for, the resulting bugs can be among the most difficult to track down. These bugs occur at a level where we’re unlikely to question our basic assumptions about a library until long after we’ve exhausted all other avenues of inquiry.
Even worse, once we leave our implementation exposed for a while, and some well-intentioned person inevitably bypasses our APIs and uses the implementation directly, we have a nasty quandary if we need to fix a bug or make an enhancement. Either we can modify our internals and break code that depends on them; or we’re stuck with our existing internals and must try to find some other way to make the change that we need.
How can we revise our random number monad so that the fact that we’re using the State monad is hidden? We need to somehow prevent our users from being able to call or . This is not difficult to do, and it introduces some tricks that we’ll reuse often in day-to-day Haskell programming.
To widen our scope, we’ll move beyond random numbers and implement a monad that supplies unique values of any kind. The name we’ll give to our monad is Supply. We’ll provide the execution function, runSupply, with a list of values (it will be up to us to ensure that each one is unique):
-- file: ch15/Supply.hs

runSupply :: Supply s a -> [s] -> (a, [s])
The monad won’t care what the values are. They might be random numbers, or names for temporary files, or identifiers for HTTP cookies.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Separating Interface from Implementation
Inhaltsvorschau
In the previous section, we saw how to hide the fact that we’re using a State monad to hold the state for our Supply monad.
Another important way to make code more modular involves separating its interface (what the code can do) from its implementation—how it does it.
The standard random number generator in is known to be quite slow. If we use our randomsIO function to provide it with random numbers, then our next action will not perform well.
One simple and effective way that we could deal with this is to provide Supply with a better source of random numbers. Let’s set this idea aside, though, and consider an alternative approach, one that is useful in many settings. We will separate the actions we can perform with the monad from how it works using a typeclass:
-- file: ch15/SupplyClass.hs

class (Monad m) => MonadSupply s m | m -> s where

    next :: m (Maybe s)
This typeclass defines the interface that any supply monad must implement. It bears careful inspection, since it uses several unfamiliar Haskell language extensions. We will cover each one in the sections that follow.
How should we read the snippet in the typeclass? If we add parentheses, an equivalent expression is , which is a little clearer. In other words, given some type variable m that is a Monad, we can make it an instance of the typeclass MonadSupply s. Unlike a regular typeclass, this one has a parameter.
As this language extension allows a typeclass to have more than one parameter, its name is . The parameter s serves the same purpose as the Supply type’s parameter of the same name: it represents the type of the values handed out by the next function.
Notice that we don’t need to mention (>>=) or return in the definition of MonadSupply s, since the typeclass’s context (superclass) requires that a MonadSupply s must already be a Monad.
To revisit a snippet that we ignored earlier, is a functional dependency, often called a fundep. We can read the vertical bar as “such that,” and the arrow as “uniquely determines.” Our functional dependency establishes a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Reader Monad
Inhaltsvorschau
The State monad lets us plumb a piece of mutable state through our code. Sometimes, we would like to be able to pass some immutable state around, such as a program’s configuration data. We could use the State monad for this purpose, but we might then find ourselves accidentally modifying data that should remain unchanged.
Let’s forget about monads for a moment and think about what a function with our desired characteristics ought to do. It should accept a value of some type e (for ) that represents the data that we’re passing in, and return a value of some other type a as its result. The overall type we want is e -> a.
To turn this type into a convenient Monad instance, we’ll wrap it in a
-- file: ch15/SupplyInstance.hs

newtype Reader e a = R { runReader :: e -> a }
Making this into a Monad instance doesn’t take much work:
-- file: ch15/SupplyInstance.hs

instance Monad (Reader e) where

    return a = R $ \_ -> a

    m >>= k = R $ \r -> runReader (k (runReader m r)) r
We can think of our value of type e as an environment in which we’re evaluating some expression. The return action should have the same effect no matter what the environment is, so our version ignores its environment.
Our definition of (>>=) is a little more complicated, but only because we have to make the environment—here the variable r—available both in the current computation and in the computation we’re chaining into.
How does a piece of code executing in this monad find out what’s in its environment? It simply has to ask:
-- file: ch15/SupplyInstance.hs

ask :: Reader e e

ask = R id
Within a given chain of actions, every invocation of ask will return the same value, since the value stored in the environment doesn’t change. Our code is easy to test in ghci:
runReader (ask >>= \x -> return (x * 3)) 2

Loading package old-locale-1.0.0.0 ... linking ... done.

Loading package old-time-1.0.0.0 ... linking ... done.

Loading package random-1.0.0.0 ... linking ... done.

6

The Reader monad is included in the standard library, which is usually bundled with
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A Return to Automated Deriving
Inhaltsvorschau
Now that we know about the Reader monad, let’s use it to create an instance of our MonadSupply typeclass. To keep our example simple, we’ll violate the spirit of MonadSupply here: our next action will always return the same value, instead of always returning a different one.
It would be a bad idea to directly make the Reader type an instance of the MonadSupply class, because then any Reader could act as a MonadSupply. This would usually not make any sense.
Instead, we create a based on Reader. The hides the fact that we’re using Reader internally. We must now make our type an instance of both of the typeclasses we care about. With the extension enabled, GHC will do most of the hard work for us:
-- file: ch15/SupplyInstance.hs

newtype MySupply e a = MySupply { runMySupply :: Reader e a }

    deriving (Monad)



instance MonadSupply e (MySupply e) where

    next = MySupply $ do

             v <- ask

             return (Just v)



    -- more concise:

    -- next = MySupply (Just `liftM` ask)
Notice that we must make our type an instance of MonadSupply e, not MonadSupply. If we omit the type variable, the compiler will complain.
To try out our MySupply type, we’ll first create a simple function that should work with any MonadSupply instance:
-- file: ch15/SupplyInstance.hs

xy :: (Num s, MonadSupply s m) => m s

xy = do

  Just x <- next

  Just y <- next

  return (x * y)
If we use this with our Supply monad and randomsIO function, we get a different answer every time, as we expect:
(fst . runSupply xy) `fmap` randomsIO

3155268008533561605104245047686121354

(fst . runSupply xy) `fmap` randomsIO

1764220767702892260034822063450517650
Because our MySupply monad has two layers of wrapping, we can write a custom execution function for it to make it easier to use:
-- file: ch15/SupplyInstance.hs

runMS :: MySupply i a -> i -> a

runMS = runReader . runMySupply
When we apply our xy action using this execution function, we get the same answer every time. Our code remains the same, but because we are executing it in a different implementation of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Hiding the IO Monad
Inhaltsvorschau
The blessing and curse of the IO monad is that it is extremely powerful. If we believe that careful use of types helps us to avoid programming mistakes, then the IO monad should be a great source of unease. Because the IO monad imposes no restrictions on what we can do, it leaves us vulnerable to all kinds of accidents.
How can we tame its power? Let’s say that we would like guarantee to ourselves that a piece of code can read and write files on the local filesystem, but it will not access the network. We can’t use the plain IO monad, because it won’t restrict us.
Let’s create a module that provides a small set of functionality for reading and writing files:
-- file: ch15/HandleIO.hs

{-# LANGUAGE GeneralizedNewtypeDeriving #-}



module HandleIO

    (

      HandleIO

    , Handle

    , IOMode(..)

    , runHandleIO

    , openFile

    , hClose

    , hPutStrLn

    ) where

    

import System.IO (Handle, IOMode(..))

import qualified System.IO
Our first approach to creating a restricted version of IO is to wrap it with a :
-- file: ch15/HandleIO.hs

newtype HandleIO a = HandleIO { runHandleIO :: IO a }

    deriving (Monad)
We do the by now familiar trick of exporting the type constructor and the runHandleIO execution function from our module, but not the data constructor. This will prevent code running within the HandleIO monad from getting hold of the IO monad that it wraps.
All that remains is for us to wrap each of the actions that we want our monad to allow. This is a simple matter of wrapping each IO with a HandleIO data constructor:
-- file: ch15/HandleIO.hs

openFile :: FilePath -> IOMode -> HandleIO Handle

openFile path mode = HandleIO (System.IO.openFile path mode)



hClose :: Handle -> HandleIO ()

hClose = HandleIO . System.IO.hClose



hPutStrLn :: Handle -> String -> HandleIO ()

hPutStrLn h s = HandleIO (System.IO.hPutStrLn h s)
We can now use our restricted HandleIO monad to perform I/O:
-- file: ch15/HandleIO.hs

safeHello :: FilePath -> HandleIO ()

safeHello path = do

  h <- openFile path WriteMode

  hPutStrLn h "hello world"

  hClose h
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 16: Using Parsec
Inhaltsvorschau
Parsing a file, or data of various types, is a common task for programmers. We already learned about Haskell’s support for regular expressions back in . Regular expressions are nice for many tasks, but they rapidly become unwieldy, or cannot be used at all, when dealing with a complex data format. For instance, we cannot use regular expressions to parse source code from most programming languages.
Parsec is a useful parser combinator library, with which we combine small parsing functions to build more sophisticated parsers. Parsec provides some simple parsing functions, as well as functions to tie them all together. It should come as no surprise that this parser library for Haskell is built around the notion of functions.
It’s helpful to know where Parsec fits compared to the tools used for parsing in other languages. Parsing is sometimes divided into two stages: lexical analysis (the domain of tools such as flex) and parsing itself (performed by programs such as bison). Parsec can perform both lexical analysis and parsing.
Let’s jump right in and write some code for parsing a CSV file. CSV files are often used as a plain-text representation of spreadsheets or databases. Each line is a record, and each field in the record is separated from the next by a comma. There are ways of dealing with fields that contain commas, but we won’t worry about that now.
This first example is much longer than it really needs to be. We will soon introduce more Parsec features that will shrink the parser down to only four lines!
-- file: ch16/csv1.hs

import Text.ParserCombinators.Parsec



{- A CSV file contains 0 or more lines, each of which is terminated

   by the end-of-line character (eol). -}

csvFile :: GenParser Char st [[String]]

csvFile = 

    do result <- many line

       eof

       return result



-- Each line contains 1 or more cells, separated by a comma

line :: GenParser Char st [String]

line = 

    do result <- cells

       eol                       -- end of line

       return result

       

-- Build up a list of cells.  Try to parse the first cell, then figure out 

-- what ends the cell.

cells :: GenParser Char st [String]

cells = 

    do first <- cellContent

       next <- remainingCells

       return (first : next)



-- The cell either ends with a comma, indicating that 1 or more cells follow,

-- or it doesn't, indicating that we're at the end of the cells for this line

remainingCells :: GenParser Char st [String]

remainingCells =

    (char ',' >> cells)            -- Found comma?  More cells coming

    <|> (return [])                -- No comma?  Return [], no more cells



-- Each cell contains 0 or more characters, which must not be a comma or

-- EOL

cellContent :: GenParser Char st String

cellContent = 

    many (noneOf ",\n")

       



-- The end of line character is \n

eol :: GenParser Char st Char

eol = char '\n'



parseCSV :: String -> Either ParseError [[String]]

parseCSV input = parse csvFile "(unknown)" input
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
First Steps with Parsec: Simple CSV Parsing
Inhaltsvorschau
Let’s jump right in and write some code for parsing a CSV file. CSV files are often used as a plain-text representation of spreadsheets or databases. Each line is a record, and each field in the record is separated from the next by a comma. There are ways of dealing with fields that contain commas, but we won’t worry about that now.
This first example is much longer than it really needs to be. We will soon introduce more Parsec features that will shrink the parser down to only four lines!
-- file: ch16/csv1.hs

import Text.ParserCombinators.Parsec



{- A CSV file contains 0 or more lines, each of which is terminated

   by the end-of-line character (eol). -}

csvFile :: GenParser Char st [[String]]

csvFile = 

    do result <- many line

       eof

       return result



-- Each line contains 1 or more cells, separated by a comma

line :: GenParser Char st [String]

line = 

    do result <- cells

       eol                       -- end of line

       return result

       

-- Build up a list of cells.  Try to parse the first cell, then figure out 

-- what ends the cell.

cells :: GenParser Char st [String]

cells = 

    do first <- cellContent

       next <- remainingCells

       return (first : next)



-- The cell either ends with a comma, indicating that 1 or more cells follow,

-- or it doesn't, indicating that we're at the end of the cells for this line

remainingCells :: GenParser Char st [String]

remainingCells =

    (char ',' >> cells)            -- Found comma?  More cells coming

    <|> (return [])                -- No comma?  Return [], no more cells



-- Each cell contains 0 or more characters, which must not be a comma or

-- EOL

cellContent :: GenParser Char st String

cellContent = 

    many (noneOf ",\n")

       



-- The end of line character is \n

eol :: GenParser Char st Char

eol = char '\n'



parseCSV :: String -> Either ParseError [[String]]

parseCSV input = parse csvFile "(unknown)" input
Let’s take a look at the code for this example. We didn’t use many shortcuts here, so remember that this will get shorter and simpler!
We’ve built it from the top down, so our first function is
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The sepBy and endBy Combinators
Inhaltsvorschau
We promised you earlier that we could simplify our CSV parser significantly by using a few Parsec helper functions. There are two that will dramatically simplify this code.
The first tool is the sepBy function. This function takes two functions as arguments: the first parses some sort of content, while the second parses a separator. sepBy starts by trying to parse content, and then separators, and alternates back and forth until it can’t parse a separator. It returns a list of all the content that it was able to parse.
The second tool is endBy. It’s similar to sepBy, but expects the very last item to be followed by the separator. That is, it continues parsing until it can’t parse any more content.
So, we can use endBy to parse lines, since every line must end with the end-of-line character. We can use sepBy to parse cells, since the last cell will not end with a comma. Take a look at how much simpler our parser is now:
-- file: ch16/csv2.hs

import Text.ParserCombinators.Parsec



csvFile = endBy line eol

line = sepBy cell (char ',')

cell = many (noneOf ",\n")

eol = char '\n'



parseCSV :: String -> Either ParseError [[String]]

parseCSV input = parse csvFile "(unknown)" input
This program behaves exactly the same as the first one. We can verify this by using ghci to rerun our examples from the earlier example. We’ll get the same result from every one. Yet the program is much shorter and more readable. It won’t be long before you can translate Parsec code such as this into a file format definition in plain English. As you read over this code, you can see that:
  • A CSV file contains zero or more lines, each of which is terminated by the end-of-line character.
  • A line contains one or more cells, separated by a comma.
  • A cell contains zero or more characters, which must be neither the comma nor the character.
  • The end-of-line character is the newline, \n.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Choices and Errors
Inhaltsvorschau
Different operating systems use different characters to mark the end of line. Unix/ systems, and Windows in text mode, use simply "\n". DOS and Windows systems use "\r\n", and Macs traditionally use "\r". We could add support for "\n\r" too, just in case anybody uses that.
We could easily adapt our example to be able to handle all these types of line endings in a single file. We would need to make two modifications: adjust eol to recognize the different endings, and adjust the noneOf pattern in cell to ignore \r.
This must be done carefully. Recall that our earlier definition of eol was simply char '\n'. There is a parser called string that we can use to match the multicharacter patterns. Let’s start by thinking of how we would add support for \n\r.
Our first attempt might look like this:
-- file: ch16/csv3.hs

-- This function is not correct!

eol = string "\n" <|> string "\n\r"
This isn’t quite right. Recall that the <|> operator always tries the left alternative first. Looking for the single character \n will match both types of line endings, so it will look to the system that the following line begins with \r. Not what we want. Try it in ghci:
:m Text.ParserCombinators.Parseclet eol = string "\n" <|> string "\n\r"

Loading package parsec-2.1.0.1 ... linking ... done.

parse eol "" "\n"

Right "\n"

parse eol "" "\n\r"

Right "\n"
It may seem like the parser worked for both endings, but actually looking at it this way, we can’t tell. If it left something unparsed, we don’t know, because we’re not trying to consume anything else from the input. So let’s look for the end of file after our end of line:
parse (eol >> eof) "" "\n\r"

Left (line 2, column 1):

unexpected "\r"

expecting end of input

parse (eol >> eof) "" "\n"

Right ()
As expected, we got an error from the \n\r ending. So the next temptation may be to try it this way:
-- file: ch16/csv4.hs

-- This function is not correct!

eol = string "\n\r" <|> string "\n"
This also isn’t right. Recall that <|> attempts the option on the right only if the option on the left consumes no input. But by the time we are able to see if there is a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extended Example: Full CSV Parser
Inhaltsvorschau
Our earlier CSV examples have had an important flaw—they weren’t able to handle cells that contain a comma. CSV generating programs typically put quotation marks around such data. But then you have another problem: what to do if a cell contains a quotation mark and a comma. In these cases, the embedded quotation marks are doubled up.
Here is a full CSV parser. You can use this from ghci, or if you compile it to a standalone program, it will parse a CSV file on standard input and convert it to a different format on output:
-- file: ch16/csv9.hs

import Text.ParserCombinators.Parsec



csvFile = endBy line eol

line = sepBy cell (char ',')

cell = quotedCell <|> many (noneOf ",\n\r")



quotedCell = 

    do char '"'

       content <- many quotedChar

       char '"' <?> "quote at end of cell"

       return content



quotedChar =

        noneOf "\""

    <|> try (string "\"\"" >> return '"')



eol =   try (string "\n\r")

    <|> try (string "\r\n")

    <|> string "\n"

    <|> string "\r"

    <?> "end of line"



parseCSV :: String -> Either ParseError [[String]]

parseCSV input = parse csvFile "(unknown)" input



main =

    do c <- getContents

       case parse csvFile "(stdin)" c of

            Left e -> do putStrLn "Error parsing input:"

                         print e

            Right r -> mapM_ print r
That’s a full-featured CSV parser in just 21 lines of code, plus an additional 10 lines for the parseCSV and main utility functions.
Let’s look at the changes in this program from the previous versions. First, a cell may now be either a bare cell or a quoted cell. We give the quotedCell option first, because we want to follow that path if the first character in a cell is the quote mark.
The quotedCell begins and ends with a quote mark and contains zero or more characters. These characters can’t be copied directly, though, because they may contain embedded, doubled-up quote marks themselves, so we define a custom quotedChar to process them.
When we’re processing characters inside a quoted cell, we first say noneOf "\"". This will match and return any single character as long as it’s not the quote mark. Otherwise, if it is the quote mark, we see if we have two in a row. If so, we return a single quote mark to go on our result string.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsec and MonadPlus
Inhaltsvorschau
Parsec’s GenParser monad is an instance of the MonadPlus typeclass that we introduced in . The value represents a parse failure, while mplus combines two alternative parses into one, using (<|>):
-- file: ch16/ParsecPlus.hs

instance MonadPlus (GenParser tok st) where

    mzero = fail "mzero"

    mplus = (<|>)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing a URL-Encoded Query String
Inhaltsvorschau
When we introduced text in , we mentioned that we’d write a parser for these strings. We can quickly and easily do this using Parsec.
Each key-value pair is separated by the & character:
-- file: ch16/FormParse.hs

p_query :: CharParser () [(String, Maybe String)]

p_query = p_pair `sepBy` char '&'
Notice that in the type signature, we’re using Maybe to represent a value: the HTTP specification is unclear about whether a key must have an associated value, and we’d like to be able to distinguish between "no value" and "empty value":
-- file: ch16/FormParse.hs

p_pair :: CharParser () (String, Maybe String)

p_pair = do

  name <- many1 p_char

  value <- optionMaybe (char '=' >> many p_char)

  return (name, value)
The many1 function is similar to many: it applies its parser repeatedly, returning a list of results. While many will succeed and return an empty list if its parser never succeeds, many1 will fail if its parser never succeeds and will otherwise return a list of at least one element.
The optionMaybe function modifies the behavior of a parser. If the parser fails, optionMaybe doesn’t: it returns . Otherwise, it wraps the parser’s successful result with . This gives us the ability to distinguish between "no value" and “empty value,” as we mentioned earlier.
Individual characters can be encoded in one of several ways:
-- file: ch16/FormParse.hs

p_char :: CharParser () Char

p_char = oneOf urlBaseChars

     <|> (char '+' >> return ' ')

     <|> p_hex



urlBaseChars = ['a'..'z']++['A'..'Z']++['0'..'9']++"$-_.!*'(),"



p_hex :: CharParser () Char

p_hex = do

  char '%'

  a <- hexDigit

  b <- hexDigit

  let ((d, _):_) = readHex [a,b]

  return . toEnum $ d
Some characters can be represented literally. Spaces are treated specially, using a character. Other characters must be encoded as a character followed by two hexadecimal digits. The module’s readHex parses a hex string as a number:
parseTest p_query "foo=bar&a%21=b+c"

Loading package parsec-2.1.0.1 ... linking ... done.

[("foo",Just "bar"),("a!",Just "b c")]

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Supplanting Regular Expressions for Casual Parsing
Inhaltsvorschau
In many popular languages, people tend to put regular expressions to work for "casual" parsing. They’re notoriously tricky for this purpose: hard to write, difficult to debug, nearly incomprehensible after a few months of neglect, and they provide no error messages on failure.
If we can write compact Parsec parsers, we’ll gain in readability, expressiveness, and error reporting. Our parsers won’t be as short as regular expressions, but they’ll be close enough to negate much of the temptation of regexps.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing Without Variables
Inhaltsvorschau
A few of our parsers just shown use do notation and bind the result of an intermediate parse to a variable for later use. One such function is p_pair:
-- file: ch16/FormParse.hs

p_pair :: CharParser () (String, Maybe String)

p_pair = do

  name <- many1 p_char

  value <- optionMaybe (char '=' >> many p_char)

  return (name, value)
We can get rid of the need for explicit variables by using the liftM2 combinator from :
-- file: ch16/FormParse.hs

p_pair_app1 =

    liftM2 (,) (many1 p_char) (optionMaybe (char '=' >> many p_char))
This parser has exactly the same type and behavior as p_pair, but it’s one line long. Instead of writing our parser in a "procedural" style, we’ve simply switched to a programming style that emphasizes that we’re applying parsers and combining their results.
We can take this applicative style of writing a parser much further. In most cases, the extra compactness that we will gain will not come at any cost in readability, beyond the initial effort of coming to grips with the idea.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Applicative Functors for Parsing
Inhaltsvorschau
The standard Haskell libraries include a module named , which we already encountered in . This module defines a typeclass named Applicative, which represents an applicative functor. This is a little bit more structured than a functor, but a little bit less than a monad. It also defines Alternative, which is similar to MonadPlus.
As usual, we think that the best way to introduce applicative functors is to put them to work. In theory, every monad is an applicative functor, but not every applicative functor is a monad. Because applicative functors were added to the standard Haskell libraries long after monads, we often don’t get an Applicative instance for free; frequently, we have to declare the monad we’re using to be Applicative or Alternative.
To do this for , we’ll write a small module that we can import instead of the normal module:
-- file: ch16/ApplicativeParsec.hs

module ApplicativeParsec

    (

      module Control.Applicative

    , module Text.ParserCombinators.Parsec

    ) where



import Control.Applicative

import Control.Monad (MonadPlus(..), ap)

-- Hide a few names that are provided by Applicative.

import Text.ParserCombinators.Parsec hiding (many, optional, (<|>))



-- The Applicative instance for every Monad looks like this.

instance Applicative (GenParser s a) where

    pure  = return

    (<*>) = ap



-- The Alternative instance for every MonadPlus looks like this.

instance Alternative (GenParser s a) where

    empty = mzero

    (<|>) = mplus
For convenience, our module’s export section exports all the names we imported from both the and modules. Because we hid Parsec’s version of (<|>) when importing, the one that will be exported is from —as we would like.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Applicative Parsing by Example
Inhaltsvorschau
We’ll start by rewriting our existing form parser from the bottom up, beginning with p_hex, which parses a hexadecimal escape sequence. Here’s the code in normal do-notation style:
-- file: ch16/FormApp.hs

p_hex :: CharParser () Char

p_hex = do

  char '%'

  a <- hexDigit

  b <- hexDigit

  let ((d, _):_) = readHex [a,b]

  return . toEnum $ d
And here’s our applicative version:
-- file: ch16/FormApp.hs

a_hex = hexify <$> (char '%' *> hexDigit) <*> hexDigit

    where hexify a b = toEnum . fst . head . readHex $ [a,b]
Although the individual parsers are mostly untouched, the combinators that we’re gluing them together with have changed. The only familiar one is (<$>), which we already know is a synonym for fmap.
From our definition of Applicative, we know that (<*>) is ap.
The remaining unfamiliar combinator is (*>), which applies its first argument, throws away its result, and then applies the second and returns its result. In other words, it’s similar to (>>).
Before we continue, here’s a useful aid for remembering what all the angle brackets are for in the combinators from : if there’s an angle bracket pointing to a side, the result from that side should be used.
For example, (*>) returns the result on its right; (<*>) returns results from both sides; and (<*)—which we have not seen yet—returns the result on its left.
Although the concepts here should mostly be familiar from our earlier coverage of functors and monads, we’ll walk through this function to explain what’s happening. First, to get a grip on our types, we’ll hoist hexify to the top level and give it a signature:
-- file: ch16/FormApp.hs

hexify :: Char -> Char -> Char

hexify a b = toEnum . fst . head . readHex $ [a,b]
Parsec’s hexDigit parser parses a single hexadecimal digit:
:type hexDigit

hexDigit :: CharParser st Char

Therefore, has the same type, since (*>) returns the result on its right. (The CharParser type is nothing more than a synonym for GenParser Char.)
:type char '%' *> hexDigit

char '%' *> hexDigit :: GenParser Char st Char

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing JSON Data
Inhaltsvorschau
To give ourselves a better feel for parsing with applicative functors, and to explore a few more corners of Parsec, we’ll write a JSON parser that follows the definition in RFC 4627.
At the top level, a JSON value must be either an object or an array:
-- file: ch16/JSONParsec.hs

p_text :: CharParser () JValue

p_text = spaces *> text

     <?> "JSON text"

    where text = JObject <$> p_object

             <|> JArray <$> p_array
These are structurally similar, with an opening character, followed by one or more items separated by commas, followed by a closing character. We capture this similarity by writing a small helper function:
-- file: ch16/JSONParsec.hs

p_series :: Char -> CharParser () a -> Char -> CharParser () [a]

p_series left parser right =

    between (char left <* spaces) (char right) $

            (parser <* spaces) `sepBy` (char ',' <* spaces)
Here, we finally have a use for the (<*) combinator that we introduced earlier. We use it to skip over any whitespace that might follow certain tokens. With this p_series function, parsing an array is simple:
-- file: ch16/JSONParsec.hs

p_array :: CharParser () (JAry JValue)

p_array = JAry <$> p_series '[' p_value ']'
Dealing with a JSON object is hardly more complicated, requiring just a little additional effort to produce a name/value pair for each of the object’s fields:
-- file: ch16/JSONParsec.hs

p_object :: CharParser () (JObj JValue)

p_object = JObj <$> p_series '{' p_field '}'

    where p_field = (,) <$> (p_string <* char ':' <* spaces) <*> p_value
Parsing an individual value is a matter of calling an existing parser, and then wrapping its result with the appropriate JValue constructor:
-- file: ch16/JSONParsec.hs

p_value :: CharParser () JValue

p_value = value <* spaces

  where value = JString <$> p_string

            <|> JNumber <$> p_number

            <|> JObject <$> p_object

            <|> JArray <$> p_array

            <|> JBool <$> p_bool

            <|> JNull <$ string "null"

            <?> "JSON value"



p_bool :: CharParser () Bool

p_bool = True <$ string "true"

     <|> False <$ string "false"
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing a HTTP Request
Inhaltsvorschau
As another example of applicative parsing, we will develop a basic parser for HTTP requests:
-- file: ch16/HttpRequestParser.hs

module HttpRequestParser

    (

      HttpRequest(..)

    , Method(..)

    , p_request

    , p_query

    ) where



import ApplicativeParsec

import Numeric (readHex)

import Control.Monad (liftM4)

import System.IO (Handle)
An HTTP request consists of a method, an identifier, a series of headers, and an optional body. For simplicity, we’ll focus on just two of the six method types specified by the HTTP 1.1 standard. A method has a body; a has none:
-- file: ch16/HttpRequestParser.hs

data Method = Get | Post

          deriving (Eq, Ord, Show)



data HttpRequest = HttpRequest {

      reqMethod :: Method

    , reqURL :: String

    , reqHeaders :: [(String, String)]

    , reqBody :: Maybe String

    } deriving (Eq, Show)
Because we’re writing in an applicative style, our parser can be both brief and readable. Readable, that is, if you’re becoming used to the applicative parsing notation:
-- file: ch16/HttpRequestParser.hs

p_request :: CharParser () HttpRequest

p_request = q "GET" Get (pure Nothing)

        <|> q "POST" Post (Just <$> many anyChar)

  where q name ctor body = liftM4 HttpRequest req url p_headers body

            where req = ctor <$ string name <* char ' '

        url = optional (char '/') *>

              manyTill notEOL (try $ string " HTTP/1." <* oneOf "01")

              <* crlf
Briefly, the q helper function accepts a method name, the type constructor to apply to it, and a parser for a request’s optional body. The url helper does not attempt to validate a URL, because the HTTP specification does not state what characters a URL contain. The function just consumes input until either the line ends or it reaches an HTTP version identifier.
The try combinator has to hold onto input in case it needs to restore it so that an alternative parser can be used. This practice is referred to as backtracking. Because try must save input, it is expensive to use. Sprinkling a parser with unnecessary uses of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 17: Interfacing with C: The FFI
Inhaltsvorschau
Programming languages do not exist in perfect isolation. They inhabit an ecosystem of tools and libraries, built up over decades, and often written in a range of programming languages. Good engineering practice suggests we reuse that effort. The Haskell Foreign Function Interface (the FFI) is the means by which Haskell code can use, and be used by, code written in other languages. In this chapter, we’ll look at how the FFI works and how to produce a Haskell binding to a C library, including how to use an FFI preprocessor to automate much of the work. The challenge: take PCRE, the standard Perl-compatible regular expression library, and make it usable from Haskell in an efficient and functional way. Throughout, we’ll seek to abstract out manual effort required by the C implementation, delegating that work to Haskell to make the interface more robust, yielding a clean, high-level binding. We assume only some basic familiarity with regular expressions.
Binding one language to another is a nontrivial task. The binding language needs to understand the calling conventions, type system, data structures, memory allocation mechanisms, and linking strategy of the target language, just to get things working. The task is to carefully align the semantics of both languages so that both can understand the data that passes between them.
For Haskell, this technology stack is specified by FFI to the Haskell report. The FFI report describes how to correctly bind Haskell and C together and how to extend bindings to other languages. The standard is designed to be portable so that FFI bindings will work reliably across Haskell , operating systems, and C .
All implementations of Haskell support the FFI, and it is a key technology when using Haskell in a new field. Instead of reimplementing the standard libraries in a domain, we just bind to existing ones written in languages other than Haskell.
The FFI adds a new dimension of flexibility to the language: if we need to access raw hardware for some reason (say we’re programming new hardware or implementing an operating system), the FFI lets us get access to that hardware. It also gives us a performance escape hatch: if we can’t get a code hot spot fast enough, there’s always the option of trying again in C. So let’s look at what the FFI actually means for writing code.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Foreign Language Bindings: The Basics
Inhaltsvorschau
The most common operation we’ll want to do, unsurprisingly, is call a C function from Haskell. So let’s do that, by binding to some functions from the standard C math library. We’ll put the binding in a source file, and then compile it into a Haskell binary that makes use of the C code.
To start with, we need to enable the FFI extension, as the FFI addendum support isn’t enabled by default. We do this, as always, via a pragma at the top of our source file:
-- file: ch17/SimpleFFI.hs

{-# LANGUAGE ForeignFunctionInterface #-}
The pragmas indicate which extensions to Haskell 98 a module uses. We bring just the FFI extension in play this time. It is important to track which extensions to the language you need. Fewer extensions generally means more portable, more robust code. Indeed, it is common for Haskell programs written more than a decade ago to compile perfectly well today, thanks to standardization, despite changes to the language’s syntax, type system, and core libraries.
The next step is to import the modules, which provide useful types (such as pointers, numerical types, and arrays) and utility functions (such as and ) for writing bindings to other languages:
-- file: ch17/SimpleFFI.hs

import Foreign

import Foreign.C.Types
For extensive work with foreign libraries, a good knowledge of the module is essential. Other useful modules include , , and .
Now we can get down to work calling C functions. To do this, we need to know three things: the name of the C function, its type, and its associated header file. Additionally, for code that isn’t provided by the standard C library, we’ll need to know the C library’s name for linking purposes. The actual binding work is done with a declaration, like so:
-- file: ch17/SimpleFFI.hs

foreign import ccall "math.h sin"

     c_sin :: CDouble -> CDouble
This defines a new Haskell function, , whose concrete implementation is in C, via the function. When is called, a call to the actual will be made (using the standard C calling convention, indicated by ). The Haskell runtime passes control to C, which returns its results back to Haskell. The result is then wrapped up as a Haskell value of type .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Regular Expressions for Haskell: A Binding for PCRE
Inhaltsvorschau
As we’ve seen in previous sections, Haskell programs have something of a bias towards lists as a foundational data structure. List functions are a core part of the base library, and convenient syntax for constructing and taking apart list structures is wired into the language. Strings are, of course, simply lists of characters (rather than, for example, flat arrays of characters). This flexibility is all well and good, but it results in a tendency for the standard library to favor polymorphic list operations at the expense of string- operations.
Indeed, many common tasks can be solved via regular-expression-based string processing, yet support for regular expressions isn’t part of the Haskell Prelude. So let’s look at how we’d take an off-the-shelf regular expression library, PCRE, and provide a natural, convenient Haskell binding to it, giving us useful regular expressions for .
PCRE itself is a ubiquitous C library implementing Perl-style regular expressions. It is widely available and preinstalled on many systems. You can find it at http://www.pcre.org/. In the following sections, we’ll assume the PCRE library and headers are available on the machine.
The simplest task when setting out to write a new FFI binding from Haskell to C is to bind constants defined in C headers to equivalent Haskell values. For example, PCRE provides a set of flags for modifying how the core pattern matching system works (such as ignoring case or allowing matching on newlines). These flags appear as numeric constants in the PCRE header files:
/* Options */



#define PCRE_CASELESS           0x00000001

#define PCRE_MULTILINE          0x00000002

#define PCRE_DOTALL             0x00000004

#define PCRE_EXTENDED           0x00000008
To export these values to Haskell, we need to insert them into a Haskell source file somehow. One obvious way to do this is by using the C preprocessor to substitute definitions from C into the Haskell source, which we then compile as a normal Haskell source file. Using the preprocessor, we can even declare simple constants, via textual substitutions on the Haskell source file:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Passing String Data Between Haskell and C
Inhaltsvorschau
The next task is to write a binding to the PCRE regular expression function. Let’s look at its type, straight from the pcre.h header file:
pcre *pcre_compile(const char *pattern,

                   int options,

                   const char **errptr,

                   int *erroffset,

                   const unsigned char *tableptr);
This function compiles a regular expression pattern into some internal format, taking the pattern as an argument, along with some flags and some variables for returning status information.
We need to work out what Haskell types to represent each argument with. Most of these types are covered by equivalents defined for us by the FFI standard and are available in . The first argument, the regular expression itself, is passed as a null-terminated pointer to C, equivalent to the Haskell type. We’ve already chosen PCRE compile-time options to represent the abstract newtype, whose runtime representation is a . As the representations are guaranteed to be identical, we can pass the safely. The other arguments are a little more complicated and require some work to construct and take apart.
The third argument, a pointer to a C string, will be used as a reference to any error message generated when compiling the expression. The value of the pointer will be modified by the C function to point to a custom error string. We can represent this with a type. Pointers in Haskell are heap-allocated containers for raw addresses and can be created and operated on with a number of allocation primitives in the FFI library. For example, we can represent a pointer to a C as , and a pointer to an unsigned char as a .
Once we have a Haskell value handy, we can do various pointer-like things with it. We can compare it for equality with the null pointer, represented with the special constant. We can cast a pointer from one type to a pointer to another, or we can advance a pointer by an offset in bytes with . We can even modify the value pointed to, using , and, of course, dereference a pointer yielding that which it points to, with . In the majority of circumstances, a Haskell programmer doesn’t need to operate on pointers directly, but when they are needed, these tools come in handy.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Matching on Strings
Inhaltsvorschau
The second part of a good regular expression library is the matching function. Given a compiled regular expression, this function does the matching of the compiled regex against some input, indicating whether it matched, and if so, what parts of the string matched. In PCRE, this function is , which has type:
int pcre_exec(const pcre *code,

              const pcre_extra *extra,

              const char *subject,

              int length,

              int startoffset,

              int options,

              int *ovector,

              int ovecsize);
The most important arguments are the input pointer structure (which we obtained from ) and the subject string. The other flags let us provide bookkeeping structures and space for return values. We can directly translate this type to the Haskell import declaration:
-- file: ch17/RegexExec.hs

foreign import ccall "pcre.h pcre_exec"

    c_pcre_exec     :: Ptr PCRE

                    -> Ptr PCREExtra

                    -> Ptr Word8

                    -> CInt

                    -> CInt

                    -> PCREExecOption

                    -> Ptr CInt

                    -> CInt

                    -> IO CInt
We use the same method as before to create typed pointers for the structure, and a to represent flags passed at regex execution time. This lets us ensure that users don’t pass compile-time flags incorrectly at regex runtime.
The main complication involved in calling is the array of pointers used to hold the offsets of matching substrings found by the pattern matcher. These offsets are held in an offset vector, whose required size is determined by analyzing the input regular expression to determine the number of captured patterns it contains. PCRE provides a function, , for determining much information about the regular expression, including the number of patterns. We’ll need to call this, and now, we can directly write down the Haskell type for binding to as:
-- file: ch17/RegexExec.hs

foreign import ccall "pcre.h pcre_fullinfo"

    c_pcre_fullinfo :: Ptr PCRE

                    -> Ptr PCREExtra

                    -> PCREInfo

                    -> Ptr a

                    -> IO CInt
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 18: Monad Transformers
Inhaltsvorschau
Monads provide a powerful way to build computations with effects. Each of the standard monads is specialized to do exactly one thing. In real code, we often need to be able to use several effects at once.
Recall the Parse type that we developed in , for instance. When we introduced monads, we mentioned that this type was a State monad in disguise. Our monad is more complex than the standard State monad, because it uses the Either type to allow the possibility of a parsing failure. In our case, if a parse fails early on, we want to stop parsing, not continue in some broken state. Our monad combines the effect of carrying state around with the effect of early exit.
The normal State monad doesn’t let us escape in this way; it carries state only. It uses the default implementation of fail: this calls error, which throws an exception that we can’t catch in pure code. The State monad thus appears to allow for failure, without that capability actually being any use. (Once again, we recommend that you almost always avoid using fail!)
It would be ideal if we could somehow take the standard State monad and add failure handling to it, without resorting to the wholesale construction of custom monads by hand. The standard monads in the library don’t allow us to combine them. Instead, the library provides a set of monad transformers to achieve the same result.
A monad transformer is similar to a regular monad, but it’s not a standalone entity. Instead, it modifies the behavior of an underlying monad. Most of the monads in the library have transformer equivalents. By convention, the transformer version of a monad has the same name, with a stuck on the end. For example, the transformer equivalent of State is StateT; it adds mutable state to an underlying monad. The WriterT monad transformer makes it possible to write data when stacked on top of another monad.
Before we introduce monad transformers, let’s look at a function written using techniques we are already familiar with. The function that follows recurses into a directory tree and returns a list of the number of entries it finds at each level of the tree:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Motivation: Boilerplate Avoidance
Inhaltsvorschau
Monads provide a powerful way to build computations with effects. Each of the standard monads is specialized to do exactly one thing. In real code, we often need to be able to use several effects at once.
Recall the Parse type that we developed in , for instance. When we introduced monads, we mentioned that this type was a State monad in disguise. Our monad is more complex than the standard State monad, because it uses the Either type to allow the possibility of a parsing failure. In our case, if a parse fails early on, we want to stop parsing, not continue in some broken state. Our monad combines the effect of carrying state around with the effect of early exit.
The normal State monad doesn’t let us escape in this way; it carries state only. It uses the default implementation of fail: this calls error, which throws an exception that we can’t catch in pure code. The State monad thus appears to allow for failure, without that capability actually being any use. (Once again, we recommend that you almost always avoid using fail!)
It would be ideal if we could somehow take the standard State monad and add failure handling to it, without resorting to the wholesale construction of custom monads by hand. The standard monads in the library don’t allow us to combine them. Instead, the library provides a set of monad transformers to achieve the same result.
A monad transformer is similar to a regular monad, but it’s not a standalone entity. Instead, it modifies the behavior of an underlying monad. Most of the monads in the library have transformer equivalents. By convention, the transformer version of a monad has the same name, with a stuck on the end. For example, the transformer equivalent of State is StateT; it adds mutable state to an underlying monad. The WriterT monad transformer makes it possible to write data when stacked on top of another monad.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A Simple Monad Transformer Example
Inhaltsvorschau
Before we introduce monad transformers, let’s look at a function written using techniques we are already familiar with. The function that follows recurses into a directory tree and returns a list of the number of entries it finds at each level of the tree:
-- file: ch18/CountEntries.hs

module CountEntries (listDirectory, countEntriesTrad) where



import System.Directory (doesDirectoryExist, getDirectoryContents)

import System.FilePath ((</>))

import Control.Monad (forM, liftM)



listDirectory :: FilePath -> IO [String]

listDirectory = liftM (filter notDots) . getDirectoryContents

    where notDots p = p /= "." && p /= ".."



countEntriesTrad :: FilePath -> IO [(FilePath, Int)]

countEntriesTrad path = do

  contents <- listDirectory path

  rest <- forM contents $ \name -> do

            let newName = path </> name

            isDir <- doesDirectoryExist newName

            if isDir

              then countEntriesTrad newName

              else return []

  return $ (path, length contents) : concat rest
We’ll now look at using the Writer monad to achieve the same goal. Since this monad lets us record a value wherever we want, we don’t need to explicitly build up a result.
As our function must execute in the IO monad so that it can traverse directories, we can’t use the Writer monad directly. Instead, we use WriterT to add the recording capability to IO. We will find the going easier if we look at the types involved.
The normal Writer monad has two type parameters, so it’s more properly written Writer w a. The first parameter w is the type of the values to be recorded, and a is the usual type that the Monad typeclass requires. Thus Writer [(FilePath, Int)] a is a writer monad that records a list of directory names and sizes.
The WriterT transformer has a similar structure, but it adds another type parameter m: this is the underlying monad whose behavior we are augmenting. The full signature of WriterT is WriterT w m a.
Because we need to traverse directories, which requires access to the IO monad, we’ll stack our writer on top of the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Common Patterns in Monads and Monad Transformers
Inhaltsvorschau
Most of the monads and monad transformers in the library follow a few common patterns around naming and typeclasses.
To illustrate these rules, we will focus on a single straightforward monad: the monad. The monad’s API is detailed by the MonadReader typeclass. Most monads have similarly named typeclasses. MonadWriter defines the API of the writer monad, and so on:
-- file: ch18/Reader.hs

class (Monad m) => MonadReader r m | m -> r where

    ask   :: m r

    local :: (r -> r) -> m a -> m a
The type variable r represents the immutable state that the reader monad carries around. The Reader r monad is an instance of the MonadReader class, as is the ReaderT r m monad transformer. Again, this pattern is repeated by other mtl monads: there usually exist both a concrete monad and a transformer, each of which are instances of the typeclass that defines the monad’s API.
Returning to the specifics of the reader monad, we haven’t touched upon the local function before. It temporarily modifies the current environment using the r -> r function, and then executes its action in the modified environment. To make this idea more concrete, here is a simple example:
-- file: ch18/LocalReader.hs

import Control.Monad.Reader



myName step = do

  name <- ask

  return (step ++ ", I am " ++ name)



localExample :: Reader String (String, String, String)

localExample = do

  a <- myName "First"

  b <- local (++"dy") (myName "Second")

  c <- myName "Third"

  return (a, b, c)
If we execute the localExample action in ghci, we can see that the effect of modifying the environment is confined to one place:
runReader localExample "Fred"

Loading package mtl-1.1.0.1 ... linking ... done.

("First, I am Fred","Second, I am Freddy","Third, I am Fred")

When the underlying monad m is an instance of MonadIO, the library provides an instance for ReaderT r m and also for a number of other typeclasses. Here are a few:
-- file: ch18/Reader.hs

instance (Monad m) => Functor (ReaderT r m) where

    ...



instance (MonadIO m) => MonadIO (ReaderT r m) where

    ...



instance (MonadPlus m) => MonadPlus (ReaderT r m) where

    ...
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Stacking Multiple Monad Transformers
Inhaltsvorschau
As we have already mentioned, when we stack a monad transformer on a normal monad, the result is another monad. This suggests the possibility that we can again stack a monad transformer on top of our combined monad, in order to get a new monad and in fact, this is a common thing to do. Under what circumstances might we want to create such a stack?
  • If we need to talk to the outside world, we’ll have IO at the base of the stack. Otherwise, we will have some normal monad.
  • If we add a ReaderT layer, we give ourselves access to read-only configuration .
  • Add a StateT layer, and we gain a global state that we can modify.
  • Should we need the ability to log events, we can add a WriterT layer.
The power of this approach is that we can customize the stack to our exact needs, specifying which kinds of effects we want to support.
As a small example of stacked monad transformers in action, here is a reworking of the countEntries function we developed earlier. We will modify it to recurse no deeper into a directory tree than a given amount and to record the maximum depth it reaches:
-- file: ch18/UglyStack.hs

import System.Directory

import System.FilePath

import Control.Monad.Reader

import Control.Monad.State



data AppConfig = AppConfig {

      cfgMaxDepth :: Int

    } deriving (Show)



data AppState = AppState {

      stDeepestReached :: Int

    } deriving (Show)
We use ReaderT to store configuration data, in the form of the maximum depth of recursion we will perform. We also use StateT to record the maximum depth we reach during an actual traversal:
-- file: ch18/UglyStack.hs

type App = ReaderT AppConfig (StateT AppState IO)
Our transformer stack has IO on the bottom, then StateT, with ReaderT on top. In this particular case, it doesn’t matter whether we have ReaderT or WriterT on top, but IO must be on the bottom.
Even a small stack of monad transformers quickly develops an unwieldy type name. We can use a type alias to reduce the lengths of the type signatures that we write.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Moving Down the Stack
Inhaltsvorschau
So far, our uses of monad transformers have been simple, and the plumbing of the library has allowed us to avoid the details of how a stack of monads is constructed. Indeed, we already know enough about monad transformers to simplify many common programming tasks.
There are a few useful ways in which we can depart from the comfort of . Most often, a custom monad sits at the bottom of the stack, or a custom monad transformer lies somewhere within the stack. To understand the potential difficulty, let’s look at an example.
Suppose we have a custom monad transformer, CustomT:
-- file: ch18/CustomT.hs

newtype CustomT m a = ...
In the framework that provides, each monad transformer in the stack makes the API of a lower level available by providing instances of a host of typeclasses. We could follow this pattern and write a number of boilerplate instances:
-- file: ch18/CustomT.hs

instance MonadReader r m => MonadReader r (CustomT m) where

    ...



instance MonadIO m => MonadIO (CustomT m) where

    ...
If the underlying monad was an instance of MonadReader, we would write a MonadReader instance for CustomT in which each function in the API passes through to the corresponding function in the underlying instance. This would allow higher-level code to only care that the stack as a whole is an instance of MonadReader, without knowing or caring about which layer provides the real implementation.
Instead of relying on all of these typeclass instances to work for us behind the scenes, we can be explicit. The MonadTrans typeclass defines a useful function named lift:
:m +Control.Monad.Trans:info MonadTrans

class MonadTrans t where lift :: (Monad m) => m a -> t m a

  	-- Defined in Control.Monad.Trans
This function takes a monadic action from one layer down the stack, and turns it—in other words, lifts it—into an action in the current monad transformer. Every monad transformer is an instance of MonadTrans.
We use the name lift based on its similarity of purpose to fmap and liftM. In each case, we hoist something from a lower level of the type system to the level we’re currently working in. The different options are described here:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Understanding Monad Transformers by Building One
Inhaltsvorschau
To give ourselves some insight into how monad transformers in general work, we will create one and describe its machinery as we go. Our target is simple and useful: . Surprisingly, though, it is missing from the library.
This monad transformer modifies the behavior of an underlying monad m a by wrapping its type parameter with Maybe, in order to get m (Maybe a). As with the Maybe monad, if we call fail in the MaybeT monad transformer, execution terminates early.
In order to turn m (Maybe a) into a Monad instance, we must make it a distinct type, via a declaration:
-- file: ch18/MaybeT.hs

newtype MaybeT m a = MaybeT {

      runMaybeT :: m (Maybe a)

    }
We now need to define the three standard monad functions. The most complex is (>>=), and its innards shed the most light on what we are actually doing. Before we delve into its operation, let us first take a look at its type:
-- file: ch18/MaybeT.hs

bindMT :: (Monad m) => MaybeT m a -> (a -> MaybeT m b) -> MaybeT m b
To understand this type signature, hark back to our discussion of multiparameter typeclasses in . The thing that we intend to make a Monad instance is the partial type MaybeT m; this has the usual single type , a, that satisfies the requirements of the Monad typeclass.
The trick to understanding the body of our (>>=) implementation is that everything inside the do block executes in the underlying monad m, whatever that is:
-- file: ch18/MaybeT.hs

x `bindMT` f = MaybeT $ do

                 unwrapped <- runMaybeT x

                 case unwrapped of

                   Nothing -> return Nothing

                   Just y -> runMaybeT (f y)
Our runMaybeT function unwraps the result contained in x. Next, recall that the symbol desugars to (>>=): a monad transformer’s (>>=) must use the underlying monad’s (>>=). The final bit of case analysis determines whether we short-circuit or chain our computation. Finally, look back at the top of the body. Here, we must wrap the result with the MaybeT constructor, in order to once again hide the underlying monad.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Transformer Stacking Order Is Important
Inhaltsvorschau
Fromour early examples using monad transformers such as ReaderT and StateT, it might be easy to conclude that the order in which we stack monad transformers doesn’t .
When we stack StateT on top of State, it should be clearer that order can indeed make a difference. The types StateT Int (State String) and StateT String (State Int) might carry around the same information, but we can’t use them interchangeably. The ordering determines when we need to use lift to get at one or the other piece of state.
Here’s a case that more dramatically demonstrates the importance of ordering. Suppose we have a computation that might fail, and we want to log the circumstances under which it does so:
-- file: ch18/MTComposition.hs

{-# LANGUAGE FlexibleContexts #-}

import Control.Monad.Writer

import MaybeT



problem :: MonadWriter [String] m => m ()

problem = do

  tell ["this is where i fail"]

  fail "oops"
Which of these monad stacks will give us the information we need?
-- file: ch18/MTComposition.hs

type A = WriterT [String] Maybe



type B = MaybeT (Writer [String])



a :: A ()

a = problem



b :: B ()

b = problem
Let’s try the alternatives in ghci:
runWriterT a

Loading package mtl-1.1.0.1 ... linking ... done.

Nothing

runWriter $ runMaybeT b

(Nothing,["this is where i fail"])
This difference in results should not come as a surprise—just look at the signatures of the execution functions:
:t runWriterT

runWriterT :: WriterT w m a -> m (a, w)

:t runWriter . runMaybeT

runWriter . runMaybeT :: MaybeT (Writer w) a -> (Maybe a, w)
Our WriterT-on-Maybe stack has Maybe as the underlying monad, so runWriterT must give us back a result of type Maybe. In our test case, we get to see only the log of what happened if nothing actually went wrong!
Stacking monad transformers is analogous to composing functions. If we change the order in which we apply functions and then get different results, we won’t be surprised. So it is with monad transformers, too.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Putting Monads and Monad Transformers into Perspective
Inhaltsvorschau
It’s useful to step back from details for a few moments and look at the weaknesses and strengths of programming with monads and monad transformers.
Probably the biggest practical irritation of working with monads is that a monad’s type constructor often gets in our way when we’d like to use pure code. Many useful pure functions need monadic counterparts, simply to tack on a placeholder parameter m for some monadic type constructor:
:t filter

filter :: (a -> Bool) -> [a] -> [a]

:i filterM

filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]

  	-- Defined in Control.Monad
However, the coverage is incomplete: the standard libraries don’t always provide monadic versions of pure functions.
The reason for this lies in history. Eugenio Moggi introduced the idea of using monads for programming in 1988, around the time the Haskell 1.0 standard was being developed. Many of the functions in today’s date back to Haskell 1.0, which was released in 1990. In 1991, Philip Wadler started writing for a wider functional programming audience about the potential of monads, at which point, they began to be put in use.
Not until 1996 and the release of Haskell 1.3 did the standard acquire support for monads. By this time, the language designers were already constrained by backwards compatibility: they couldn’t change the signatures of functions in the , because it would have broken existing code.
Since then, the Haskell community has learned a lot about creating suitable abstractions, so that we can write code that is less affected by the pure/monadic divide. You can find modern distillations of these ideas in the and modules. As appealing as those modules are, we do not cover them in this book. This is in part for want of space, but also because if you’re still following us at this point, you won’t have trouble figuring them out for yourself.
In an ideal world, would we make a break from the past and switch over to use Traversable and
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 19: Error Handling
Inhaltsvorschau
Error handling is one of the most important—and overlooked—topics for programmers, regardless of the language used. In Haskell, you will find two major types of error handling employed: pure error handling and exceptions.
When we speak of pure error handling, we are referring to algorithms that do not require anything from the IO monad. We can often implement error handling for them simply by using Haskell’s expressive data type system to our advantage. Haskell also has an exception system. Due to the complexities of lazy evaluation, exceptions in Haskell can be thrown anywhere, but caught only within the IO monad. In this chapter, we’ll consider both.
Let’s begin our discussion of error handling with a very simple function. Let’s say that we wish to perform division on a series of numbers. We have a constant numerator but wish to vary the denominator. We might come up with a function like this:
-- file: ch19/divby1.hs

divBy :: Integral a => a -> [a] -> [a]

divBy numerator = map (numerator `div`)
Very simple, right? We can play around with this a bit in ghci:
divBy 50 [1,2,5,8,10]

[50,25,10,6,5]

take 5 (divBy 100 [1..])

[100,50,33,25,20]
This behaves as expected: 50 / 1 is 50, 50 / 2 is 25, and so forth. This even worked with the infinite list [1..]. What happens if we sneak a 0 into our list somewhere?
divBy 50 [1,2,0,8,10]

[50,25,*** Exception: divide by zero

Isn’t that interesting? ghci started displaying the output, and then stopped with an exception when it got to the zero. That’s lazy evaluation at work—it calculated results as needed.
As we will see later in this chapter, in the absence of an explicit exception handler, this exception will crash the program. That’s obviously not desirable, so let’s consider better ways we could indicate an error in this pure function.
One immediately recognizable and easy way to indicate failure is to use Maybe. Instead of just returning a list and throwing an exception on failure, we can return Nothing if the input list contains a zero anywhere, or return
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Error Handling with Data Types
Inhaltsvorschau
Let’s begin our discussion of error handling with a very simple function. Let’s say that we wish to perform division on a series of numbers. We have a constant numerator but wish to vary the denominator. We might come up with a function like this:
-- file: ch19/divby1.hs

divBy :: Integral a => a -> [a] -> [a]

divBy numerator = map (numerator `div`)
Very simple, right? We can play around with this a bit in ghci:
divBy 50 [1,2,5,8,10]

[50,25,10,6,5]

take 5 (divBy 100 [1..])

[100,50,33,25,20]
This behaves as expected: 50 / 1 is 50, 50 / 2 is 25, and so forth. This even worked with the infinite list [1..]. What happens if we sneak a 0 into our list somewhere?
divBy 50 [1,2,0,8,10]

[50,25,*** Exception: divide by zero

Isn’t that interesting? ghci started displaying the output, and then stopped with an exception when it got to the zero. That’s lazy evaluation at work—it calculated results as needed.
As we will see later in this chapter, in the absence of an explicit exception handler, this exception will crash the program. That’s obviously not desirable, so let’s consider better ways we could indicate an error in this pure function.
One immediately recognizable and easy way to indicate failure is to use Maybe. Instead of just returning a list and throwing an exception on failure, we can return Nothing if the input list contains a zero anywhere, or return Just with the results otherwise. Here’s an implementation of such an algorithm:
-- file: ch19/divby2.hs

divBy :: Integral a => a -> [a] -> Maybe [a]

divBy _ [] = Just []

divBy _ (0:_) = Nothing

divBy numerator (denom:xs) =

    case divBy numerator xs of

      Nothing -> Nothing

      Just results -> Just ((numerator `div` denom) : results)
If you try it out in ghci, you’ll see that it works:
divBy 50 [1,2,5,8,10]

Just [50,25,10,6,5]

divBy 50 [1,2,0,8,10]

Nothing
The function that calls divBy can now use a case statement to see if the call was successful, just as divBy does when it calls itself.
You may note that you could use a monadic implementation of the preceding example, like so:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Exceptions
Inhaltsvorschau
Version 6.10.1 of GHC was released as this book went to press. It introduces an extensible extension system. In the sections that follow, we document the older exception system. The two are similar, but not completely compatible.
Exception handling is found in many programming languages, including Haskell. It can be useful because, when a problem occurs, exception handling can provide an easy way of handling it, even if it occurs several layers down through a chain of function calls. With exceptions, it’s not necessary to check the return value of every function call for errors, and we must take care to produce a return value that reflects the error, as C programmers must do. In Haskell, thanks to monads and the Either and Maybe types, we can often achieve the same effects in pure code without the need to use exceptions and exception handling.
Some problems—especially those involving I/O—call for working with exceptions. In Haskell, exceptions may be thrown from any location in the program. However, due to the unspecified evaluation order, they can only be caught in the IO monad. Haskell exception handling doesn’t involve special syntax as it does in Python or Java. Rather, the mechanisms to catch and handle exceptions are—surprise—functions.
In the Control.Exception module, various functions and types relating to exceptions are defined. There is an Exception type defined there; all exceptions are of type Exception. There are also functions for catching and handling exceptions. Let’s start by looking at try, which has type IO a -> IO (Either Exception a). This wraps an IO action with exception handling. If an exception is thrown, it will return a Left value with the exception; otherwise, it returns a Right value with the original result. Let’s try this out in ghci. We’ll first trigger an unhandled exception, and then try to catch it:
:m Control.Exceptionlet x = 5 `div` 0let y = 5 `div` 1print x

*** Exception: divide by zero

print y

5

try (print x)

Left divide by zero

try (print y)

5

Right ()
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Error Handling in Monads
Inhaltsvorschau
Because we must catch exceptions in the IO monad, if we try to use them inside a monad, or in a stack of monad transformers, we’ll get bounced out to the IO monad. This is almost never what we would actually like.
We defined a MaybeT transformer in , but it is more useful as an aid to understanding than a programming tool. Fortunately, a dedicated—and more useful—monad transformer already exists: ErrorT, which is defined in the module.
The ErrorT transformer lets us add exceptions to a monad, but it uses its own special exception machinery, separate from that provided the module. It gives us some interesting capabilities:
  • If we stick with the ErrorT interfaces, we can both throw and catch exceptions within this monad.
  • Following the naming pattern of other monad transformers, the execution function is named runErrorT. An uncaught ErrorT exception will stop propagating upwards when it reaches runErrorT. We will not be kicked out to the IO monad.
  • We control the type that our exceptions will have.
If we use the throw function from inside ErrorT (or if we use error or ), we will still be bounced out to the IO monad.
As with other monads, the interface that ErrorT provides is defined by a typeclass:
-- file: ch19/MonadError.hs

class (Monad m) => MonadError e m | m -> e where

    throwError :: e             -- error to throw

               -> m a



    catchError :: m a           -- action to execute

               -> (e -> m a)    -- error handler

               -> m a
The type variable e represents the error type that we want to use. Whatever our error type is, we must make it an instance of the Error typeclass:
-- file: ch19/MonadError.hs

class Error a where

    -- create an exception with no message

    noMsg  :: a



    -- create an exception with a message

    strMsg :: String -> a
ErrorT’s implementation of fail uses the strMsg function. It throws strMsg as an exception, passing it the string argument that it received. As for noMsg, it is used to provide an
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 20: Systems Programming in Haskell
Inhaltsvorschau
So far, we’ve been talking mostly about high-level concepts. Haskell can also be used for lower-level systems programming. It is quite possible to write programs that interface with the operating system at a low level using Haskell.
In this chapter, we are going to attempt something ambitious: a Perl-like “language” that is valid Haskell, implemented in pure Haskell, that makes shell scripting easy. We are going to implement piping, easy command invocation, and some simple tools to handle tasks that might otherwise be performed with grep or sed.
Specialized modules exist for different operating systems. In this chapter, we will use generic OS-independent modules as much as possible. However, we will be focusing on the POSIX environment for much of the chapter. POSIX is a standard for Unix-like operating systems such as Linux, FreeBSD, MacOS X, or Solaris. Windows does not support POSIX by default, but the Cygwin environment provides a POSIX compatibility layer for Windows.
It is possible to invoke external commands from Haskell. To do that, we suggest using rawSystem from the System.Cmd module. This will invoke a specified program, with the specified arguments, and return the exit code from that program. You can play with it in ghci:
:module System.CmdrawSystem "ls" ["-l", "/usr"]

Loading package old-locale-1.0.0.0 ... linking ... done.

Loading package old-time-1.0.0.0 ... linking ... done.

Loading package filepath-1.1.0.0 ... linking ... done.

Loading package directory-1.0.0.1 ... linking ... done.

Loading package unix-2.3.0.1 ... linking ... done.

Loading package process-1.0.0.1 ... linking ... done.

total 408

drwxr-xr-x   2 root root  94208 2008-08-22 04:51 bin

drwxr-xr-x   2 root root   4096 2008-04-07 14:44 etc

drwxr-xr-x   2 root root   4096 2008-04-07 14:44 games

drwxr-xr-x 155 root root  16384 2008-08-20 20:54 include

drwxr-xr-x   4 root root   4096 2007-11-01 21:31 java

drwxr-xr-x   6 root root   4096 2008-03-18 11:38 kerberos

drwxr-xr-x  70 root root  36864 2008-08-21 04:52 lib

drwxr-xr-x 212 root root 126976 2008-08-21 04:53 lib64

drwxr-xr-x  23 root root  12288 2008-08-21 04:53 libexec

drwxr-xr-x  15 root root   4096 2008-04-07 14:44 local

drwxr-xr-x   2 root root  20480 2008-08-21 04:53 sbin

drwxr-xr-x 347 root root  12288 2008-08-21 11:01 share

drwxr-xr-x   5 root root   4096 2008-04-07 14:44 src

lrwxrwxrwx   1 root root     10 2008-05-16 15:01 tmp -> ../var/tmp

drwxr-xr-x   2 root root   4096 2007-04-10 11:01 X11R6

ExitSuccess
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Running External Programs
Inhaltsvorschau
It is possible to invoke external commands from Haskell. To do that, we suggest using rawSystem from the System.Cmd module. This will invoke a specified program, with the specified arguments, and return the exit code from that program. You can play with it in ghci:
:module System.CmdrawSystem "ls" ["-l", "/usr"]

Loading package old-locale-1.0.0.0 ... linking ... done.

Loading package old-time-1.0.0.0 ... linking ... done.

Loading package filepath-1.1.0.0 ... linking ... done.

Loading package directory-1.0.0.1 ... linking ... done.

Loading package unix-2.3.0.1 ... linking ... done.

Loading package process-1.0.0.1 ... linking ... done.

total 408

drwxr-xr-x   2 root root  94208 2008-08-22 04:51 bin

drwxr-xr-x   2 root root   4096 2008-04-07 14:44 etc

drwxr-xr-x   2 root root   4096 2008-04-07 14:44 games

drwxr-xr-x 155 root root  16384 2008-08-20 20:54 include

drwxr-xr-x   4 root root   4096 2007-11-01 21:31 java

drwxr-xr-x   6 root root   4096 2008-03-18 11:38 kerberos

drwxr-xr-x  70 root root  36864 2008-08-21 04:52 lib

drwxr-xr-x 212 root root 126976 2008-08-21 04:53 lib64

drwxr-xr-x  23 root root  12288 2008-08-21 04:53 libexec

drwxr-xr-x  15 root root   4096 2008-04-07 14:44 local

drwxr-xr-x   2 root root  20480 2008-08-21 04:53 sbin

drwxr-xr-x 347 root root  12288 2008-08-21 11:01 share

drwxr-xr-x   5 root root   4096 2008-04-07 14:44 src

lrwxrwxrwx   1 root root     10 2008-05-16 15:01 tmp -> ../var/tmp

drwxr-xr-x   2 root root   4096 2007-04-10 11:01 X11R6

ExitSuccess
Here, we run the equivalent of the shell command ls -l /usr. rawSystem does not parse arguments from a string or expand wild cards. Instead, it expects every argument to be contained in a list. If you don’t want to pass any arguments, you can simply pass an empty list like this:
rawSystem "ls" []

calendartime.ghci  modtime.ghci    rp.ghci	  RunProcessSimple.hs

cmd.ghci	   posixtime.hs    rps.ghci	  timediff.ghci

dir.ghci	   rawSystem.ghci  RunProcess.hs  time.ghci

ExitSuccess

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Directory and File Information
Inhaltsvorschau
The System.Directory module contains quite a few functions that can be used to obtain information from the filesystem. You can get a list of files in a directory, rename or delete files, copy files, change the current working directory, or create new directories. System.Directory is portable and works on any platform where GHC itself works.
The library reference for System.Directory provides a comprehensive list of the functions available. Let’s use ghci to demonstrate a few of them. Most of these functions are straightforward equivalents to C library calls or shell commands:
:module System.DirectorysetCurrentDirectory "/etc"

Loading package old-locale-1.0.0.0 ... linking ... done.

Loading package old-time-1.0.0.0 ... linking ... done.

Loading package filepath-1.1.0.0 ... linking ... done.

Loading package directory-1.0.0.1 ... linking ... done.

getCurrentDirectory

"/etc"

setCurrentDirectory ".."

getCurrentDirectory

"/"
Here we saw commands to change the current working directory and obtain the current working directory from the system. These are similar to the cd and pwd commands in the POSIX shell:
getDirectoryContents "/"

["dev",".vmware","mnt","var","etc","net","..","lib","srv","media","lib64","opt",

".ccache","bin","selinux",".","lost+found","proc",".autorelabel",".autofsck",

"sys","misc","home","tmp","boot",".bash_history","root","sbin","usr"]

getDirectoryContents returns a list for every item in a given directory. Note that on POSIX systems, this list normally includes the special values "." and "..". You will usually want to filter these out when processing the content of the directory, perhaps like this:
getDirectoryContents "/" >>= return . filter (`notElem` [".", ".."])

["dev",".vmware","mnt","var","etc","net","lib","srv","media","lib64","opt",

".ccache","bin","selinux","lost+found","proc",".autorelabel",".autofsck",

"sys","misc","home","tmp","boot",".bash_history","root","sbin","usr"]

For a more detailed discussion of filtering the results of getDirectoryContents, refer to .
Is the filter (`notElem` [".", ".."])
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Program Termination
Inhaltsvorschau
Developers often write individual programs to accomplish particular tasks. These individual parts may be combined to accomplish larger tasks. A shell script or another program may execute them. The calling script often needs a way to discover whether the program was able to complete its task successfully. Haskell automatically indicates a nonsuccessful exit whenever a program is aborted by an exception.
However, you may need more fine-grained control over the exit code than that. Perhaps you need to return different codes for different types of errors. The System.Exit module provides a way to exit the program and return a specific exit status code to the caller. You can call exitWith ExitSuccess to return a code indicating a successful termination (0 on POSIX systems). Or, you can call something like exitWith (ExitFailure 5), which will return code 5 to the calling program.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Dates and Times
Inhaltsvorschau
Everything from file timestamps to business transactions involve dates and times. Haskell provides ways for manipulating dates and times, as well as features for obtaining date and time information from the system.
In Haskell, the System.Time module is primarily responsible for date and time handling. It defines two types: ClockTime and CalendarTime.
ClockTime is the Haskell version of the traditional POSIX epoch. A ClockTime represents a time relative to midnight the morning of January 1, 1970, Coordinated Universal Time (UTC). A negative ClockTime represents a number of seconds prior to that date, while a positive number represents a count of seconds after it.
ClockTime is convenient for computations. Since it tracks UTC, it doesn’t have to adjust for local time zones, daylight saving time, or other special cases in time handling. Every day is exactly (60 * 60 * 24) or 86,400 seconds, which makes time interval calculations simple. You can, for instance, check the ClockTime at the start of a long task, again at the end, and simply subtract the start time from the end time to determine how much time elapsed. You can then divide by 3,600 and display the elapsed time as a count of hours if you wish.
ClockTime is ideal for answering questions such as these:
  • How much time has elapsed?
  • What will be the ClockTime 14 days ahead of this precise instant?
  • When was the file last modified?
  • What is the precise time right now?
These are good uses of ClockTime because they refer to precise, unambiguous moments in time. However, ClockTime is not as easily used for questions such as:
  • Is today Monday?
  • What day of the week will May 1 fall on next year?
  • What is the current time in my local time zone, taking the potential presence of Daylight Saving Time (DST) into account?
CalendarTime stores time the way humans do: with a year, month, day, hour, minute, second, time zone, and DST information. It’s easy to convert this into a conveniently displayable string, or to answer questions about the local time.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extended Example: Piping
Inhaltsvorschau
We’ve just seen how to invoke external programs. Sometimes we need more control than that. Perhaps we need to obtain the output from those programs, provide input, or even chain together multiple external programs. Piping can help with all of these needs. Piping is often used in shell scripts. When you set up a pipe in the shell, you run multiple programs. The output of the first program is sent to the input of the second. Its output is sent to the third as input, and so on. The last program’s output normally goes to the terminal, or it could go to a file. Here’s an example session with the POSIX shell to illustrate piping:
$ ls /etc | grep 'm.*ap' | tr a-z A-Z

IDMAPD.CONF

MAILCAP

MAILCAP.ORDER

MEDIAPRM

TERMCAP
This command runs three programs, piping data between them. It starts with ls /etc, which outputs a list of all files or directories in /etc. The output of ls is sent as input to grep. We gave grep a regular expression that will cause it to output only the lines that start with 'm' and then contain "ap" somewhere in the line. Finally, the result of that is sent to tr. We gave tr options to convert everything to uppercase. The output of tr isn’t set anywhere in particular, so it is displayed on the screen.
In this situation, the shell handles setting up all the pipelines between programs. By using some of the POSIX tools in Haskell, we can accomplish the same thing.
Before describing how to do this, we should first warn you that the System.Posix modules expose a very low-level interface to Unix systems. The interfaces can be complex and their interactions can be complex as well, regardless of the programming language you use to access them. The full nature of these low-level interfaces has been the topic of entire books themselves, so we will just scratch the surface in this chapter.
POSIX defines a function that creates a pipe. This function returns two file descriptors (FDs), which are similar in concept to a Haskell Handle. One FD is the reading end of the pipe, and the other is the writing end. Anything that is written to the writing end can be read by the reading end. The data is “shoved through a pipe.” In Haskell, you call
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 21: Using Databases
Inhaltsvorschau
Everything from web forums to podcatchers or even backup programs frequently use databases for persistent storage. SQL-based databases are often quite convenient: they are fast, can scale from tiny to massive sizes, can operate over the network, often help handle locking and transactions, and can even provide failover and redundancy for applications. Databases come in many different shapes: the large commercial databases such as Oracle, open source engines such as PostgreSQL or MySQL, and even embeddable engines such as Sqlite.
Because databases are so important, Haskell support for them is important as well. In this chapter, we will introduce you to one of the Haskell frameworks for working with databases. We will also use this framework to begin building a podcast downloader, which we will further develop in .
At the bottom of the database stack is the database engine, which is responsible for actually storing data on disk. Well-known database engines include PostgreSQL, MySQL, and Oracle.
Most modern database engines support the Structured Query Language (SQL) as a standard way of getting data into and out of relational databases. This book will not provide a tutorial on SQL or relational database management.
Once you have a database engine that supports SQL, you need a way to communicate with it. Each database has its own protocol. Since SQL is reasonably constant across databases, it is possible to make a generic interface that uses drivers for each individual protocol.
Haskell has several different database frameworks available, some providing high-level layers atop others. For this chapter, we will concentrate on the Haskell DataBase Connectivity system (HDBC). HDBC is a database abstraction library. That is, you can write code that uses HDBC and can access data stored in almost any SQL database with little or no modification. Even if you never need to switch underlying database engines, the HDBC system of drivers makes a large number of choices available to you with a single interface.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Overview of HDBC
Inhaltsvorschau
At the bottom of the database stack is the database engine, which is responsible for actually storing data on disk. Well-known database engines include PostgreSQL, MySQL, and Oracle.
Most modern database engines support the Structured Query Language (SQL) as a standard way of getting data into and out of relational databases. This book will not provide a tutorial on SQL or relational database management.
Once you have a database engine that supports SQL, you need a way to communicate with it. Each database has its own protocol. Since SQL is reasonably constant across databases, it is possible to make a generic interface that uses drivers for each individual protocol.
Haskell has several different database frameworks available, some providing high-level layers atop others. For this chapter, we will concentrate on the Haskell DataBase Connectivity system (HDBC). HDBC is a database abstraction library. That is, you can write code that uses HDBC and can access data stored in almost any SQL database with little or no modification. Even if you never need to switch underlying database engines, the HDBC system of drivers makes a large number of choices available to you with a single interface.
Another database abstraction library for Haskell is HSQL, which shares a similar purpose with HDBC. There is also a higher-level framework called HaskellDB, which sits atop either HDBC or HSQL and is designed to help insulate the programmer from the details of working with SQL. However, it does not have as broad appeal because its design limits it to certain—albeit quite common—database access patterns. Finally, Takusen is a framework that uses a “left fold” approach to reading data from the .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Installing HDBC and Drivers
Inhaltsvorschau
To connect to a given database with HDBC, you need at least two packages: the generic interface and a driver for your specific database. You can obtain the generic HDBC package, and all of the other drivers, from Hackage. For this chapter, we will use HDBC version 1.1.3.
You’ll also need a database backend and a backend driver. For this chapter, we’ll use Sqlite version 3. Sqlite is an embedded database, so it doesn’t require a separate server and is easy to set up. Many operating systems already ship with Sqlite version 3. If yours doesn’t, you can download it from http://www.sqlite.org/. The HDBC home page has a link to known HDBC backend drivers. The specific driver for Sqlite version 3 can be obtained from Hackage.
If you want to use HDBC with other databases, check out the HDBC Known Drivers page at http://software.complete.org/hdbc/wiki/KnownDrivers. There you will find a link to the ODBC binding, which lets you connect to virtually any database on virtually any platform (Windows, POSIX, and others). You will also find a PostgreSQL binding. MySQL is supported via the ODBC binding, and specific information for MySQL users can be found in the HDBC-ODBC API documentation.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Connecting to Databases
Inhaltsvorschau
To connect to a database, you will use a connection function from a database backend driver. Each database has its own unique method of connecting. The initial connection is generally the only time you will call anything from a backend driver module directly.
The database connection function will return a database handle. The precise type of this handle may vary from one driver to the next, but it will always be an instance of the IConnection typeclass. All of the functions you will use to operate on databases will work with any type that is an instance of IConnection. When you’re done talking to the database, call the disconnect function to disconnect from it. Here’s an example of making a connection to an Sqlite database:
:module Database.HDBC Database.HDBC.Sqlite3conn <- connectSqlite3 "test1.db"

Loading package array-0.1.0.0 ... linking ... done.

Loading package containers-0.1.0.2 ... linking ... done.

Loading package bytestring-0.9.0.1.1 ... linking ... done.

Loading package old-locale-1.0.0.0 ... linking ... done.

Loading package old-time-1.0.0.0 ... linking ... done.

Loading package mtl-1.1.0.1 ... linking ... done.

Loading package HDBC-1.1.4 ... linking ... done.

Loading package HDBC-sqlite3-1.1.4.0 ... linking ... done.

:type conn

conn :: Connectiondisconnect conn
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Transactions
Inhaltsvorschau
Most modern SQL databases have a notion of transactions. A transaction is designed to ensure that all components of a modification get applied, or that none of them do. Furthermore, transactions help prevent other processes accessing the same database from seeing partial data from modifications that are in progress.
Many databases require you to either explicitly commit all your changes before they appear on disk, or to run in an autocommit mode. Autocommit mode runs an implicit commit after every statement. This may make the adjustment to transactional databases easier for programmers not accustomed to them, but it is just a hindrance to people who actually want to use multistatement transactions.
HDBC intentionally does not support autocommit mode. When you modify data in your databases, you must explicitly cause it to be committed to disk. There are two ways to do that in HDBC: you can call commit when you’re ready to write the data to disk, or you can use the withTransaction function to wrap around your modification code. withTransaction will cause data to be committed upon successful completion of your function.
Sometimes a problem will occur while you are working on writing data to the database. Perhaps you get an error from the database or discover a problem with the data. In these instances, you can “roll back” your changes. This will cause all changes you made since your last commit or rollback to be forgotten. In HDBC, you can call the rollback function to do this. If you are using withTransaction, any uncaught exception will cause a rollback to be issued.
Note that a roll back operation rolls back only the changes since the last commit, rollback, or withTransaction. A database does not maintain an extensive history like a version-control system. You will see examples of commit later in this chapter.
One popular database, MySQL, does not support transactions with its default table type. In its default configuration, MySQL will silently ignore calls to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Simple Queries
Inhaltsvorschau
Some of the simplest queries in SQL involve statements that don’t return any data. These queries can be used to create tables, insert data, delete data, and set database parameters.
The most basic function for sending queries to a database is run. This function takes an IConnection, a String representing the query itself, and a list of parameters. Let’s use it to set up some things in our database:
:module Database.HDBC Database.HDBC.Sqlite3conn <- connectSqlite3 "test1.db"

Loading package array-0.1.0.0 ... linking ... done.

Loading package containers-0.1.0.2 ... linking ... done.

Loading package bytestring-0.9.0.1.1 ... linking ... done.

Loading package old-locale-1.0.0.0 ... linking ... done.

Loading package old-time-1.0.0.0 ... linking ... done.

Loading package mtl-1.1.0.1 ... linking ... done.

Loading package HDBC-1.1.4 ... linking ... done.

Loading package HDBC-sqlite3-1.1.4.0 ... linking ... done.

run conn "CREATE TABLE test (id INTEGER NOT NULL, desc VARCHAR(80))" []

0

run conn "INSERT INTO test (id) VALUES (0)" []

1

commit conndisconnect conn
In this example, after connecting to the database, we first created a table called test. Then we inserted one row of data into the table. Finally, we committed the changes and disconnected from the database. Note that if we hadn’t called commit, no final change would have been written to the database at all.
The run function returns the number of rows that each query modified. For the first query, which created a table, no rows were modified. The second query inserted a single row, so run returned 1.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
SqlValue
Inhaltsvorschau
Before proceeding, we need to discuss a data type introduced in HDBC: SqlValue. Since both Haskell and SQL are strongly typed systems, HDBC tries to preserve type information as much as possible. At the same time, Haskell and SQL types don’t exactly mirror each other. Furthermore, different databases have different ways of representing things such as dates or special characters in strings.
SqlValue is a data type that has a number of constructors such as SqlString, SqlBool, SqlNull, SqlInteger, and more. This lets you represent various types of data in argument lists to the database and see various types of data in the results coming back, and still store it all in a list. There are convenience functions, toSql and fromSql, that you will normally use. If you care about the precise representation of data, you can still manually construct SqlValue data if you need to.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Query Parameters
Inhaltsvorschau
HDBC, like most databases, supports a notion of replaceable parameters in queries. There are three primary benefits of using replaceable parameters: they prevent SQL injection attacks or trouble when the input contains quote characters, they improve performance when executing similar queries repeatedly, and they permit easy and portable insertion of data into queries.
Let’s say you want to add thousands of rows into our new table test. You could issue queries that look like INSERT INTO test VALUES (0, 'zero') and INSERT INTO test VALUES (1, 'one'). This forces the database server to parse each SQL statement individually. If you could replace the two values with a placeholder, the server could parse the SQL query once and just execute it multiple times with the different data.
A second problem involves escaping characters. What if you want to insert the string "I don't like 1"? SQL uses the single quote character to show the end of the field. Most SQL databases would require you to write this as 'I don''t like 1'. But rules for other special characters such as backslashes differ between databases. Rather than trying to code this yourself, HDBC can handle it all for you. Let’s look at an example:
conn <- connectSqlite3 "test1.db"run conn "INSERT INTO test VALUES (?, ?)" [toSql 0, toSql "zero"]

1

commit conndisconnect conn
The question marks in the INSERT query in this example are the placeholders. We then pass the parameters that are going to go there. run takes a list of SqlValue, so we use toSql to convert each item into an SqlValue. HDBC automatically handles conversion of the String "zero" into the appropriate representation for the database in use.
This approach won’t actually achieve any performance benefits when inserting large amounts of data. For that, we need more control over the process of creating the SQL query. We’ll discuss that in the next section.
Replaceable parameters work only for parts of the queries where the server is expecting a value, such as a WHERE clause in a SELECT statement or a value for an INSERT statement. You cannot say
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Prepared Statements
Inhaltsvorschau
HDBC defines a function prepare that will prepare a SQL query, but it does not yet bind the parameters to the query. prepare returns a Statement representing the compiled query.
Once you have a Statement, you can do a number of things with it. You can call execute on it one or more times. After calling execute on a query that returns data, you can use one of the fetch functions to retrieve that data. Functions such as run and quickQuery' use statements and execute internally; they are simply shortcuts to let you perform common tasks quickly. When you need more control over what’s happening, you can use a Statement instead of a function such as run.
Let’s look at using statements to insert multiple values with a single query. Here’s an example:
conn <- connectSqlite3 "test1.db"stmt <- prepare conn "INSERT INTO test VALUES (?, ?)"execute stmt [toSql 1, toSql "one"]

1

execute stmt [toSql 2, toSql "two"]

1

execute stmt [toSql 3, toSql "three"]

1

execute stmt [toSql 4, SqlNull]

1

commit conndisconnect conn
Here, we create a prepared statement and call it stmt. We then execute that statement four times and pass different parameters each time. These parameters are used, in order, to replace the question marks in the original query string. Finally, we commit the changes and disconnect the database.
HDBC also provides a function, executeMany, that can be useful in situations such as this. executeMany simply takes a list of rows of data to call the statement with. Here’s an example:
        

        conn <- connectSqlite3 "test1.db"

        

        stmt <- prepare conn "INSERT INTO test VALUES (?, ?)"

        

        executeMany stmt [[toSql 5, toSql "five's nice"], [toSql 6, SqlNull]]

        

        commit conn

        

        disconnect conn
On the server, most databases will have an optimization that they can apply to executeMany so that they only have to compile this query string once, rather than twice. This can lead to a dramatic performance gain when inserting large amounts of data at one time. Some databases can also apply this optimization to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading Results
Inhaltsvorschau
So far, we have discussed queries that insert or change data. Let’s now go over getting data back out of the database. The type of the function quickQuery' looks very similar to run, but it returns a list of results instead of a count of changed rows. quickQuery' is normally used with SELECT statements. Let’s see an example:
conn <- connectSqlite3 "test1.db"quickQuery' conn "SELECT * from test where id < 2" []

[[SqlString "0",SqlNull],[SqlString "0",SqlString "zero"],

[SqlString "1",SqlString "one"],[SqlString "0",SqlNull],

[SqlString "0",SqlString "zero"],[SqlString "1",SqlString "one"]]

disconnect conn
quickQuery' works with replaceable parameters, as we just discussed. In this case, we aren’t using any, so the set of values to replace is the empty list at the end of the quickQuery' call. quickQuery' returns a list of rows, where each row is itself represented as [SqlValue]. The values in the row are listed in the order returned by the database. You can use fromSql to convert them into regular Haskell types as needed.
It’s a bit hard to read that output. Let’s extend this example to format the results nicely. Here’s some code to do that:
-- file: ch21/query.hs

import Database.HDBC.Sqlite3 (connectSqlite3)

import Database.HDBC



{- | Define a function that takes an integer representing the maximum

id value to look up.  Will fetch all matching rows from the test database

and print them to the screen in a friendly format. -}

query :: Int -> IO ()

query maxId = 

    do -- Connect to the database

       conn <- connectSqlite3 "test1.db"



       -- Run the query and store the results in r

       r <- quickQuery' conn

            "SELECT id, desc from test where id <= ? ORDER BY id, desc"

            [toSql maxId]



       -- Convert each row into a String

       let stringRows = map convRow r

                        

       -- Print the rows out

       mapM_ putStrLn stringRows



       -- And disconnect from the database

       disconnect conn



    where convRow :: [SqlValue] -> String

          convRow [sqlId, sqlDesc] = 

              show intid ++ ": " ++ desc

              where intid = (fromSql sqlId)::Integer

                    desc = case fromSql sqlDesc of

                             Just x -> x

                             Nothing -> "NULL"

          convRow x = fail $ "Unexpected result: " ++ show x
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Database Metadata
Inhaltsvorschau
Sometimes it can be useful for a program to learn information about the database itself. For instance, a program may want to see what tables exist so that it can automatically create missing tables or upgrade the database schema. In some cases, a program may need to alter its behavior depending on the database backend in use.
First, there is a getTables function that will obtain a list of defined tables in a database. You can also use the describeTable function, which will provide information about the defined columns in a given table.
You can learn about the database server in use by calling dbServerVer and proxiedClientName, for instance. The dbTransactionSupport function can be used to determine whether or not a given database supports transactions. Let’s look at an example of some of these items:
conn <- connectSqlite3 "test1.db"getTables conn

["test"]

proxiedClientName conn

"sqlite3"

dbServerVer conn

"3.5.6"

dbTransactionSupport conn

Truedisconnect conn
You can also learn about the results of a specific query by obtaining information from its statement. The describeResult function returns [(String, SqlColDesc)], a list of pairs. The first item gives the column name, and the second provides information about the column: the type, the size, and whether it may be NULL. The full specification is given in the HDBC API reference.
Some databases may not be able to provide all this metadata. In these circumstances, an exception will be raised. Sqlite3, for instance, does not support describeResult or describeTable as of this writing.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Error Handling
Inhaltsvorschau
HDBC will raise exceptions when errors occur. The exceptions have type SqlError. They convey information from the underlying SQL engine, such as the database’s state, the error message, and the database’s numeric error code, if any.
ghci does not know how to display an SqlError on the screen when it occurs. While the exception will cause the program to terminate, it will not display a useful message. Here’s an example:
conn <- connectSqlite3 "test1.db"quickQuery' conn "SELECT * from test2" []

*** Exception: (unknown)

disconnect conn
Here we tried to SELECT data from a table that didn’t exist. The error message we got wasn’t helpful. There’s a utility function, handleSqlError, that will catch an SqlError and re-raise it as an IOError. In this form, it will be printable onscreen, but it will be more difficult to extract specific pieces of information programmatically. Let’s look at its usage:
conn <- connectSqlite3 "test1.db"handleSqlError $ quickQuery' conn "SELECT * from test2" []

*** Exception: user error (SQL error: SqlError {seState = "", seNativeError = 1, 

seErrorMsg = "prepare 20: SELECT * from test2: no such table: test2"})

disconnect conn
Here we got more information, including a message saying that there is no such table as test2. This is much more helpful. Many HDBC programmers make it a standard practice to start their programs with main = handleSqlError $ do, which will ensure that every uncaught SqlError will be printed in a helpful manner.
There are also catchSql and handleSql—similar to the standard catch and handle functions. catchSql and handleSql will intercept HDBC errors only. For more information on error handling, refer to .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 22: Extended Example: Web Client Programming
Inhaltsvorschau
By this point, you’ve seen how to interact with a database, parse things, and handle errors. Let’s now take this a step farther and introduce a web client library to the mix.
We’ll develop a real application in this chapter: a podcast downloader, or podcatcher. The idea of a podcatcher is simple. It is given a list of URLs to process. Downloading each of these URLs results in an XML file in the RSS format. Inside this XML file, we’ll find references to URLs for audio files to download.
Podcatchers usually let the user subscribe to podcasts by adding RSS URLs to their configuration. Then, the user can periodically run an update operation. The podcatcher will download the RSS documents, examine them for audio file references, and download any audio files that haven’t already been downloaded on behalf of this user.
Users often call the RSS document a podcast or the podcast feed, and call each individual audio file an episode.
To make this happen, we need to have several things:
  • An HTTP client library to download files
  • An XML parser
  • A way to specify and persistently store which podcasts we’re interested in
  • A way to persistently store which podcast episodes we’ve already downloaded
The last two items can be accommodated via a database that we’ll set up using HDBC. The first two can be accommodated via other library modules we’ll introduce in this chapter.
The code in this chapter was written specifically for this book, but is based on code written for hpodder, an existing podcatcher written in Haskell. hpodder has many more features than the examples presented here, which make it too long and complex to cover in this book. If you are interested in studying hpodder, its source code is freely available at http://software.complete.org/hpodder.
We’ll write the code for this chapter in pieces. Each piece will be its own Haskell module. You’ll be able to play with each piece by itself in ghci. At the end, we’ll write the final code that ties everything together into a finished application. We’ll start with the basic types that we’ll need to use.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Basic Types
Inhaltsvorschau
The first thing to do is have some idea of the basic information that will be important to the application. This will generally be information about the podcasts the user is interested in, plus information about episodes that we have seen and processed. It’s easy enough to change this later if needed, but since we’ll be importing it just about everywhere, we’ll define it first:
-- file: ch22/PodTypes.hs

module PodTypes where



data Podcast =

    Podcast {castId :: Integer, -- ^ Numeric ID for this podcast

             castURL :: String  -- ^ Its feed URL

            }

    deriving (Eq, Show, Read)



data Episode = 

    Episode {epId :: Integer,     -- ^ Numeric ID for this episode

             epCast :: Podcast,   -- ^ The ID of the podcast it came from

             epURL :: String,     -- ^ The download URL for this episode

             epDone :: Bool       -- ^ Whether or not we are done with this ep

            }

    deriving (Eq, Show, Read)
We’ll be storing this information in a database. Having a unique identifier for both a podcast and an episode makes it easy to find which episodes belong to a particular podcast, load information for a particular podcast or episode, or handle future cases such as changing URLs for podcasts.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Database
Inhaltsvorschau
Next, we’ll write the code to make possible persistent storage in a database. We’ll primarily be interested in moving data between the Haskell structures that we defined in PodTypes.hs and the database on disk. Also, the first time the user runs the program, the user will need to create the database tables that he will use to store our data.
We’ll use HDBC (see ) to interact with a Sqlite database. Sqlite is lightweight and self-contained, which makes it perfect for this project. For information on installing HDBC and Sqlite, consult :
-- file: ch22/PodDB.hs

module PodDB where



import Database.HDBC

import Database.HDBC.Sqlite3

import PodTypes

import Control.Monad(when)

import Data.List(sort)



-- | Initialize DB and return database Connection

connect :: FilePath -> IO Connection

connect fp =

    do dbh <- connectSqlite3 fp

       prepDB dbh

       return dbh



{- | Prepare the database for our data.



We create two tables and ask the database engine to verify some pieces

of data consistency for us:



* castid and epid both are unique primary keys and must never be duplicated

* castURL also is unique

* In the episodes table, for a given podcast (epcast), there must be only

  one instance of each given URL or episode ID

-}

prepDB :: IConnection conn => conn -> IO ()

prepDB dbh =

    do tables <- getTables dbh

       when (not ("podcasts" `elem` tables)) $

           do run dbh "CREATE TABLE podcasts (\

                       \castid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,\

                       \castURL TEXT NOT NULL UNIQUE)" []

              return ()

       when (not ("episodes" `elem` tables)) $

           do run dbh "CREATE TABLE episodes (\

                       \epid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,\

                       \epcastid INTEGER NOT NULL,\

                       \epurl TEXT NOT NULL,\

                       \epdone INTEGER NOT NULL,\

                       \UNIQUE(epcastid, epurl),\

                       \UNIQUE(epcastid, epid))" []

              return ()

       commit dbh



{- | Adds a new podcast to the database.  Ignores the castid on the

incoming podcast, and returns a new object with the castid populated.



An attempt to add a podcast that already exists is an error. -}

addPodcast :: IConnection conn => conn -> Podcast -> IO Podcast

addPodcast dbh podcast = 

    handleSql errorHandler $

      do -- Insert the castURL into the table.  The database

         -- will automatically assign a cast ID.

         run dbh "INSERT INTO podcasts (castURL) VALUES (?)"

             [toSql (castURL podcast)]

         -- Find out the castID for the URL we just added.

         r <- quickQuery' dbh "SELECT castid FROM podcasts WHERE castURL = ?"

              [toSql (castURL podcast)]

         case r of

           [[x]] -> return $ podcast {castId = fromSql x}

           y -> fail $ "addPodcast: unexpected result: " ++ show y

    where errorHandler e = 

              do fail $ "Error adding podcast; does this URL already exist?\n"

                     ++ show e



{- | Adds a new episode to the database. 



Since this is done by automation instead of by user request, we will

simply ignore requests to add duplicate episodes.  This way, when we are

processing a feed, each URL encountered can be fed to this function,

without having to first look it up in the DB.



Also, we generally won't care about the new ID here, so don't bother

fetching it. -}

addEpisode :: IConnection conn => conn -> Episode -> IO ()

addEpisode dbh ep =

    run dbh "INSERT OR IGNORE INTO episodes (epCastId, epURL, epDone) \

                \VALUES (?, ?, ?)"

                [toSql (castId . epCast $ ep), toSql (epURL ep),

                 toSql (epDone ep)]

    >> return ()

       

{- | Modifies an existing podcast.  Looks up the given podcast by

ID and modifies the database record to match the passed Podcast. -}

updatePodcast :: IConnection conn => conn -> Podcast -> IO ()

updatePodcast dbh podcast =

    run dbh "UPDATE podcasts SET castURL = ? WHERE castId = ?" 

            [toSql (castURL podcast), toSql (castId podcast)]

    >> return ()



{- | Modifies an existing episode.  Looks it up by ID and modifies the

database record to match the given episode. -}

updateEpisode :: IConnection conn => conn -> Episode -> IO ()

updateEpisode dbh episode =

    run dbh "UPDATE episodes SET epCastId = ?, epURL = ?, epDone = ? \

             \WHERE epId = ?"

             [toSql (castId . epCast $ episode),

              toSql (epURL episode),

              toSql (epDone episode),

              toSql (epId episode)]

    >> return ()



{- | Remove a podcast.  First removes any episodes that may exist

for this podcast. -}

removePodcast :: IConnection conn => conn -> Podcast -> IO ()

removePodcast dbh podcast =

    do run dbh "DELETE FROM episodes WHERE epcastid = ?" 

         [toSql (castId podcast)]

       run dbh "DELETE FROM podcasts WHERE castid = ?"

         [toSql (castId podcast)]

       return ()



{- | Gets a list of all podcasts. -}

getPodcasts :: IConnection conn => conn -> IO [Podcast]

getPodcasts dbh =

    do res <- quickQuery' dbh 

              "SELECT castid, casturl FROM podcasts ORDER BY castid" []

       return (map convPodcastRow res)



{- | Get a particular podcast.  Nothing if the ID doesn't match, or

Just Podcast if it does. -}

getPodcast :: IConnection conn => conn -> Integer -> IO (Maybe Podcast)

getPodcast dbh wantedId =

    do res <- quickQuery' dbh 

              "SELECT castid, casturl FROM podcasts WHERE castid = ?"

              [toSql wantedId]

       case res of

         [x] -> return (Just (convPodcastRow x))

         [] -> return Nothing

         x -> fail $ "Really bad error; more than one podcast with ID"



{- | Convert the result of a SELECT into a Podcast record -}

convPodcastRow :: [SqlValue] -> Podcast

convPodcastRow [svId, svURL] =

    Podcast {castId = fromSql svId,

             castURL = fromSql svURL}

convPodcastRow x = error $ "Can't convert podcast row " ++ show x



{- | Get all episodes for a particular podcast. -}

getPodcastEpisodes :: IConnection conn => conn -> Podcast -> IO [Episode]

getPodcastEpisodes dbh pc =

    do r <- quickQuery' dbh

            "SELECT epId, epURL, epDone FROM episodes WHERE epCastId = ?"

            [toSql (castId pc)]

       return (map convEpisodeRow r)

    where convEpisodeRow [svId, svURL, svDone] =

              Episode {epId = fromSql svId, epURL = fromSql svURL,

                       epDone = fromSql svDone, epCast = pc}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Parser
Inhaltsvorschau
Now that we have the database component, we need to have code to parse the podcast feeds. These are XML files that contain various information. Here’s an example XML file to show you what they look like:
<?xml version="1.0" encoding="UTF-8"?>

<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd" version="2.0">

  <channel>

    <title>Haskell Radio</title>

    <link>http://www.example.com/radio/</link>

    <description>Description of this podcast</description>

    <item>

      <title>Episode 2: Lambdas</title>

      <link>http://www.example.com/radio/lambdas</link>

      <enclosure url="http://www.example.com/radio/lambdas.mp3"

       type="audio/mpeg" length="10485760"/>

    </item>

    <item>

      <title>Episode 1: Parsec</title>

      <link>http://www.example.com/radio/parsec</link>

      <enclosure url="http://www.example.com/radio/parsec.mp3"

       type="audio/mpeg" length="10485150"/>

    </item>

  </channel>

</rss>
Out of these files, we are mainly interested in two things: the podcast title and the enclosure URLs. We use the HaXml toolkit to parse the XML file. Here’s the source code for this component:
-- file: ch22/PodParser.hs

module PodParser where



import PodTypes

import Text.XML.HaXml

import Text.XML.HaXml.Parse

import Text.XML.HaXml.Html.Generate(showattr)

import Data.Char

import Data.List



data PodItem = PodItem {itemtitle :: String,

                  enclosureurl :: String

                  }

          deriving (Eq, Show, Read)



data Feed = Feed {channeltitle :: String,

                  items :: [PodItem]}

            deriving (Eq, Show, Read)



{- | Given a podcast and an PodItem, produce an Episode -}

item2ep :: Podcast -> PodItem -> Episode

item2ep pc item =

    Episode {epId = 0,

             epCast = pc,

             epURL = enclosureurl item,

             epDone = False}



{- | Parse the data from a given string, with the given name to use

in error messages. -}

parse :: String -> String -> Feed

parse content name = 

    Feed {channeltitle = getTitle doc,

          items = getEnclosures doc}



    where parseResult = xmlParse name (stripUnicodeBOM content)

          doc = getContent parseResult



          getContent :: Document -> Content

          getContent (Document _ _ e _) = CElem e

          

          {- | Some Unicode documents begin with a binary sequence;

             strip it off before processing. -}

          stripUnicodeBOM :: String -> String

          stripUnicodeBOM ('\xef':'\xbb':'\xbf':x) = x

          stripUnicodeBOM x = x



{- | Pull out the channel part of the document.



Note that HaXml defines CFilter as:



> type CFilter = Content -> [Content]

-}

channel :: CFilter

channel = tag "rss" /> tag "channel"



getTitle :: Content -> String

getTitle doc =

    contentToStringDefault "Untitled Podcast" 

        (channel /> tag "title" /> txt $ doc)



getEnclosures :: Content -> [PodItem]

getEnclosures doc =

    concatMap procPodItem $ getPodItems doc

    where procPodItem :: Content -> [PodItem]

          procPodItem item = concatMap (procEnclosure title) enclosure

              where title = contentToStringDefault "Untitled Episode"

                               (keep /> tag "title" /> txt $ item)

                    enclosure = (keep /> tag "enclosure") item



          getPodItems :: CFilter

          getPodItems = channel /> tag "item"



          procEnclosure :: String -> Content -> [PodItem]

          procEnclosure title enclosure =

              map makePodItem (showattr "url" enclosure)

              where makePodItem :: Content -> PodItem

                    makePodItem x = PodItem {itemtitle = title,

                                       enclosureurl = contentToString [x]}



{- | Convert [Content] to a printable String, with a default if the 

passed-in [Content] is [], signifying a lack of a match. -}

contentToStringDefault :: String -> [Content] -> String

contentToStringDefault msg [] = msg

contentToStringDefault _ x = contentToString x



{- | Convert [Content] to a printable string, taking care to unescape it.



An implementation without unescaping would simply be:



> contentToString = concatMap (show . content)



Because HaXml's unescaping works only on Elements, we must make sure that

whatever Content we have is wrapped in an Element, then use txt to

pull the insides back out. -}

contentToString :: [Content] -> String

contentToString = 

    concatMap procContent

    where procContent x = 

              verbatim $ keep /> txt $ CElem (unesc (fakeElem x))



          fakeElem :: Content -> Element

          fakeElem x = Elem "fake" [] [x]



          unesc :: Element -> Element

          unesc = xmlUnEscape stdXmlEscaper
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Downloading
Inhaltsvorschau
The next part of our program is a module to download data. We’ll need to download two different types of data: the content of a podcast and the audio for each episode. In the former case, we’ll parse the data and update our database. For the latter, we’ll write the data out to a file on disk.
We’ll be downloading from HTTP servers, so we’ll use a Haskell see HTTP library. For downloading podcast feeds, we’ll download the document, parse it, and update the database. For episode audio, we’ll download the file, write it to disk, and mark it downloaded in the database. Here’s the code:
-- file: ch22/PodDownload.hs

module PodDownload where

import PodTypes

import PodDB

import PodParser

import Network.HTTP

import System.IO

import Database.HDBC

import Data.Maybe

import Network.URI



{- | Download a URL.  (Left errorMessage) if an error,

(Right doc) if success. -}

downloadURL :: String -> IO (Either String String)

downloadURL url =

    do resp <- simpleHTTP request

       case resp of

         Left x -> return $ Left ("Error connecting: " ++ show x)

         Right r -> 

             case rspCode r of

               (2,_,_) -> return $ Right (rspBody r)

               (3,_,_) -> -- A HTTP redirect

                 case findHeader HdrLocation r of

                   Nothing -> return $ Left (show r)

                   Just url -> downloadURL url

               _ -> return $ Left (show r)

    where request = Request {rqURI = uri,

                             rqMethod = GET,

                             rqHeaders = [],

                             rqBody = ""}

          uri = fromJust $ parseURI url



{- | Update the podcast in the database. -}

updatePodcastFromFeed :: IConnection conn => conn -> Podcast -> IO ()

updatePodcastFromFeed dbh pc =

    do resp <- downloadURL (castURL pc)

       case resp of

         Left x -> putStrLn x

         Right doc -> updateDB doc



    where updateDB doc = 

              do mapM_ (addEpisode dbh) episodes

                 commit dbh

              where feed = parse doc (castURL pc)

                    episodes = map (item2ep pc) (items feed)



{- | Downloads an episode, returning a String representing

the filename it was placed into, or Nothing on error. -}

getEpisode :: IConnection conn => conn -> Episode -> IO (Maybe String)

getEpisode dbh ep =

    do resp <- downloadURL (epURL ep)

       case resp of

         Left x -> do putStrLn x

                      return Nothing

         Right doc -> 

             do file <- openBinaryFile filename WriteMode

                hPutStr file doc

                hClose file

                updateEpisode dbh (ep {epDone = True})

                commit dbh

                return (Just filename)

          -- This function ought to apply an extension based on the file type

    where filename = "pod." ++ (show . castId . epCast $ ep) ++ "." ++ 

                     (show (epId ep)) ++ ".mp3"
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Main Program
Inhaltsvorschau
Finally, we need a main program to tie it all together. Here’s our main module:
-- file: ch22/PodMain.hs

module Main where



import PodDownload

import PodDB

import PodTypes

import System.Environment

import Database.HDBC

import Network.Socket(withSocketsDo)



main = withSocketsDo $ handleSqlError $

    do args <- getArgs

       dbh <- connect "pod.db"

       case args of

         ["add", url] -> add dbh url

         ["update"] -> update dbh

         ["download"] -> download dbh

         ["fetch"] -> do update dbh

                         download dbh

         _ -> syntaxError

       disconnect dbh



add dbh url = 

    do addPodcast dbh pc

       commit dbh

    where pc = Podcast {castId = 0, castURL = url}



update dbh = 

    do pclist <- getPodcasts dbh

       mapM_ procPodcast pclist

    where procPodcast pc =

              do putStrLn $ "Updating from " ++ (castURL pc)

                 updatePodcastFromFeed dbh pc



download dbh =

    do pclist <- getPodcasts dbh

       mapM_ procPodcast pclist

    where procPodcast pc =

              do putStrLn $ "Considering " ++ (castURL pc)

                 episodelist <- getPodcastEpisodes dbh pc

                 let dleps = filter (\ep -> epDone ep == False)

                             episodelist

                 mapM_ procEpisode dleps

          procEpisode ep =

              do putStrLn $ "Downloading " ++ (epURL ep)

                 getEpisode dbh ep



syntaxError = putStrLn 

  "Usage: pod command [args]\n\

  \\n\

  \pod add url      Adds a new podcast with the given URL\n\

  \pod download     Downloads all pending episodes\n\

  \pod fetch        Updates, then downloads\n\

  \pod update       Downloads podcast feeds, looks for new episodes\n"
We have a very simple command-line parser with a function to indicate a command-line syntax error, plus small functions to handle the different command-line arguments.
You can compile this program with a command like this:
ghc --make -O2 -o pod -package HTTP -package HaXml -package network \

    -package HDBC -package HDBC-sqlite3 PodMain.hs
Alternatively, you could use a Cabal file as documented in to build this project:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 23: GUI Programming with gtk2hs
Inhaltsvorschau
Throughout this book, we have been developing simple text-based tools. While these are often ideal interfaces, sometimes a graphical user interface (GUI) is required. There are several GUI toolkits available for Haskell. In this chapter, we will look at one of them, gtk2hs.
Before we dive in to working with gtk2hs, you’ll need to get it installed. On most Linux, BSD, or other POSIX platforms, you will find ready-made gtk2hs packages. You will generally need to install the GTK+ development environment, Glade, and gtk2hs. The specifics of doing so vary by distribution.
Windows and Mac developers should consult the gtk2hs downloads site at http://www.haskell.org/gtk2hs/download/. Begin by downloading gtk2hs from there. Then you will also need Glade version 3. Mac developers can find this at http://www.macports.org/, while Windows developers should consult http://sourceforge.net/projects/gladewin32.
Before examining the code, let’s pause a brief moment and consider the architecture of the system we are going to use. First off, we have GTK+. GTK+ is a cross-platform GUI-building toolkit, implemented in C. It runs on Windows, Mac, Linux, BSDs, and more. It is also the toolkit beneath the GNOME desktop environment.
Next, we have Glade. Glade is a user-interface designer, which lets you graphically lay out your application’s windows and dialogs. Glade saves the interface in XML files, which your application will load at runtime.
The last piece of this puzzle is gtk2hs. This is the Haskell binding for GTK+, Glade, and several related libraries. It is one of many language bindings available for GTK+.
In this chapter, we are going to develop a GUI for the podcast downloader we first developed in . Our first task is to design the user interface in Glade. Once we have accomplished that, we will write the Haskell code to integrate it with the application.
Because this is a Haskell book, rather than a GUI design book, we will move fast through some of these early parts. For more information on interface design with Glade, you may wish to refer to one of these resources:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Installing gtk2hs
Inhaltsvorschau
Before we dive in to working with gtk2hs, you’ll need to get it installed. On most Linux, BSD, or other POSIX platforms, you will find ready-made gtk2hs packages. You will generally need to install the GTK+ development environment, Glade, and gtk2hs. The specifics of doing so vary by distribution.
Windows and Mac developers should consult the gtk2hs downloads site at http://www.haskell.org/gtk2hs/download/. Begin by downloading gtk2hs from there. Then you will also need Glade version 3. Mac developers can find this at http://www.macports.org/, while Windows developers should consult http://sourceforge.net/projects/gladewin32.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Overview of the GTK+ Stack
Inhaltsvorschau
Before examining the code, let’s pause a brief moment and consider the architecture of the system we are going to use. First off, we have GTK+. GTK+ is a cross-platform GUI-building toolkit, implemented in C. It runs on Windows, Mac, Linux, BSDs, and more. It is also the toolkit beneath the GNOME desktop environment.
Next, we have Glade. Glade is a user-interface designer, which lets you graphically lay out your application’s windows and dialogs. Glade saves the interface in XML files, which your application will load at runtime.
The last piece of this puzzle is gtk2hs. This is the Haskell binding for GTK+, Glade, and several related libraries. It is one of many language bindings available for GTK+.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
User Interface Design with Glade
Inhaltsvorschau
In this chapter, we are going to develop a GUI for the podcast downloader we first developed in . Our first task is to design the user interface in Glade. Once we have accomplished that, we will write the Haskell code to integrate it with the application.
Because this is a Haskell book, rather than a GUI design book, we will move fast through some of these early parts. For more information on interface design with Glade, you may wish to refer to one of these resources:
The Glade homepage
Contains documentation for Glade; see http://glade.gnome.org/.
The GTK+ homepage
Contains information about the different widgets. Refer to the documentation section, and then the stable GTK documentation area; see http://www.gtk.org/.
The gtk2hs homepage
Also has a useful documentation section, which contains an API reference to gtk2hs as well as a glade tutorial; see http://www.haskell.org/gtk2hs/documentation/.
Glade is a user-interface design tool. It lets us use a graphical interface to design our graphical interface. We could build up the window components using a bunch of calls to GTK+ functions, but it is usually easier to do this with Glade.
The fundamental “thing” we work with in GTK+ is the widget. A widget represents any part of the GUI, and may contain other widgets. Some examples of widgets include a window, dialog box, button, and text within the button.
Glade, then, is a widget layout tool. We set up a whole tree of widgets, with top-level windows at the top of the tree. You can think of Glade and widgets in somewhat the same terms as HTML: you can arrange widgets in a table-like layout, set up padding rules, and structure the entire description in a hierarchical way.
Glade saves the widget descriptions into an XML file. Our program loads this XML file at runtime. We load the widgets by asking the Glade runtime library to load a widget with a specific name.
shows a screenshot of an example working with Glade to design our application’s main screen.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Event-Driven Programming
Inhaltsvorschau
GTK+, like many GUI toolkits, is an event-driven toolkit. That means that instead of, say, displaying a dialog box and waiting for the user to click on a button, we instead tell gtk2hs what function to call if a certain button is clicked, but don’t sit there waiting for a click in the dialog box.
This is different from the model traditionally used for console programs. When you think about it, though, it almost has to be. A GUI program could have multiple windows open, and writing code to sit there waiting for input in the particular combination of open windows could be a complicated proposition.
Event-driven programming complements Haskell nicely. As we’ve discussed over and over in this book, functional languages thrive on passing around functions. So we’ll be passing functions to gtk2hs that get called when certain events occur. These are known as callback functions.
At the core of a GTK+ program is the main loop. This is the part of the program that waits for actions from the user or commands from the program and carries them out. The GTK+ main loop is handled entirely by GTK+. To us, it looks like an I/O action that we execute, which doesn’t return until the GUI has been disposed of.
Since the main loop is responsible for doing everything from handling clicks of a mouse to redrawing a window when it has been uncovered, it must always be available. We can’t just run a long-running task—such as downloading a podcast episode—from within the main loop. This would make the GUI unresponsive, and actions such as clicking a Cancel button wouldn’t be processed in a timely manner.
Therefore, we will be using multithreading to handle these long-running tasks. More information on multithreading can be found in . For now, just know that we will use forkIO to create new threads for long-running tasks such as downloading podcast feeds and episodes. For very quick tasks, such as adding a new podcast to the database, we will not bother with a separate thread since it will be executed so fast that the user will never notice.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Initializing the GUI
Inhaltsvorschau
Our first steps are going to involve initializing the GUI for our program. For reasons that we’ll explain later in this chapter in , we’re going to have a small file called PodLocalMain.hs that loads PodMain and passes to it the path to podresources.glade, which is the XML file saved by Glade that gives the information about our GUI widgets:
-- file: ch23/PodLocalMain.hs

module Main where



import qualified PodMainGUI



main = PodMainGUI.main "podresources.glade"
Now, let’s consider PodMainGUI.hs. This file is the only Haskell source file that we had to modify from the example in to make it work as a GUI. Let’s begin by looking at the start of our new PodMainGUI.hs file—we’ve renamed it from PodMain.hs for clarity:
-- file: ch23/PodMainGUI.hs

module PodMainGUI where



import PodDownload

import PodDB

import PodTypes

import System.Environment

import Database.HDBC

import Network.Socket(withSocketsDo)



-- GUI libraries



import Graphics.UI.Gtk hiding (disconnect)

import Graphics.UI.Gtk.Glade



-- Threading



import Control.Concurrent
This first part of PodMainGUI.hs is similar to our non-GUI version. We import three additional components, however. First, we have Graphics.UI.Gtk, which provides most of the GTK+ functions we will be using. Both this module and Database.HDBC provide a function named disconnect. Since we’ll be using the HDBC version, but not the version, we don’t import that function from Graphics.UI.Gtk. Graphics.UI.Gtk.Glade contains functions needed for loading and working with our Glade file.
We also import Control.Concurrent, which has the basics needed for multithreaded programming. We’ll use a few functions from here as just described once we get into the guts of the program. Next, let’s define a type to store information about our GUI:
-- file: ch23/PodMainGUI.hs

-- | Our main GUI type

data GUI = GUI {

      mainWin :: Window,

      mwAddBt :: Button,

      mwUpdateBt :: Button,

      mwDownloadBt :: Button,

      mwFetchBt :: Button,

      mwExitBt :: Button,

      statusWin :: Dialog,

      swOKBt :: Button,

      swCancelBt :: Button,

      swLabel :: Label,

      addWin :: Dialog,

      awOKBt :: Button,

      awCancelBt :: Button,

      awEntry :: Entry}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Add Podcast Window
Inhaltsvorschau
Now that we’ve covered the main window, let’s talk about the other windows that our application presents, starting with the Add Podcast window. When the user clicks the button to add a new podcast, we need to pop up a dialog box to prompt for the URL of the podcast. We have defined this dialog box in Glade, so all we need to do is set it up:
-- file: ch23/PodMainGUI.hs

guiAdd gui dbh = 

    do -- Initialize the add URL window

       entrySetText (awEntry gui) ""

       onClicked (awCancelBt gui) (widgetHide (addWin gui))

       onClicked (awOKBt gui) procOK

       

       -- Show the add URL window

       windowPresent (addWin gui)

    where procOK =

              do url <- entryGetText (awEntry gui)

                 widgetHide (addWin gui) -- Remove the dialog

                 add dbh url             -- Add to the DB
We start by calling entrySetText to set the contents of the entry box (the place where the user types in the URL) to the empty string. That’s because the same widget gets reused over the lifetime of the program, and we don’t want the last URL the user entered to remain there. Next, we set up actions for the two buttons in the dialog. If the user clicks on the cancel button, we simply remove the dialog box from the screen by calling widgetHide on it. If the user clicks the OK button, we call procOK.
procOK starts by retrieving the supplied URL from the entry widget. Next, it uses widgetHide to get rid of the dialog box. Finally, it calls add to add the URL to the database. This add is exactly the same function as we had in the non-GUI version of the program.
The last thing we do in guiAdd is actually display the pop-up window. That’s done by calling windowPresent, which is the opposite of widgetHide.
Note that the guiAdd function returns almost immediately. It just sets up the widgets and causes the box to be displayed; at no point does it block waiting for input. shows what the dialog box looks like.
Figure : Screenshot of the add-a-podcast window
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Long-Running Tasks
Inhaltsvorschau
As we think about the buttons available in the main window, three of them correspond to tasks that could take a while to complete: update, download, and fetch. While these operations take place, we’d like to do two things with our GUI: provide the user with the status of the operation and the ability to cancel the operation as it is in progress.
Since all three of these things are very similar operations, it makes sense to provide a generic way to handle this interaction. We have defined a single status window widget in the Glade file that will be used by all three of these. In our Haskell source code, we’ll define a generic statusWindow function that will be used by all three of these operations as well.
statusWindow takes four parameters: the GUI information, the database information, a String giving the title of the window, and a function that will perform the operation. This function will itself be passed a function that it can call to report its progress. Here’s the code:
-- file: ch23/PodMainGUI.hs

statusWindow :: IConnection conn =>

                GUI 

             -> conn 

             -> String 

             -> ((String -> IO ()) -> IO ())

             -> IO ()

statusWindow gui dbh title func =

    do -- Clear the status text

       labelSetText (swLabel gui) ""

       

       -- Disable the OK button, enable Cancel button

       widgetSetSensitivity (swOKBt gui) False

       widgetSetSensitivity (swCancelBt gui) True



       -- Set the title

       windowSetTitle (statusWin gui) title



       -- Start the operation

       childThread <- forkIO childTasks



       -- Define what happens when clicking on Cancel

       onClicked (swCancelBt gui) (cancelChild childThread)

       

       -- Show the window

       windowPresent (statusWin gui)

    where childTasks =

              do updateLabel "Starting thread..."

                 func updateLabel

                 -- After the child task finishes, enable OK

                 -- and disable Cancel

                 enableOK

                 

          enableOK = 

              do widgetSetSensitivity (swCancelBt gui) False

                 widgetSetSensitivity (swOKBt gui) True

                 onClicked (swOKBt gui) (widgetHide (statusWin gui))

                 return ()



          updateLabel text =

              labelSetText (swLabel gui) text

          cancelChild childThread =

              do killThread childThread

                 yield

                 updateLabel "Action has been cancelled."

                 enableOK
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Cabal
Inhaltsvorschau
We presented a Cabal file to build this project for the command-line version in . We need to make a few tweaks for it to work with our GUI version. First, there’s the obvious need to add the gtk2hs packages to the list of build dependencies. There is also the matter of the Glade XML file.
Earlier, we wrote a PodLocalMain.hs file that simply assumed this file is named podresources.glade and stored in the current working directory. For a real, system-wide installation, we can’t make that assumption. Moreover, different systems may place the file in different locations.
Cabal provides a way around this problem. It automatically generates a module that exports functions that can interrogate the environment. We must add a Data-files line to our Cabal description file. This file names all data files that will be part of a system-wide installation. Then, Cabal will export a Paths_pod module (the “pod” part comes from the Name line in the Cabal file) that we can interrogate for the location at runtime. Here’s our new Cabal description file:
-- ch24/pod.cabal

Name: pod

Version: 1.0.0

Build-type: Simple

Build-Depends: HTTP, HaXml, network, HDBC, HDBC-sqlite3, base, 

               gtk, glade

Data-files: podresources.glade



Executable: pod

Main-Is: PodCabalMain.hs

GHC-Options: -O2
And, to go with it, here’s PodCabalMain.hs:
-- file: ch23/PodCabalMain.hs

module Main where



import qualified PodMainGUI

import Paths_pod(getDataFileName)



main = 

    do gladefn <- getDataFileName "podresources.glade"

       PodMainGUI.main gladefn
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 24: Concurrent and Multicore Programming
Inhaltsvorschau
As we write this book, the landscape of CPU architecture is changing more rapidly than it has in decades.
A concurrent program needs to perform several possibly unrelated tasks at the same time. Consider the example of a game server: it is typically composed of dozens of components, each of which has complicated interactions with the outside world. One component might handle multiuser chat; several more will process players’ inputs and also feed state updates back to them; while yet another performs physics calculations.
The correct operation of a concurrent program does not require multiple cores, though they may improve performance and responsiveness.
In contrast, a parallel program solves a single problem. Consider a financial model that attempts to predict the next minute of fluctuations in the price of a single stock. If we want to apply this model to every stock listed on an exchange—for example, to estimate which ones we should buy and sell—we hope to get an answer more quickly if we run the model on 500 cores than if we use just 1. As this suggests, a parallel program does not usually depend on the presence of multiple cores to work correctly.
Another useful distinction between concurrent and parallel programs lies in their with the outside world. By definition, a concurrent program deals continuously with networking protocols, databases, and the like. A typical parallel program is likely to be more focused: it streams in data, crunches it for a while (with little further I/O), and then streams data back out.
Many traditional languages further blur the already indistinct boundary between concurrent and parallel programming, because they force programmers to use the same primitives to construct both kinds of programs.
In this chapter, we will concern ourselves with concurrent and parallel programs that operate within the boundaries of a single operating system process.
As a building block for concurrent programs, most programming languages provide a way of creating multiple independent
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Defining Concurrency and Parallelism
Inhaltsvorschau
A concurrent program needs to perform several possibly unrelated tasks at the same time. Consider the example of a game server: it is typically composed of dozens of components, each of which has complicated interactions with the outside world. One component might handle multiuser chat; several more will process players’ inputs and also feed state updates back to them; while yet another performs physics calculations.
The correct operation of a concurrent program does not require multiple cores, though they may improve performance and responsiveness.
In contrast, a parallel program solves a single problem. Consider a financial model that attempts to predict the next minute of fluctuations in the price of a single stock. If we want to apply this model to every stock listed on an exchange—for example, to estimate which ones we should buy and sell—we hope to get an answer more quickly if we run the model on 500 cores than if we use just 1. As this suggests, a parallel program does not usually depend on the presence of multiple cores to work correctly.
Another useful distinction between concurrent and parallel programs lies in their with the outside world. By definition, a concurrent program deals continuously with networking protocols, databases, and the like. A typical parallel program is likely to be more focused: it streams in data, crunches it for a while (with little further I/O), and then streams data back out.
Many traditional languages further blur the already indistinct boundary between concurrent and parallel programming, because they force programmers to use the same primitives to construct both kinds of programs.
In this chapter, we will concern ourselves with concurrent and parallel programs that operate within the boundaries of a single operating system process.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Concurrent Programming with Threads
Inhaltsvorschau
As a building block for concurrent programs, most programming languages provide a way of creating multiple independent threads of control. Haskell is no exception, though programming with threads in Haskell looks somewhat different than in other .
In Haskell, a thread is an IO action that executes independently from other threads. To create a thread, we import the module and use the forkIO :
:m +Control.Concurrent:t forkIO

forkIO :: IO () -> IO ThreadId

:m +System.Directory

forkIO (writeFile "xyzzy" "seo craic nua!") >> doesFileExist "xyzzy"

False
The new thread starts to execute almost immediately, and the thread that created it continues to execute concurrently. The thread will stop executing when it reaches the end of its IO action.
The runtime component of GHC does not specify an order in which it executes threads. As a result, in the preceding example, the file xyzzy created by the new thread may or may not have been created by the time the original thread checks for its existence. If we try this example once, and then remove xyzzy and try again, we may get a different result the second time.
Suppose we have a large file to compress and write to disk, but we want to handle a user’s input quickly enough that she will perceive our program as responding immediately. If we use forkIO to write the file out in a separate thread, we can do both simultaneously:
-- file: ch24/Compressor.hs

import Control.Concurrent (forkIO)

import Control.Exception (handle)

import Control.Monad (forever)

import qualified Data.ByteString.Lazy as L

import System.Console.Readline (readline)



-- Provided by the 'zlib' package on http://hackage.haskell.org/

import Codec.Compression.GZip (compress)



main = do

    maybeLine <- readline "Enter a file to compress> "

    case maybeLine of

      Nothing -> return ()      -- user entered EOF

      Just "" -> return ()      -- treat no name as "want to quit"

      Just name -> do

           handle print $ do

             content <- L.readFile name

             forkIO (compressFile name content)

             return ()

           main

  where compressFile path = L.writeFile (path ++ ".gz") . compress
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Simple Communication Between Threads
Inhaltsvorschau
The simplest way to share information between two threads is to let them both use a variable. In our file compression example, the main thread shares both the name of a file and its contents with the other thread. Because Haskell data is immutable by default, this poses no risks: neither thread can modify the other’s view of the file’s name or contents.
We often need to have threads actively communicate with each other. For example, GHC does not provide a way for one thread to find out whether another is still executing, has completed, or has crashed. However, it provides a synchronizing variable type, the MVar, which we can use to create this capability for ourselves.
An MVar acts like a single-element box: it can be either full or empty. We can put something into the box, making it full, or take something out, making it empty:
:t putMVar

putMVar :: MVar a -> a -> IO ()

:t takeMVar

takeMVar :: MVar a -> IO a
If we try to put a value into an MVar that is already full, our thread is put to sleep until another thread takes the value out. Similarly, if we try to take a value from an empty MVar, our thread is put to sleep until some other thread puts a value in:
-- file: ch24/MVarExample.hs

import Control.Concurrent



communicate = do

  m <- newEmptyMVar

  forkIO $ do

    v <- takeMVar m

    putStrLn ("received " ++ show v)

  putStrLn "sending"

  putMVar m "wake up!"
The newEmptyMVar function has a descriptive name. To create an MVar that starts out nonempty, we’d use newMVar:
:t newEmptyMVar

newEmptyMVar :: IO (MVar a)

:t newMVar

newMVar :: a -> IO (MVar a)
Let’s run our example in ghci:
:load MVarExample

[1 of 1] Compiling Main             ( MVarExample.hs, interpreted )

Ok, modules loaded: Main.

communicate

sending

received "wake up!"
If you’re coming from a background of concurrent programming in a traditional language, you can think of an MVar as being useful for two familiar purposes:
  • Sending a message from one thread to another, for example, a notification.
  • Providing mutual exclusion for a piece of mutable data that is shared among threads. We put the data into the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Main Thread and Waiting for Other Threads
Inhaltsvorschau
GHC’s runtime system treats the program’s original thread of control differently from other threads. When this thread finishes executing, the runtime system considers the program as a whole to have completed. If any other threads are executing at the time, they are terminated.
As a result, when we have long-running threads that must not be killed, we need to make special arrangements to ensure that the main thread doesn’t complete until the others do. Let’s develop a small library that makes this easy to do:
-- file: ch24/NiceFork.hs

import Control.Concurrent

import Control.Exception (Exception, try)

import qualified Data.Map as M



data ThreadStatus = Running

                  | Finished         -- terminated normally

                  | Threw Exception  -- killed by uncaught exception

                    deriving (Eq, Show)



-- | Create a new thread manager.

newManager :: IO ThreadManager



-- | Create a new managed thread.

forkManaged :: ThreadManager -> IO () -> IO ThreadId



-- | Immediately return the status of a managed thread.

getStatus :: ThreadManager -> ThreadId -> IO (Maybe ThreadStatus)



-- | Block until a specific managed thread terminates.

waitFor :: ThreadManager -> ThreadId -> IO (Maybe ThreadStatus)



-- | Block until all managed threads terminate.

waitAll :: ThreadManager -> IO ()
We keep our ThreadManager type abstract using the usual recipe: we wrap it in a and prevent clients from creating values of this type. Among our module’s exports, we list the type constructor and the IO action that constructs a manager, but we do not export the data constructor:
-- file: ch24/NiceFork.hs

module NiceFork

    (

      ThreadManager

    , newManager

    , forkManaged

    , getStatus

    , waitFor

    , waitAll

    ) where
For the implementation of ThreadManager, we maintain a map from thread ID to thread state. We’ll refer to this as the thread map:
-- file: ch24/NiceFork.hs

newtype ThreadManager =

    Mgr (MVar (M.Map ThreadId (MVar ThreadStatus)))

    deriving (Eq)



newManager = Mgr `fmap` newMVar M.empty
We have two levels of
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Communicating over Channels
Inhaltsvorschau
For one-shot communications between threads, an MVar is perfectly good. Another type, Chan, provides a one-way communication channel. Here is a simple example of its use:
-- file: ch24/Chan.hs

import Control.Concurrent

import Control.Concurrent.Chan



chanExample = do

  ch <- newChan

  forkIO $ do

    writeChan ch "hello world"

    writeChan ch "now i quit"

  readChan ch >>= print

  readChan ch >>= print
If a Chan is empty, readChan blocks until there is a value to read. The writeChan function never blocks; it writes a new value into a Chan immediately.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Useful Things to Know About
Inhaltsvorschau
Like most Haskell container types, both MVar and Chan are nonstrict: neither evaluates its contents. We mention this not because it’s a problem but because it’s a common blind spot. People tend to assume that these types are strict, perhaps because they’re used in the IO monad.
As for other container types, the upshot of a mistaken guess about the strictness of an MVar or Chan type is often a space or performance leak. Here’s a plausible scenario to consider.
We fork off a thread to perform some expensive computation on another core:
-- file: ch24/Expensive.hs

import Control.Concurrent



notQuiteRight = do

  mv <- newEmptyMVar

  forkIO $ expensiveComputation_stricter mv

  someOtherActivity

  result <- takeMVar mv

  print result
It seems to do something and puts its result back into the MVar:
-- file: ch24/Expensive.hs

expensiveComputation mv = do

  let a = "this is "

      b = "not really "

      c = "all that expensive"

  putMVar mv (a ++ b ++ c)
When we take the result from the MVar in the parent thread and attempt to do something with it, our thread starts computing furiously, because we never forced the computation to actually occur in the other thread!
As usual, the solution is straightforward, once we know there’s a potential for a problem: we add strictness to the forked thread, in order to ensure that the computation occurs there. This strictness is best added in one place, in order to avoid the possibility that we might forget to add it:
-- file: ch24/ModifyMVarStrict.hs

{-# LANGUAGE BangPatterns #-}



import Control.Concurrent (MVar, putMVar, takeMVar)

import Control.Exception (block, catch, throw, unblock)

import Prelude hiding (catch) -- use Control.Exception's version



modifyMVar_strict :: MVar a -> (a -> IO a) -> IO ()

modifyMVar_strict m io = block $ do

  a <- takeMVar m

  !b <- unblock (io a) `catch` \e ->

        putMVar m a >> throw e

  putMVar m b
In the Hackage package database, you will find a library, , that provides strict versions of the MVar
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Shared-State Concurrency Is Still Hard
Inhaltsvorschau
Although Haskell has different primitives for sharing data between threads than other languages, it still suffers from the same fundamental problem: writing correct programs is fiendishly difficult. Indeed, several pitfalls of concurrent in other languages apply equally to Haskell. Two of the better-known problems are deadlock and starvation.
In a deadlock situation, two or more threads get stuck forever in a clash over access to shared resources. One classic way to make a multithreaded program deadlock is to forget the order in which we must acquire locks. This kind of bug is so common, it has a name: lock order inversion. While Haskell doesn’t provide locks, the MVar type is prone to the order inversion problem. Here’s a simple example:
-- file: ch24/LockHierarchy.hs

import Control.Concurrent



nestedModification outer inner = do

  modifyMVar_ outer $ \x -> do

    yield  -- force this thread to temporarily yield the CPU

    modifyMVar_ inner $ \y -> return (y + 1)

    return (x + 1)

  putStrLn "done"



main = do

  a <- newMVar 1

  b <- newMVar 2

  forkIO $ nestedModification a b

  forkIO $ nestedModification b a
If we run this in ghci, it will usually—but not always—print nothing, indicating that both threads have gotten stuck.
The problem with the nestedModification function is easy to spot. In the first thread, we take the MVar a, then b. In the second, we take b, then a. If the first thread succeeds in taking a and the second takes b, both threads will block; each tries to take an MVar that the other has already emptied, so neither can make progress.
Across languages, the usual way to solve an order inversion problem is to always follow a consistent order when acquiring resources. Since this approach requires manual adherence to a coding convention, it is easy to miss in practice.
To make matters more complicated, these kinds of inversion problems can be difficult to spot in real code. The taking of MVars is often spread across several functions in different files, making visual inspection more tricky. Worse, these problems are often
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using Multiple Cores with GHC
Inhaltsvorschau
By default, GHC generates programs that use just one core, even when we write explicitly concurrent code. To use multiple cores, we must explicitly choose to do so. We make this choice at link time, when we are generating an executable program:
  • The nonthreaded runtime library runs all Haskell threads in a single operating system thread. This runtime is highly efficient for creating threads and passing data around in MVars.
  • The threaded runtime library uses multiple operating system threads to run Haskell threads. It has somewhat more overhead for creating threads and using MVars.
If we pass the -threaded option to the compiler, it will link our program against the threaded runtime library. We do not need to use -threaded when we are compiling libraries or source files—only when we are finally generating an executable.
Even when we select the threaded runtime for our program, it will still default to using only one core when we run it. We must explicitly tell the runtime how many cores to use.
We can pass options to GHC’s runtime system on the command line of our program. Before handing control to our code, the runtime scans the program’s arguments for the special command-line option +RTS. It interprets everything that follows (until the special option -RTS) as an option for the runtime system, not our program. It hides all of these options from our code. When we use the module’s getArgs function to obtain our command-line arguments, we will not find any runtime options in the list.
The threaded runtime accepts an option -N. This takes one argument, which specifies the number of cores that GHC’s runtime system should use. The option parser is picky: there cannot be any spaces between -N and the number that follows it. The option -N4 is acceptable, but -N 4 is not.
The module exports a variable, numCapabilities, that tells us how many cores the runtime system has been given with the -N RTS option:
-- file: ch24/NumCapabilities.hs

import GHC.Conc (numCapabilities)

import System.Environment (getArgs)



main = do

  args <- getArgs

  putStrLn $ "command line arguments: " ++ show args

  putStrLn $ "number of cores: " ++ show numCapabilities
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parallel Programming in Haskell
Inhaltsvorschau
We will now switch our focus to parallel programming. For many computationally expensive problems, we could calculate a result more quickly if we could divide the solution and evaluate it on many cores at once. Computers with multiple cores are already ubiquitous, but few programs can take advantage of the computing power of even a modern laptop.
In large part, this is because parallel programming is traditionally seen as very difficult. In a typical programming language, we would use the same libraries and constructs that we apply to concurrent programs to develop a parallel program. This forces us to contend with the familiar problems of deadlocks, race conditions, starvation, and sheer complexity.
While we could certainly use Haskell’s concurrency features to develop parallel code, there is a much simpler approach available to us. We can take a normal Haskell function, apply a few simple transformations to it, and have it evaluated in parallel.
The familiar seq function evaluates an expression to what we call head normal form (HNF). It stops once it reaches the outermost constructor (the head). This is distinct from normal form (NF), in which an expression is completely evaluated.
You will also hear Haskell programmers refer to weak head normal form (WHNF). For normal data, weak head normal form is the same as head normal form. The difference arises only for functions and is too abstruse to concern us here.
Here is a normal Haskell function that sorts a list using a divide-and-conquer approach:
-- file: ch24/Sorting.hs

sort :: (Ord a) => [a] -> [a]

sort (x:xs) = lesser ++ x:greater

    where lesser  = sort [y | y <- xs, y <  x]

          greater = sort [y | y <- xs, y >= x]

sort _ = []
This function is inspired by the well-known Quicksort algorithm, and it is a classic among Haskell programmers. It is often presented as a one-liner early in a Haskell tutorial to tease the reader with an example of Haskell’s expressiveness. Here, we’ve split the code over a few lines, in order to make it easier to compare the serial and parallel versions.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parallel Strategies and MapReduce
Inhaltsvorschau
Within the programming community, one of the most famous software systems to credit functional programming for inspiration is Google’s MapReduce infrastructure for parallel processing of bulk data.
We can easily construct a greatly simplified, but still useful, Haskell equivalent. To focus our attention, we will look at processing web server logfiles, which tend to be both huge and plentiful.As an example, here is a log entry for a page visit recorded by the Apache Web Server. The entry originally filled one line—we split it across several lines to fit:
201.49.94.87 - - [08/Jun/2008:07:04:20 -0500] "GET / HTTP/1.1"

200 2097 "http://en.wikipedia.org/wiki/Mercurial_(software)"

"Mozilla/5.0 (Windows; U; Windows XP 5.1; en-GB; rv:1.8.1.12)

Gecko/20080201 Firefox/2.0.0.12" 0 hgbook.red-bean.com
While we could create a straightforward implementation without much effort, we will resist the temptation to dive in. If we think about solving a class of problems instead of a single one, we may end up with more widely applicable code.
When we develop a parallel program, we always face a few "bad penny" problems, which turn up regardless of the underlying programming language. A few are described here:
  • Our algorithm quickly becomes obscured by the details of partitioning and communication. This makes it difficult to understand code, which in turn makes modifying it risky.
  • Choosing a grain size—the smallest unit of work parceled out to a core—can be difficult. If the grain size is too small, cores spend so much of their time on book-keeping that a parallel program can easily become slower than a serial counterpart. If the grain size is too large, some cores may lie idle due to poor load balancing.
In parallel Haskell code, the clutter that would arise from communication code in a traditional language is replaced with the clutter of par and pseq annotations. As an example, this function operates similarly to map, but evaluates each element to WHNF in parallel as it goes:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 25: Profiling and Optimization
Inhaltsvorschau
Haskell is a high-level language. A really high-level language. We can spend our days programming entirely in abstractions, in monoids, functors, and hylomorphisms, far removed from any specific hardware model of computation. The language specification goes to great lengths to avoid prescribing any particular evaluation model. These layers of abstraction let us treat Haskell as a notation for computation itself, letting us concentrate on the essence of the problem without getting bogged down in low-level decisions. We get to program in pure thought.
However, this is a book about real-world programming, and in the real world, code runs on stock hardware with limited resources. Our programs will have time and space requirements that we may need to enforce. As such, we need a good knowledge of how our program data is represented, the precise consequences of using lazy or strict evaluation strategies, and techniques for analyzing and controlling space and time behavior.
In this chapter, we’ll look at typical space and time problems a Haskell programmer might encounter and how to methodically analyze, understand, and address them. To do this, we’ll use a range of techniques: time and space profiling, runtime statistics, and reasoning about strict and lazy evaluation. We’ll also look at the impact of compiler optimizations on performance and the use of advanced optimization techniques that become feasible in a purely functional language. So let’s begin with a challenge: squashing unexpected memory usage in some inocuous-looking code.
Let’s consider the following list manipulating program, which naively computes the mean of some large list of values. While only a program fragment (and we’ll stress that the particular algorithm we’re implementing is irrelevant here), it is representative of real code that we might find in any Haskell program: typically concise list manipulation code and heavy use of standard library functions. It also illustrates several common performance trouble spots that can catch the unwary:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Profiling Haskell Programs
Inhaltsvorschau
Let’s consider the following list manipulating program, which naively computes the mean of some large list of values. While only a program fragment (and we’ll stress that the particular algorithm we’re implementing is irrelevant here), it is representative of real code that we might find in any Haskell program: typically concise list manipulation code and heavy use of standard library functions. It also illustrates several common performance trouble spots that can catch the unwary:
-- file: ch25/A.hs

import System.Environment

import Text.Printf



main = do

    [d] <- map read `fmap` getArgs

    printf "%f\n" (mean [1..d])



mean :: [Double] -> Double

mean xs = sum xs / fromIntegral (length xs)
This program is very simple. We import functions for accessing the system’s environment (in particular, getArgs), and the Haskell version of printf, for formatted text output. The program then reads a numeric literal from the command line, using that to build a list of floating-point values, whose mean value we compute by dividing the list sum by its length. The result is printed as a string. Let’s compile this source to native code (with optimizations on) and run it with the command to see how it performs:
ghc --make -O2 A.hs

[1 of 1] Compiling Main             ( A.hs, A.o )

Linking A ...

time ./A 1e5

50000.5

./A 1e5  0.05s user 0.01s system 102% cpu 0.059 total

time ./A 1e6

500000.5

./A 1e6  0.26s user 0.04s system 99% cpu 0.298 total

time ./A 1e7

5000000.5

./A 1e7  63.80s user 0.62s system 99% cpu 1:04.53 total
It worked well for small numbers, but the program really started to struggle with a list size of 10 million. From this alone, we know something’s not quite right, but it’s unclear what resources are being used. Let’s investigate.
To get access to that kind of information, GHC lets us pass flags directly to the Haskell runtime, using the special and flags to delimit arguments reserved for the runtime system. The application itself won’t see those flags, as they’re immediately consumed by the Haskell runtime system.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Controlling Evaluation
Inhaltsvorschau
We have a number of options if we want to write our loop to traverse the list only once. For example, we can write the loop as a fold over the list or via explicit recursion on the list structure. Sticking to the high-level approaches, we’ll try a fold first:
-- file: ch25/B.hs

mean :: [Double] -> Double

mean xs = s / fromIntegral n

  where

    (n, s)     = foldl k (0, 0) xs

    k (n, s) x = (n+1, s+x)
Now, instead of taking the sum of the list and retaining the list until we can take its length, we left-fold over the list, accumulating the intermediate sum and length values in a pair (and we must left-fold, since a right-fold would take us to the end of the list and work backwards, which is exactly what we’re trying to avoid).
The body of our loop is the k function, which takes the intermediate loop state and the current element and returns a new state with the length increased by one and the sum increased by the current element. When we run this, however, we get a stack overflow:
ghc -O2 --make B.hs -fforce-recomptime ./B 1e6

Stack space overflow: current size 8388608 bytes.

Use `+RTS -Ksize' to increase it.

./B 1e6  0.44s user 0.10s system 96% cpu 0.565 total
We traded wasted heap for wasted stack! In fact, if we increase the stack size to the size of the heap in our previous implementation, using the runtime flag, the program runs to completion and has similar allocation figures:
ghc -O2 --make B.hs -prof -auto-all -caf-all -fforce-recomp

[1 of 1] Compiling Main             ( B.hs, B.o )

Linking B ...

time ./B 1e6 +RTS -i0.001 -hc -p -K100M

500000.5

./B 1e6 +RTS -i0.001 -hc -p -K100M  38.70s user 0.27s system 99% cpu 39.241 total
Generating the heap profile, we see all the allocation is now in . See .
Figure : Graph of stack usage. The curve is shaped like a hump, with mean representing 80%, and GHC.Real.CAF the other 20%.
The question is: why are we building up more and more allocated state, when all we are doing is folding over the list? This, it turns out, is a classic space leak due to excessive laziness.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Understanding Core
Inhaltsvorschau
Besides looking at runtime profiling data, one sure way to determine exactly what your program is doing is to look at the final program source after the compiler is done it, particularly in the case of Haskell compilers, which can perform very aggressive transformations on the code. GHC uses what is humorously referred to as “a simple functional language”—known as Core—as the compiler intermediate representation. It is essentially a subset of Haskell, augmented with unboxed data types (raw machine types, directly corresponding to primitive data types in languages such as C), suitable for code generation. GHC optimizes Haskell by transformation, repeatedly rewriting the source into more and more efficient forms. The Core representation is the final functional version of your program, before translation to low-level code. In other words, Core has the final say, and if all-out performance is your goal, it is worth understanding.
To view the Core version of our Haskell program, we compile with the flag, or use the tool, a third-party utility that lets us view Core in a pager. So let’s look at the representation of our final using strict data types, in Core form:
        

        ghc -O2 -ddump-simpl G.hs
A screenful of text is generated. If we look carefully at it, we’ll see a loop (here, cleaned up slightly for clarity):
lgo :: Integer -> [Double] -> Double# -> (# Integer, Double #)



lgo = \ n xs s ->

    case xs of

      []       -> (# n, D# s #);

      (:) x ys ->

        case plusInteger n 1 of

            n' -> case x of

                D# y -> lgo n' ys (+## s y)
This is the final version of our , and it tells us a lot about the next steps for optimization. The fold itself has been entirely inlined, yielding an explicit recursive loop over the list. The loop state, our strict pair, has disappeared entirely, and the function now takes its length and sum accumulators as direct arguments along with the list.
The sum of the list elements is represented with an unboxed value, a raw machine kept in a floating-point register. This is ideal, as there will be no memory traffic involved in keeping the sum on the heap. However, the length of the list—since we gave no explicit type annotation—has been inferred to be a heap-allocated , which requires a nonprimitive to perform addition. If it is algorithmically sound to use a instead, we can replace with it, via a type annotation, and GHC will then be able to use a raw machine for the length. We can hope for an improvement in time and space by ensuring that both loop components are unboxed and kept in registers.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Advanced Techniques: Fusion
Inhaltsvorschau
The final bottleneck in our program is the lazy list itself. While we can avoid allocating it all at once, there is still memory traffic each time around the loop, as we demand the next cons cell in the list, allocate it to the heap, operate on it, and continue. The list type is also polymorphic, so the elements of the list will be represented as heap-allocated values.
What we’d like to do is eliminate the list entirely, keeping just the next element we need in a register. Perhaps surprisingly, GHC is able to transform the list program into a listless version, using an optimization known as deforestation, which refers to a general class of optimizations that involve eliminating intermediate data structures. Due to the absence of side effects, a Haskell compiler can be extremely aggressive when rearranging code, reordering and transforming wholesale at times. The specific deforestation optimization we will use here is stream fusion.
This optimization transforms recursive list generation and transformation functions into nonrecursive . When an appears next to a , the structure them is then eliminated entirely, yielding a single, tight loop with no heap allocation. The optimization isn’t enabled by default, and it can radically change the complexity of a piece of code, but it is enabled by a number of data structure libraries, which provide rewrite rules, custom optimizations, that the compiler applies to functions that the library exports.
We’ll use the library, which provides a suite of list-like operations that use stream fusion to remove intermediate data structures. Rewriting our program to use streams is straightforward:
-- file: ch25/I.hs

import System.Environment

import Text.Printf

import Data.Array.Vector



main = do

    [d] <- map read `fmap` getArgs

    printf "%f\n" (mean (enumFromToFracU 1 d))



data Pair = Pair !Int !Double



mean :: UArr Double -> Double

mean xs = s / fromIntegral n

  where

    Pair n s       = foldlU k (Pair 0 0) xs

    k (Pair n s) x = Pair (n+1) (s+x)
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 26: Advanced Library Design:
Inhaltsvorschau
A Bloom filter is a set-like data structure that is highly efficient in its use of space. It supports two operations only: insertion and membership querying. Unlike a normal set data structure, a Bloom filter can give incorrect answers. If we query it to see whether an element that we have inserted is present, it will answer affirmatively. If we query for an element that we have not inserted, it might incorrectly claim that the element is present.
For many applications, a low rate of false positives is tolerable. For instance, the job of a network traffic shaper is to throttle bulk transfers (e.g., BitTorrent) so that interactive sessions (such as ssh sessions or games) see good response times. A traffic shaper might use a Bloom filter to determine whether a packet belonging to a particular session is bulk or interactive. If it misidentifies 1 in 10,000 bulk packets as interactive and fails to throttle it, nobody will notice.
The attraction of a Bloom filter is its space efficiency. If we want to build a spell checker and have a dictionary of 500,000 words, a set data structure might consume 20 of space. A Bloom filter, in contrast, would consume about half a megabyte, at the cost of missing perhaps 1% of misspelled words.
Behind the scenes, a Bloom filter is remarkably simple. It consists of a bit array and a handful of hash functions. We’ll use k for the number of hash functions. If we want to insert a value into the Bloom filter, we compute k hashes of the value and turn on those bits in the bit array. If we want to see whether a value is present, we compute k hashes and check all of those bits in the array to see if they are turned on.
To see how this works, let’s say we want to insert the strings and into a Bloom filter that is 8 bits wide, and we have two hash functions:
  1. Compute the two hashes of , and get the values and .
  2. Set bits and in the bit array.
  3. Compute the two hashes of , and get the values and .
  4. Set bits and in the bit array.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introducing the Bloom Filter
Inhaltsvorschau
A Bloom filter is a set-like data structure that is highly efficient in its use of space. It supports two operations only: insertion and membership querying. Unlike a normal set data structure, a Bloom filter can give incorrect answers. If we query it to see whether an element that we have inserted is present, it will answer affirmatively. If we query for an element that we have not inserted, it might incorrectly claim that the element is present.
For many applications, a low rate of false positives is tolerable. For instance, the job of a network traffic shaper is to throttle bulk transfers (e.g., BitTorrent) so that interactive sessions (such as ssh sessions or games) see good response times. A traffic shaper might use a Bloom filter to determine whether a packet belonging to a particular session is bulk or interactive. If it misidentifies 1 in 10,000 bulk packets as interactive and fails to throttle it, nobody will notice.
The attraction of a Bloom filter is its space efficiency. If we want to build a spell checker and have a dictionary of 500,000 words, a set data structure might consume 20 of space. A Bloom filter, in contrast, would consume about half a megabyte, at the cost of missing perhaps 1% of misspelled words.
Behind the scenes, a Bloom filter is remarkably simple. It consists of a bit array and a handful of hash functions. We’ll use k for the number of hash functions. If we want to insert a value into the Bloom filter, we compute k hashes of the value and turn on those bits in the bit array. If we want to see whether a value is present, we compute k hashes and check all of those bits in the array to see if they are turned on.
To see how this works, let’s say we want to insert the strings and into a Bloom filter that is 8 bits wide, and we have two hash functions:
  1. Compute the two hashes of , and get the values and .
  2. Set bits and in the bit array.
  3. Compute the two hashes of , and get the values and .
  4. Set bits and in the bit array.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Use Cases and Package Layout
Inhaltsvorschau
Not all users of Bloom filters have the same needs. In some cases, it suffices to create a Bloom filter in one pass, and only query it afterwards. For other applications, we may need to continue to update the Bloom filter after we create it. To accommodate these needs, we will design our library with mutable and immutable APIs.
We will segregate the mutable and immutable APIs that we publish by placing them in different modules: for the immutable code and for the mutable code.
In addition, we will create several "helper" modules that won’t provide parts of the public API but will keep the internal code cleaner.
Finally, we will ask our API’s users to provide a function that can generate a number of hashes of an element. This function will have the type a -> [Word32]. We will use all of the hashes that this function returns, so the list must not be infinite!
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Basic Design
Inhaltsvorschau
The data structure that we use for our Haskell Bloom filter is a direct translation of the simple description we gave earlier—a bit array and a function that computes hashes:
-- file: BloomFilter/Internal.hs

module BloomFilter.Internal

    (

      Bloom(..)

    , MutBloom(..)

    ) where



import Data.Array.ST (STUArray)

import Data.Array.Unboxed (UArray)

import Data.Word (Word32)



data Bloom a = B {

      blmHash  :: (a -> [Word32])

    , blmArray :: UArray Word32 Bool

    }
When we create our Cabal package, we will not be exporting this module. It exists purely to let us control the visibility of names. We will import into both the mutable and immutable modules, but we will re-export from each module only the type that is relevant to that module’s API.
Unlike other Haskell arrays, a UArray contains unboxed values.
For a normal Haskell type, a value can be either fully evaluated, an unevaluated thunk, or the special value ⊥, pronounced (and sometimes written) bottom. The value ⊥ is a placeholder for a computation that does not succeed. Such a computation could take any of several forms. It could be an infinite loop, an application of error, or the special value undefined.
A type that can contain ⊥ is referred to as lifted. All normal Haskell types are lifted. In practice, this means that we can always write or in place of a normal expression.
This ability to store thunks or ⊥ comes with a performance cost: it adds an extra layer of indirection. To see why we need this indirection, consider the Word32 type. A value of this type is a full 32 bits wide, so on a 32-bit system, there is no way to directly encode the value ⊥ within 32 bits. The runtime system has to maintain, and check, some extra data to track whether the value is ⊥ or not.
An unboxed value does away with this indirection. In doing so, it gains performance but sacrifices the ability to represent a thunk or ⊥. Since it can be denser than a normal Haskell array, an array of unboxed values is an excellent choice for numeric data and bits.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The ST Monad
Inhaltsvorschau
Back in , we mentioned that modifying an immutable array is prohibitively expensive, as it requires copying the entire array. Using a UArray does not change this, so what can we do to reduce the cost to bearable levels?
In an imperative language, we would simply modify the elements of the array in place—this will be our approach in Haskell, too.
Haskell provides a special monad, named ST, which lets us work safely with mutable state. Compared to the State monad, it has some powerful added capabilities:
  • We can thaw an immutable array to give a mutable array; modify the mutable array in place; and freeze a new immutable array when we are done.
  • We have the ability to use mutable references. This lets us implement data structures that we can modify after construction, as in an imperative language. This ability is vital for some imperative data structures and algorithms, for which similarly efficient, purely functional alternatives have not yet been discovered.
The IO monad also provides these capabilities. The major difference between the two is that the ST monad is intentionally designed so that we can escape from it back into pure Haskell code. We enter the ST monad via the execution function runST (in the same way as most other Haskell monads do—except IO, of course), and we escape by returning from runST.
When we apply a monad’s execution function, we expect it to behave repeatably: given the same body and arguments, we must get the same results every time. This also applies to runST. To achieve this repeatability, the ST monad is more restrictive than the IO monad. We cannot read or write files, create global variables, or fork threads. Indeed, although we can create and work with mutable references and arrays, the type system prevents them from escaping to the caller of runST. A mutable array must be frozen into an immutable array before we can return it, and a mutable reference cannot escape at all.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Designing an API for Qualified Import
Inhaltsvorschau
The public interfaces that we provide for working with Bloom filters are worth a little discussion:
-- file: BloomFilter/Mutable.hs

module BloomFilter.Mutable

    (

      MutBloom

    , elem

    , notElem

    , insert

    , length

    , new

    ) where



import Control.Monad (liftM)

import Control.Monad.ST (ST)

import Data.Array.MArray (getBounds, newArray, readArray, writeArray)

import Data.Word (Word32)

import Prelude hiding (elem, length, notElem)



import BloomFilter.Internal (MutBloom(..))
We export several names that clash with names the Prelude exports. This is deliberate: we expect users of our modules to import them with qualified names. This reduces the burden on the memory of our users, as they should already be familiar with the Prelude’s elem, notElem, and length functions.
When we use a module written in this style, we might often import it with a single-letter prefix—for instance, as . This would allow us to write M.length, which stays compact and readable.
Alternatively, we could import the module unqualified and import the Prelude while hiding the clashing names with . This is much less useful, as it gives a reader skimming the code no local cue that she is not actually seeing the Prelude’s length.
Of course, we seem to be violating this precept in our own module’s header: we import the Prelude and hide some of the names it exports. There is a practical reason for this. We define a function named length. If we export this from our module without first hiding the Prelude’s length, the compiler will complain that it cannot tell whether to export our version of length or the Prelude’s.
While we could export the fully qualified name BloomFilter.Mutable.length to eliminate the ambiguity, that seems uglier in this case. This decision has no consequences for someone using our module, just for ourselves as the authors of what ought to be a “black box,” so there is little chance of confusion here.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating a Mutable Bloom Filter
Inhaltsvorschau
We put type declaration for our mutable Bloom filter in the module, along with the immutable Bloom type:
-- file: BloomFilter/Internal.hs

data MutBloom s a = MB {

      mutHash :: (a -> [Word32])

    , mutArray :: STUArray s Word32 Bool

    }
The STUArray type gives us a mutable unboxed array that we can work with in the ST monad. To create an STUArray, we use the newArray function. The new function belongs in the function:
-- file: BloomFilter/Mutable.hs

new :: (a -> [Word32]) -> Word32 -> ST s (MutBloom s a)

new hash numBits = MB hash `liftM` newArray (0,numBits-1) False
Most of the methods of STUArray are actually implementations of the MArray typeclass, which is defined in the module.
Our length function is slightly complicated by two factors. We are relying on our bit array’s record of its own bounds, and an MArray instance’s getBounds function has a monadic type. We also have to add one to the answer, as the upper bound of the array is one less than its actual length:
-- file: BloomFilter/Mutable.hs

length :: MutBloom s a -> ST s Word32

length filt = (succ . snd) `liftM` getBounds (mutArray filt)
To add an element to the Bloom filter, we set all of the bits indicated by the hash function. We use the mod function to ensure that all of the hashes stay within the bounds of our array, and isolate our code that computes offsets into the bit array in one function:
-- file: BloomFilter/Mutable.hs

insert :: MutBloom s a -> a -> ST s ()

insert filt elt = indices filt elt >>=

                  mapM_ (\bit -> writeArray (mutArray filt) bit True)



indices :: MutBloom s a -> a -> ST s [Word32]

indices filt elt = do

  modulus <- length filt

  return $ map (`mod` modulus) (mutHash filt elt)
Testing for membership is no more difficult. If every bit indicated by the hash function is set, we consider an element to be present in the Bloom filter:
-- file: BloomFilter/Mutable.hs

elem, notElem :: a -> MutBloom s a -> ST s Bool



elem elt filt = indices filt elt >>=

                allM (readArray (mutArray filt))



notElem elt filt = not `liftM` elem elt filt
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Immutable API
Inhaltsvorschau
Our interface to the immutable Bloom filter has the same structure as the mutable API:
-- file: ch26/BloomFilter.hs

module BloomFilter

    (

      Bloom

    , length

    , elem

    , notElem

    , fromList

    ) where



import BloomFilter.Internal

import BloomFilter.Mutable (insert, new)

import Data.Array.ST (runSTUArray)

import Data.Array.IArray ((!), bounds)

import Data.Word (Word32)

import Prelude hiding (elem, length, notElem)



length :: Bloom a -> Int

length = fromIntegral . len



len :: Bloom a -> Word32

len = succ . snd . bounds . blmArray



elem :: a -> Bloom a -> Bool

elt `elem` filt   = all test (blmHash filt elt)

  where test hash = blmArray filt ! (hash `mod` len filt)



notElem :: a -> Bloom a -> Bool

elt `notElem` filt = not (elt `elem` filt)
We provide an easy-to-use means to create an immutable Bloom filter, via a fromList function. This hides the ST monad from our users so that they see only the immutable type:
-- file: ch26/BloomFilter.hs

fromList :: (a -> [Word32])    -- family of hash functions to use

         -> Word32             -- number of bits in filter

         -> [a]                -- values to populate with

         -> Bloom a

fromList hash numBits values =

    B hash . runSTUArray $

      do mb <- new hash numBits

         mapM_ (insert mb) values

         return (mutArray mb)
The key to this function is runSTUArray. We mentioned earlier that in order to return an immutable array from the ST monad, we must freeze a mutable array. The runSTUArray function combines execution with freezing. Given an action that returns an STUArray, it executes the action using runST; freezes the STUArray that it returns; and returns that as a UArray.
The typeclass provides a freeze function that we could use instead, but runSTUArray is both more convenient and more efficient. The efficiency lies in the fact that freeze must copy the underlying data from the STUArray to the new UArray, in order to ensure that subsequent modifications of the STUArray cannot affect the contents of the UArray. Thanks to the type system,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating a Friendly Interface
Inhaltsvorschau
Although our immutable Bloom filter API is straightforward to use once we have created a Bloom value, the fromList function leaves some important decisions unresolved. We still have to choose a function that can generate many hash values and determine what the capacity of a Bloom filter should be:
-- file: BloomFilter/Easy.hs

easyList :: (Hashable a)

         => Double        -- false positive rate (between 0 and 1)

         -> [a]           -- values to populate the filter with

         -> Either String (B.Bloom a)
Here is a possible "friendlier" way to create a Bloom filter. It leaves responsibility for hashing values in the hands of a typeclass, Hashable. It lets us configure the Bloom filter based on a parameter that is easier to understand—namely the rate of false positives that we are willing to tolerate. And it chooses the size of the filter for us, based on the desired false positive rate and the number of elements in the input list.
This function will, of course, not always be usable—for example, it will fail if the length of the input list is too long. However, its simplicity rounds out the other interfaces we provide. It lets us offer our users a range of control over creation, from entirely imperative to completely declarative.
In the export list for our module, we re-export some names from the base module. This allows casual users to import only the module and have access to all of the types and functions they are likely to need.
If we import both and , you might wonder what will happen if we try to use a name exported by both. We already know that if we import unqualified and try to use length, GHC will issue an error about ambiguity, because the Prelude also makes the name length available.
The Haskell standard requires an implementation to be able to tell when several names refer to the same “thing.” For instance, the Bloom type is exported by and . If we import both modules and try to use Bloom, GHC will be able to see that the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating a Cabal Package
Inhaltsvorschau
We have created a moderately complicated library, with four public modules and one internal module. To turn this into a package that we can easily redistribute, we create a rwh-bloomfilter.cabal file.
Cabal allows us to describe several libraries in a single package. A .cabal file begins with information that is common to all of the libraries, which is followed by a distinct section for each library:
Name:               rwh-bloomfilter

Version:            0.1

License:            BSD3

License-File:       License.txt

Category:           Data

Stability:          experimental

Build-Type:         Simple
As we are bundling some C code with our library, we tell Cabal about our C source files:
Extra-Source-Files: cbits/lookup3.c cbits/lookup3.h
The directive has no effect on a build: it directs Cabal to bundle some extra files if we run runhaskell Setup sdist to create a source tarball for .
When reading a property (the text before a ":" character), Cabal ignores case, so it treats and the same.
Prior to 2007, the standard Haskell libraries were organized in a handful of large packages, of which the biggest was named . This organization tied many unrelated libraries together, so the Haskell community split the package up into a number of more modular libraries. For instance, the array types migrated from into a package named .
A Cabal package needs to specify the other packages that it needs to have present in order to build. This makes it possible for Cabal’s command-line interface to automatically download and build a package’s dependencies, if necessary. We would like our code to work with as many versions of GHC as possible, regardless of whether they have the modern layout of and numerous other packages. We thus need to be able to specify that we depend on the package if it is present, and alone otherwise.
Cabal provides a generic configurations feature, which we can use to selectively enable parts of a .cabal file. A build configuration is controlled by a Boolean-valued
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Testing with QuickCheck
Inhaltsvorschau
Before we pay any attention to performance, we want to establish that our Bloom filter behaves correctly. We can easily use QuickCheck to test some basic properties:
-- file: examples/BloomCheck.hs

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

module Main where



import BloomFilter.Hash (Hashable)

import Data.Word (Word8, Word32)

import System.Random (Random(..), RandomGen)

import Test.QuickCheck

import qualified BloomFilter.Easy as B

import qualified Data.ByteString as Strict

import qualified Data.ByteString.Lazy as Lazy
We will not use the normal quickCheck function to test our properties, as the 100 test inputs that it generates do not provide much coverage:
-- file: examples/BloomCheck.hs

handyCheck :: Testable a => Int -> a -> IO ()

handyCheck limit = check defaultConfig {

                     configMaxTest = limit

                   , configEvery   = \_ _ -> ""

                   }
Our first task is to ensure that if we add a value to a Bloom filter, a subsequent test will always report it as present, regardless of the chosen false positive rate or input value.
We will use the easyList function to create a Bloom filter. The Random instance for Double generates numbers in the range zero to one, so QuickCheck can nearly supply us with arbitrary false positive rates.
However, we need to ensure that both zero and one are excluded from the false positives we test with. QuickCheck gives us two ways to do this:
Construction
We specify the range of valid values to generate. QuickCheck provides a forAll combinator for this purpose.
Elimination
When QuickCheck generates an arbitrary value for us, we filter out those that do not fit our criteria, using the (==>) operator. If we reject a value in this way, a test will appear to succeed.
If we can choose either method, it is always preferable to take the constructive approach. To see why, suppose that QuickCheck generates 1,000 arbitrary values for us, and we filter out 800 as unsuitable for some reason. We will appear to run 1,000 tests, but only 200 will actually do anything useful.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Performance Analysis and Tuning
Inhaltsvorschau
We now have a correctness base line: our QuickCheck tests pass. When we start tweaking performance, we can rerun the tests at any time to ensure that we haven’t inadvertently broken anything.
Our first step is to write a small test application that we can use for timing:
-- file: examples/WordTest.hs

module Main where



import Control.Parallel.Strategies (NFData(..))

import Control.Monad (forM_, mapM_)

import qualified BloomFilter.Easy as B

import qualified Data.ByteString.Char8 as BS

import Data.Time.Clock (diffUTCTime, getCurrentTime)

import System.Environment (getArgs)

import System.Exit (exitFailure)



timed :: (NFData a) => String -> IO a -> IO a

timed desc act = do

    start <- getCurrentTime

    ret <- act

    end <- rnf ret `seq` getCurrentTime

    putStrLn $ show (diffUTCTime end start) ++ " to " ++ desc

    return ret



instance NFData BS.ByteString where

    rnf _ = ()



instance NFData (B.Bloom a) where

    rnf filt = B.length filt `seq` ()
We borrow the rnf function that we introduced in to develop a simple timing harness. Out timed action ensures that a value is evaluated to normal form in order to accurately capture the cost of evaluating it.
The application creates a Bloom filter from the contents of a file, treating each line as an element to add to the filter:
-- file: examples/WordTest.hs

main = do

  args <- getArgs

  let files | null args = ["/usr/share/dict/words"]

            | otherwise = args

  forM_ files $ \file -> do



    words <- timed "read words" $

      BS.lines `fmap` BS.readFile file



    let len = length words

        errRate = 0.01



    putStrLn $ show len ++ " words"

    putStrLn $ "suggested sizings: " ++

               show (B.suggestSizing (fromIntegral len) errRate)



    filt <- timed "construct filter" $

      case B.easyList errRate words of

        Left errmsg -> do

          putStrLn $ "Error: " ++ errmsg

          exitFailure

        Right filt -> return filt



    timed "query every element" $

      mapM_ print $ filter (not . (`B.elem` filt)) words
We use timed to account for the costs of three distinct phases: reading and splitting the data into lines; populating the Bloom filter; and querying every element in it.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 27: Sockets and Syslog
Inhaltsvorschau
In several earlier chapters of this book, we discussed services that operate over a network. Two examples are client/server databases and web services. When the need arises to devise a new protocol or to communicate with a protocol that doesn’t have an existing helper library in Haskell, you’ll need to use the lower-level networking tools in the Haskell library.
In this chapter, we will discuss these lower-level tools. Network communication is a broad topic with entire books devoted to it. We will show you how to use Haskell to apply the low-level network knowledge you already have.
Haskell’s networking functions almost always correspond directly to familiar C function calls. As most other languages also layer on top of C, you should find this interface familiar.
UDP breaks data down into packets. It does not ensure that the data reaches its destination or it reaches it only once. It does use checksumming to ensure that packets that arrive have not been corrupted. UDP tends to be used in applications that are - or latency-sensitive, in which each individual packet of data is less important than the overall performance of the system. It may also be used where the TCP behavior isn’t the most efficient, such as ones that send short, discrete messages. Examples of systems that tend to use UDP include audio and video conferencing, time synchronization, network-based filesystems, and logging systems.
The traditional Unix syslog service allows programs to send log messages over a network to a central server that records them. Some programs are quite performance-sensitive and may generate a large volume of messages. In these programs, it could be more important to have the logging impose a minimal performance overhead than to guarantee every message is logged. Moreover, it may be desirable to continue program operation even if the logging server is unreachable. For this reason, UDP is one of the protocols syslog supports for the transmission of log messages. The protocol is simple; we present a Haskell implementation of a client here:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Basic Networking
Inhaltsvorschau
In several earlier chapters of this book, we discussed services that operate over a network. Two examples are client/server databases and web services. When the need arises to devise a new protocol or to communicate with a protocol that doesn’t have an existing helper library in Haskell, you’ll need to use the lower-level networking tools in the Haskell library.
In this chapter, we will discuss these lower-level tools. Network communication is a broad topic with entire books devoted to it. We will show you how to use Haskell to apply the low-level network knowledge you already have.
Haskell’s networking functions almost always correspond directly to familiar C function calls. As most other languages also layer on top of C, you should find this interface familiar.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Communicating with UDP
Inhaltsvorschau
UDP breaks data down into packets. It does not ensure that the data reaches its destination or it reaches it only once. It does use checksumming to ensure that packets that arrive have not been corrupted. UDP tends to be used in applications that are - or latency-sensitive, in which each individual packet of data is less important than the overall performance of the system. It may also be used where the TCP behavior isn’t the most efficient, such as ones that send short, discrete messages. Examples of systems that tend to use UDP include audio and video conferencing, time synchronization, network-based filesystems, and logging systems.
The traditional Unix syslog service allows programs to send log messages over a network to a central server that records them. Some programs are quite performance-sensitive and may generate a large volume of messages. In these programs, it could be more important to have the logging impose a minimal performance overhead than to guarantee every message is logged. Moreover, it may be desirable to continue program operation even if the logging server is unreachable. For this reason, UDP is one of the protocols syslog supports for the transmission of log messages. The protocol is simple; we present a Haskell implementation of a client here:
-- file: ch27/syslogclient.hs

import Data.Bits

import Network.Socket

import Network.BSD

import Data.List

import SyslogTypes



data SyslogHandle = 

    SyslogHandle {slSocket :: Socket,

                  slProgram :: String,

                  slAddress :: SockAddr}



openlog :: HostName             -- ^ Remote hostname, or localhost

        -> String               -- ^ Port number or name; 514 is default

        -> String               -- ^ Name to log under

        -> IO SyslogHandle      -- ^ Handle to use for logging

openlog hostname port progname =

    do -- Look up the hostname and port.  Either raises an exception

       -- or returns a nonempty list.  First element in that list

       -- is supposed to be the best option.

       addrinfos <- getAddrInfo Nothing (Just hostname) (Just port)

       let serveraddr = head addrinfos



       -- Establish a socket for communication

       sock <- socket (addrFamily serveraddr) Datagram defaultProtocol



       -- Save off the socket, program name, and server address in a handle

       return $ SyslogHandle sock progname (addrAddress serveraddr)



syslog :: SyslogHandle -> Facility -> Priority -> String -> IO ()

syslog syslogh fac pri msg =

    sendstr sendmsg

    where code = makeCode fac pri

          sendmsg = "<" ++ show code ++ ">" ++ (slProgram syslogh) ++

                    ": " ++ msg



          -- Send until everything is done

          sendstr :: String -> IO ()

          sendstr [] = return ()

          sendstr omsg = do sent <- sendTo (slSocket syslogh) omsg

                                    (slAddress syslogh)

                            sendstr (genericDrop sent omsg)

          

closelog :: SyslogHandle -> IO ()

closelog syslogh = sClose (slSocket syslogh)



{- | Convert a facility and a priority into a syslog code -}

makeCode :: Facility -> Priority -> Int

makeCode fac pri =

    let faccode = codeOfFac fac

        pricode = fromEnum pri 

        in

          (faccode `shiftL` 3) .|. pricode
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Communicating with TCP
Inhaltsvorschau
TCP is designed to make data transfer over the Internet as reliable as possible. TCP traffic is a stream of data. While this stream gets broken up into individual packets by the operating system, the packet boundaries are neither known nor relevant to applications. TCP guarantees that, if traffic is delivered to the application at all, it arrives intact, unmodified, exactly once, and in order. Obviously, things such as a broken wire can cause traffic to not be delivered, and no protocol can overcome those limitations.
This brings with it some trade-offs compared with UDP. First of all, there are a few packets that must be sent at the start of the TCP conversation to establish the link. For very short conversations, then, UDP would have a performance advantage. Also, TCP tries very hard to get data through. If one end of a conversation tries to send data to the remote but doesn’t receive an acknowledgment back, it will periodically retransmit the data for some time before giving up. This makes TCP robust in the face of dropped packets. However, it also means that TCP is not the best choice for real-time protocols that involve things such as live audio or video.
With TCP, connections are stateful. That means that there is a dedicated logical “channel” between a client and server, rather than just one-off packets as with UDP. This makes things easy for client developers. Server applications almost always will want to be able to handle more than one TCP connection at once. How then to do this?
On the server side, you will first create a socket and bind to a port, just like with UDP. Instead of repeatedly listening for data from any location, your main loop will be around the accept call. Each time a client connects, the server’s operating system allocates a new socket for it. So we have the master socket, used only to listen for incoming connections, and never to transmit data. We also have the potential for multiple child sockets to be used at once, each corresponding to a logical TCP conversation.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 28: Software Transactional Memory
Inhaltsvorschau
In the traditional threaded model of concurrent programming, when we share data among threads, we keep it consistent using locks, and we notify threads of changes using condition variables. Haskell’s MVar mechanism improves somewhat upon these tools, but it still suffers from all of the same problems:
  • Race conditions due to forgotten locks
  • Deadlocks resulting from inconsistent lock ordering
  • Corruption caused by uncaught exceptions
  • Lost wakeups induced by omitted notifications
These problems frequently affect even the smallest concurrent programs, but the difficulties they pose become far worse in larger code bases or under heavy load.
For instance, a program with a few big locks is somewhat tractable to write and debug, but contention for those locks will clobber us under heavy load. If we react with finer-grained locking, it becomes far harder to keep our software working at all. The additional bookkeeping will hurt performance even when loads are light.
Software transactional memory (STM) gives us a few simple, but powerful, tools with which we can address most of these problems. We execute a block of actions as a transaction using the atomically combinator. Once we enter the block, other threads cannot see any modifications we make until we exit, nor can our thread see any changes made by other threads. These two properties mean that our execution is isolated.
Upon exit from a transaction, exactly one of the following things will occur:
  • If no other thread concurrently modifies the same data as us, all of our modifications will simultaneously become visible to other threads.
  • Otherwise, our modifications are discarded without being performed, and our block of actions is automatically restarted.
This all-or-nothing nature of an atomically block is referred to as atomic, hence the name of the combinator. If you have used databases that support transactions, you should find that working with STM feels quite familiar.
In a multiplayer role playing game, a player’s character will have some state such as health, possessions, and money. To explore the world of STM, let’s start with a few simple functions and types based around working with some character state for a game. We will refine our code as we learn more about the API.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Basics
Inhaltsvorschau
Software transactional memory (STM) gives us a few simple, but powerful, tools with which we can address most of these problems. We execute a block of actions as a transaction using the atomically combinator. Once we enter the block, other threads cannot see any modifications we make until we exit, nor can our thread see any changes made by other threads. These two properties mean that our execution is isolated.
Upon exit from a transaction, exactly one of the following things will occur:
  • If no other thread concurrently modifies the same data as us, all of our modifications will simultaneously become visible to other threads.
  • Otherwise, our modifications are discarded without being performed, and our block of actions is automatically restarted.
This all-or-nothing nature of an atomically block is referred to as atomic, hence the name of the combinator. If you have used databases that support transactions, you should find that working with STM feels quite familiar.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Some Simple Examples
Inhaltsvorschau
In a multiplayer role playing game, a player’s character will have some state such as health, possessions, and money. To explore the world of STM, let’s start with a few simple functions and types based around working with some character state for a game. We will refine our code as we learn more about the API.
The STM API is provided by the package, and its modules are in the hierarchy:
-- file: ch28/GameInventory.hs

{-# LANGUAGE GeneralizedNewtypeDeriving #-}



import Control.Concurrent.STM

import Control.Monad



data Item = Scroll

          | Wand

          | Banjo

            deriving (Eq, Ord, Show)



newtype Gold = Gold Int

    deriving (Eq, Ord, Show, Num)



newtype HitPoint = HitPoint Int

    deriving (Eq, Ord, Show, Num)



type Inventory = TVar [Item]

type Health = TVar HitPoint

type Balance = TVar Gold



data Player = Player {

      balance :: Balance,

      health :: Health,

      inventory :: Inventory

    }
The TVar parameterized type is a mutable variable that we can read or write inside an atomically block. For simplicity, we represent a player’s inventory as a list of items. Notice, too, that we use declarations so that we cannot accidentally confuse wealth with health.
To perform a basic transfer of money from one Balance to another, all we have to do is adjust the values in each TVar:
-- file: ch28/GameInventory.hs

basicTransfer qty fromBal toBal = do

  fromQty <- readTVar fromBal

  toQty   <- readTVar toBal

  writeTVar fromBal (fromQty - qty)

  writeTVar toBal   (toQty + qty)
Let’s write a small function to try this out:
-- file: ch28/GameInventory.hs

transferTest = do

  alice <- newTVar (12 :: Gold)

  bob   <- newTVar 4

  basicTransfer 3 alice bob

  liftM2 (,) (readTVar alice) (readTVar bob)
If we run this in ghci, it behaves as we should expect:
:load GameInventory

[1 of 1] Compiling Main             ( GameInventory.hs, interpreted )

Ok, modules loaded: Main.

atomically transferTest

Loading package array-0.1.0.0 ... linking ... done.

Loading package stm-2.1.1.1 ... linking ... done.

(Gold 9,Gold 7)
The properties of atomicity and isolation guarantee that if another thread sees a change in ’s balance, they will also be able to see the modification of ’s balance.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
STM and Safety
Inhaltsvorschau
If we are to provide atomic, isolated transactions, it is critical that we cannot either deliberately or accidentally escape from an atomically block. Haskell’s type system enforces this on our behalf, via the STM monad:
:type atomically

atomically :: STM a -> IO a

The atomically block takes an action in the STM monad, executes it, and makes its result available to us in the IO monad. This is the monad in which all transactional code executes. For instance, the functions that we have seen for manipulating TVar values operate in the STM monad:
:type newTVar

newTVar :: a -> STM (TVar a)

:type readTVar

readTVar :: TVar a -> STM a

:type writeTVar

writeTVar :: TVar a -> a -> STM ()
This is also true of the transactional functions we defined earlier:
-- file: ch28/GameInventory.hs

basicTransfer :: Gold -> Balance -> Balance -> STM ()

maybeGiveItem :: Item -> Inventory -> Inventory -> STM Bool
The STM monad does not let us perform I/O or manipulate nontransactional mutable state, such as MVar values. This lets us avoid operations that might violate the transactional guarantees.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Retrying a Transaction
Inhaltsvorschau
The API of our maybeGiveItem function is somewhat awkward. It gives an item only if the character actually possesses it, which is reasonable, but by returning a Bool, it complicates the code of its callers. Here is an item sale function that has to look at the result of maybeGiveItem to decide what to do next:
-- file: ch28/GameInventory.hs

maybeSellItem :: Item -> Gold -> Player -> Player -> STM Bool

maybeSellItem item price buyer seller = do

  given <- maybeGiveItem item (inventory seller) (inventory buyer)

  if given

    then do

      basicTransfer price (balance buyer) (balance seller)

      return True

    else return False
Not only do we have to check whether the item was given, we have to propagate an indication of success back to our caller. The complexity thus cascades outwards.
There is a more elegant way to handle transactions that cannot succeed. The STM API provides a retry action that will immediately terminate an atomically block that cannot proceed. As the name suggests, when this occurs, execution of the block is restarted from scratch, with any previous modifications unperformed. Here is a rewrite of maybeGiveItem to use retry:
-- file: ch28/GameInventory.hs

giveItem :: Item -> Inventory -> Inventory -> STM ()



giveItem item fromInv toInv = do

  fromList <- readTVar fromInv

  case removeInv item fromList of

    Nothing -> retry

    Just newList -> do

      writeTVar fromInv newList

      readTVar toInv >>= writeTVar toInv . (item :)
Our basicTransfer from earlier had a different kind of flaw: it did not check the sender’s balance to see if she had sufficient money to transfer. We can use retry to correct this, while keeping the function’s type the same:
-- file: ch28/GameInventory.hs

transfer :: Gold -> Balance -> Balance -> STM ()



transfer qty fromBal toBal = do

  fromQty <- readTVar fromBal

  when (qty > fromQty) $

    retry

  writeTVar fromBal (fromQty - qty)

  readTVar toBal >>= writeTVar toBal . (qty +)
Now that we are using retry, our item sale function becomes dramatically simpler:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Choosing Between Alternatives
Inhaltsvorschau
We don’t always want to restart an atomically action if it calls retry or fails due to concurrent modification by another thread. For instance, our new sellItem function will retry indefinitely as long as we are missing either the item or enough money, but we might prefer to just try the sale once.
The orElse combinator lets us perform a "backup" action if the main one fails:
:type orElse

orElse :: STM a -> STM a -> STM a

If sellItem fails, orElse will invoke the action, causing our sale function to return immediately.
Imagine that we’d like to be a little more ambitious and buy the first item from a list that is both in the possession of the seller and affordable to us, but it does nothing if we cannot afford something right now. We could, of course, write code to do this in a direct manner:
-- file: ch28/GameInventory.hs

crummyList :: [(Item, Gold)] -> Player -> Player

             -> STM (Maybe (Item, Gold))

crummyList list buyer seller = go list

    where go []                         = return Nothing

          go (this@(item,price) : rest) = do

              sellItem item price buyer seller

              return (Just this)

           `orElse`

              go rest
This function suffers from the familiar problem of muddling together what we want to do with how we ought to do it. A little inspection suggests that there are two reusable patterns buried in this code.
The first of these is to make a transaction fail immediately instead of retrying:
-- file: ch28/GameInventory.hs

maybeSTM :: STM a -> STM (Maybe a)

maybeSTM m = (Just `liftM` m) `orElse` return Nothing
Second, we want to try an action over successive elements of a list, stopping at the first that succeeds or performing a retry if every one fails. Conveniently for us, STM is an instance of the MonadPlus typeclass:
-- file: ch28/STMPlus.hs

instance MonadPlus STM where

  mzero = retry

  mplus = orElse
The module defines the msum function as follows, which is exactly what we need:
-- file: ch28/STMPlus.hs

msum :: MonadPlus m => [m a] -> m a

msum =  foldr mplus mzero
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
I/O and STM
Inhaltsvorschau
The STM monad forbids us from performing arbitrary I/O actions, because they can break the guarantees of atomicity and isolation that the monad provides. Of course, the need to perform I/O still arises—we just have to treat it very carefully.
Most often, we will need to perform some I/O action as a result of a decision we made inside an atomically block. In these cases, the right thing to do is usually to return a piece of data from atomically, which will tell the caller in the IO monad what to do next. We can even return the action to perform, since actions are first-class values:
-- file: ch28/STMIO.hs

someAction :: IO a



stmTransaction :: STM (IO a)

stmTransaction = return someAction



doSomething :: IO a

doSomething = join (atomically stmTransaction)
We occasionally need to perform an I/O operation from within STM. For instance, reading immutable data from a file that must exist does not violate the STM guarantees of isolation or atomicity. In these cases, we can use unsafeIOToSTM to execute an IO action. This function is exported by the low-level module, so we must go out of our way to use it:
:m +GHC.Conc:type unsafeIOToSTM

unsafeIOToSTM :: IO a -> STM a
The IO action that we execute must not start another atomically transaction. If a thread tries to nest transactions, the runtime system will throw an exception.
Since the type system can’t help us to ensure that our IO code is doing something sensible, we will be safest if we limit our use of unsafeIOToSTM as much as possible. Here is a typical error that can arise with IO in an atomically block:
-- file: ch28/STMIO.hs

launchTorpedoes :: IO ()



notActuallyAtomic = do

  doStuff

  unsafeIOToSTM launchTorpedoes

  mightRetry
If the mightRetry block causes our transaction to restart, we will call launchTorpedoes more than once. Indeed, we can’t predict how many times it will be called, since the runtime system handles retries for us. The solution is not to perform these kinds of nonidempotent I/O operations inside a transaction.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Communication Between Threads
Inhaltsvorschau
As well as the basic TVar type, the package provides two types that are more useful for communicating between threads. A TMVar is the STM equivalent of an MVar: it can hold either a value or . The TChan type is the STM counterpart of Chan, and it implements a typed FIFO channel.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
A Concurrent Web Link Checker
Inhaltsvorschau
As a practical example of using STM, we will develop a program that checks an HTML file for broken links—that is, URLs that either point to bad web pages or dead servers. This is a good problem to address via concurrency: if we try to talk to a dead server, it will take up to two minutes before our connection attempt times out. If we use multiple threads, we can still get useful work done while one or two are stuck talking to slow or dead servers.
We can’t simply create one thread per URL, because that may overburden either our CPU or our network connection if (as we expect) most of the links are live and responsive. Instead, we use a fixed number of worker threads, which fetch URLs to download from a queue:
-- file: ch28/Check.hs

{-# LANGUAGE FlexibleContexts, GeneralizedNewtypeDeriving,

             PatternGuards #-}



import Control.Concurrent (forkIO)

import Control.Concurrent.STM

import Control.Exception (catch, finally)

import Control.Monad.Error

import Control.Monad.State

import Data.Char (isControl)

import Data.List (nub)

import Network.URI

import Prelude hiding (catch)

import System.Console.GetOpt

import System.Environment (getArgs)

import System.Exit (ExitCode(..), exitWith)

import System.IO (hFlush, hPutStrLn, stderr, stdout)

import Text.Printf (printf)

import qualified Data.ByteString.Lazy.Char8 as B

import qualified Data.Set as S



-- This requires the HTTP package, which is not bundled with GHC

import Network.HTTP



type URL = B.ByteString



data Task = Check URL | Done
Our main function provides the top-level scaffolding for our program:
-- file: ch28/Check.hs

main :: IO ()

main = do

    (files,k) <- parseArgs

    let n = length files



    -- count of broken links

    badCount <- newTVarIO (0 :: Int)



    -- for reporting broken links

    badLinks <- newTChanIO



    -- for sending jobs to workers

    jobs <- newTChanIO



    -- the number of workers currently running

    workers <- newTVarIO k



    -- one thread reports bad links to stdout

    forkIO $ writeBadLinks badLinks



    -- start worker threads

    forkTimes k workers (worker badLinks jobs badCount)



    -- read links from files, and enqueue them as jobs

    stats <- execJob (mapM_ checkURLs files)

                     (JobState S.empty 0 jobs)



    -- enqueue "please finish" messages

    atomically $ replicateM_ k (writeTChan jobs Done)



    waitFor workers



    broken <- atomically $ readTVar badCount



    printf fmt broken

               (linksFound stats)

               (S.size (linksSeen stats))

               n

  where

    fmt   = "Found %d broken links. " ++

            "Checked %d links (%d unique) in %d files.\n"
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Practical Aspects of STM
Inhaltsvorschau
We have so far been quiet about the specific benefits that STM gives us. Most obvious is how well it composes—to add code to a transaction, we just use our usual monadic building blocks, (>>=) and (>>).
The notion of composability is critical to building modular software. If we take two pieces of code that work correctly individually, the composition of the two should also be correct. While normal threaded programming makes composability impossible, STM restores it as a key assumption that we can rely upon.
The STM monad prevents us from accidentally performing nontransactional I/O actions. We don’t need to worry about lock ordering, since our code contains no locks. We can forget about lost wakeups, since we don’t have condition variables. If an exception is thrown, we can either catch it using catchSTM or be bounced out of our transaction, leaving our state untouched. Finally, the retry and orElse functions give us some beautiful ways to structure our code.
Code that uses STM will not deadlock, but it is possible for threads to starve each other to some degree. A long-running transaction can cause another transaction to retry often enough that it will make comparatively little progress. To address a problem such as this, make your transactions as short as you can, while keeping your data consistent.
Whether with concurrency or memory management, there will be times when we must retain control: some software must make solid guarantees about latency or memory footprint, so we will be forced to spend the extra time and effort managing and debugging explicit code. For many interesting, practical uses of software, garbage collection and STM will do more than well enough.
STM is not a complete panacea. It is useful to compare it with the use of garbage collection for memory management. When we abandon explicit memory management in favor of garbage collection, we give up control in return for safer code. Likewise, with STM, we abandon the low-level details in exchange for code that we can better hope to .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Appendix : Installing GHC and Haskell Libraries
Inhaltsvorschau
The instructions in this appendix are based on our experience installing GHC and other software in late 2008. Installation instructions inevitably become dated quickly; please bear this in mind as you read.

Installing GHC

Installing Haskell Software

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Installing GHC
Inhaltsvorschau
Because GHC runs on a large number of platforms, we focus on a handful of the most popular.
The prebuilt binary packages of GHC should work on Windows Vista and XP (even Windows 2000). We have installed GHC 6.8.3 under Windows XP Service Pack 2; the following paragraphs detail the steps we followed.
On Windows, GHC requires about 400 MB of disk space. The exact amount will vary from release to release.
Our first step is to visit the GHC at http://www.haskell.org/ghcdownload.html (see ) and follow the link to the current stable release. Scroll down to the section entitled "Binary packages," and then again to the subsection for Windows. Download the installer; in our case, it’s named ghc-6.8.3-i386-windows.exe.
Figure : Screenshot of Firefox, displaying the GHC download page
After the installer has downloaded, double-click it to start the installation process. This involves stepping through a normal Windows installer wizard (see ).
Figure : Screenshot of the GHC installation wizard on Windows
Once the installer has finished, the Start Menu’s "All Programs" submenu (see ) should have a GHC folder, inside which you’ll find an icon that you can use to run ghci.
Figure : Screenshot of the Windows XP Start menu, showing the GHC submenu
Clicking the ghci icon brings up a normal Windows console window that is running ghci (see ).
Figure : Screenshot of the ghci interpreter running on Windows
The GHC installer automatically modifies your user account’s PATH variable so that commands such as ghc will be present in the command shell’s search path (i.e., you can type a GHC command name without typing its complete path). This change will take effect the next time you open a command shell.
We have installed GHC 6.8.3 under Mac OS X 10.5 (Leopard), on an Intel-based MacBook. Before installing GHC, the Xcode development system must already be installed.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Installing Haskell Software
Inhaltsvorschau
Almost all Haskell libraries are distributed using a standard packaging system named Cabal. You can find hundreds of Haskell open source libraries and programs, all of which use Cabal, at http://hackage.haskell.org/, the home of the Hackage code .
A command named cabal automates the job of downloading, building, and installing a Haskell package. It also figures out what dependencies a particular library needs and either makes sure that they are installed already or downloads and builds them first. You can install any Haskell package with a single cabal install mypackage command.
The cabal command is not bundled with GHC, so at least as of GHC version 6.8.3, you will have to download and build it yourself.

Installing cabal

To build the cabal command, download the sources for the following four packages from http://hackage.haskell.org/:
Follow the instructions in to manually build each of these four packages, making sure that you leave until last.
After you install the package, the $HOME/.cabal/bin directory will contain the cabal command. You can either move it somewhere more convenient or add that directory to your shell’s search path.

Updating cabal’s package list

After installing cabal, and periodically thereafter, you should download a fresh list of packages from Hackage. You can do so as follows:
            

            cabal update

Installing a library or program

To install some executable or library, just run the following command:
            

            cabal install -p mypackage
If you download a tarball from Hackage, it will arrive in source form. Unpack the tarball and go into the newly created directory in a command shell. The process to build and install it is simple, consisting of three commands:
  1. Configure for system-wide installation (i.e., available to all users):
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Appendix : Characters, Strings, and Escaping Rules
Inhaltsvorschau
This appendix covers the escaping rules used to represent non-ASCII characters in Haskell character and string literals. Haskell’s escaping rules follow the pattern established by the C programming language, but they expand considerably upon them.

Writing Character and String Literals

International Language Support

Escaping Text

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Writing Character and String Literals
Inhaltsvorschau
A single character is surrounded by ASCII single quotes, ', and has type Char:
'c'

'c'

:type 'c'

'c' :: Char
A string literal is surrounded by double quotes, ", and has type [Char] (more often written as String):
"a string literal"

"a string literal"

:type "a string literal"

"a string literal" :: [Char]
The double-quoted form of a string literal is just syntactic sugar for list notation:
['a', ' ', 's', 't', 'r', 'i', 'n', 'g'] == "a string"

True

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
International Language Support
Inhaltsvorschau
Haskell uses Unicode internally for its Char data type. Since String is just an alias for [Char] (which is a list of Chars), Unicode is also used to represent strings.
Different Haskell implementations place limitations on the character sets they can accept in source files. GHC allows source files to be written in the UTF-8 encoding of Unicode, so in a source file, you can use UTF-8 literals inside a character or string constant. Do be aware that if you use UTF-8, other Haskell implementations may not be able to parse your source files.
When you run the ghci interpreter interactively, it may not be able to deal with international characters in character or string literals that you enter at the keyboard.
Although Haskell represents characters and strings internally using Unicode, there is no standardized way to do I/O on files that contain Unicode data. Haskell’s standard text I/O functions treat text as a of 8-bit characters, and do not perform any character set .
There are third-party libraries that will convert between the many different encodings used in files and Haskell’s internal Unicode .
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Escaping Text
Inhaltsvorschau
Some characters must be escaped to be represented inside a character or string literal. For example, a double-quote character inside a string literal must be escaped, or else it will be treated as the end of the string.
Haskell uses essentially the same single-character escapes as the C language and many other popular languages. The escape codes are shown in .
Table : Single-character escape codes
EscapeUnicodeCharacter
\0 U+0000Null character
\a U+0007Alert
\b U+0008Backspace
\f U+000CForm feed
\n U+000ANewline (linefeed)
\r U+000DCarriage return
\t U+0009Horizontal tab
\v U+000BVertical tab
\" U+0022Double-quote
\& n/a Empty string
\' U+0027Single quote
\\ U+005CBackslash
To write a string literal that spans multiple lines, terminate one line with a backslash and resume the string with another backslash. An arbitrary amount of whitespace (of any kind) can fill the gap between the two backslashes:
"this is a \

	\long string,\

    \ spanning multiple lines"
Haskell recognizes the escaped use of the standard two- and three-letter abbreviations of ASCII control codes, shown in .
Table : ASCII control code abbreviations
EscapeUnicodeMeaning
\NUL U+0000Null character
\SOH U+0001Start of heading
\STX U+0002Start of text
\ETX U+0003End of text
\EOT U+0004End of transmission
\ENQ U+0005Enquiry
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
	

Zurück zu Real World Haskell


Themen

Buchreihen

Special Interest

International Sites

O'Reilly China O'Reilly USA O'Reilly Japan O'Reilly Taiwan