Literate Haskell ---------------- This program is written in Literate Haskell, thus the .lhs file extension. It's "inside out": Comments are the default; code lines are indicated with > on the first column. Importing things ---------------- > import System > import Monad > import System.IO > import List Completely functional code -------------------------- This section contains code which is completely functional, i.e. composed of functions which are well-defined in the mathematical sense. > isIn :: (Eq a) => [a] -> [a] -> Bool 'foo :: bar' means, "Hey Haskell, I'm telling you explicitly that foo is of type bar." 'Eq a =>' means that the type a must be comparable: if you have x and y, both of type a, 'x == y' must mean something. (As x and y are data variables, a is a type variable.) Given that constraint, '[a]' means a list of comparable things. Lists in Haskell are composed of items which are all of the same type. So, '[a] -> [a] -> Bool' means that isIn is a function that takes two lists of the same type and returns a boolean value. > [] `isIn` [] = True > _ `isIn` [] = False > [] `isIn` _ = True Two things to note: (1) a `isIn` b is equivalent to isIn a b; that is, the backquotes are syntactic sugar that lets us treat the function like an operator. (2) What follows is pattern matching: for any given call of isIn, the parameters will first be checked to see if they are both empty lists. If so, True will be returned, and none of the other definitions will be consulted. Failing that, if the second list is empty, False will be returned: _ means "This needs to be provided, but I don't care what it is and I won't be using it." If the first list is empty, True will be returned. > needle `isIn` haystack = > if needle `isPrefixOf` haystack > then True > else needle `isIn` (tail haystack) So all the trivial cases are out of the way; the nontrivial one is if neither list is empty. If each item of needle is equal to the corresponding item of haystack starting at the beginning of haystack (that's what isPrefixOf means), needle is in haystack. Otherwise, look in the tail of haystack, which is everything but the first item. If needle is not in haystack, at some point we will recurse to the point where the haystack in question is shorter than the needle; at this point needle `isPrefixof` haystack will certainly be false. > -- Returns the haystacks which contain needles. > detect needle haystacks = filter (needle `isIn`) haystacks The filter function takes a function and a list. It applies the function to each item of the list, and returns a list of the items for which the function returned True. needle `isIn` is a partial application of the isIn function. This is the reason for the strange syntax of isIn's type [a] -> [a] -> Bool. If you call isIn with only one list, then instead of returning a value of type Bool, it will return a value of type [a] -> Bool: that is, it will return a function of one argument which returns a Bool. This is called "currying" after Haskell Curry, a mathematician and pioneer in this sort of stuff. > -- Prepends "[name]:" to each of a list of strings, if name is anything > annotate :: Maybe String -> [String] -> [String] > annotate Nothing theLines = theLines > annotate (Just name) theLines = map (annotation ++) theLines > where annotation = name ++ ":" A Maybe String is a constructed type. Its data is constructed data, and there are two constructors: Nothing and Just x, where x is a String. We employ pattern matching with these constructors in order to define annotate over the entire domain of Maybe String. annotate prepends "[name]:" to each string in theLines, if there is a name. (++ is the list concatenation operator; also the string concatenation operator because a String is a list of Chars.) Monadic things -------------- We're in a purely functional language, where if you call the same function twice with the same parameters you will always get the same result. But how do you do I/O in such a language? Imagine that you are a purely intellectual being. You want to deal with matter, but you don't want to get your hands dirty. But you have machines that deal with matter, and you can plug them together to do more complicated things of your desiring. Then your servant in the world of mere matter can push a button and things will get done. That's how I/O happens in Haskell. Due to the nature of I/O, functions dealing with it cannot be strictly well-defined. But if we had some abstract idea of an operation, functions could return those in a well-defined fashion, and if we could compose operations to come up with more complex operations, we could construct abstractly and nicely a machine to do our work. Then the runtime hooks it up to the I/O, and the results are not well-defined, but the operations are. A monad is a mathematical construct from category theory. It's rather a slippery concept but the best first impression is that monads deal with operations. A monad defines, at minimum, a way to turn a function into an operation ("make a machine") and a way to compose two operations into a larger one. We'll call the first one 'return', and the second one '>>=', and we'll assume that the way you compose the operations is mostly by hooking the output of one to the input of the next, like a conveyor belt, or a Unix command-line pipeline. Here comes the code. > -- Using the bind operator >>= > grep :: String -> Handle -> Maybe String -> IO String > grep pattern handle name = grep takes a pattern, which is a String; a file handle; and maybe a filename; and it returns not a string, but an operation that returns a string. IO is a type constructor; IO String is a constructed type. What follows is a pipeline of operations. > hGetContents handle >>= hGetContents is an operation that takes a file handle and returns the entire contents of the file. Just like Unix pipes, if something farther down the pipeline doesn't use all of the output of hGetContents, the whole file will not be read. This is because Haskell has "lazy evaluation." > return . lines >>= lines is a function that takes one String and returns a list of Strings, containing each line in the input String, as split by newlines. return is a function that takes a function and returns an operation suitable for putting in this pipeline thing we're building. And the dot operator is function composition: (f.g) x == f (g x). > return . (detect pattern) >>= detect, from above, will filter out the lines from the previous step that match the pattern. Since it's a function, not an operation, we have to apply return to it. > return . (annotate name) >>= If there's a filename, prepend it to each matching line. > (return . unlines) Join all the annotated matching lines into a single string using newlines. The output of this last stage of the pipeline is a String. The result of the grep function is a complex operation that returns a String. > -- Using the do notation. More natural because you can name things. > main = do Do-notation is an alternative to this >>= business that lets you name more things and write a bit more naturally. Still a monad thing, and the result is still one of these operation thingies. The Haskell runtime will take the operation that is the result of main and make it actually happen. > (pattern:filenames) <- getArgs getArgs :: IO String. It returns an operation that gets the command-line arguments. We call the first of these pattern, and the rest filenames (: is the list construction operator; [1,2,3] is syntactic sugar for (1:(2:(3:()))). > if (length filenames) == 0 > then do output <- grep pattern stdin Nothing > putStr output The if/then/else thing is functional: it's not part of the do-construct (see isIn above). It chooses which operation will happen, so the things in the then and the else have to be operations. The do-construct returns an operation. Thus the extra do's. Now, if we have no filenames, we grep on standard input, and standard input has no filename. (This is why grep takes a Maybe String for the filename.) putStr :: String -> IO (). It returns an operation which passes nothing on to the next operation (and prints the String to stdout). > else do > handles <- mapM (\x -> openFile x ReadMode) filenames First you should know that the map function takes a function and a list, applies the function to each element of the list, and returns the list of the return values of the function. Next you should know that mapM does the same thing except it knows it's messing with these operation things instead of functions. (BTW you will not find many others talking about "operations." If you want to truly understand monads you have to get deeper; don't worry, there are a dozen great tutorials at all levels on the web. Google 'monad tutorial.') The result which we shall call 'handles' is a list of operations that open each file whose name was given on the command line. To go on with the conveyor belt analogy, we have just split the conveyor belt into n belts, where n = how many filenames we have. > allMatches <- return $ map > (\(h,n) -> grep pattern h (Just n)) > (zip handles filenames) zip [1,2,3] [4,5,6] == [(1,4), (2,5), (3,6)]. Things in parentheses with commas are tuples; each member of a tuple can be a different type. This zip returns a list of tuples, each containing a handle and its corresponding filename. The function turns these (which we'll call h and n) into grep operations searching through each file. allMatches is a list of n conveyor belts now with two stages on each. > matchesInSequence <- sequence allMatches The sequence function takes a bunch of operations, each returning one value, and returns one operation which returns a list of values. > mapM_ putStr matchesInSequence And given that list of values, mapM_ produces a list of operations that print out each value; the _ on the end means that this time, we don't care about the return values (recall that putStr returns nothing anyway). Conclusion ---------- These comments explain the behavior of the code, but not the theory behind it. It's like the difference between telling youngsters how to add on paper with carry digits and teaching the number theory that explains why you can carry. You can do more arithmetic if you know how to add, but you can't go making your own number system until you know a bit more about number theory.