Continuing with the todo.txt haskell project, this post will introduce parsers.
Previous project post: [Unit Testing with Hspec][1].
This post is going to be a little more "theory" and explanation but I know I always like trying things out as I read so lets get the project updated to allow you to follow along.
The following changes will be needed to import Parsec into both the library and the unit test code:
library hs-source-dirs: src exposed-modules: Tasks , Util , Parser build-depends: base >= 4.7 && < 5 , text , parsec default-language: Haskell2010 ... test-suite todo-test type: exitcode-stdio-1.0 hs-source-dirs: test main-is: Spec.hs build-depends: base , todo , text , hspec , parsec ghc-options: -Wall default-language: Haskell2010
Next time we call stack build
, stack test
, or stack ghci
, the Parsec libraries will be downloaded, configured and imported into our project.
Here is the final list of imports when the code is complete. We will go through them all as we introduce them. But to ease the walk through process we might as well get rid of any of the compile errors that come from missing imports.
module Parser where import Control.Applicative (many) import Data.Char (toUpper) import Data.Text (pack) import Text.Parsec.Char (char, oneOf, letter, alphaNum, digit, anyChar, noneOf, endOfLine) import Text.Parsec.Combinator (many1, optionMaybe, choice, option, eof, optional, sepBy) import Text.Parsec.Error as E import Text.Parsec.Prim (parse, try, (<|>), parseTest, unexpected) import Text.Parsec.Text import qualified Tasks as Tasks
If you want links to the documentation you can check out the References.
There is a common data type pattern you'll see in much of the Parsec library You can just view it as a "Parser Response", ParsecT s u m a
. Some of the unnamed fields will be replaced with specifics, but you'll definitely see something close to this throughout the documentation.
The code we will be writing takes a stream of data and tokenizes it into haskell data structures, atoms. We will create a set of stream to data type parsers and combine them until eventually we can create a Task
.
Looking over the [Parsec Documentation][2] you'll see the common structure ParsecT s u m a
which breaks down pretty straight forward.
s
is the stream typeu
is the user state typem
is the underlying monada
is the return type.Given a stream (s
) and executing in the current state (u
), the parser performs a set of operations to extract data that creates type a
. All of this occurs within a monad (m
).
Understanding this isn't really required to create a parser. Just knowing that ParsecT s u m a
is the common data type in all parsers should be good enough as you will start to notice it in some form or another (typically replacing a letter for an actual type.
Often instead of using the transformer (denoted by the 'T') you'll see the following type used:
type Parsec s u = ParsecT s u Identity
Using the Identity
monad ([docs][3]) we get back exactly what type we pass in. And by using currying we can add the a
return type to the end.
The type we will most commonly be using is ([doc][4]):
type Parser = Parsec String ()
Our stream type is String
and we aren't using any User State. So a type of Parser Int
means String
in and Int
out. We can make all sorts of parsers with this type signature:
parseInt :: Parser Int -- String in, Int out parseChar :: Parser Char -- String in, Char out parsePriority :: Parser Priority -- String in, Priority out
Now that we know how the construct a parser by creating smaller atom parsers, and we know how the Parser
data type works, let us create our first atom and walk through the parts.
We will start with a very simple example: Priority. Looking at [Rule 1][5] of the todo.txt documentation,
The priority is an uppercase character from A-Z enclosed in parentheses and
followed by a space.
Here is the code:
-- |Priority: (A) priority :: Parser Tasks.Priority priority = do char '(' p <- letter char ')' return $ toUpper p
We will do the TDD work in a later post. But for now lets de-construct the parser for Priorities.
priority :: Parser Tasks.Priority
The type signature for the parser function is pretty straight forward. We are creating a parser (Parser a
) with a return type of Tasks.Priority
.
priority = do
char '('
This line reads in a specific character ([doc][6]). If it is found the value is returned (and we ignore it by not assigning it a name). If the next character in the stream doesn't match, the parser fails.
p <- letter
Similar to the previous line, this one reads a specific character. [letter][7] parse any "upper case or lower case character" and returns the parsed character. We assign that value to `p`
char ')'
Again, parsing and ignoring a parenthesis.
return $ toUpper p
text/gemini
This content has been proxied by September (3851b).