Haskell is an advanced purely-functional programming language.
The latest standard of Haskell is Haskell 2010. As of May 2016, a group is working on the next version, Haskell 2020.
The official Haskell documentation is also a comprehensive and useful resource. Great place to find books, courses, tutorials, manuals, guides, etc.
is a numeral without a decimal point
for example 0
, 1
, 42
, ...
is implicitly applied to fromInteger
which is part of the Num
type class so it indeed has type Num a => a
- that is it can have any type that is an instance of Num
is a numeral with a decimal point
for example 0.0
, -0.1111
, ...
is implicitly applied to fromRational
which is part of the Fractional
type class so it indeed has type a => a
- that is it can have any type that is an instance of Fractional
If you add the language extension OverloadedStrings
to GHC you can have the same for String
-literals which then are applied to fromString
from the Data.String.IsString
type class
This is often used to replace String
with Text
or ByteString
.
Lists can defined with the [1, 2, 3]
literal syntax. In GHC 7.8 and beyond, this can also be used to define other list-like structures with the OverloadedLists
extension.
By default, the type of []
is:
> :t []
[] :: [t]
With OverloadedLists
, this becomes:
[] :: GHC.Exts.IsList l => l
If t
is Foldable
it means that for any value t a
we know how to access all of the elements of a
from "inside" of t a
in a fixed linear order. This is the meaning of foldMap :: Monoid m => (a -> m) -> (t a -> m)
: we "visit" each element with a summary function and smash all the summaries together. Monoid
s respect order (but are invariant to different groupings).
Lenses (and other optics) allow us to separate describing how we want to access some data from what we want to do with it. It is important to distinguish between the abstract notion of a lens and the concrete implementation. Understanding abstractly makes programming with lens
much easier in the long run. There are many isomorphic representations of lenses so for this discussion we will avoid
any concrete implementation discussion and instead give a high-level overview of the concepts.
An important concept in understanding abstractly is the notion of focusing. Important optics focus on a specific part of a larger data structure without forgetting about the larger context. For example, the lens _1
focuses on the first
element of a tuple but doesn't forget about what was in the second field.
Once we have focus, we can then talk about which operations we are allowed to perform with a lens. Given a Lens s a
which when given a datatype of type s
focuses on a specific a
, we can either
a
by forgetting about the additional context ora
by providing a new valueThese correspond to the well-known get
and set
operations which are usually used to characterise a lens.
We can talk about other optics in a similar fashion.
Optic | Focuses on... |
---|---|
Lens | One part of a product |
Prism | One part of a sum |
Traversal | Zero or more parts of a data structure |
Isomorphism | ... |
Each optic focuses in a different way, as such, depending on which type of optic we have we can perform different operations.
What's more, we can compose any of the two optics we have so-far discussed in order to specify complex data accesses. The four types of optics we have discussed form a lattice, the result of composing two optics together is their upper bound.
For example, if we compose together a lens and a prism, we get a traversal. The reason for this is that by their (vertical) composition, we first focus on one part of a product and then on one part of a sum. The result being an optic which focuses on precisely zero or one parts of our data which is a special case of a traversal. (This is also sometimes called an affine traversal).
The reason for the popularity in Haskell is that there is a very succinct representation of optics. All optics are just functions of a certain form which can be composed together using function composition. This leads to a very light-weight embedding which makes it easy to integrate optics into your programs. Further to this, due to the particulars of the encoding, function composition also automatically computes the upper bound of two optics we compose. This means that we can reuse the same combinators for different optics without explicit casting.
These language extensions are typically available when using the Glasgow Haskell Compiler (GHC) as they are not part of the approved Haskell 2010 language Report. To use these extensions, one must either inform the compiler using a flag or place a LANGUAGE
programa before the module
keyword in a file. The official documentation can be found in section 7 of the GCH users guide.
The format of the LANGUAGE
programa is {-# LANGUAGE ExtensionOne, ExtensionTwo ... #-}
. That is the literal {-#
followed by LANGUAGE
followed by a comma separated list of extensions, and finally the closing #-}
. Multiple LANGUAGE
programas may be placed in one file.
The following diagram taken from the Typeclassopedia article shows the relationship between the various typeclasses in Haskell.
Let's clear up some misconceptions that beginners might make.
You may have encountered functions such as:
max :: (Ord a) => a -> a -> a
max m n
| m >= n = m
| otherwise = n
Beginners will typically view max :: (Ord a) => a -> a -> a
as function that takes two arguments (values) of type a
and returns a value of type a
. However, what is really happening, is that max
is taking one argument of type a
and returning a function of type a -> a
. This function then takes an argument of type a
and returns a final value of type a
.
Indeed, max
can be written as max :: (Ord a) => a -> (a -> a)
Consider the type signature of max
:
Prelude> :t max
max :: Ord a => a -> a -> a
Prelude> :t (max 75)
(max 75) :: (Num a, Ord a) => a -> a
Prelude> :t (max "Fury Road")
(max "Fury Road") :: [Char] -> [Char]
Prelude> :t (max "Fury Road" "Furiosa")
(max "Fury Road" "Furiosa") :: [Char]
max 75
and max "Fury Road"
may not look like functions, but in actuality, they are.
The confusion stems from the fact that in mathematics and many, other, common programming languages, we are allowed to have functions that take multiple arguments. However, in Haskell, functions can only take one argument and they can return either values such as a
, or functions such as a -> a
.
[a]
is equivalent to [] a
.[]
constructs the empty list.[]
in a function definition LHS, e.g. f [] = ...
, is the empty list pattern.x:xs
constructs a list where an element x
is prepended to the list xs
f (x:xs) = ...
is a pattern match for a non-empty list where x
is the head and xs
is the tail.f (a:b:cs) = ...
and f (a:(b:cs)) = ...
are the same. They are a pattern match for a list of at least two elements where the first element is a
, the second element is b
, and the rest of the list is cs
.f ((a:as):bs) = ...
is NOT the same as f (a:(as:bs)) = ...
. The former is a pattern match for a non-empty list of lists, where a
is the head of the head, as
is the tail of the head, and bs
is the tail.f (x:[]) = ...
and f [x] = ...
are the same. They are a pattern match for a list of exactly one element.f (a:b:[]) = ...
and f [a,b] = ...
are the same. They are a pattern match for a list of exactly two elements.f [a:b] = ...
is a pattern match for a list of exactly one element where the element is also a list. a
is the head of the element and b
is the tail of the element.[a,b,c]
is the same as (a:b:c:[])
. Standard list notation is just syntactic sugar for the (:)
and []
constructors.all@(x:y:ys)
in order to refer to the whole list as all
(or any other name you choose) instead of repeating (x:y:ys)
again.Functions mentioned here in examples are defined with varying degrees of abstraction in several packages, for example, data-fix
and recursion-schemes
(more functions here). You can view a more complete list by searching on Hayoo.
Text
is a more efficient alternative to Haskell's standard String
type. String
is defined as a linked list of characters in the standard Prelude, per the Haskell Report:
type String = [Char]
Text
is represented as a packed array of Unicode characters. This is similar to how most other high-level languages represent strings, and gives much better time and space efficiency than the list version.
Text
should be preferred over String
for all production usage. A notable exception is depending on a library which has a String
API, but even in that case there may be a benefit of using Text
internally and converting to a String
just before interfacing with the library.
All of the examples in this topic use the OverloadedStrings
language extension.
GHCI is the interactive REPL that comes bundled with GHC.
A Functor can be thought of as a container for some value, or a computation context. Examples are Maybe a
or [a]
. The Typeclassopedia article has a good write-up of the concepts behind Functors.
To be considered a real Functor, an instance has to respect the 2 following laws:
fmap id == id
fmap (f . g) = (fmap f) . (fmap g)
Good resources for learning about concurrent and parallel programming in Haskell are:
Function composition operator (.)
is defined as
(.) :: (b -> c) -> (a -> b) -> (a -> c)
(.) f g x = f (g x) -- or, equivalently,
(.) f g = \x -> f (g x)
(.) f = \g -> \x -> f (g x)
(.) = \f -> \g -> \x -> f (g x)
(.) = \f -> (\g -> (\x -> f (g x) ) )
The type (b -> c) -> (a -> b) -> (a -> c)
can be written as (b -> c) -> (a -> b) -> a -> c
because the ->
in type signatures "associates" to the right, corresponding to the function application associating to the left,
f g x y z ... == (((f g) x) y) z ...
So the "dataflow" is from the right to the left: x
"goes" into g
, whose result goes into f
, producing the final result:
(.) f g x = r
where r = f (g x)
-- g :: a -> b
-- f :: b -> c
-- x :: a
-- r :: c
(.) f g = q
where q = \x -> f (g x)
-- g :: a -> b
-- f :: b -> c
-- q :: a -> c
....
Syntactically, the following are all the same:
(.) f g x = (f . g) x = (f .) g x = (. g) f x
which is easy to grasp as the "three rules of operator sections", where the "missing argument" just goes into the empty slot near the operator:
(.) f g = (f . g) = (f .) g = (. g) f
-- 1 2 3
The x
, being present on both sides of the equation, can be omitted. This is known as eta-contraction. Thus, the simple way to write down the definition for function composition is just
(f . g) x = f (g x)
This of course refers to the "argument" x
; whenever we write just (f . g)
without the x
it is known as point-free style.
Higher Order Functions are functions that take functions as parameters and/or return functions as their return values.
To parse expressions involving operators and functions, Haskell uses fixity declarations to figure out where parenthesis go. In order, it
Notice that we assume here that the operators in any given group from step 2 must all have the same associativity. In fact, Haskell will reject any program where this condition is not met.
As an example of the above algorithm, we can step though the process of adding parenthesis to 1 + negate 5 * 2 - 3 * 4 ^ 2 ^ 1
.
infixl 6 +
infixl 6 -
infixl 7 *
infixr 8 ^
1 + (negate 5) * 2 - 3 * 4 ^ 2 ^ 1
1 + ((negate 5) * 2) - (3 * (4 ^ 2 ^ 1))
(1 + ((negate 5) * 2)) - (3 * (4 ^ (2 ^ 1)))
More details in section 4.4.2 of the Haskell 98 report.
It [Data.Vector] has an emphasis on very high performance through loop fusion, whilst retaining a rich interface. The main data types are boxed and unboxed arrays, and arrays may be immutable (pure), or mutable. Arrays may hold Storable elements, suitable for passing to and from C, and you can convert between the array types. Arrays are indexed by non-negative Int values.
The Haskell Wiki has these recommendations:
In general:
- End users should use Data.Vector.Unboxed for most cases
- If you need to store more complex structures, use Data.Vector
- If you need to pass to C, use Data.Vector.Storable
For library writers;
- Use the generic interface, to ensure your library is maximally flexible: Data.Vector.Generic
One of the strengths of Haskell is the ability to leverage the type system to model parts of your problem domain in the type system. In doing so, one often encounters very complex types. When writing programs with these types (i.e. with values having these types) it occasionally becomes nearly unmangeable to 'juggle' all of the types. As of GHC 7.8, there is a new syntactic feature called typed holes. Typed holes do not change the semantics of the core language; they are intended purely as an aid for writing programs.
For an in-depth explanation of typed holes, as well as a discussion of the design of typed holes, see the Haskell wiki.
Section of the GHC user guide on typed holes.
The Data.Time
module from time
package provides support for retrieving & manipulating date & time values:
Template Haskell refers to the template meta-programming facilities built into GHC Haskell. The paper describing the original implementation can be found here.
Stages refer to when code is executed. Normally, code is exectued only at runtime, but with Template Haskell, code can be executed at compile time. "Normal" code is stage 0 and compile-time code is stage 1.
The stage restriction refers to the fact that a stage 0 program may not be executed at stage 1 - this would be equivalent to being able to run any regular program (not just meta-program) at compile time.
By convention (and for the sake of implementation simplicity), code in the current module is always stage 0 and code imported from all other modules is stage 1. For this reason, only expressions from other modules may be spliced.
Note that a stage 1 program is a stage 0 expression of type Q Exp
, Q Type
, etc; but the converse is not true - not every value (stage 0 program) of type Q Exp
is a stage 1 program,
Futhermore, since splices can be nested, identifiers can have stages greater than 1. The stage restriction can then be generalized - a stage n program may not be executed in any stage m>n. For example, one can see references to such stages greater than 1 in certain error messages:
>:t [| \x -> $x |]
<interactive>:1:10: error:
* Stage error: `x' is bound at stage 2 but used at stage 1
* In the untyped splice: $x
In the Template Haskell quotation [| \ x -> $x |]
Normally, all the declarations in a single Haskell module can be thought of as all being mutually recursive. In other words, every top-level declaration is in the scope of every other in a single module. When Template Haskell is enabled, the scoping rules change - the module is instead broken into groups of code seperated by TH splices, and each group is mutually recursive, and each group in the scope of all further groups.
Haskell has support for modules:
a module can export all, or a subset of its member types & functions
a module can "re-export" names it imported from other modules
On the consumer end of a module, one can:
import all, or a subset of module members
hide imports of a particular member or set of members
haskell.org has a great chapter on module definition.
Haskell does not support tuples with one component natively.
Units (written ()
) can be understood as tuples with zero components.
There are no predefined functions to extract components of tuples with more than two components. If you feel that you need such functions, consider using a custom data type with record labels instead of the tuple type. Then you can use the record labels as functions to extract the components.
Newcomers to Haskell often shy away from the State
monad and treat it like a taboo—like the claimed benefit of functional programming is the avoidance of state, so don't you lose that when you use State
? A more nuanced view is that:
State
type provides the ability to control the dose very precisely.The reasons being that if you have action :: State s a
, this tells you that:
action
is special because it depends on a state;s
, so action
cannot be influenced by any old value in your program—only an s
or some value reachable from some s
;runState :: State s a -> s -> (a, s)
puts a "barrier" around the stateful action, so that its effectfulness cannot be observed from outside that barrier.So this is a good set of criteria for whether to use State
in particular scenario. You want to see that your code is minimizing the scope of the state, both by choosing a narrow type for s
and by putting runState
as close to "the bottom" as possible, (so that your actions can be influenced by as few thing as possible.
As the hackage page describes:
pipes is a clean and powerful stream processing library that lets you build and connect reusable streaming components
Programs implemented through streaming can often be succinct and composable, with simple, short functions allowing you to "slot in or out" features easily with the backing of the Haskell type system.
await :: Monad m => Consumer' a m a
Pulls a value from upstream, where a
is our input type.
yield :: Monad m => a -> Producer' a m ()
Produce a value, where a
is the output type.
It's highly recommended you read through the embedded Pipes.Tutorial
package which gives an excellent overview of the core concepts of Pipes and how Producer
, Consumer
and Effect
interact.
Most Haskell functions are called with the function name followed by arguments (prefix notation). For functions that accept two arguments like (+), it sometimes makes sense to provide an argument before and after the function (infix).
Simon Marlow's book, Concurrent and Parallel Programming in Haskell, is outstanding and covers a multitude of concepts. It is also very much accessible for even the newest Haskell programmer. It is highly recommended and available in PDF or online for free.
Parallel vs Concurrent
Simon Marlow puts it best:
A parallel program is one that uses a multiplicity of computational hardware (e.g., several processor cores) to perform a computation more quickly. The aim is to arrive at the answer earlier, by delegating different parts of the computation to different processors that execute at the same time.
By contrast, concurrency is a program-structuring technique in which there are multiple threads of control. Conceptually, the threads of control execute “at the same time”; that is, the user sees their effects interleaved. Whether they actually execute at the same time or not is an implementation detail; a concurrent program can execute on a single processor through interleaved execution or on multiple physical processors.
Weak Head Normal Form
It's important to be aware of how lazy-evaluation works. The first section of this chapter will give a strong introduction into WHNF and how this relates to parallel and concurrent programming.
While cabal has support for including a C and C++ libraries in a Haskell package, there are a few bugs. First, if you have data (rather than a function) defined in b.o
that is used in a.o
, and list the C-sources: a.c, b.c
, then cabal will be unable to find the data. This is documented in #12152. A workaround when using cabal is to reorder the C-sources
list to be C-sources: b.c, a.c
. This may not work when using stack, because stack always links the C-sources
alphabetically, regardless of the order in which you list them.
Another issues is that you must surround any C++ code in header (.h) files with #ifdef __cplusplus
guards. This is because GHC doesn't understand C++ code in header files. You can still write C++ code in header files, but you must surround it with guards.
ccall
refers to the calling convention; currently ccall
and stdcall
(Pascal convention) are supported. The unsafe
keyword is optional; this reduces overhead for simple functions but may cause deadlocks if the foreign function blocks indefinitely or has insufficient permission to execute1.
On many Linux distributions, the Haskell Gtk3 library is available as a package in the systems package manager (e.g. libghc-gtk
in Ubuntu's APT). However, for some developers it might be preferable to use a tool like stack
to manage isolated environments, and have Gtk3 installed via cabal
instead of via an global installation by the systems package manager. For this option, gtk2hs-buildtools
is required. Run cabal install gtk2hs-buildtools
before adding gtk
, gtk3
or any other Gtk-based Haskell libraries to your projects build-depends
entry in your cabal file.
A run of the mill Functor
is covariant in a single type parameter. For instance, if f
is a Functor
, then given an f a
, and a function of the form a -> b
, one can obtain an f b
(through the use of fmap
).
A Bifunctor
is covariant in two type parameters. If f
is a Bifunctor
, then given an f a b
, and two functions, one from a -> c
, and another from b -> d
, then one can obtain an f c d
(using bimap
).
first
should be thought of as an fmap
over the first type parameter, second
as an fmap
over the second, and bimap
should be conceived as mapping two functions covariantly over the first and second type parameters, respectively.
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
Note the Functor
constraint on f
. The pure
function returns its argument embedded in the Applicative
structure. The infix function <*>
(pronounced "apply") is very similar to fmap
except with the function embedded in the Applicative
structure.
A correct instance of Applicative
should satisfy the applicative laws, though these are not enforced by the compiler:
pure id <*> a = a -- identity
pure (.) <*> a <*> b <*> c = a <*> (b <*> c) -- composition
pure f <*> pure a = pure (f a) -- homomorphism
a <*> pure b = pure ($ b) <*> a -- interchange
Num
sits at the root of the numeric typeclass hierarchy. Its characteristic operations and some common instances are shown below (the ones loaded by default with Prelude plus those of Data.Complex
):
λ> :i Num
class Num a where
(+) :: a -> a -> a
(-) :: a -> a -> a
(*) :: a -> a -> a
negate :: a -> a
abs :: a -> a
signum :: a -> a
fromInteger :: Integer -> a
{-# MINIMAL (+), (*), abs, signum, fromInteger, (negate | (-)) #-}
-- Defined in ‘GHC.Num’
instance RealFloat a => Num (Complex a) -- Defined in ‘Data.Complex’
instance Num Word -- Defined in ‘GHC.Num’
instance Num Integer -- Defined in ‘GHC.Num’
instance Num Int -- Defined in ‘GHC.Num’
instance Num Float -- Defined in ‘GHC.Float’
instance Num Double -- Defined in ‘GHC.Float’
We have already seen the Fractional
class, which requires Num
and introduces the notions of "division" (/)
and reciprocal of a number:
λ> :i Fractional
class Num a => Fractional a where
(/) :: a -> a -> a
recip :: a -> a
fromRational :: Rational -> a
{-# MINIMAL fromRational, (recip | (/)) #-}
-- Defined in ‘GHC.Real’
instance RealFloat a => Fractional (Complex a) -- Defined in ‘Data.Complex’
instance Fractional Float -- Defined in ‘GHC.Float’
instance Fractional Double -- Defined in ‘GHC.Float’
The Real
class models .. the real numbers. It requires Num
and Ord
, therefore it models an ordered numerical field. As a counterexample, Complex numbers are not an ordered field (i.e. they do not possess a natural ordering relationship):
λ> :i Real
class (Num a, Ord a) => Real a where
toRational :: a -> Rational
{-# MINIMAL toRational #-}
-- Defined in ‘GHC.Real’
instance Real Word -- Defined in ‘GHC.Real’
instance Real Integer -- Defined in ‘GHC.Real’
instance Real Int -- Defined in ‘GHC.Real’
instance Real Float -- Defined in ‘GHC.Float’
instance Real Double -- Defined in ‘GHC.Float’
RealFrac
represents numbers that may be rounded
λ> :i RealFrac
class (Real a, Fractional a) => RealFrac a where
properFraction :: Integral b => a -> (b, a)
truncate :: Integral b => a -> b
round :: Integral b => a -> b
ceiling :: Integral b => a -> b
floor :: Integral b => a -> b
{-# MINIMAL properFraction #-}
-- Defined in ‘GHC.Real’
instance RealFrac Float -- Defined in ‘GHC.Float’
instance RealFrac Double -- Defined in ‘GHC.Float’
Floating
(which implies Fractional
) represents constants and operations that may not have a finite decimal expansion.
λ> :i Floating
class Fractional a => Floating a where
pi :: a
exp :: a -> a
log :: a -> a
sqrt :: a -> a
(**) :: a -> a -> a
logBase :: a -> a -> a
sin :: a -> a
cos :: a -> a
tan :: a -> a
asin :: a -> a
acos :: a -> a
atan :: a -> a
sinh :: a -> a
cosh :: a -> a
tanh :: a -> a
asinh :: a -> a
acosh :: a -> a
atanh :: a -> a
GHC.Float.log1p :: a -> a
GHC.Float.expm1 :: a -> a
GHC.Float.log1pexp :: a -> a
GHC.Float.log1mexp :: a -> a
{-# MINIMAL pi, exp, log, sin, cos, asin, acos, atan, sinh, cosh,
asinh, acosh, atanh #-}
-- Defined in ‘GHC.Float’
instance RealFloat a => Floating (Complex a) -- Defined in ‘Data.Complex’
instance Floating Float -- Defined in ‘GHC.Float’
instance Floating Double -- Defined in ‘GHC.Float’
Caution: while expressions such as sqrt . negate :: Floating a => a -> a
are perfectly valid, they might return NaN
("not-a-number"), which may not be an intended behaviour. In such cases, we might want to work over the Complex field (shown later).
See also SafeNewtypeDeriving
.
In general, the rule for converting a C-style function call to Haskell, in any context (assignment, return, or embedded in another call), is to replace the commas in the C-style argument list with whitespace, and move the opening parenthesis from the C-style call to contain the function name and its parameters.
If any expressions are wrapped entirely in parentheses, these (external) pairs of parentheses can be removed for readability, as they do not affect the meaning of the expression.
There are some other circumstances where parentheses can be removed, but this only affects readability and maintainability.
Profunctors are, as described by the docs on Hackage, "a bifunctor where the first argument is contravariant and the second argument is covariant."
So what does this mean? Well, a bifunctor is like a normal functor, except that it has two parameters instead of one, each with its own fmap
-like function to map on it.
Being "covariant" means that the second argument to a profunctor is just like a normal functor: its mapping function (rmap
) has a type signature of Profunctor p => (b -> c) -> p a b -> p a c
. It just maps the function on the second argument.
Being "contravariant" makes the first argument a little weirder. Instead of mapping like a normal functor, its mapping function (lmap
) has a type signature of Profunctor p => (a -> b) -> p b c -> p a c
. This seemingly backward mapping makes most sense for inputs to a function: you would run a -> b
on the input, and then your other function, leaving the new input as a
.
Note: The naming for the normal, one argument functors is a little misleading: the Functor
typeclass implements "covariant" functors, while "contravariant" functors are implemented in the Contravariant
typeclass in Data.Functor.Contravariant
, and previously the (misleadingly named) Cofunctor
typeclass in Data.Cofunctor
.