Teenage Haskell

I’ve been inspired by the efforts of others (Chris Smith, Manuel Chakravarty) to try teaching children haskell as a first experience of programming. Haskell has a reputation of being a "hard" language, but I suspect this stems from the challenges faced by software developers transitioning from an imperative programming paradigm to a functional one. There’s anecdotal evidence that, for first steps into programming, a functional programming language may be easier for many students, and allow a class to focus more quickly on interesting aspects of programming.

With any group of beginners, and especially children, simple tooling is really important. Being able to run examples in minutes of turning on the computer is really important. But running even the simplest of traditional toolchains requires at least a rudimentary understanding of:

  • a text editor
  • the file system
  • a command line
  • an interpreter/compiler

And there’s platform issues here also – even when the language is platform independent the other items will vary. It would be very easy to get bogged down in all this well before actually writing a program that does something interesting…

Hence I was excited several weeks ago when Chris announced the reimplementation of his codeworld environment. In a nutshell, it’s a web site where:

  1. you edit haskell code in your browser
  2. it gets compiled to java script on the remote server using ghcjs
  3. the javascript runs back in the browser

and it comes with a beginner-friendly prelude focussed on creating pictures, animations, and simple games (no monads required!).

This was just in time for school holidays here in Sydney – my own children to be my "guinea pig" students. Nick (aged 14) is in year 9 at school, whereas Sam (aged 12) is in year 7. At school they have covered simple algebra, number planes, and other math ripe to be used for something more fun than drill exercises! They have a younger brother Henry (aged 10), who has being observing with interest.

Our goal is to learn to draw pictures, then move on to animations, and, further down the track (if we get there) write some games. After a couple of 2 hour sessions, it has gone remarkably well.

So what have we done? Here’s a short outline of our two sessions so far:

Session 1 (2.5 hours):

We discussed the nature of computers, programming languages, compilers.

We launched the codeworld environment, and played with the demos. We tried changing them, mostly by adjusting various constants, and found they broke in often entertaining ways.

We typed in a trivial 2 line program to draw a circle, and made it work. We observed how problems were reported in the log window.

We talked about what a function is, and looked at a few of the builtin functions:

solidCircle :: Number -> Picture
color :: Color -> Picture -> Picture
(&) :: Picture -> Picture -> Picture

… and looked at how they can be composed using haskell syntax.

Then we played!

After this, we introduced some extra functions:

solidRectangle :: Number -> Number -> Picture
translate :: Number -> Number -> Picture -> Picture
rotate :: Number -> Picture -> Picture
scale :: Number -> Number -> Picture -> Picture

which let us draw much more interesting stuff. The rest of this session was spent seeing what cool stuff we could draw with these 7 functions.

Nick programmed some abstract art:

Sam coded up a sheep:

That ended the session, though the boys found some unsupervised time on the computer the next day, when Nick built a castle:

and Sam did some virtual surfing:

Session 2 (2 hours):

In the second session, we started by talked about organising code for clarity and reuse.

The transformation functions introduced in the previous session caused some confusion when used in combination. We talked about how each primitive worked, and how they combined – the different between rotating and then translating versus translating then rotating was investigated.

The boys were keen to move on to animations. I thought we’d leave this for a few sessions, but their enthusiasm overruled. This required that we looked at how to write our own functions for the first time. (In codeworld an animation is a function from time to a picture). This is quite a big step, as we needed to get at least a basic idea of scoping also.

Nevertheless we battled on, and got some movement on the screen. It was soon discovered that rotations are the most interesting transform to animate, as you don’t lose you picture elements off the screen as time goes to infinity!

Nick and Sam needed more assistance here, but still managed to get some ideas working. I’ve only got single frames of their results. Sam produced his space race:

and Nick made a working clock (which tells the right time if you push the run button at 12 oclock!):

In the next session we are going to have to look at numerical functions in a bit more detail in order to produce more types of animations. Time for some graph paper perhaps…

Summary

For a beta (alpha?) piece of software, relying on some fairly advanced and new technology, Codeworld works remarkably well. And Chris has plans for it – there’s a long list of proposed enhancements in the github issue tracker, and a mailing list has just been created.

Right now the main issue is documentation. It works well with an already haskell-literate tutor. Others may want to wait for the documentation, course guides, etc to be written.

If you are a haskell enthusiast, Give it a try!

Cabal version consistency

Thanks to some great work done over the google summer of code, the chart library has gained much new functionality over the last 6 months. A consequence of this is that it has gained plenty of dependencies on other software. Furthermore, where the library previously had 2 cabal files to build the system, it now has 4. It’s important the the versioning of dependencies is consistent across these cabal files, but manually checking is tedious. As best I could tell there is not yet a tool to facilitate this.

Hence, I spend a little time learning about the cabal API, and wrote a short script that:

  1. reads several cabal files specified on the command line
  2. merges these into one overall set of dependencies
  3. displays the depencies in such a way that inconsistent version constrains are obvious

Here’s some example output:

$ runghc ~/repos/merge-cabal-deps/mergeCabalDeps.hs `find . -name '*.cabal'`
* loaded Chart-gtk-1.1
* loaded Chart-1.1
* loaded Chart-tests-1.1
* loaded Chart-cairo-1.1
* loaded Chart-diagrams-1.1
Chart:
    >=1.1 && <1.2 (Chart-cairo,Chart-diagrams,Chart-gtk,Chart-tests)
Chart-cairo:
    >=1.1 && <1.2 (Chart-gtk,Chart-tests)
Chart-diagrams:
    >=1.1 && <1.2 (Chart-tests)
Chart-gtk:
    >=1.1 && <1.2 (Chart-tests)
SVGFonts:
    >=1.4 && <1.5 (Chart-diagrams)
array:
    -any (Chart,Chart-cairo,Chart-gtk,Chart-tests)
base:
    >=3 && <5 (Chart,Chart-cairo,Chart-diagrams,Chart-gtk,Chart-tests)
blaze-svg:
    >=0.3.3 (Chart-diagrams,Chart-tests)
bytestring:
    >=0.9 && <1.0 (Chart-diagrams,Chart-tests)
cairo:
    >=0.9.11 (Chart-cairo,Chart-gtk,Chart-tests)
colour:
    >=2.2.0 (Chart-diagrams)
    >=2.2.1 && <2.4 (Chart,Chart-cairo,Chart-gtk,Chart-tests)
containers:
    >=0.4 && <0.6 (Chart-diagrams,Chart-tests)
data-default-class:
    <0.1 (Chart,Chart-cairo,Chart-diagrams,Chart-tests)
diagrams-cairo:
    >=0.7 && <0.8 (Chart-tests)
diagrams-core:
    >=0.7 && <0.8 (Chart-diagrams,Chart-tests)
diagrams-lib:
    >=0.7 && <0.8 (Chart-diagrams,Chart-tests)
...
$ 

As should be evident, all of the imported cabal packages are referenced with consistent version constraints except for colour (which is lacking an upper bound in Chart-diagrams).

The script is pretty straightforward:

import Control.Monad
import Data.List(intercalate)
import System.Environment(getArgs)

import qualified Data.Map as Map
import qualified Data.Set as Set

import Distribution.Package
import Distribution.Version
import Distribution.Verbosity
import Distribution.Text(display)
import Distribution.PackageDescription
import Distribution.PackageDescription.Parse
import Distribution.PackageDescription.Configuration

type VersionRangeS = String

type DependencyMap = Map.Map PackageName (Map.Map VersionRangeS (Set.Set PackageName))

getDependencyMap :: PackageDescription -> DependencyMap
getDependencyMap pd = foldr f Map.empty (buildDepends pd)
  where
    f :: Dependency -> DependencyMap  -> DependencyMap
    f (Dependency p vr) = Map.insert p (Map.singleton (display vr) (Set.singleton (pkgName (package pd))))

printMergedDependencies :: [PackageDescription] -> IO ()
printMergedDependencies pds = do
  forM_ (Map.toList dmap) $ \(pn,versions) -> do
    putStrLn (display pn ++ ":")
    forM_ (Map.toList versions) $ \(version,pnset) -> do
       putStrLn ("    " ++ version ++ " (" ++ intercalate "," (map display (Set.toList pnset)) ++ ")")
  where
    dmap :: DependencyMap
    dmap = Map.unionsWith (Map.unionWith Set.union) (map getDependencyMap pds)

scanPackages :: [FilePath] -> IO ()
scanPackages fpaths = do
    pds <- mapM loadPackageDescription fpaths
    printMergedDependencies pds
  where
    loadPackageDescription path = do
      pd <- fmap flattenPackageDescription (readPackageDescription silent path)
      putStrLn ("* loaded " ++ display (package pd))
      return pd

main = getArgs >>= scanPackages      

I’d be interested in other tools used for managing suites of cabal configurations.

Data analysis with Monoids

This post expresses the key ideas of a talk I gave at FP-SYD this week.

Monoids are a pretty simple concept in haskell. Some years ago I learnt of them through the excellent Typeclassopedia, looked at the examples, and understood them quickly (which is more than can be said for many of the new ideas that one learns in haskell). However that was it. Having learnt the idea, I realised that monoids are everywhere in programming, but I’d not found much use for the Monoid typeclass abstraction itself. Recently, I’ve found they can be a useful tool for data analysis…

Monoids

First a quick recap. A monoid is a type with a binary operation, and an identity element:

class Monoid a where
  mempty :: a
  mappend :: a -> a -> a

It must satisfy a simple set of laws, specifically that the binary operation much be associative, and the identity element must actually be the identity for the given operation:

mappend a (mappend b c) = mappend (mappend a b) c
mappend mempty x = x
mappend x mempty = x

As is hinted by the names of the typeclass functions, lists are an obvious Monoid instance:

instance Monoid [a] where
  mempty  = []
  mappend = (++)

However, many types can be Monoids. In fact, often a type can be a monoid in multiple ways. Numbers are monoids under both addition and multiplication, with 0 and 1 as their respective identity elements. In the haskell standard libraries, rather than choose one kind of monoid for numbers, newtype declarations are used to given instances for both:

newtype Sum a = Sum { getSum :: a }
  deriving (Eq, Ord, Read, Show, Bounded)

instance Num a => Monoid (Sum a) where
  mempty = Sum 0
  Sum x `mappend` Sum y = Sum (x + y)

newtype Product a = Product { getProduct :: a }
  deriving (Eq, Ord, Read, Show, Bounded)

instance Num a => Monoid (Product a) where
  mempty = Product 1
  Product x `mappend` Product y = Product (x * y)

We’ve now established and codified the common structure for a few monoids, but it’s not yet clear what it has gained us. The Sum and Product instances are unwieldly – you are unlikely to want to use Sum directly to add two numbers:

Prelude> :m Data.Monoid
Prelude Data.Monoid> 5+4
9
Prelude Data.Monoid> getSum (mappend (Sum 5) (Sum 4))
9

Before we progress, however, let’s define a few more monoid instances, potentially useful for data analysis.

data Min a = Min a | MinEmpty deriving (Show)
           
data Max a = Max a | MaxEmpty deriving (Show)

newtype Count = Count Int deriving (Show)

instance (Ord a) => Monoid (Min a) where
  mempty = MinEmpty
  mappend MinEmpty m = m
  mappend m MinEmpty = m
  mappend (Min a) (Min b) = (Min (P.min a b))

instance (Ord a) => Monoid (Max a) where
  mempty = MaxEmpty
  mappend MaxEmpty m = m
  mappend m MaxEmpty = m
  mappend (Max a) (Max b) = (Max (P.max a b))

instance Monoid Count where
  mempty = Count 0
  mappend (Count n1) (Count n2) = Count (n1+n2)

Also some helper functions to construct values of all these monoid types:

sum :: (Num a) => a -> Sum a
sum = Sum

product :: (Num a) => a -> Product a
product = Product

min :: (Ord a) => a -> Min a
min = Min

max :: (Ord a) => a -> Max a
max = Max

count :: a -> Count
count _ = Count 1

These functions are trivial, but they put a consistent interface on creating monoid values. They all have a signature (a -> m) where m is some monoid. For lack of a better name, I’ll call functions with such signatures "monoid functions".

Foldable

It’s time to introduce another typeclass, Foldable. This class abstracts the classic foldr and foldl functions away from lists, making them applicable to arbitrary structures. (There’s a robust debate going on right now about the merits of replacing the list specific fold functions in the standard prelude with the more general versions from Foldable.) Foldable is a large typeclass – here’s the key function of interest to us:

class Foldable t where
  ...
  foldMap :: Monoid m => (a -> m) -> t a -> m
  ...

foldMap takes a monoid function and a Foldable structure, and reduces the structure down to a single value of the monoid. Lists are, of course, instances of foldable, so we can demo our helper functions:

*Examples> let as = [45,23,78,10,11,1]
*Examples> foldMap count as
Count 6
*Examples> foldMap sum as
Sum {getSum = 168}
*Examples> foldMap max as
Max 78

Notice how the results are all still wrapped with the newtype constructors. We’ll deal with this later.

Composition

As it turns out, tuples are already instances of Monoids:

instance (Monoid a,Monoid b) => Monoid (a,b) where
  mempty = (mempty,mempty)
  mappend (a1,b1) (a2,b2) = (mappend a1 a2,mappend b1 b2)

A pair is a monoid if it’s elements are monoids. There are similar instances for longer tuples. We need some helper monoid functions for tuples also:

a2 :: (a -> b) -> (a -> c) -> a -> (b,c)
a2 b c = (,) <$> b <*> c

a3 :: (a -> b) -> (a -> c) -> (a -> d) -> a -> (b,c,d)
a3 b c d = (,,) <$> b <*> c <*> d

These are implemented above using Applicative operators, though I’ve given them more restrictive types to make their intended use here clearer. Now I can compose monoid functions:

*Examples> let as = [45,23,78,10,11,1]
*Examples> :t (a2 min max)
(a2 min max) :: Ord a => a -> (Min a, Max a)
*Examples> foldMap (a2 min max) as
(Min 1,Max 78)
*Examples> :t (a3 count (a2 min max) (a2 sum product))
(a3 count (a2 min max) (a2 sum product))
  :: (Num a, Ord a) =>
     a -> (Count, (Min a, Max a), (Sum a, Product a))
*Examples> foldMap (a3 count (a2 min max) (a2 sum product)) as
(Count 6,(Min 1,Max 78),(Sum {getSum = 168},Product {getProduct = 8880300}))

It’s worth noting here that the composite computations are done in a single traversal of the input list.

More complex calculations

Happy with this, I decide to extend my set of basic computations with the arithmetic mean. There is a problem, however. The arithmetic mean doesn’t "fit" as a monoid – there’s no binary operation such that a mean for a combined set of data can be calculated from the mean of two subsets.

What to do? Well, the mean is the sum divided by the count, both of which are monoids:

newtype Mean a = Mean (Sum a,Count) deriving (Show)

instance (Num a) => Monoid (Mean a) where
  mempty = Mean mempty
  mappend (Mean m1) (Mean m2) = Mean (mappend m1 m2)

mean v = Mean (Sum v,Count 1)

So I can calculate the mean if I am prepared to do a calculation after the foldMap:

*Examples> let as = [45,23,78,10,11,1.5]
*Examples> foldMap mean as
Mean (Sum {getSum = 168.5},Count 6)
*Examples> let (Mean (Sum t,Count n)) = foldMap mean as in t / fromIntegral n
28.083333333333332

The Aggregation type class

For calculations like mean, I need something more than a monoid. I need a monoid for accumulating the values, and then, once the accumulation is complete, a postprocessing function to compute the final result. Hence a new typeclass to extend Monoid:

{-# LANGUAGE TypeFamilies #-}

class (Monoid a) => Aggregation a where
  type AggResult a :: *
  aggResult :: a -> AggResult a

This makes use of the type families ghc extension. We need this to express the fact that our postprocessing function aggResult has a different return type to the type of the monoid. In the above definition:

  • aggResult is a function that gives you the value of the final result from the value of the monoid
  • AggResult is a type function that gives you the type of the final result from the type of the monoid

We can write an instance of Aggregation for Mean:

instance (Fractional a) => Aggregation (Mean a) where
  type AggResult (Mean a) = a
  aggResult (Mean (Sum t,Count n)) = t/fromIntegral n

and test it out:

*Examples> let as = [45,23,78,10,11,1.5]
*Examples> aggResult (foldMap mean as)
28.083333333333332
*Examples> 

Nice. Given that aggResult (foldMap ...) will be a common pattern, lets write a helper:

afoldMap :: (Foldable t, Aggregation a) => (v -> a) -> t v -> AggResult a
afoldMap f vs = aggResult (foldMap f vs)

In order to use the monoids we defined before (sum,product etc) we need to define Aggregation instances for them also. Even though they are trivial, it turns out to be useful, as we can make the aggResult function strip off the newtype constructors that were put there to enable the Monoid typeclass:

instance (Num a) => Aggregation (Sum a)  where
  type AggResult (Sum a) = a
  aggResult (Sum a) = a
    
instance (Num a) => Aggregation (Product a)  where
  type AggResult (Product a) = a
  aggResult (Product a) = a

instance (Ord a) => Aggregation (Min a)  where
  type AggResult (Min a) = a
  aggResult (Min a) = a

instance (Ord a) => Aggregation (Max a)  where
  type AggResult (Max a) = a
  aggResult (Max a) = a

instance Aggregation Count where
  type AggResult Count = Int
  aggResult (Count n) = n

instance (Aggregation a, Aggregation b) => Aggregation (a,b) where
  type AggResult (a,b) = (AggResult a, AggResult b)
  aggResult (a,b) = (aggResult a, aggResult b)

instance (Aggregation a, Aggregation b, Aggregation c) => Aggregation (a,b,c) where
  type AggResult (a,b,c) = (AggResult a, AggResult b, AggResult c)
  aggResult (a,b,c) = (aggResult a, aggResult b, aggResult c)

This is mostly boilerplate, though notice how the tuple instances delve into their components in order to postprocess the results. Now everything fits together cleanly:

*Examples> let as = [45,23,78,10,11,1.5]
*Examples> :t (a3 count (a2 min max) mean)
(a3 count (a2 min max) mean)
  :: Ord a => a -> (Count, (Min a, Max a), Mean a)
*Examples> afoldMap (a3 count (a2 min max) mean) as
(6,(1.5,78.0),28.083333333333332)
*Examples> 

The 4 computations have been calculated all in a single pass over the input list, and the results are free of the type constructors that are no longer required once the aggregation is complete.

Another example of an Aggregation where we need to postprocess the result is counting the number of unique items. For this we will keep a set of the items seen, and then return the size of this set at the end:

newtype CountUnique a = CountUnique (Set.Set a)

instance Ord a => Monoid (CountUnique a) where
  mempty = CountUnique Set.empty
  mappend (CountUnique s1) (CountUnique s2) = CountUnique (Set.union s1 s2)

instance Ord a => Aggregation (CountUnique a) where
  type AggResult (CountUnique a) = Int
  aggResult (CountUnique s1) = Set.size s1

countUnique :: Ord a => a -> CountUnique a
countUnique a = CountUnique (Set.singleton a)

.. in use:

*Examples> let as = [5,7,8,7,11,10,11]
*Examples> afoldMap (a2 countUnique count) as
(5,7)

Higher order aggregation functions

All of the calculations seen so far have worked consistently across all values in the source data structure. We can make use of the mempty monoid value in order to filter our data set, and or aggregate in groups. Here’s a couple of higher order monoid functions for this:

afilter :: Aggregation m => (a -> Bool) -> (a -> m) -> (a -> m)
afilter match mf = \a -> if match a then mf a else mempty

newtype MMap k v = MMap (Map.Map k v)
  deriving Show

instance (Ord k, Monoid v) => Monoid (MMap k v) where
  mempty = MMap (Map.empty)
  mappend (MMap m1) (MMap m2) = MMap (Map.unionWith mappend m1 m2)

instance (Ord k, Aggregation v) => Aggregation (MMap k v) where
  type AggResult (MMap k v) = Map.Map k (AggResult v)
  aggResult (MMap m) = Map.map aggResult m

groupBy :: (Ord k, Aggregation m) => (a -> k) -> (a -> m) -> (a -> MMap k m)
groupBy keyf valuef = \a -> MMap (Map.singleton (keyf a) (valuef a))

afilter restricts the application of a monoid function to a subset of the input data. eg to calculate the sum of all the values, and the sum of values less than 20:

*Examples> let as = [5,10,20,45.4,35,1,3.4]
*Examples> afoldMap (a2 sum (afilter (<=20) sum)) as
(119.8,39.4)

groupBy takes a key function and a monoid function. It partitions the data set using the key function, and applies a monoid function to each subset, returning all of the results in a map. Non-numeric data works better as an example here. Let’s take a set of words as input, and for each starting letter, calculate the number of words with that letter, the length of the shortest word, and and the length of longest word:

*Examples> let as = words "monoids are a pretty simple concept in haskell some years ago i learnt of them through the excellent typeclassopedia looked at the examples and understood them straight away which is more than can be said for many of the new ideas that one learns in haskell"
*Examples> :t groupBy head (a3 count (min.length) (max.length))
groupBy head (a3 count (min.length) (max.length))
  :: Ord k => [k] -> MMap k (Count, Min Int, Max Int)
*Examples> afoldMap (groupBy head (a3 count (min.length) (max.length))) as
fromList [('a',(6,1,4)),('b',(1,2,2)),('c',(2,3,7)),('e',(2,8,9)),('f',(1,3,3)),('h',(2,7,7)),('i',(5,1,5)),('l',(3,6,6)),('m',(3,4,7)),('n',(1,3,3)),('o',(3,2,3)),('p',(1,6,6)),('s',(4,4,8)),('t',(9,3,15)),('u',(1,10,10)),('w',(1,5,5)),('y',(1,5,5))]

Many useful data analysis functions can be written through simple function application and composition using these primitive monoid functions, the product combinators a2 and a3 and these new filtering and grouping combinators.

Disk-based data

As pointed out before, regardless of the complexity of the computation, it’s done with a single traversal of the input data. This means that we don’t need to limit ourselves to lists and other in memory Foldable data structures. Here’s a function similar to foldMap, but that works over the lines in a file:

foldFile :: Monoid m => FilePath -> (BS.ByteString -> Maybe a) -> (a -> m) -> IO m
foldFile fpath pf mf = do
  h <- openFile fpath ReadMode
  m <- loop h mempty
  return m
  where
    loop h m = do
      eof <- hIsEOF h
      if eof
        then (return m)
        else do
          l <- BS.hGetLine h
          case pf l of
            Nothing -> loop h m
            (Just a) -> let m' = mappend m (mf a)
                        in loop h m'

afoldFile :: Aggregation m => FilePath -> (BS.ByteString -> Maybe a) -> (a -> m) -> IO (AggResult m)
afoldFile fpath pf mf = fmap aggResult (foldFile fpath pf mf)

foldFile take two parameters – a function to parse each line of the file, the other is the monoid function to do the aggregation. Lines that fail to parse are skipped. (I can here questions in the background "What about strictness and space leaks?? – I’ll come back to that). As an example usage of aFoldFile, I’ll analyse some stock data. Assume that I have it in a CSV file, and I’ve got a function to parse one CSV line into a sensible data value:

import qualified Data.ByteString.Char8 as BS
import Data.Time.Calendar

data Prices = Prices {
  pName :: String,          -- The stock code
  pDate :: Day,             -- The historic date
  pOpen :: Double,          -- The price at market open
  pHigh :: Double,          -- The highest price on the date
  pLow :: Double,           -- The lowest price on the date
  pClose :: Double,         -- The price at market close
  pVolume :: Double         -- How many shares were traded
  } deriving (Show)

  parsePrices :: BS.ByteString -> Maybe Prices
  parsePrices = ...

Now I can use my monoid functions to analyse the file based data. How many google prices do I have, over what date range:

*Examples> let stats =  afilter (("GOOG"==).pName) (a3 count (min.pDate) (max.pDate))
*Examples> :t stats
stats
  :: Prices
     -> (Count,
         Min time-1.4:Data.Time.Calendar.Days.Day,
         Max time-1.4:Data.Time.Calendar.Days.Day)
*Examples> afoldFile "prices.csv" parsePrices stats
(1257,2008-05-29,2013-05-24)
*Examples> 

Perhaps I want to aggregate my data per month, getting traded price range and total volume. We need a helper function to work out the month of each date:

startOfMonth :: Day -> Day
startOfMonth t = let (y,m,d) = toGregorian t
                 in fromGregorian y m 1

And then we can use groupBy to collect data monthly:

:*Examples> let stats =  afilter (("GOOG"==).pName) (groupBy (startOfMonth.pDate) (a3 (min.pLow) (max.pHigh) (sum.pVolume)))
*Examples> :t stats
stats
  :: Prices
     -> MMap
          time-1.4:Data.Time.Calendar.Days.Day
          (Min Double, Max Double, Sum Double)
*Examples> results <- afoldFile "prices.csv" parsePrices stats
*Examples> mapM_ print (Map.toList results)
(2008-05-01,(573.2,589.92,8073107.0))
(2008-06-01,(515.09,588.04,9.3842716e7))
(2008-07-01,(465.6,555.68,1.04137619e8))
...
(2013-03-01,(793.3,844.0,4.2559856e7))
(2013-04-01,(761.26,827.64,5.3574633e7))
(2013-05-01,(816.36,920.6,4.1080028e7))

Conclusion

So, I hope I’ve shown that monoids are useful indeed. They can form the core of a framework for cleanly specifing quite complex data analysis tasks.

An additional typeclass which I called "Aggregation" extends Monoid and provides for a broader range of computations and also cleaner result types (thanks to type families). There was some discussion when I presented this talk as to whether a single method typeclass like Aggregation was a "true" abstraction, given it has no associated laws. This is a valid point, however using it simplifies the syntax and usage of monoidal calculations significantly, and for me, this makes it worth having.

There remains an elephant in the room, however, and this is space leakage. Lazy evalulation means that, as written, most of the calculations shown run in space proportional to the input data set. Appropriate strictness annotations and related modifications will fix this, but it turns out to be slightly irritating. This blog post is already long enough, so I’ll address space leaks in in a subsequent post…

Composible Value Editor Code

I’ve made the code for this library (previously described here) available via a repository on github:

https://github.com/timbod7/veditor

It’s still experimental, so I don’t intend to put it on hackage until I have (or someone else has) a dependency on it.

The actual VE GADT in the source has an extra type parameter intended to let the generated UIs depend on context. Where this is not necessary, the ConstE type may be supplied. Hence, in the actual code the type VE ConstE a corresponds to VE a in the previous blog post.

Composable Value Editors

Graphical User Interfaces (GUIs) in haskell are frustrating. It’s not yet clear what is the cleanest model for fitting GUIs into functional programming. Currently there are two main approaches:

  • Various effort at applying Functional Reactive Programming (FRP) to GUIs. These are somewhat experimental, and tend to be proof of concepts implementing a small range of GUI features (several of these libraries are listed here).

  • The full blown toolkits which provide a comprehensive imperative binding to mainstream toolkits. The two key contenders here are gtk2hs and wxHaskell.

Whilst enticing, the FRP approach doesn’t currently look appropriate for building rich GUI applications. wxHaskell and gtk2hs at least provide the functionality required, but the low level imperative approach based in the IO monad is tedious to a fluent haskell developer. Here’s a code snippet:

b <- buttonNew
image <- imageNewFromStock stockAdd IconSizeSmallToolbar
containerAdd b image
set b [buttonRelief := ReliefNone]
on b buttonActivated {
     ... button activated action ...
}

It’s not hard to write this sort of code, but it is tedious, especially considering the amount that is required to build a whole application.

This post outlines my experiments to reduce the amount of imperative code required for GUIs, yet retaining compatibility with the imperative toolkits. Initially I’ve been focussed on "value editors" (VEs) aka "forms". These are GUI components to capture/edit values of ideally arbitrary complexity. I’ve two key goals, composability and abstraction.

Composability: I want to be able to compose my value editors effortlessly. Whilst the existing toolkits let you compose widgets using containers and glue code, it’s verbose indeed.

Abstraction: I’d like to define my VEs independently from the underlying toolkit. But I’m looking for something more than a thin layer over the existing toolkits. I want to define my VEs in terms of the structure of the values involved, and worry about the formatting and layout later, if at all.

If we take this abstraction far enough, it should be possible to reuse our structural VEs definitions beyond gtk2hs and wxWindows. For example, a JSON generator+parser pair can be considered a VE – in the sense that to edit a value, one can generate the json text, edit the text, and then parse to recover the new value. Of course, it’s likely to be a balancing act between abstraction and functionality – we’ll have to see how this pans out.

An Abstract UI

OK, enough preamble, here’s a GADT I’ve devised to capture VEs:

-- | A GADT describing abstracted, user interface components for manipulating
-- values of type a.
data VE a where
    -- | A String field
    Entry :: VE String

    -- | An enumeration. A list of label string are supplied,
    -- the VE value is the integer index of the selected label.
    EnumVE :: [String] -> VE Int

    -- | Annotate a VE with a text label
    Label :: String -> VE a -> VE a

    -- | A "product" VE that combines values from two other VEs
    AndVE :: (VE a) -> (VE b) -> VE (a,b)

    -- | A "sum" VE that captures the value from either of two other VEs
    OrVE  :: (VE a) -> (VE b) -> VE (Either a b)

    -- | A VE for manipulating  a list of values. The supplied function lets the
    -- the VE display the list items to the user (eg for selection).
    ListVE :: (a->String) -> VE a -> VE [a]

    -- | Convert a VE over a type a, to a VE over a type b, given
    -- the necessary mappings. Either String captures the potential
    -- failure in the mapping.
    MapVE :: (a -> Either String b) -> (b -> a) -> VE a -> VE b

    -- | Annotate a VE with a default value
    DefaultVE :: a -> VE a -> VE a

-- A typeclass to build VEs
class HasVE a where
  mkVE :: VE a

(.*.) = AndVE
(.+.) = OrVE
infixr 5 .*.
infixr 5 .+.

And here’s an example usage for a simple data type:

data Gender = Male | Female deriving (Show,Enum)

data Person = Person {
    st_name :: String,
    st_age :: Int,
    st_gender :: Gender
} deriving (Show)

instance HasVE Person
  where
    mkVE = MapVE toStruct fromStruct
        (   Label "Name" nonEmptyString
        .*. Label "Age"   mkVE
        .*. Label "Gender"   mkVE
        )
      where
        toStruct (a,(b,c)) = Right (Person a b c)
        fromStruct (Person a b c) = (a,(b,c))

nonEmptyString :: VE String
nonEmptyString = ...

instance HasVE Int ...
instance HasVE String ...
instance HasVE Gender ...

This captures in some sense the abstract semantics for an editor of Person values. We need to capture:

  • a non-empty string for the name,
  • an integer for the age
  • a gender enumeration

and know how to pack/unpack these into a person value.

A GTK UI

But what can we do with this? We need to turn this abstruct VE into a concrete UI. There’s a library function to do this for an arbitrary VE:

data GTKWidget a = GTKWidget {
    ui_widget :: Widget,
    ui_set :: a -> IO (),
    ui_get :: IO (ErrVal a),
    ui_reset :: IO ()
}

uiGTK  :: VE  a -> IO (GTKWidget a)

The uiGTK function turns our abstract VE a into GTK component for editing a value of type a. In addition to building the compound widget, it gives us functions to:

  • put a value into the widget
  • recover a value from the widget
  • restore the widget to a default value

A higher level function constructs a modal dialog to get a value of type a from the user.

data ModalDialog e a = ModalDialog {
    md_dialog :: Dialog,
    md_gw :: GTKWidget a,
    md_run :: IO (Maybe a)
}

modalDialogNew :: String -> VE a -> IO (ModalDialog a)

Hence running this:

dialog <- modalDialogNew "Example 2" (mkVE :: Person)
ma <- md_run dialog

Results in this:

Example 2

Example 2

The automatically generated dialog is simple, but quite functional:

  • invalid fields have a red background, dynamically updated with each keystroke
  • Fields have sensible defaults – often invalid to force entry from a user

More complex UIs are of course possible. As should be clear from the VE GADT above we support sum and product types, lists, etc, and can map these with arbitrary code. Hence we can construct GTK UIs for a very large range of haskell values. A slightly more complex example composes the previous VE:

data Team = Team {
    t_leader :: Person,
    t_followers :: [Person]
} deriving (Show)

instance HasVE Team ...

Resulting in:

Example 3

Example 3

Recursive types are supported, so its possible to build GTK VEs for expression trees, etc.

JSON Serialisation

As I alluded to previously, given VE a, we can automatically generate a JSON generator and parser for values of type a:

data VEJSON a = VEJSON {
        uj_tojson ::  a -> DA.Value,
        uj_fromjson :: DA.Value -> Maybe a
}

uiJSON :: VE ConstE a -> VEJSON a

Related Work

Well into working on these ideas, I was reminded of two somewhat similar haskell projects: Functional Forms and Tangible Values. Functional Forms aims to ease the creation of wxHaskell dialogs to edit values. The exact purpose Tangeable Values is a little unclear to me, but it appears to be about automatically generating UIs suitable for visualising function behaviour and exploring functional programming.

Future Work

Currently I have a library that implements the VE GADT to automatically build GTK editors and JSON serialisers. There’s many ways to progress this work. Some of my ideas follow…

Whilst the generated GTK editor is a sensible default, there are only very limited ways in which the editor can be customised. I envisage a model where the uiGTK function takes an extra parameter akin to a style sheet, given extra information controlling the UI layout and formatting, etc.

I can envisage many other useful things that could automatically be derived from VE definitions:

  • equivalent functionality for wxHaskell
  • console GUIs
  • Funky UIs implemented with primitives more interesting than the standard toolkit widgets: eg zoomable UIs, or UIs more suited to table based platforms.
  • web GUIs. This could be done by automatically generating javascript and corresponding server side haskell code.

Finally, It might be worth investigate whether the GHC Generic mechansism might be used to automatically generate VE definitions.

So there’s plenty of directions this work can go, but right now I want to put it to the test and build an application!

Installing ghc 7.0.3 and the haskell platform on RHEL 5.6

The current haskell platform requires ghc 7.0.3. I need this to run on some RHEL 5.6 machines. Whilst this OS update was released in Jan 2011, it’s based on old software. In particular, it’s built with libc 2.5, which was released back in 2006. It’s not able to use the prebuilt generic binary release from the ghc downloads page. It says:

NOTE: If you have too old a version of libc, then you will get an error like “floating point exception” from the binaries in these bindists. You will need to either upgrade your libc (we’re not sure what the minimum version required is), or use a binary package built for your distribution instead.

I sure don’t want to upgrade libc, and to the best of my knowledge there’s no binary package built for RHEL. So, I’ll need to build it myself from source. But we need ghc to compile ghc, and to make it worse, we need a version >= 6.10, and the binaries for these won’t work with libc 2.5 either.

So, our approach needs to be:

  1. Compile and install 6.10.4 using 6.8.3
  2. Compile a binary distribution of 7.0.3 using 6.10.4
  3. Install the 7.0.3 binary distribution
  4. Compile and install the haskell platform 2011.2.0.1

But wait, as it turns out, the RHEL 5.6 C compiler (gcc 4.1.2) doesn’t seem to be compatible with recent ghc builds either, giving errors like:

rts/dist/build/RtsStartup.dyn_o: relocation R_X86_64_PC32 against `StgRun' can
not be used when making a shared object; recompile with -fPIC

(there are some details on the building and troubleshooting ghc page)

So, you need a more recent gcc also. I could have build this from source also, but luckily I had a working gcc 4.4.3 build already present.

For reference, I needed to download:

  • ghc-6.10.4-src.tar.bz2
  • ghc-6.8.3-x86_64-unknown-linux.tar.bz2
  • ghc-7.0.3-src.tar.bz2
  • haskell-platform-2011.2.0.1.tar.gz

And here’s the commands used:

# General setup
# Assumes downloaded files are in $BASE/downloads
BASE=/tmp/ghc-dev
GCC443DIR=/opt/gcc4.4.3/bin
mkdir -p $BASE/install
mkdir -p $BASE/build

# Start with a 6.8.3 binary
cd $BASE/build
tar -xjf $BASE/downloads/ghc-6.8.3-x86_64-unknown-linux.tar.bz2
export PATH=/usr/bin:/sbin:/bin
cd $BASE/build/ghc-6.8.3
./configure --prefix $BASE/install/ghc-6.8.3
make install

# Build 6.10.4 from src
cd $BASE/build
tar -xjf $BASE/downloads/ghc-6.10.4-src.tar.bz2 
export PATH=$BASE/install/ghc-6.8.3/bin:/usr/sbin:/usr/bin:/sbin:/bin
cd $BASE/build/ghc-6.10.4
./configure --prefix $BASE/install/ghc-6.10.4
make
make install

# Build 7.0.3 from src, using 6.10.4 and gcc 4.4.3
# (gcc 4.1.2 from RHEL doesn't seem to work)
cd $BASE/build
tar -xjf $BASE/downloads/ghc-7.0.3-src.tar.bz2 
export PATH=$BASE/install/ghc-6.10.4/bin:$GCC443DIR:/usr/sbin:/usr/bin:/sbin:/bin
cd $BASE/build/ghc-7.0.3
./configure
make
make binary-dist
 
# Unpack and install the 7.0.3 bin-dist
cd /tmp
rm -rf /tmp/ghc-7.0.3
tar -xjf $BASE/build/ghc-7.0.3/ghc-7.0.3-x86_64-unknown-linux.tar.bz2
cd /tmp/ghc-7.0.3
./configure --prefix $BASE/install/ghc-7.0.3
make install

# Unpack and install the haskell platform
cd $BASE/build
export PATH=$BASE/install/ghc-7.0.3/bin:$GCC443DIR:/usr/sbin:/usr/bin:/sbin:/bin
tar -xzf $BASE/downloads/haskell-platform-2011.2.0.1.tar.gz
cd $BASE/build/haskell-platform-2011.2.0.1
./configure --prefix $BASE/install/ghc-7.0.3
make
make install

Be prepared to chew up some CPU cycles! Pleasingly, once I sorted out the gcc version issue, all of the above worked without problems.

HBeat Lives

Reorganising my projects home, I copied in the old documentation for my hbeat program. The docs needed some updating, so I decided to check it all still works ok. Fearing bitrot, I was pleased and a little suprised to see that on my recently rebuilt ubuntu machine, all I needed was

sudo apt-get install libsdl1.2-dev
sudo apt-get install libsdl-mixer1.2-dev
cabal-dev install hbeat

Or at least that’s what I first thought. The program fired up ok, but failed to respond to mouse clicks as expected. It turns out that this was a pre-existing bug – if the screen redraws don’t happen fast enough, hbeat gets further and further behind in it’s event processing eventually ignoring everything. A small code fix (now published to hackage) causes out-of-date redraw requests to be dropped.

But why was I seeing this problem now? It seems that since I wrote the software, openGL via SDL seems to have got alot slower. The compositing window manager (compiz) seems to be the culprit – it’s consuming significant cpu time whilst hbeat is running. Some references to this can be found here.

I guess there’s a downside to all those fancy compositing effects.

It’s a shame hbeat is now a fair bit glitchier than it was before. Maybe sometime I’ll look at this, but for now at least it still works.