21 Jan 2010 @ 11:18 PM

Haskell for anything

Lately I've been playing around with Haskell a lot... particularly trying to get it to do things it's not ideally suited for. For example high performance binary deserialization.

Here's the problem I've been running into:

No Dictionary

As far as I can tell there's nothing like C#'s dictionary class available in Haskell. There are types like Data.Map, and there are some Trie types, but I've run into serious performance and memory issues.

Say you get data from a network stream as a lazy byte string... so far so good we have a very efficient lazy list of byte arrays. The format is something like this on the wire: Key ValueType Value where the Key is a null terminated string, the value type is one of several various types, and the values are things like ints, longs, bools,... and also other ByteStrings.

I wanted to build an interface to this set of data in two ways: First, and the most straightforward would be to return some sort of mapping object which closely resembles what the actual data is. In C# this would be a Dictionary (which is a Hash Table), Data.Map is a binary tree and there is a more efficient Trie class specifically for ByteString -> Value collections.

So I wrote some functions to build that object and all seemed to go ok. Except building these things seemed kind of slow. For example making a collection of 3,000 items took about a second... which I felt was unacceptable.

My initial thought was that perhaps a Trie is sometimes overkill for the kind of retrieval you may want to do. Maybe you are interested in only one of the key value pairs.

So I refactored the code to hand back a List of ByteString, Value pairs. The idea being that if you wanted the Trie you could build it off of these guys. Or you could just work directly with it (and use something like lookup). Sure finding something in a list is far less efficient then using a dictionary type, but you have to pay for that O(n) cost anyway (and much worse in fact to build the initial dictionary).

I thought this would be ideal, but I was quite wrong. The memory usage went nuts, sometimes as much as 1GB or more being used. This on about 50 3000 item collections. There's just no way its that much data, which meant there was a whole lot of orphaned objects, unnecessary memory allocations, maybe some unneeded boxing...

I've yet to solve this problem. I toyed with the ST Monad, and thought about using one of the various Vector types out there... That turned out to be really confusing since there are so many available.

Maybe one of these days I'll finally sit down and figure out what the Haskell code gets turned into. Until then I think I'll revert back to how this used to work.

22 Mar 2009 @ 7:35 PM

Project Euler in Javascript

I rather like Project Euler. Having solved a lot of these with python and haskell I thought I'd try solving them in javascript and post their solutions. Javascript doesn't have a lot of built in functionality so some of the processing could be tough. (Big numbers for example) I'll probably just use some libraries that are laying out there on the internet for those problems.

Still the problems should be doable. Maybe they are of use to someone.

27 Jan 2009 @ 8:58 PM

XBRL

Not too long ago the SEC announced that they are going to start requiring companies to file in XBRL. XBRL is a technology with a lot of potential - albeit mostly unrealized potential. I was lucky enough to work on the Financial Explorer which was an automated viewer of the data. It appears to now be mostly abandoned (I couldn't find an active link on the SEC's site anymore) but I've probably not heard the last of the technology.

XBRL is actually kind of difficult to work with. It has a decent spec, but its very open-ended. We kept running into serious problems trying to interpret the data in meaningful ways (trying to fit the data into nice tables, graphics, etc...) The project had a fairly unique atomic visualization which is worth taking a look at. It was definitely not your typical financial chart. (As a web developer I live in a world of boxes and straight lines not circles)

I wrote an open source C# xbrl processor for it. I never had time to persue it any further, but I thought the biggest win might be the mutual fund risk return filings. They seem to lend themselves much better toward this kind of visual representation.

7 Jan 2009 @ 5:13 AM

Arrows + jQuery

A few months ago I was reading a haskell blog and discovered Arrowlets - a Javascript library based on Haskell's arrows library.

I thought this was interesting for 2 reasons: It demonstrates the progress Javascript has made in becoming a serious programming language (thanks to people like Crockford) and it has the potential to solve a difficult (and common) problem in a simple way.

At least that was the theory. It turned out that arrows are actually kind of hard to understand. After plodding through a few hours of material, I came to the conclusion that it wasn't so much the concept that was difficult as much as the verbiage used to describe it. In that vain I set about writing my own version of the library. The very incomplete first version of this library can be found here: http://code.google.com/p/jquery-arrows/. (It's built on top of jquery for events and ajax)

Here's an example of how ajax would be done:

Occasionally you run into circumstances where you have AJAX calls that depend on other AJAX calls. You don't want to use a synchronous call because that hangs the browser. Rather what you want is for the calls to be made sequentially, though still asynchronously with regard to the browser. Code for that would typically look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$.ajax({
    url: "A",
    success: function() {
        // a
        $a.ajax({
            url: "B",
            success: function() {
                // b
                $a.ajax({
                    url: "C",
                    success: function() {
                        // c
                    }
                })
            }
        })
    }
})
 

That quickly becomes out of hand. With arrows:

1
2
3
4
5
6
7
8
9
10
11
12
InSequence(
    Ajax({
        url: "A"
    }),
    Ajax({
        url: "B"
    }),
    Ajax({
        url: "C"
    })
);
 

What's important here isn't so much the amount of typing one has to do, it's the visibility of the interaction between the various components. The flow of control is much clearer in the second example.

Here's a real example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Notice the linear representation of an asynchronous program
$.Arrows.InSequence(
    // Hit Twitter
    $.Arrows.Ajax({
        url: "http://search.twitter.com/search.json?q=dogs",
        dataType: "jsonp"
    }),
    // Take the result and return an object that can be fed
    //   into google
    function(result) {
        return {
            url: "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q="
                + encodeURIComponent(result.results[0].from_user)
        };
    },
    // Hit google
    $.Arrows.Ajax({
        dataType: "jsonp"
    }),
    // Log the result
    function(result) {
        console.log(result);
    }
).run();