Sean Barret's Judy vs Hashtable Performance Comparison

October 4, 2009 by nicolas, tagged programming

http://uucidl.com/git/?p=hashperf.git;a=summary

$ git clone http://uucidl.com/git/hashperf.git

Requirements: GNU Make; cc; gnuplot.

Introduction

A couple of years ago, I stumbled upon the programming works of Sean Barrett. Both he and Molly Rocket partner Casey Muratori exhibit a refreshingly pragmatic approach to programming, a will to fight the useless complexity that too often plagues our field. Checking out their forums is also well recommended¹.

You might have read before about Judy, an associative array implementation by Doug Baskins, with peculiar performance claims.

Sean Barrett submitted Judy to his inquisitive eye, and produced an enlightening article "A Performance Comparison of Judy to Hash Tables"

A very interesting aspect of the comparison is that performance alone is not the sole focus: the article contrasts the Judy’s 200k lines of code with the 200 lines of code of a simple hash table implementation.

¹ As well as checking out Sean Barrett’s excellent pure C libraries: stb_truetype, stb_image, stb_vorbis and stb.

So what about it?

Well I just took a couple of hours to convert Sean Barrett’s original windows based test suite to POSIX platforms.

Just do:

$ git clone http://uucidl.com/git/hashperf.git

A Makefile (for GNU Make) to build the program, launch the (lengthy) tests and create the graphs, that’s about it.

To reproduce Sean Barrett’s results, just type:

$ make tests
$ make plot

Adding new implementations to the tests

You can easily add new implementations to the test suite by opening aatest.c:

Add your datastructure’s API to the top of the file:

// add your new datastructure code here

void *stlhashCreate(int size);
void stlhashFree(void* hash);
uint32 *stlhashFind(void *hash, uint32 key);
uint32 *stlhashInsert(void *hash, uint32 key);
int stlhashDelete(void *hash, uint32 key);
int stlhashCount(void *hash);

(...)

void reset(void)
{
    (...)
    if (stlhash != NULL)
	stlhashFree(stlhash);
    stlhash = stlhashCreate(1);
    (...)

Describe (AArray) the new hashtable to the test suite:

AArray stlhash_desc = { "stlhash", stlInsert, stlDelete, stlGet, stlCount, stlMem };

Add it to the command line:

int main(int argc, char **argv)
{
    (...)
    if (!stricmp("judy", argv[i])) a = &judy_desc;
    else if (!stricmp("hash", argv[i])) a = &hash_desc;
    else if (!stricmp("bhash", argv[i])) a = &bhash_desc;
    else if (!stricmp("stlhash", argv[i])) a = &stlhash_desc;
    (...)
}

You also have to produce the datasets in the Makefile

First add a new testing function, and add it to "both". Don’t forget to change both the parameter and the name of the output file.

## testing functions

stlhash=./$(TEST) stlhash $(1) $(2) $(3) $(4) $(5) $(6) $(7) $(8) $(9) > $(OUTPUT)/stlhash$(1)$(2)$(3)$(4)$(5)$(6)$(7)$(8)$(9).txt
bhash=./$(TEST) bhash $(1) $(2) $(3) $(4) $(5) $(6) $(7) $(8) $(9) > $(OUTPUT)/bhash$(1)$(2)$(3)$(4)$(5)$(6)$(7)$(8)$(9).txt
(...)
both=\
    $(call judy,$(1),$(2),$(3),$(4),$(5),$(6),$(7),$(8),$(9)) ; \
    $(call hash,$(1),$(2),$(3),$(4),$(5),$(6),$(7),$(8),$(9)) ; \
    $(call stlhash,$(1),$(2),$(3),$(4),$(5),$(6),$(7),$(8),$(9)) ; \
    $(call bhash,$(1),$(2),$(3),$(4),$(5),$(6),$(7),$(8),$(9))

Plotting the results: then you have to also manually add the new datastructure to the plotting scripts: buildraw.gp, proberaw.gp