nicolas @ uucidl

Anti-Pattern: Blobs of test data

April 29, 2015 by nicolas, tagged testing and programming, filed under projects

During the development of automated tests, test data is sometimes represented in blobs, stored in central repositories. They are often shared across automated tests and help setting them up. The repositories can take the form of code (constructing a complete tree of objects), files or even relational databases.

The creation of a shared repository of test data is often introduced because creating and setting up test data is difficult or costly, both at development and execution time. Some reasons:

  1. the domain objects and their collaborators are hard to construct, fake or wire in a test,
  2. the domain is itself very complex and test writers have to master many aspects of the domain to create the correct test data at runtime,
  3. the creation of these objects takes time to execute.

Check if those reasons really apply to your software project. Is 2) inherent to your domain? Can 1) and 3) be remedied? Are they even maybe a result of the application of this anti-pattern?

Personal experience

I have worked on a project that had shared data in the form of a centralized database against which every unit + acceptance test suite would be run.

The database had been created at a certain date, then updated sometimes by hand (handwritten SQL) or code as well as standard database schema migrations.

When a test failed it would be because the behaviors of the code changed (intentionally or not) or because the test data had not been migrated. Finding out what nature the test data had was also difficult. Did that person write this test against object O because it was an object with a precise, intended set up or because somehow it had some property that the writer of the test liked? Those aspects were almost never documented. In effect it meant that the test would often not document what it was constructed against.

Also the test data always grew because modifying items would mean taking the risk of breaking tests that you had no idea how to fix.

Why is a blob of test-data an unit-test anti-pattern?

A good unit-test is fast, precise, readable and isolated. It brings confidence into the working state of the system under test.

Tests become hard to read, imprecise and poorly isolated

Unit tests written against a blob of test data tend to be hard to read, poorly isolated and imprecise.

When a unit-test refer to the entire blob or even part of it, they are potentially depending on the entire tree rather than isolating only a part of the system.

When the test cherry-picks one particular item of the test data blob, the precise setup that the test is using is barely described. One must read the data to find out what the test is actually doing.

When creating a new test, it is very tempting to just look around and inherit one piece of data that someone else’s has written. This becomes a liability if this item is touched further, and couples the two tests implicitely. (I.e. the test failures are correlated)

It also means the test in question never really can state what its starting state is.

And if one cherry-picks the correct data within the blob in practice each tests get its own test data within the entire blob, which means that the blob is growing with the number of tests and never shrinking.

Tests become hard to trust

Unit tests written against a blob of test data also tend to be harder to trust.

In the long run as the application changes so must the test data. When the test data is not correctly versioned or updated then it becomes difficult to trust it. Although code-generated data is superior in this way because at least it can be made to use the basic operations of the data model, leading to well-formed test data in practice it’s always a bit of a mixture of static and generated data.

Tests are still slow

Finally performance wise, although these blobs are often brought in to solve performance issues with setting up the tests, if the test data is mutable, all modifications made to the blobs must be rolled-back so as to keep each test isolated. This may undermine the expected performance benefits of the shared data.

It goes further: when the test data repository is actually a shared resource such as a database, then it is inefficient under heavy parallel testing, making the unit test suite run slowly.

Why is a blob of test-data an acceptance-test anti-pattern as well?

While a unit test tests a system, an acceptance test tests a product.

A good acceptance test embodies the specification of the product in user terms.

When written against a blob of test data, an acceptance test becomes poorly specified. It starts depending on implicit properties of the test data.

Suggestions & Example

Write tests which directly construct their own starting state.

Unit-Test Example: specifications-based setup

A concrete alternative is to write your unit-test in this way:

  • a setup phase that constructs the objects out of a concise specification (a compressed version of your test data)
  • a test phase which operates on the resulting domain objects and verifies its expectations.
  • an unwind phase where the domain objects are destructed

An example in javascript:

function test_thatNotesCanBeDeletedWithADoubleClick() {
    withMidiEditorOnNotes(
        // specification for this test's data:
        [
            { midiPitch: 64, startTime: 7.0 },
        ],
        function (midiEditor, midiNotes) {
            doubleClick(midiEditor, timeToX(7.0), midiToY(64));
            verify(midiNotes.isEmpty());
        }
    );
}

Commentary on suggestion

For unit-tests this means constructing the smallest amount of domain objects necessary for the system under test.

For acceptance tests this means dedicated setup code to move the product into a desired state via domain object manipulation. It is acceptable here to use dedicated shortcuts (using model operations) to bring the product efficiently into this state.

All in all, creating well formed domain objects should anyway not be an after thought. Types with good specification and defaults that create well-formed values allow the creation of domain object values which can be directly used by tests.

It translates into domain objects that can be created anywhere (In C++: on the stack/on the heap), objects that can live standalone without being part of a complex network of other objects. I.e. properties of a modular code base.

A proposal for tracking the health of a code base

September 13, 2014 by nicolas, tagged management and programming, filed under projects

Code as Liability, features as Asset

For a peer reviewed software development project (ideally a module/sub-module) we introduce a dashboard to track its health.

The dashboard is regularly compiled and updated and includes:

A balance listing

  • mass of code” as liability [EWD.1]
  • user features” as asset

An indicator:

  • feature density” the ratio “user features” per “mass of code” unit

It must be applied to peer reviewed projects where the review process exist to guarantee that code is and will remain easy to understand by all peers.

Only features which have are validated / tested in the software can of course be included in the dashboard.

Motivation

This metric encourages reducing the “mass of code” as well and/or the production of fine grained list of its “user features“, as both raise the feature density metric. It acts as both a trigger and a reward for the removal of cruft.

For a given module with a defined business scope, reducing the mass of code encourages finding simpler, more factored expressions of the user features in code, more compact documentation, as well as factoring out in other modules/products what is not directly linked to the domain.

For the same module, producing fine grained lists of user features encourages the understanding of its scope, and can help breaking down development into smaller deliverable units.

Application

The metric is not intended for comparaisons of software projects.

It is meant to be used by the developers themselves (software engineers, designers, documenters) to detect when and where they should direct their efforts. [EWD.2]

Tracking the derivative (its variation over time) of the metric (as for many other metrics) makes it easier to act upon.

Mass of code unit

Mass of code is voluntarily vague. Define it as you see fit. I would for instance include the code, its tests as well as documentation. All of these needs to be maintained in the name of the delivered features.

If code is by default “peer reviewed” then using lines of code is reasonable. With the peer review an additional control already exists for the readability of the code and thus the lines of code are themselves normalized somehow.

User features unit

Inside a focused module, user features can be considered equivalent and simply counted.

References

[EWD.1]: Inspiration from an E.W. Dijkstra’s quote:

From there it is only a small step to measuring “programmer productivity” in terms of “number of lines of code produced per month”. This is a very costly measuring unit because it encourages the writing of insipid code, but today I am less interested in how foolish a unit it is from even a pure business point of view. My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger. — E.W. Dijkstra [EWD1036]

[EWD.2]: simplicity is difficult

Firstly, simplicity and elegance are unpopular because they require hard work and discipline to achieve and education to be appreciated. — E.W. Dijkstra [EWD1243a

Acknowledgement

Thanks to Julien Kirch for his feedback

micros, a playground for demos

December 1, 2013 by nicolas, tagged programming, filed under tools

A little modern playground for writing demos against OSX or Windows, with an emphasis on making it possible to collaborate easily between multiple individuals.

The build system is self-contained, it enforces a strict coding standard with warnings all activated and automatically formatting the code for you.

It standardizes on C++11 and OpenGL 3.2 and targets Darwin/OSX past 10.7 (Lion) and NT/Windows 7 and beyond.

It’s all on github ready to be forked or downloaded

Fork it, and compose your demo inside src/. Start with main.cpp to get an idea.

The runtime will be improved later with some non-essential convenience code for demo making and also user interaction & packaging, which you’ll be able to incorporate within your own repo as long as you’re willing to rebase to upstream, and don’t modify the runtime or build system too much.

The runtime’s API is voluntarily minimalistic, otherwise you would use the APIs directly.

Trace Event Profiling with chrome://tracing and SPDR

March 14, 2013 by nicolas, tagged programming, filed under tools

Knowing more about my code has been a part-time obsession for a couple of years now both at work and outside. Can you really improve what you cannot measure or even visualize?

After some months I had settled with monitoring a certain number of variables in a HTML/Javascript page hosted inside my programs, using the mongoose http server.

However this article from August 2012 made me reconsider writing my own visualization console.

Furthermore, I value very much reusing APIs / protocols (when reusing implementations is not necessary or even needed) and so the trace event profiling API outlined there gave me the needed push to start my own implementation, whose first version I am releasing today: uu.spdr-v0.1.0

It is called SPDR, and allow you to label sections of your code, track the evolution of values with minimum overhead, and importantly in a format compatible with Google’s trace viewer, which almost every one has at their desk in the form of chrome.

The SPDR library is lightweight, compiles for Windows, OS X and POSIX platforms. It is designed to not allocate more memory than you want to use, and supports concurrent traces.

Google’s trace viewer code is also available standalone, which should make it easy to embed in your app.

Sean Barret's Judy vs Hashtable Performance Comparison

October 4, 2009 by nicolas, tagged programming

http://uucidl.com/git/?p=hashperf.git;a=summary

$ git clone http://uucidl.com/git/hashperf.git

Requirements: GNU Make; cc; gnuplot.

Introduction

A couple of years ago, I stumbled upon the programming works of Sean Barrett. Both he and Molly Rocket partner Casey Muratori exhibit a refreshingly pragmatic approach to programming, a will to fight the useless complexity that too often plagues our field. Checking out their forums is also well recommended¹.

You might have read before about Judy, an associative array implementation by Doug Baskins, with peculiar performance claims.

Sean Barrett submitted Judy to his inquisitive eye, and produced an enlightening article "A Performance Comparison of Judy to Hash Tables"

A very interesting aspect of the comparison is that performance alone is not the sole focus: the article contrasts the Judy’s 200k lines of code with the 200 lines of code of a simple hash table implementation.

¹ As well as checking out Sean Barrett’s excellent pure C libraries: stb_truetype, stb_image, stb_vorbis and stb.

So what about it?

Well I just took a couple of hours to convert Sean Barrett’s original windows based test suite to POSIX platforms.

Just do:

$ git clone http://uucidl.com/git/hashperf.git

A Makefile (for GNU Make) to build the program, launch the (lengthy) tests and create the graphs, that’s about it.

To reproduce Sean Barrett’s results, just type:

$ make tests
$ make plot

Adding new implementations to the tests

You can easily add new implementations to the test suite by opening aatest.c:

Add your datastructure’s API to the top of the file:

// add your new datastructure code here

void *stlhashCreate(int size);
void stlhashFree(void* hash);
uint32 *stlhashFind(void *hash, uint32 key);
uint32 *stlhashInsert(void *hash, uint32 key);
int stlhashDelete(void *hash, uint32 key);
int stlhashCount(void *hash);

(...)

void reset(void)
{
    (...)
    if (stlhash != NULL)
	stlhashFree(stlhash);
    stlhash = stlhashCreate(1);
    (...)

Describe (AArray) the new hashtable to the test suite:

AArray stlhash_desc = { "stlhash", stlInsert, stlDelete, stlGet, stlCount, stlMem };

Add it to the command line:

int main(int argc, char **argv)
{
    (...)
    if (!stricmp("judy", argv[i])) a = &judy_desc;
    else if (!stricmp("hash", argv[i])) a = &hash_desc;
    else if (!stricmp("bhash", argv[i])) a = &bhash_desc;
    else if (!stricmp("stlhash", argv[i])) a = &stlhash_desc;
    (...)
}

You also have to produce the datasets in the Makefile

First add a new testing function, and add it to "both". Don’t forget to change both the parameter and the name of the output file.

## testing functions

stlhash=./$(TEST) stlhash $(1) $(2) $(3) $(4) $(5) $(6) $(7) $(8) $(9) > $(OUTPUT)/stlhash$(1)$(2)$(3)$(4)$(5)$(6)$(7)$(8)$(9).txt
bhash=./$(TEST) bhash $(1) $(2) $(3) $(4) $(5) $(6) $(7) $(8) $(9) > $(OUTPUT)/bhash$(1)$(2)$(3)$(4)$(5)$(6)$(7)$(8)$(9).txt
(...)
both=\
    $(call judy,$(1),$(2),$(3),$(4),$(5),$(6),$(7),$(8),$(9)) ; \
    $(call hash,$(1),$(2),$(3),$(4),$(5),$(6),$(7),$(8),$(9)) ; \
    $(call stlhash,$(1),$(2),$(3),$(4),$(5),$(6),$(7),$(8),$(9)) ; \
    $(call bhash,$(1),$(2),$(3),$(4),$(5),$(6),$(7),$(8),$(9))

Plotting the results: then you have to also manually add the new datastructure to the plotting scripts: buildraw.gp, proberaw.gp