Working with data in protovis – part 1 of 5

When I started using protovis I had only a very basic knowledge of javascript, which in theory isn’t a problem as protovis is meant to be learned by example, and as it has its own logic and structure which is different from typical javascript code. So I started by looking and modifying examples which was enough to do basic stuff.
But I soon felt limited by what hid behind a single property: data. I knew that protovis had lots of features to manipulate and process data but they were not obvious from the examples.

I mean,

var vis = new pv.Panel()
.width(150)
.height(150);

vis.add(pv.Bar)
.data([1, 1.2, 1.7, 1.5, .7, .3])
.width(20)
.height(function(d) d * 80)

vis.render();

Here, it’s pretty obvious that the bars represent the values 1, 1.2, 1.7, 1.5, 0.7 and 0.3 respectively. One can infer that the sizes of bars are 25 pixels wide and 80 times their value long.

But protovis doesn’t usually look like this “hello world” kind of example, but rather like this:

/* Compute yield medians by site and by variety. */
function median(data) pv.median(data, function(d) d.yield);
var site = pv.nest(barley).key(function(d) d.site).rollup(median);
var variety = pv.nest(barley).key(function(d) d.variety).rollup(median);
/* Nest yields data by site then year. */
barley = pv.nest(barley)
    .key(function(d) d.site)
    .sortKeys(function(a, b) site[b] - site[a])
    .key(function(d) d.year)
    .sortValues(function(a, b) variety[b.variety] - variety[a.variety])
    .entries();
[. . .]
/* A panel per site-year. */
var cell = vis.add(pv.Panel)
    .data(barley)
    .height(h)
    .top(function() this.index * h)
    .strokeStyle("#999");

What just happened? pv.nest, key, rollup, sortKeys, entries – what could that do?

To go beyond merely touching up examples, and do your own visualizations from scratch, it is important to get a good grip on how to feed protovis with data. In order to do so, you need a few javascript notions.

Arrays, arrays, how do they work?

In javascript, an array is an ordered list of stuff.

In our initial example, we had one such list:

[1, 1.2, 1.7, 1.5, .7, .3]

Anything can be put in an array: numbers, strings, Booleans (true/false values), objects … including other arrays. All elements of an array don’t have to be of the same type. Arrays can be assigned to a variable.

var a = [1, 1.2, 1.7, 1.5, .7, .3];

Elements of the array can be accessed using the [] notation. In javascript, indices start at 0, so the first element of an array can be obtained so:

a[0];

This returns 1. Javascript has many functions to create and manipulate arrays, which we will talk about later. For the time being, let’s look at arrays of arrays. If we wrote instead:

var a = [[1, 1.2], [1.7, 1.5], [.7, .3]];

a is now an array of arrays, or “multi-dimensional array”.

a[0] is now worth [1, 1.2]. To access the first number of the array, one has to write a[0][0], which will return the first element (1) of the first element ([1, 1.2]) of a.

Javascript also has another type of array called associative arrays, where values are assigned to keys instead of an index. For instance,

var a = {yield: 27.00000, variety: "Manchuria", year: 1931, site: "University Farm"};

is an associative array. To access a value, one can use a . operator:

a.yield

will retun 27.

a["yield"]

also works.

Like other variable types, it is possible to have an array of associative arrays. In fact, this is used quite often in protovis.

Protovis and arrays – deconstructing the first example

The reason why I introduced javascript arrays is that the data property requires an array. Protovis then loops through that array, performing operations on each of its elements. To that end, it uses things such as accessor functions and properties of an object called this.

To explain all of this let’s go back to the first example and analyse it line by line.

var vis = new pv.Panel()
  .width(150)
  .height(150);
vis.add(pv.Bar)
  .data([1, 1.2, 1.7, 1.5, .7, .3])
  .width(20)
  .bottom(0)
  .height(function(d) d * 80)
  .left(function() this.index * 25);
vis.render();

The first 3 lines create a panel, which is like the sheet of paper on which protovis will draw the chart. Its width and height properties must be filled, as they are 0 by default which would make the whole visualization invisible.

The next line adds a bar chart to this panel we’ve just created.

The line after specifies the data on which to work: here comes our array. Here, we have written the array literally in the data property, but nothing prevents us to assign it to a variable first and to pass the variable instead.

The next line, and the line with the bottom property, assign constant numbers to these properties. It means that all the bars will have a width of 20 pixels, and they will all be aligned with the bottom of the panel – that’s what

bottom(0)

does.

Now let’s look at the two remaining lines:

.height(function(d) d * 80)
.left(function() this.index * 25);

The first line uses an accessor function. What this does is that it looks at the current element, and perform an operation on it, the result of which will be the height of that element.

In proper javascript, we would have written:

function(d) {return d*80;}

but protovis uses a shorthand notation that allows us to omit curly braces and the return statement. By the way, d in the function is completely arbitrary, and could be any variable name –

function(a) a*80

also works. It’s just that the name of the variable between parentheses will represent the value of the current element.

The second line uses the this object. this represents what protovis is working on at the moment, and it has properties that can be used. The most commonly used is index: this.index returns the position of the current element in its array, so it is going to be: 0 for the first bar, 1 for the next one, etc.

So this line specifies that each new bar should start every 25 pixels from the left border of the panel.

You may wonder, why not write

.left(this.index * 25);

and omit the function()? Well, function() means that the content of the property gets re-evaluated. If we had omitted it, this.index * 25 would have been computed once (for a result of 0) and that value would have been used for all the bars.

By the way, instead of writing the height property as it is, we could have written:

.height(function()[1, 1.2, 1.7, 1.5, .7, .3][this.index] * 80)

Using an accessor function is shorter and clearer.

Next: Multi-dimensional arrays, inheritance and hierarchy

 

Working with data in protovis

For the past year or so I have been dabbling with protovis. I don’t have a heavy CS background but protovis is supposedly easy to pick up for people like me, who are vaguely aware that computers can make calculations but who need to check the manual for the most mundane programming instructions.

I found was while it’s reasonnably easy to modify the most basic examples to make stuff happen, it is much harder to understand or adapt the more complex ones, let alone to create a fairly complex visualization.

The stumbling block for me was the use of the method data. Data is used to feed all the other protovis methods with, well, data. In the simplest examples, the data which is passed is very plain and simple and so easy to understand. But, for slightly more advanced uses, the shape of data is increasingly complex and the very powerful methods that protovis uses to process and reshape data are just out of reach for a noob.

So I started documenting my struggle with data, first for my own use, and eventually realized I could share what I learned. This is it.

I split this tutorial in 5 parts.

  1. First, we’ll look at the humble arrays and how protovis works with them.
  2. Then, we’ll talk about multi-dimensional arrays, associative arrays, hierarchy and inheritance.
  3. Third, we’ll take a break from protovis and look at the javascript methods to work with arrays, such as sorting.
  4. [update] Since I first published the array part I wrote a supplement
  5. We’ll then check out the powerful protovis data processing methods, such as those who reshape complex arrays.
  6. Finally, we’ll see how data must be prepared to work with the built-in layouts, such as treemaps, force-directed layouts etc.

And as a bonus, I have also deconstructed several interesting (but not immediately accessible) examples from the gallery:

To make the best use of this material, it would be helpful to know a bit about protovis. The best ways to get started are:

Now that’s said, en route for the 1st part!