Working with data in protovis – part 1 of 5

When I started using protovis I had only a very basic knowledge of javascript, which in theory isn’t a problem as protovis is meant to be learned by example, and as it has its own logic and structure which is different from typical javascript code. So I started by looking and modifying examples which was enough to do basic stuff.
But I soon felt limited by what hid behind a single property: data. I knew that protovis had lots of features to manipulate and process data but they were not obvious from the examples.

I mean,

var vis = new pv.Panel()
.width(150)
.height(150);

vis.add(pv.Bar)
.data([1, 1.2, 1.7, 1.5, .7, .3])
.width(20)
.height(function(d) d * 80)

vis.render();

Here, it’s pretty obvious that the bars represent the values 1, 1.2, 1.7, 1.5, 0.7 and 0.3 respectively. One can infer that the sizes of bars are 25 pixels wide and 80 times their value long.

But protovis doesn’t usually look like this “hello world” kind of example, but rather like this:

/* Compute yield medians by site and by variety. */
function median(data) pv.median(data, function(d) d.yield);
var site = pv.nest(barley).key(function(d) d.site).rollup(median);
var variety = pv.nest(barley).key(function(d) d.variety).rollup(median);
/* Nest yields data by site then year. */
barley = pv.nest(barley)
    .key(function(d) d.site)
    .sortKeys(function(a, b) site[b] - site[a])
    .key(function(d) d.year)
    .sortValues(function(a, b) variety[b.variety] - variety[a.variety])
    .entries();
[. . .]
/* A panel per site-year. */
var cell = vis.add(pv.Panel)
    .data(barley)
    .height(h)
    .top(function() this.index * h)
    .strokeStyle("#999");

What just happened? pv.nest, key, rollup, sortKeys, entries – what could that do?

To go beyond merely touching up examples, and do your own visualizations from scratch, it is important to get a good grip on how to feed protovis with data. In order to do so, you need a few javascript notions.

Arrays, arrays, how do they work?

In javascript, an array is an ordered list of stuff.

In our initial example, we had one such list:

[1, 1.2, 1.7, 1.5, .7, .3]

Anything can be put in an array: numbers, strings, Booleans (true/false values), objects … including other arrays. All elements of an array don’t have to be of the same type. Arrays can be assigned to a variable.

var a = [1, 1.2, 1.7, 1.5, .7, .3];

Elements of the array can be accessed using the [] notation. In javascript, indices start at 0, so the first element of an array can be obtained so:

a[0];

This returns 1. Javascript has many functions to create and manipulate arrays, which we will talk about later. For the time being, let’s look at arrays of arrays. If we wrote instead:

var a = [[1, 1.2], [1.7, 1.5], [.7, .3]];

a is now an array of arrays, or “multi-dimensional array”.

a[0] is now worth [1, 1.2]. To access the first number of the array, one has to write a[0][0], which will return the first element (1) of the first element ([1, 1.2]) of a.

Javascript also has another type of array called associative arrays, where values are assigned to keys instead of an index. For instance,

var a = {yield: 27.00000, variety: "Manchuria", year: 1931, site: "University Farm"};

is an associative array. To access a value, one can use a . operator:

a.yield

will retun 27.

a["yield"]

also works.

Like other variable types, it is possible to have an array of associative arrays. In fact, this is used quite often in protovis.

Protovis and arrays – deconstructing the first example

The reason why I introduced javascript arrays is that the data property requires an array. Protovis then loops through that array, performing operations on each of its elements. To that end, it uses things such as accessor functions and properties of an object called this.

To explain all of this let’s go back to the first example and analyse it line by line.

var vis = new pv.Panel()
  .width(150)
  .height(150);
vis.add(pv.Bar)
  .data([1, 1.2, 1.7, 1.5, .7, .3])
  .width(20)
  .bottom(0)
  .height(function(d) d * 80)
  .left(function() this.index * 25);
vis.render();

The first 3 lines create a panel, which is like the sheet of paper on which protovis will draw the chart. Its width and height properties must be filled, as they are 0 by default which would make the whole visualization invisible.

The next line adds a bar chart to this panel we’ve just created.

The line after specifies the data on which to work: here comes our array. Here, we have written the array literally in the data property, but nothing prevents us to assign it to a variable first and to pass the variable instead.

The next line, and the line with the bottom property, assign constant numbers to these properties. It means that all the bars will have a width of 20 pixels, and they will all be aligned with the bottom of the panel – that’s what

bottom(0)

does.

Now let’s look at the two remaining lines:

.height(function(d) d * 80)
.left(function() this.index * 25);

The first line uses an accessor function. What this does is that it looks at the current element, and perform an operation on it, the result of which will be the height of that element.

In proper javascript, we would have written:

function(d) {return d*80;}

but protovis uses a shorthand notation that allows us to omit curly braces and the return statement. By the way, d in the function is completely arbitrary, and could be any variable name –

function(a) a*80

also works. It’s just that the name of the variable between parentheses will represent the value of the current element.

The second line uses the this object. this represents what protovis is working on at the moment, and it has properties that can be used. The most commonly used is index: this.index returns the position of the current element in its array, so it is going to be: 0 for the first bar, 1 for the next one, etc.

So this line specifies that each new bar should start every 25 pixels from the left border of the panel.

You may wonder, why not write

.left(this.index * 25);

and omit the function()? Well, function() means that the content of the property gets re-evaluated. If we had omitted it, this.index * 25 would have been computed once (for a result of 0) and that value would have been used for all the bars.

By the way, instead of writing the height property as it is, we could have written:

.height(function()[1, 1.2, 1.7, 1.5, .7, .3][this.index] * 80)

Using an accessor function is shorter and clearer.

Next: Multi-dimensional arrays, inheritance and hierarchy

11 thoughts on “Working with data in protovis – part 1 of 5

  1. Hi Jerome,

    just one short remark: you mention that protovis uses a shorthand notation that allows us to omit curly braces and the return statement.

    In fact, this is not a protovis shorthand, but a feature of Javascript 1.8, called expression closures. See this link for more info: https://developer.mozilla.org/en/new_in_javascript_1.8 (or just Google: javascript expression closures).

    Perhaps, but I am not sure of that, protovis has some hack available so that this also works with Javascript 1.7, but I am not sure about that…

  2. thanks for the precision, that must explain why I never found an allusion to “protovis shorthand” :)
    I’m mostly using protovis with chrome which has javascript 1.5, so protovis must implement this as you say

  3. Amazing tutorial!!!!

    I would highly recommend to not use the “syntactic sugar” supplied by protovis and advise people to rewrite everything that looks like (function(d) d * 80) as plain JavaScript code (function(d) {return d*80;}). This way, the tag type can equal “text/javascript” instead of “text/javascript+protovis” and various debuggers will highlight the error in the user’s source file and NOT the protovis library … this makes debugging much less frustrating.

  4. thanks – Yes, for these same reasons I try to use it as little as possible, but it’s good to know it’s there, especially when reading someone else’s code :)

  5. Is this the only difference between “text/javascript” and “text/javascript+protovis”. So does that mean that it is possible to use the Protovis library with code that is pure java script?
    Thanks for the great tutorials.
    Pete.

  6. afaik, using text/javascript+protovis instead of text/javascript allows you to omit semi-colons or curly braces + return statements in functions.

    but standard code written with semi-colons, return statements etc. will work fine within the text/javascript+protovis type.

Leave a Reply