The top TV earners are not found on tabloids.

This is my contribution to Flowing Data challenge: visualize this / top tv earners.

When I looked at the dataset provided by Nathan I first wondered what was missing. Quite a few stars were missing in the list. I added Simon Baker whose Mentalist gets a huge audience and who is said to be getting $450k per show. Others who probably should be there include Forest Whittaker for the Criminal Minds spinoff, Jim Belushi in the Defenders, Kate Walsh in Private Practice or Elizabeth Michell in V. I don't think any of them is settling for less than $15k per episode.

Another thing were the audiences. There are a few good sources for that (and some less good) so I tried to approximate how much viewers a show would generate for their channel. I came up with hourly viewers because a show that brings 10 million people to one channel for half an hour should be equivalent to one that gets 5 million viewers for an hour, right?
In fact, I wanted to come up with a proxy of how much value a TV show was generating and how much of that went into the pockets of the cast. It's probably possible to come up with a precise answer with better data than could be assembled by an amateur over a lunch break, audience is probably a part of the equation, but the short story doesn't require much calculation: actors only get the crumbs of a very fat pie.
A show like Grey's Anatomy got $329.1m dollars of ad revenue in 2009, which I assume is an US-only figure, the show being syndicated in many countries, unfortunately including France. And that excludes sales of DVD, paid donwloads and other streams of revenue. Out of that, Patrick Dempsey only got $6m. Now $6m is a lot of money, but actors of successful, established shows don't get a very good deal here. In the last seasons of "Friends", the 6 main actors all got $1m per episode, which seemed fair in retrospect. Sarah Jessica Parker's salary even reached $3.2m per Sex and the City episode, but she was co-producer. And this was then.

The chart can be divided in quadrants. On the lower-right corner, Charlie Sheen, who is the only one there with an old-school deal.
On the higher-left corner, the work horses - stars of the crime shows with stellar ratings, who could ask for more.
On the very bottom-left, those who are happy to be there. Etc.

Now some stars of shows that get well over 10m viewers get "only" around $100k per episode. So obviously the revenue stream must go somewhere else! my money is on the writers :)

Also for fun, I computed the ratio of their salary to the length of each episode. I once calculated that I earned about 83 cents a minute, which sounds pretty ridiculous compared to Charlie Sheen's $62500!

La france qui exporte

This week, I was made aware of a new set of maps by French ministry of Foreign Trade, called cartographie de la France qui exporte (map of France exports) (link). Since I’m interested in the topic and that I know that French public services have killer cartographers I was eager to see what was so exciting about the first set of online maps on French exports.

I was a little underwhelmed to be honest. Online here meant static pdf files, although this is a dataset that just begs to be explored and manipulated.
On top of that, those where basic choropleth maps with markers such as this one here:

Now this map has two problems. First, it’s a choropleth with a discrete scale, but the values of adjacent areas can vary a lot. So, if you look at this portion of the map, what can be deduced on the values? not much I’m afraid.

Second, it’s difficult to compare the marks on the map. Which region has the biggest? the smallest? how do two specific regions compare? with this representation, this type of question is even more difficult to answer than with a table.

Also these charts answer one partial question. So this one, here, shows which region exports most food products. But to where? and how about the imports and balance? now if one given view was the most relevant and could illustrate some important finding, it can be highlighted but here the website gives us collections of many of such maps. As a citizen I’m leaving no more informed than I was.

Not being the one to criticize without proposing an alternative, I whipped out an interactive exploratory tool of France trade flows.

(The interactive vis is too wide to conveniently fit in a blog, but clicking on the image will open it in a new tab).

I don’t have access to the same dataset so I can’t show a strict equivalent. My data comes from COMTRADE, the UN database of trade flows, and shows all exports and imports to France in 2009. They are not broken down by region or by type of company, but I got the flows by partner country and product category.
The idea is that one can select something on one treemap to update the other. Also, it’s possible to alternate between a categorical view (where all groups of products and continents look neatly separated) and a view of the balance, which quickly shows which products or which countries get the bulk of French trade.


(technical explanation follows for those interested in the code proper)
Now following last week’s tutorial, of course it had to be done in protovis.
Actually it illustrates some interesting principles of working with arrays, trees, maps etc.

First, I want to do as much data manipulation as possible in protovis as opposed to manually. So, my source data for the treemap is stored as an array of associative arrays, which is probably the preferred form in protovis. This is no different than, for instance, Protovis’s barley example.
Now how do you get something of the shape -

var data=
[{com:"02",cat:0,cou:4,con:3,imp:0,exp:101421},
{com:"03",cat:0,cou:4,con:3,imp:9716,exp:0},
{com:"04",cat:0,cou:4,con:3,imp:0,exp:9272355},
{com:"05",cat:0,cou:4,con:3,imp:531587,exp:0},
{com:"07",cat:0,cou:4,con:3,imp:0,exp:83360},
...
{com:"08",cat:0,cou:4,con:3,imp:0,exp:2779}

to something shaped like a tree like:

var tree=
{0: {
       02: 101421,
       03: 0,
       04: 9272355,
       05: 0,
       07:83360},
...

The solution is to use the rollup method.

First, if you look at my individual records, they are of the shape:

{com:"04",cat:0,cou:4,con:3,imp:0,exp:9272355}

where com is commodity code, cat is product category, cou is country, con is continent, imp is imports and exp is exports.

For any country + commodity combination, there will be only one record.
What I’m interested to get in the tree I will use for the treemap are exports. That is what will determine the size of the leaves of the tree.

So…
first I am going to nest my array:

var byProduct=pv.nest(data) 
	.key(function(d) {return d.cou;})
	.key(function(d) {return d.cat;})
	.key(function(d) {return d.com;})

once I have written this I could follow up with a .entries() statement which would return me a nested array, or with rollup() which could give me the tree I need.
Since, again, there is only one record for a combination of country (cou) and commodity (com), I can use any aggregation I want.

I define this function:

function rollup(data) {return pv.sum(data, function(d) {return d.exp;});} 

It returns the sum of all the export values. Since there is just one record, what it does is that it gives me the one export value I need in a tree form.

So the complete statement is:

function rollup(data) {return pv.sum(data, function(d) {return d.exp;});} 

var byProduct=pv.nest(data) 
	.key(function(d) {return d.cou;})
	.key(function(d) {return d.cat;})
	.key(function(d) {return d.com;})
	.rollup(rollup)

This creates a tree, nested by country, then by product category, then by commodity. The corresponding values are the exports.

now creating my treemap data dynamically saves me a ton of hassle compared to trying to come up with a data file of the right shape and size, not mentioning the calculation errors which creep in each manual manipulation !

Another point of interestingness: how I computed the data to create the bar charts on the side.
For the left treemap (and left bar chart) the user has selected a country. (and for the right ones, it’s a given product, but let’s focus on the left side, the reasoning is the same for the other side anyway).

so first I am going to take the tree we made earlier and just look at the selected country. We can do that with a statement like:

myProductTree=byProduct[selCountry];

(so now we have a tree with just 2 levels, product category and commodity).

Now I can’t run pv.nest and all that on a tree. I need an array! so I have to use flatten to turn that section of the tree into a bona fide array which I will be able to further process.

catsByCountry = pv.flatten(myProductTree).key("cat").key("com").key("exp").array(); 

Here, note that the arguments: “cat”, “com”, “exp” are completely arbitrary. But, since I’m recreating the array almost as it originally was, I might as well use the same names for the keys.

So now, I have like a little subset of my original dataset, only the records of the selected country.
I can now proceed to sum exports by categories using a standard rollup method, just as we’ve seen here.

catsByCountry = pv.nest(catsByCountry).key(function(d) {return d.cat;}).rollup(rollup);

Conveniently, the rollup function that I defined earlier sums the records! and here I do need summing, not any aggregation.

The problem is that the rollup() method creates an associative array, and if I need to use that in a bar chart I need a proper array! so, I use pv.values() which does just that, it creates an array out of the values of an associative array.

catsByCountry = pv.values(catsByCountry);

Now the values can vary a lot in absolute terms depending on the selected country. This is why in the actual bar chart, I use pv.normalize() to have only values from 0 to 1 which are much more convenient to plot.

vis.add(pv.Bar)
	.data(function() pv.normalize(catsByCountry))

one last thing, to save space in the data set (which means: bandwidth + loading time) I’ve used short keys in my data file, and I’ve used codes for countries, commodities and the like.

so I have this:

{com:"04",cat:0,cou:4,con:3,imp:0,exp:9272355}

instead of

{
    com:"CEREALS,CEREAL PREPRTNS.",
    cat:"Food and live animals",
    cou:"Algeria",
    con:"Africa",
    imp:0,exp:9272355}

to get the names of the countries, categories etc. I have in my data file variables that associate, say, a country code to its long name, its continent etc.
so I can have to write things like:

countries["4"].name+" ("+continents[countries["4"].continent]+")"

instead of something simpler, but it’s a good trade-off because writing those names in full in the original dataset inflates the size of the file to megabytes (there are approx 10.000 records).

Working with data in protovis: part 5 of 5

previous: reshaping complex arrays (4/5)

Working with layouts

In this final part, we’re going to look at how we can shape our data to use the protovis built-in layouts such as stacked areas, treemaps or force-directed graphs.
This is not a tutorial on how to use layouts stricto sensu, and I advise anyone interested to first look at the protovis documentation to see what can be done with this and to understand the underlying concepts.

But if there is one thing to know about layouts, it’s that they allow you to create non-trivial visualizations in even less code than regular protovis, provided that you pass them data in a form they can use, and this is precisely where we come in.

Three great categories of layouts

Currently, there are no fewer than 13 types of layouts in Protovis. Fortunately, there are examples for all of them in the gallery.
There are layouts for:

In addition, there are layouts like pv.Layout.Bullet which require data to have a certain specific shape but the example from the gallery is very explicit. (et tu, Horizon layout).

Arrays of data

In order to work with this kind of layout, the simplest thing is to put your data in a 2-dimensional array:

var data=[
   [8,3,7,2,5],
   [9,6,1,7,4],
    ...
   [7,4,3,6,8]
];

For the grid layout, this gives you an array of cells divided in columns (number of elements in each line) and rows (number of lines).
The idea of the grid layout is that your cells are automatically positioned and sized, so afaik the only thing you can do is add a mark such as a pv.Bar which would fill them completely, but which you could still style with fillStyle or strokeStyle. You can’t really access the underlying data with functions but you can use methods that rely on default values, like adding labels.

For instance, you can use it to generate a QR code:

var qr=[
"000000000000000000000000000",
"011111110001010100011111110",
"010000010101001110010000010",
"010111010000010100010111010",
"010111010111011110010111010",
"010111010010000001010111010",
"010000010110110010010000010",
"011111110101010101011111110",
"000000000011100100000000000",
"011111011110101110101010100",
"000010101001010111101000100",
"010101111001001011111010110",
"001011000100010101010100010",
"001100010111011010010101110",
"010101100110001101001010100",
"010011010011111111100110110",
"010111101010100101000010010",
"010100110010111101111101000",
"000000000101010111000111000",
"011111110100011001010111110",
"010000010000110011000110110",
"010111010110001011111111000",
"010111010101101100110101110",
"010111010100000111001001010",
"010000010111010101101110010",
"011111110101001100011111110",
"000000000000000000000000000",
].map(function(i) i.split(""));

var vis = new pv.Panel()
    .width(216)
    .height(216);
vis.add(pv.Layout.Grid)
    .rows(qr)
 	.cell.add(pv.Bar)
 	    .fillStyle(pv.colors("#fff", "#000"))
     ;
vis.render();
(BTW, this is the QR code to this page)

On line 29, I’m using a map function to turn this array of strings, which is easier and shorter to type, into a bona fide 2-dimensional array.

That’s all there is to grids, of all the layouts they are among the easiest to reproduce with regular protovis.

Now, stacks.
The easiest way to use them is to pass them 2-dimensional arrays. Now it doesn’t have to be arrays of numbers, it can be arrays of associative arrays in case you need to do something exotic. But for the following examples let’s just assume you don’t. Here is how you’d do a stacked area, stacked columns and stacked bars respectively:

var data=[
[[1000,1200,1500,1700]]
[[100,500,300,200]]
]
var vis=new pv.Panel().width(200).height(200);
vis.add(pv.Layout.Stack)
    .layers(data)
    .x(function() 50*this.index)
    .y(function(d) d/20)
    .layer.add(pv.Area)

all you need is to feed the layers, x, y properties of your stack, then say what you want to add to your layers.
Now, columns:

vis.add(pv.Layout.Stack)
    .layers(data)
    .x(function() 50*this.index)
    .y(function(d) d/20)
    .layer.add(pv.Bar).width(40)

and finally, bars:

vis.add(pv.Layout.Stack)
    .layers(data)
    .orient("left")
    .x(function() 50*this.index)
    .y(function(d) d/20)
    .layer.add(pv.Bar).height(40)

For bars, there is a little trick here. I specify that the layer orientation is horizontal (“left”) and I change the height instead of the width of the added pv.Bar.
And that all there is. You can create various streamgraphs by playing with the order and offset properties of the stack but this doesn’t change anything to the data structure, so we’re done here.

Representing networks

Protovis provides 3 cool layouts to easily exhibit relationships between nodes: arc diagrams, matrix diagrams and force-directed layouts.
The good news is that the shape of the data required by those three layouts is identical.

They require an array that correponds to the nodes. This can be as simple as a pv.range(), or as sophisticated as an array of associative arrays if you want to style your network graph according to several potential attributes of the node.

And they also require an array for the links. This array has a more rigid form, it must be an array of associative arrays of the shape: {source: #, target: #, value: #} where the values for source and target correspond to the position of a node in the node array, and value indicates the strength of the link.

So let’s do a simple one.

var nodes=pv.range(6); // why more complex, right?
var links=[
{source:0, target:1, value:2},
{source:1, target:2, value:1},
{source:1, target:3, value:1},
{source:2, target:4, value:4},
{source:3, target:5, value:1},
{source:4, target:5, value:1},
{source:1, target:5, value:3}
]
var vis = new pv.Panel()
    .width(200)
    .height(200)
    ;
var arc = vis.add(pv.Layout.Arc)
    .nodes(nodes)
    .links(links)
	.bottom(100)
arc.link.add(pv.Line);
arc.node.add(pv.Dot)
    .size(50)
vis.render();

Here, by varying the strength of the link, the thickness of the arcs changes accordingly. The nodes are left unstyled, had we passed a more complicated dataset to the nodes array, we could have changed their properties (fillStyle, size, strokeStyle, labels etc.) with appropriate accessor functions.

With little modifications we can create a force-directed layout and a matrix diagram.

var force = vis.add(pv.Layout.Force)
    .nodes(nodes)
    .links(links);

force.link.add(pv.Line);

force.node.add(pv.Dot)
	.size(50)
	.anchor("center").add(pv.Label)
		.text(function() this.index);

vis.render();

Here I labelled the nodes so one can tell which is which. This is done by adding a pv.Label to the pv.Dot that’s attached to the node, just like with any other mark.

var Matrix = vis.add(pv.Layout.Matrix)
	.nodes(nodes)
	.directed(true)
	.links(links)
	.top(20).left(20)

Matrix.link.add(pv.Bar)
    .fillStyle(function(d) pv.Scale.linear(0, 2, 4)
      .range('#eee', 'yellow', 'green')(d.linkValue))

Matrix.label.add(pv.Label).text(function() Math.floor(this.index/2))

vis.render();

For the matrix things are slightly more complex than for the previous 2. Here I opted for a directed matrix, as opposed to a bidirectional one: this means that each link is shown once, to its source from its target, and not twice (ie from its target back to its source) which is the default.
I chose to color the bar attached to my links (which are cells of the matrix) according to the strength of my links. Again, if my nodes field was more qualified, I could have used these properties.

Finally, we’ve added labels to the custom property Matrix.label. Only, the labels are numbered from 0 to 11 so to get numbers from 0 to 5 for both rows and columns I used Math.floor(this.index/2) (integer part of half of this number).

Hierarchized data

Like for networks, the shape of the data we can feed to treemaps, icicles and other hierarchical representation doesn’t change. So once you have your data in order, you can easily switch representations.

Essentially, you will be passing a tree of the form:

var myTree={
   rootnode: {
      node: {
      ...  
         node: {
            leaf: value,
            leaf: value,
            ...
            leaf: value
         },
      ...  
}

The protovis examples use the hierarchy of flare source code as an example, which really shows what can be done with a treemap and other tree represenations.

For our purpose we are going for a simpler tree, inspired by the work of Periscopic on congressspeaks.com which Kim Rees showed at Strata.
Kim presentation featured tiny treemaps that showed the voting record for a congressperson, and whether they had voted for or against their party.

So let’s play with the voting record of an hypothetic congressperson:

var hasVoted={
	didnt: 100,
	voted: {
	    yes: {
	        yesWithParty: 241,
	        yesAgainstParty: 23
	    },
	    no: {
	        noWithParty: 73,
	        noAgainstParty: 5
	    }
	}
};

Once you have your tree, you will need to pass it to your layout using pv.dom, like this:

pv.dom(hasVoted).root("hasVoted").nodes()

Based on that let’s do two hierarchical representations.
Let’s start with a tree:

var vis = new pv.Panel()
    .width(500)
    .height(200)
    ;
var tree = vis.add(pv.Layout.Tree)
    .nodes(pv.dom(hasVoted).root("hasVoted").nodes())
    .depth(40)
    .breadth(100)
    .top(30)
    .right(100)
    ;
tree.link.add(pv.Line);
tree.node.add(pv.Dot)
    .size(function(n) n.nodeValue)
	.anchor("center").add(pv.Label).textAlign("center").text(function(n) n.nodeName)
vis.render();

And here is the result:

There are many styling possibilities obviously left unexplored in this simple example (you can control properties of the tree.link, tree.node, tree.labels which we didn’t use here, etc.), but this won’t change much as far as data are concerned.

Now let’s try a treemap with the same dataset.

var vis = new pv.Panel()
    .width(400)
    .height(200)
    ;

var tree = vis.add(pv.Layout.Treemap)
	.width(200).height(200)
    .nodes(pv.dom(hasVoted).root("hasVoted").nodes())
    ;

tree.leaf.add(pv.Panel)
	.fillStyle(function(d) d.nodeName=="didnt"?"darkgrey":d.nodeName.slice(0,3)=="yes"?
	d.nodeName.slice(-9)=="WithParty"?"powderblue":"steelblue":
	d.nodeName.slice(-9)=="WithParty"?"lightsalmon":"salmon")

vis.add(pv.Panel)
	.data([
		   {label:"yes with party", 	color: "powderblue"},
		   {label:"yes against party", 	color: "steelblue"},
		   {label:"no with party", 		color: "lightsalmon"},
		   {label:"no against party", 	color: "salmon"},
		   {label:"didn't vote", 		color: "darkgrey"}
		   ])
	.left(220)
	.top(function() 50+20*this.index)
	.height(15)
	.width(20)
	.fillStyle(function(d) d.color)
	.anchor("right").add(pv.Label).textAlign("left").text(function(d) d.label)

vis.render();

and what took the longest part of the code was making the legend.

Here is the outcome:

Working with data in protovis – part 4 of 5

Previous: array functions in javascript and protovis

Reshaping complex arrays

This really is what protovis data processing is all about.
In today’s tutorial, I am going to refer extensively to protovis’s Becker’s Barley example. One reason for that is that it’s also used in the API documentation of the methods we are going to cover, and also because I am posting a line-by-line explanation of this example that you can refer to.

So far we’ve seen that :

  • Associative arrays are great as data elements, as their various values can be used for various attributes.
    For instance, if the current data element is an associative array of this shape:

    { yield: 27.00000, variety: "Manchuria", year: 1931, site: "University Farm" }

    one could imagine a bar chart where the length of the bar would come from the yield, their fillStyle color from the variety, the label from the site, etc.

  • arrays of associative arrays are very practical to manipulate thanks to accessor functions.
    An array of the shape:

    var barley = [
      { yield: 27.00000, variety: "Manchuria", year: 1931, site: "University Farm" },
      { yield: 48.86667, variety: "Manchuria", year: 1931, site: "Waseca" },
      { yield: 27.43334, variety: "Manchuria", year: 1931, site: "Morris" },
      { yield: 39.93333, variety: "Manchuria", year: 1931, site: "Crookston" },
      { yield: 32.96667, variety: "Manchuria", year: 1931, site: "Grand Rapids" },
      { yield: 28.96667, variety: "Manchuria", year: 1931, site: "Duluth" },
      { yield: 43.06666, variety: "Glabron", year: 1931, site: "University Farm" },
      { yield: 55.20000, variety: "Glabron", year: 1931, site: "Waseca" },
      { yield: 28.76667, variety: "Glabron", year: 1931, site: "Morris" },
      { yield: 38.13333, variety: "Glabron", year: 1931, site: "Crookston" },
      { yield: 29.13333, variety: "Glabron", year: 1931, site: "Grand Rapids" },
      { yield: 29.66667, variety: "Glabron", year: 1931, site: "Duluth" },
      { yield: 35.13333, variety: "Svansota", year: 1931, site: "University Farm" },
    …
      { yield: 29.33333, variety: "Wisconsin No. 38", year: 1932, site: "Duluth" }
    ]

    could be easily sorted according to any of the keys – yield, variety, year, site, etc.

  • it is easy to access the data of an element’s parent, and in some cases it can greatly simplify the code.

Nesting

For this last reason, you may want to turn one flat array of associative arrays into an array of arrays of associative arrays. This process is called nesting.

Simple nesting

If you turn a single array like the one on the left-hand side to an array of arrays like on the right-hand side, you could easily do 3 smaller charts, one next to the other, by creating them inside of panels. You could have some information at this panel level (for instance the variety) and the rest at a lower level.

Fortunately, there are protovis methods that can turn your flat list into a more complex array of arrays. And since protovis methods are meant to be chained, you can even go on and create arrays of arrays of arrays of arrays if needs be.
Even better – combined with the other data processing functions, you don’t only change the structure of your array, but you can also filter and control the order of the elements to show everything that you want and only what you want.

And how complicated can this be?
To do the above, all you have to type is

barley=pv.nest(barley).key(function(d) d.variety).entries();

What this does is that it nests your barley array, according to the variety key. entries() at the end is required to obtain the array of arrays needed.

Here is an example of what can be done with both kinds of data structures in just a few lines of code (which won’t include the data proper. The long, flat array is stored in the variable barley, as above).
Without nesting:

var vis = new pv.Panel()
    .width(600)
    .height(200)
;
vis.add(pv.Bar)
	.data(barley)
	.left(function() this.index*5)
	.width(4)
	.bottom(0)
	.height(function(d) d.yield*2)
	.fillStyle(function(d) d.year==1931?"steelBlue":"Orange")
;
vis.render();

As the pv.Bar goes through the array, there is not much it can do to structure it. We can just size the bars according to the value of yield, and color them according to another key (here the year).

Now using nesting:

barley2=pv.nest(barley).key(function(d) d.variety).entries();
barley2=pv.nest(barley).key(function(d) d.variety).entries();
var vis = new pv.Panel()
    .width(710)
    .height(150)
;
var cell=vis.add(pv.Panel)
	.data(barley2)
	.left(function() this.index*70)
	.width(70)
	.top(0)
	.height(150);
cell.anchor("top").add(pv.Label).textAlign("center").font("9px sans-serif").text(function(d) d.key)
cell.add(pv.Bar)
		.data(function(d) d.values)
		.left(function() 5+this.index%6*10)
		.width(8)
		.bottom(0)
		.height(function(d) d.yield*2)
		.fillStyle(function(d) d.year==1931?"steelBlue":"Orange")
		.add(pv.Label).text(function(d) d.site).textAngle(-Math.PI/2)
			.textAlign("left").left(function() 15+this.index%6*10).textStyle("#222")
	;
vis.render();

Here we used the same simple nesting command as above (the original example uses a more complicated one which allows for more refinement in the display). This structure allows us to create first panels, which we can style by displaying the name of the sites for instance, then, within these panels, the corresponding bars.

Doing this with the data in its original form would have been possible, but would have required writing a much longer program. So the whole idea of nesting is to take some time to plan the data structure once, so that the code is as short and useful as possible.

Going further with nesting

However, it is possible to go beyond that:

  • by nesting further, which can be done by adding other .key() methods:

    barley=pv.nest(barley)
      .key(function(d) d.variety)
      .key(function(d) d.year)
      .entries();

And/or

  • by sorting keys or values using the sortKeys() and sortValues() methods, respectively.

    For instance, we can change the order in which the variety blocks are displayed with sortKeys():

    barley=pv.nest(barley)
      .key(function(d) d.variety)
      .entries();
    barley=pv.nest(barley)
      .key(function(d) d.variety)
      .sortKeys()
      .entries();

By using sortKeys without argument, the natural order is used (alphabetical, since our key is a string). But we could provide a comparison function if we wanted a more sophisticated arrangement.

Nesting and hierarchy

If you run the double nesting command we discussed above,

barley=pv.nest(barley)
  .key(function(d) d.variety)
  .key(function(d) d.year)
  .entries();

you’ll get as a result something of the form:

var barley=[
  {key:"Manchuria", values: [
    {key:"1931", values: [
      {site:"University farm", variety: "Manchuria", year: 1931, yield: 27},
      {site:"Waseca", variety: "Manchuria", year: 1931, yield: 48.86667},
      {site:"Morris", variety: "Manchuria", year: 1931, yield: 27.43334},
      {site:"Crookston", variety: "Manchuria", year: 1931, yield: 39.93333},
      {site:"Grand Rapids", variety: "Manchuria", year: 1931, yield: 32.96667},
      {site:"Duluth", variety: "Manchuria", year: 1931, yield: 28.96667}
    ]},
    {key:"1932", values: [  
      {site:"University farm", variety: "Manchuria", year: 1932, yield: 26.9},
      {site:"Waseca", variety: "Manchuria", year: 1932, yield: 33.46667},
      {site:"Morris", variety: "Manchuria", year: 1932, yield: 34.36666},
      {site:"Crookston", variety: "Manchuria", year: 1932, yield: 32.96667},
      {site:"Grand Rapids", variety: "Manchuria", year: 1932, yield: 22.13333},
      {site:"Duluth", variety: "Manchuria", year: 1932, yield: 22.56667}
    ]}
  ]},
  {key: "Glabron", ...
]

and so on and so forth for all the varieties of barley. Now how can we use this structure in a protovis script? why not use multi-dimensional arrays instead, and if so, how would the code change?

Well. You’d start using this structure by creating a first panel and and passing it the nested structure as data.
Assuming your root panel is called vis, you’d start likewise:

vis.add(pv.Panel)
    .data(barley)

now, since barley has been nested first by variety, it is now an array of 10 elements. You are going to create 10 individual panels. At some point you should worry about dimensioning and positioning them. But here we are only focusing on passing data to subsequent elements.

Next, you are going to create another set of panels (or any mark, really, this doesn’t change anything for the data)

.add(pv.Panel)
    .data(function(d) d.values)

This is how you drill down to the next level of data, by using an accessor function with the key “values”.
Congratulations! you have created 2 panels in each of our 10 individual panels, one per year.

Finally, you are going to create a final mark (let’s say, a pv.Bar)

.add(pv.Bar)
    .data(function(d) d.values)

Again, you use an accessor function of the same form. This will create a bar chart with 6 bars.
The data element corresponding to each bar is of the form:

{site:"University farm", variety: "Manchuria", year: 1932, yield: 26.9}

So, when you style the chart, you can access these properties with accessor functions, and write for instance:

    .height(function(d) d.yield)
    .add(pv.Label).text(function(d) d.variety)

etc.

To sum it up: you can create a hierarchical structure in protovis that corresponds to the shape of your nested array by adding elements and passing data using an accessor function with the key “values”.
At the lowest level of your structure you can access all the properties of the original array using accessor functions.

Now, what if instead we used a multi-dimensional, normal array without keys and values? don’t they have structure and hierarchy, too?
This is not only possible, but also advised when your dataset is getting really big, as you would plague your users with annoying loading times. This changes the structure of the code though.

An equivalent multi-dimensional array would be something like:

var yields =
[ // this is the level of the variety
  [ // this is the level of the year
    [ 27, 48.86667, 27.43334, 39.93333, 32.96667, 28.96667], 
    [ 26.9, 33.46667, 34.36666, 32.96667, 22.13333, 22.56667]
  ],
  [ 
    [ 43.06666, 55.2, 28.76667, 38.13333, 29.13333, 29.66667],
    [ 36.8, 37.73333, 35.13333, 26.16667, 14.43333, 25.86667]
  ],
  [
    [ 35.13333, 47.33333, 25.76667, 40.46667, 29.66667, 25.7],
    [ 27.43334, 38.5, 35.03333, 20.63333, 16.63333, 22.23333]
  ],
  [
    [ 39.9, 50.23333, 26.13333, 41.33333, 23.03333, 26.3],
    [ 26.8, 37.4, 38.83333, 32.06666, 32.23333, 22.46667]
  ],
  [
    [ 36.56666, 63.8333, 43.76667, 46.93333, 29.76667, 33.93333],
    [ 29.06667, 49.2333, 46.63333, 41.83333, 20.63333, 30.6]
  ],
  [
    [ 43.26667, 58.1, 28.7, 45.66667, 32.16667, 33.6],
    [ 26.43334, 42.2, 43.53334, 34.33333, 19.46667, 22.7]
  ],
  [
    [ 36.6, 65.7667, 30.36667, 48.56666, 24.93334, 28.1],
    [ 25.56667, 44.7, 47, 30.53333, 19.9, 22.5]
  ],
  [
    [ 32.76667, 48.56666, 29.86667, 41.6, 34.7, 32],
    [ 28.06667, 36.03333, 43.2, 25.23333, 26.76667, 31.36667]
  ],
  [
    [ 24.66667, 46.76667, 22.6, 44.1, 19.7, 33.06666],
    [ 30, 41.26667, 44.23333, 32.13333, 15.23333, 27.36667]
  ],
  [
    [ 39.3, 58.8, 29.46667, 49.86667, 34.46667, 31.6],
    [ 38, 58.16667, 47.16667, 35.9, 20.66667, 29.33333]
  ]
]

and that’s the whole lot. It is indeed shorter. now this array is only the yields, you may want to create an array of the possible values of varieties, sites and years for good measure.

var varieties=["Manchuria", "Glabron", "Svansota", "Velvet", "Trebi",
     "No. 457", "No. 462", "Peatland", "No. 475", "Wisconsin No. 38"], 
    sites=["University Farm", "Waseca", "Morris", "Crookston", "Grand Rapids", "Duluth"],
    years=[1931,1932];

And by the way, it is very possible to create these arrays out of the original array using the map() method or equivalent.

how can we create an equivalent structure?
we start like the above:

vis.add(pv.Panel)
     .data(yields)

Likewise, our 3-dimensional array is really an array of 10 arrays of 2 arrays of 6 elements. So we are also creating 10 panels. Let’s continue and create panels for the years:

.add(pv.Panel)
    .data(function(d) d)

To drill down one level in an array, you have to use this form. you say that you are giving the children of your object what’s inside the data property of their parent.

So naturally, you follow by

.add(pv.Bar)
    .data(function(d) d)

now how you style your bars will be slightly different than before. What you passed your first panel was an array of yields. So that’s what you get now from your data. If you want something else, you’ll have to get it with this.index for instance.

    .height(function(d) d) // that's the yield
    .add(pv.Label).text(function() varieties[this.index])

All in all it’s trickier to work with arrays. The code is less explicit, and if you change one array even by accident, you’ll have to check that others are still synchronized. But it could make your vis much faster.

Aggregating

Sometimes, what you want out of an array is not a more complex array, but a simpler list of numbers. For instance, what if you could obtain the sum of all the values in the array for such or such property? This is also possible in protovis, and in fact, it looks a lot like what we’ve done. The difference is that instead of using the method entries(), we will use the method rollup().

Let’s suppose we have a flat array that looks like this: these are scores of students on 3 exams.

var scores=[
{student:"Adam", exam:1, score: 77},
{student:"Adam", exam:2, score: 34},
{student:"Adam", exam:3, score: 85},
{student:"Barbara", exam:1, score: 92},
{student:"Barbara", exam:2, score: 68},
{student:"Barbara", exam:3, score: 97},
{student:"Connor", exam:1, score: 84},
{student:"Connor", exam:2, score: 54},
{student:"Connor", exam:3, score: 37},
{student:"Daniela", exam:1, score: 61},
{student:"Daniela", exam:2, score: 58},
{student:"Daniela", exam:3, score: 64}
]

Now, we would like to get, in one simple object, the average for each student.
We know we could reshape the array if we wanted by using pv.Nest and entries():

pv.nest(scores).key(function(d) d.student).entries()

This would be something of the shape:

[{key:"Adam", values:[
    {exam:1, score: 77, student: "Adam"},
    {exam:2, score: 34, student: "Adam"},
    {exam:3, score: 85, student: "Adam"}
    ]
  },
  {key:"Barbara", values:[
    {exam:1, score: 92, student: "Barbara"},
    {exam:2, score: 68, student: "Barbara"},
    {exam:3, score: 97, student: "Barbara"}
   ]
  },
etc.

Useful, for instance, if we’d want to chart the progress of each student separately.

Now if instead of using entries() at the end, we use rollup(), we could get this:

{
Adam: 65.33333333333333
Barbara: 85.66666666666667
Connor: 58.333333333333336
Daniela: 61}

The exact statement is

pv.nest(scores)
  .key(function(d) d.student)
  .rollup(function(data) pv.mean(data, function(d) d.score))

To understand how this works, it helps to visualize what the pv.nest would have returned if we had asked for entries.
What rollup does is that it would go through each of the values that correspond to the keys, and return one aggregate value, depending on the function.

For the first student, “Adam”, the corresponding values array is like this:

[
    {exam:1, score: 77, student: "Adam"},
    {exam:2, score: 34, student: "Adam"},
    {exam:3, score: 85, student: "Adam"}
    ]

so rollup will just look at each element of this array and apply the function.
This is what (data) in “function(data)” corresponds to.
Next, we tell protovis what to do with these elements. Here, we are interested in the average, so we take pv.mean (not pv.average, remember?)
However, we can’t directly compute the average of an array of associative arrays – we must tell protovis exactly what to average. This is why we use an accessor function, function(d) d.score.

Of course, pv.mean used in this example can be replaced by just about any function.

In the name of clarity, especially if there is only one property that can be aggregated, you can declare a function outside of the rollup() method. This is useful if you are going to aggregate your array by different dimensions:

function meanScore(data) pv.mean(data, function(d) d.score);
var avgStudent=pv.nest(scores)
  .key(function(d) d.student)
  .rollup(meanScore);
var avgExam=pv.nest(scores)
  .key(function(d) d.exam)
  .rollup(meanScore);

Flattening

Protovis also provides methods that turn a “nested” array back into a flat array. And methods that turn a normal array into a tree.
The main advantage of having a flat array is that you can nest it in a different way. This is useful, for instance, if you got your data in a nested form that doesn’t work for you. Likewise, a tree is easier to reshape in protovis than an array.

To create a flat array out of a nested one, you have to use pv.flatten and specify all the keys of the array and conclude by array().

barley=pv.flatten(barley).key("variety").key("site").key("year").key("yield").array()

It’s important to note that you need to specify all the keys, not just the keys that correspond to actual nesting. So again, if you start from a flat array, and you do

barley=pv.nest(barley).key("variety").entries()

to reverse this, you’ll have to enter the full formula, using key four times:

barley=pv.flatten(barley).key("variety").key("site").key("year").key("yield").array()

Finally, pv.tree – well, I haven’t seen this method used outside the documentation. It’s not used in any live example, not covered by any question in the forum, and I haven’t found any trace of it in the internet. So I’d rather leave you with the explanation in the documentation which is fairly clear than come up with my own. If you find yourself in a situation like in the documentation, where you have a flat array of associative arrays, which have a property that could be interpreted as a hierarchy, then you could use this method to turn your array in something more useful to protovis.

Putting it all together

Instead of coming up with a specific new example for this section I refer you to my explanation of the Becker’s Barley example.
On the same subject, see a comparison of how to re-create Becker’s Barley with protovis and Tableau

next: working with layouts

Protovis: analysis of the Becker’s barley example sketch

We are taking a look at the Protovis Becker’s Barley example.

<html>
  <head>
    <title>Barley Yields</title>
    <link type="text/css" rel="stylesheet" href="ex.css?3.2"/>
    <script type="text/javascript" src="../protovis-r3.2.js"></script>
    <script type="text/javascript" src="barley.js"></script>
    <style type="text/css">

#fig {
  width: 350px;
  height: 833px;
}

    </style>
  </head>
  <body><div id="center"><div id="fig">
    <script type="text/javascript+protovis">

/* Compute yield medians by site and by variety. */
function median(data) pv.median(data, function(d) d.yield);
var site = pv.nest(barley).key(function(d) d.site).rollup(median);
var variety = pv.nest(barley).key(function(d) d.variety).rollup(median);

/* Nest yields data by site then year. */
barley = pv.nest(barley)
    .key(function(d) d.site)
    .sortKeys(function(a, b) site[b] - site[a])
    .key(function(d) d.year)
    .sortValues(function(a, b) variety[b.variety] - variety[a.variety])
    .entries();

/* Sizing and scales. */
var w = 242,
    h = 132,
    x = pv.Scale.linear(10, 70).range(0, w),
    c = pv.Colors.category10();

/* The root panel. */
var vis = new pv.Panel()
    .width(w)
    .height(h * pv.keys(site).length)
    .top(15)
    .left(90)
    .right(20)
    .bottom(25);

/* A panel per site-year. */
var cell = vis.add(pv.Panel)
    .data(barley)
    .height(h)
    .top(function() this.index * h)
    .strokeStyle("#999");

/* Title bar. */
cell.add(pv.Bar)
    .height(14)
    .fillStyle("bisque")
  .anchor("center").add(pv.Label)
    .text(function(site) site.key);

/* A dot showing the yield. */
var dot = cell.add(pv.Panel)
    .data(function(site) site.values)
    .top(23)
  .add(pv.Dot)
    .data(function(year) year.values)
    .left(function(d) x(d.yield))
    .top(function() this.index * 11)
    .size(12)
    .lineWidth(2)
    .strokeStyle(function(d) c(d.year));

/* A label showing the variety. */
dot.anchor("left").add(pv.Label)
    .visible(function() !this.parent.index)
    .left(-1)
    .text(function(d) d.variety);

/* X-ticks. */
vis.add(pv.Rule)
    .data(x.ticks(7))
    .left(x)
    .bottom(-5)
    .height(5)
    .strokeStyle("#999")
  .anchor("bottom").add(pv.Label);

/* A legend showing the year. */
vis.add(pv.Dot)
    .extend(dot)
    .data([{year:1931}, {year:1932}])
    .left(function(d) 170 + this.index * 40)
    .top(-8)
  .anchor("right").add(pv.Label)
    .text(function(d) d.year);

vis.render();

    </script>
  </div></div></body>
</html>

Data:

var barley = [
  { yield: 27.00000, variety: "Manchuria", year: 1931, site: "University Farm" },
  { yield: 48.86667, variety: "Manchuria", year: 1931, site: "Waseca" },
  { yield: 27.43334, variety: "Manchuria", year: 1931, site: "Morris" },
etc.

The code begins by defining a function that will be used throughout the program.

function median(data) pv.median(data, function(d) d.yield);

this function expects an array of associative arrays which have the key “yield”. What it does is that it returns the median of all the values of the key yield.
If run against barley, it will return the median yield of all observations. If run against a subset of barley, it will return the median of the yield of that subset.
The point of this function is to simplify the writing of otherwise obfuscated statements.

Now let’s see it put to good use.

var site = pv.nest(barley).key(function(d) d.site).rollup(median);

The output of this statement is easier to understand than itself. It returns an associative array, with each site as a key, and their corresponding value is the median yield for all observations for that site.

{
  Crookston: 39.03333,
  Duluth: 28.533335,
  Grand Rapids: 23.983335,
  Morris: 34.699995,
  University Farm: 31.383335,
  Waseca: 47.949995}

pv.nest(barley) means that we are going to create an associative array based on barley with a hierarchy. (tree)
.key(function(d) d.site) means that they first key in the hierarchy of that tree will be the site. So, we are going to run an operation on all the entries of “barley” with the same site.
That operation is called by .rollup(median ) at the end of the line: this crunches all the entries and replace them by the result of the function “median” applied to all those records.

var variety = pv.nest(barley).key(function(d) d.variety).rollup(median);

This does the same as above, but by variety instead of by site.

{
  Glabron: 32.4,
  Manchuria: 30.96667,
  No. 457: 33.966665,
  No. 462: 30.45,
  No. 475: 31.066665,
  Peatland: 32.383335,
  Svansota: 28.550005,
  Trebi: 39.199995,
  Velvet: 32.149995000000004,
  Wisconsin No. 38: 36.95
}
/* Nest yields data by site then year. */
  barley = pv.nest(barley)
    .key(function(d) d.site)
    .sortKeys(function(a, b) site[b] - site[a])
    .key(function(d) d.year)
    .sortValues(function(a, b) variety[b.variety] - variety[a.variety])
    .entries();

This will transform the variable barley, which is now a flat list of records, into a tree form.
pv.nest(barley) indicates we are turning barley into a tree.
.key(function(d) d.site) says that the first order of the hierarchy will be “site”.
So our tree should look like:

[{key:”Crookston”, values: {….},
  {key:”Duluth”, values:{…},
…

Well, the order of the keys may not be alphabetical, thanks to the next statement.
sortKeys sorts the keys using a comparator function: this function(a,b) thing goes through all the pairs of keys and will order them according to that function, so if for a key the value of site is higher than for another one, it will be put first.
As a result, the first key should be Waseca, then Crookston, etc.
The next order of hierarchy will be after year. No sortkeys here, so the values of keys will just be presented in natural order.
So our tree will look like:

[
{ key: “Waseca”, values: [
	{key: 1931, values: [ … ]},
	{key: 1932, values: [ … ]}
  ]
}, 
{key: “Crookston”, values: [
	{key: 1931, values: [ … ]},
	{key: 1932, values: [ … ]}
  ]
}, etc.

The next statement will rank the values.
What is in that values field (where the ellipses are) will now be ranked using another comparator function, on variety this time.
The sortKeys statement worked on keys: “Crookston”, “Duluth”, etc. so one could write directly site[a] and get a value.
But this sortValues statement works on entries (fields like: { yield: 48.86667, variety: “Manchuria”, year: 1931, site: “Waseca” } ), so one can’t write variety[a] but instead, variety[a.variety].

Finally, the last statement, entries(), says that the values thing should be filled by the actual records, as opposed to using rollup in order to crush an aggregate value from the records.

So the final tree will look like this:

[{key: “Waseca”, values: [
 {key: 1931, values: 	[{site: "Waseca", variety: "Trebi", year: 1931, yield: 63.8333},
 {site: "Waseca",variety: "Wisconsin No. 38", year: 1931, yield: 58.8},
 {site: "Waseca", variety: "No. 457", year: 1931, yield: 58.1},
…
]},
{key: 1932, values: 	[…]}
]}, 
{key: “Crookston”, values: […]}, 
…
{key: “Grand Rapids”, values: […]}]

The next block of statements is more straightforward.

/* Sizing and scales. */
var w = 242,
    h = 132,
    x = pv.Scale.linear(10, 70).range(0, w),
    c = pv.Colors.category10();

w and h are constants for the width and heights of the cells,
x is a scale to transform the yields in horizontal coordinates, so 10 will be represented at the left-most sied of the cell and 70 at the right-most side,
c is the standard color palette.

Now, we create the panels.

/* The root panel. */
var vis = new pv.Panel()
    .width(w)
    .height(h * pv.keys(site).length)
    .top(15)
    .left(90)
    .right(20)
    .bottom(25);

/* A panel per site-year. */
var cell = vis.add(pv.Panel)
    .data(barley)
    .height(h)
    .top(function() this.index * h)
    .strokeStyle("#999");

First we create the root panel. The height of the panel is determined by the number of sites.
Since site is an associative array, we first derive an array of the same length with pv.keys(site), which contains the names of the keys of that associative array (in plain English, the names of the sites). What we need is just the number of them, that’s what the length property is for.
(btw, we could have simply written barley.length)
We multiply that number by the height of each cell to get the height of the root panel.

We then create panels inside that rootpanel.
Note the simplicity with which data is pushed into those panels:

    .data(barley)

Now barley is an array of 6 objects. So we are creating 6 panels, and their data element will be of the form:

{key: “Waseca”, values: [{key: “1931”, values:  [entries]}, {key:”1932”, values: [entries]}]}

Since these cells will be positioned from the root panel, their top value is determined using a simple function of this.index, so the 1st one will be right on top, the next one will be at h pixels from the top, the next one at 2*h pixels, etc.

/* Title bar. */
cell.add(pv.Bar)
    .height(14)
    .fillStyle("bisque")
  .anchor("center").add(pv.Label)
    .text(function(site) site.key);

This just adds a title to the cells. Here, we use a bar, but it could have been a panel.
We only specify the height: the top and left value are supposed to be 0 and the width is that of the parent. We choose the svg color “bisque” as the background color and we add a label in the middle.
Here’s how we obtain the text. Yes it is the site name, but this has nothing to do with the choice of “site” as the variable name in the accessor function.
That function just takes data from the data property, which is directly inherited from the parent. This is a complex associative array, but at its first level, the key property corresponds to the site name. So, function(site) site.key returns the site name.

/* A dot showing the yield. */
var dot = cell.add(pv.Panel)
    .data(function(site) site.values)
    .top(23)
  .add(pv.Dot)
    .data(function(year) year.values)
    .left(function(d) x(d.yield))
    .top(function() this.index * 11)
    .size(12)
    .lineWidth(2)
    .strokeStyle(function(d) c(d.year));

Here’s the data representation.
We first add a group of panels called dot to the panel cell.
This is a group, and not a single panel, because of what they get through the data method: the content of the values key of their parent.
Again, the data property of their parent, cell, is of the form:

{key: site name, values: [{key: “1931”, values: […]}, {key:”1932”, values: […]}]}

So if we isolate the content of the values key, we have an array of 2 values:

[{key: “1931”, values:[…]} , 
{key: “1932”, values:[…]} ]

Therefore, this statement creates 2, not 1, panels.
But those panels are superposed: the only positioning instruction is that they are 23 pixels from the top, so they are both taking the full width of the cell panel, and go to the bottom of that panel as well.
The system will first draw the first one, then the second one on top of that.

Then, we add a dot object to those dot panels, which is a series of circles.
How many dots will there be in those series? This, again, depends on the contents of the data method.
What we get this time is:

    .data(function(year) year.values)

What this means is that we are looking at the data element of the parent, and we are taking what’s behind values.

The data element of the parent was of the form:

{key: year name, values: 
  [
   {site: site name, variety: variety name, year: year name, yield: yield value},
   {site: site name, variety: variety name, year: year name, yield: yield value}, 
   {site: site name, variety: variety name, year: year name, yield: yield value},
…
   {site: site name, variety: variety name, year: year name, yield: yield value}
  ]
}

So what we’re getting is the array of 10 entries that were behind values.

.left(function(d) x(d.yield))

This means that each mark will be positioned from the left, according to the value of its yield property, once transformed by our scale.

.top(function() this.index * 11)

And they are simply positioned vertically depending on this.index, so one on top of the other. Remember, this is relative to the dot panel, not to cell.

.size(12)
.lineWidth(2)

We are determining a size and a line width, but no fill style, so we are drawing rings, not discs. Even if they are superposed, the back one should still be visible.

.strokeStyle(function(d) c(d.year));

Finally, we are determining the color of the ring. For that, we extract the year of the current entry using d.year. It is converted to a color using the palette c.
There are other ways to get the year, but this is the simplest to write.

Now if we consider the statement in its entirety again:

/* A dot showing the yield. */
var dot = cell.add(pv.Panel)
    .data(function(site) site.values)
    .top(23)
  .add(pv.Dot)
    .data(function(year) year.values)
    .left(function(d) x(d.yield))
    .top(function() this.index * 11)
    .size(12)
    .lineWidth(2)
    .strokeStyle(function(d) c(d.year));

What is really assigned to dot is not the first item created (panels) but the dot chart proper. This is important to note that for the rest:

/* A label showing the variety. */
dot.anchor("left").add(pv.Label)
    .visible(function() !this.parent.index)
    .left(-1)
    .text(function(d) d.variety);

What we’re trying to do here is to write legends for the dot charts. We don’t want to do this for each year: the labels would be on top of each other, and less legible than if they were just one group of labels. This is why we use this visible statement. this.parent refers to a the parent panel of dot, so this.parent.index is worth 0 for the first panel, and 1 for the second. So here, !this.parent.index will be false for all values other than 0, so there will be only 1 set of labels, even if we add a year for instance.

Then, although we said we were adding the labels just to the left of the dot (this is what dot.anchor(“left”) does), what we really want is to have them on the left of the cell.
That’s why we use

.left(-1)
.text(function(d) d.variety)

runs on the data provided by the parent, in this case the dot chart. So it’s just the variety of the current entry.

/* X-ticks. */
vis.add(pv.Rule)
    .data(x.ticks(7))
    .left(x)
    .bottom(-5)
    .height(5)
    .strokeStyle("#999")
  .anchor("bottom").add(pv.Label);

This just adds a series of 7 ticks to the bottom of the chart. They are rules, but their height has been limited to 5 pixels. They start from below the chart – bottom(-5) – and we add the ticks to that below with anchor(“bottom”). Since all the values are 2-digit numbers, we can leave the default versions for the pv.Label objects, and not go into tickFormat or anything fancy.

/* A legend showing the year. */
vis.add(pv.Dot)
    .extend(dot)
    .data([{year:1931}, {year:1932}])
    .left(function(d) 170 + this.index * 40)
    .top(-8)
  .anchor("right").add(pv.Label)
    .text(function(d) d.year);

Finally, the legend of the chart, in form of a dot chart.
The first .extend(dot) statement just says that we are copying the properties of the other dot chart. So, if we decide to change the size of the rings or their thickness, it will be reflected in the legend.
Here, we just pass the data manually. Note that this is added to the root panel (vis), so top(-8) means that it appears over the main chart.

In this example, the authors have started from an unformated, flat list of 120 entries.
Then, they thought of the shape their final visualization will take and the hierarchy of objects needed to accommodate that:
• two superposed dot series, showing the various varieties for a given year/site combination (dot),
• each inside a panel, one for each year,
• which would be part of a parent panel, for each site (cell)
• which would be stacked vertically inside the root panel (vis).

So in order to do that seamlessly, they had to prepare a data variable with that hierarchy:
First by site, then by year, then by variety.

This is what the complex pv.nest command does at the beginning.
Once this is taken care of, the various objects can be added one to the other naturally.

Working with data in protovis – interlude: protovis nesting vs tableau

Protovis, like Tableau, are based on the grammar of graphics framework. In a nutshell, in both environments, a chart designer can map visual attributes (such as x or y dimension, color, shape, etc.) to dimensions of data.

The flat file which Becker’s Barley is based on can be used in Tableau public nearly as is.
Here’s a size-by-size comparison of the results:

How it’s done in Protovis

In protovis, the flat file is nested several times, so that its various elements can be called in a hierarchy from series of dots per variety, to panel per sites to a grander panel. Legend and ticks are added by hand for a perfect finish. Still, some careful planning is required to prepare the data file and to adjust the various elements (choice of colors, sizes, etc.)

How it’s done in Tableau


The data file, unsurprisingly, has 4 dimensions: site, variety, year and yield. In tableau terms, yield is a measure (a numerical dimension) while the other 3 are categorical dimensions. With a few clicks, it is possible to get a result which is similar to the original vis in Protovis. We assign yield to column (horizontal attribute), site and variety to row in that order. We also untick aggregate measures in Analysis, so we get little circles and not big bars. Here, I’ve manually sorted the sites and the varieties.

Conclusion

It is much, much easier to achieve a similar result with Tableau, however using protovis provides a finer control.

Working with data in protovis – part 3 of 5

Previous : Multi-dimensional arrays, inheritance and hierarchy

Short interlude: what can be done with arrays in javascript?

Now that we have a grasp on how arrays work and how they can be used in protovis, let’s take a step back and look at some very useful methods for working with arrays in standard javascript.

Sorting arrays

Using a method called, well, sort(), javascript arrays can be reordered in a variety of ways.

var a = [1, 1.2, 1.7, 1.5, .7, .3];
a.sort();

Without any argument, this will sort the array in ascending order: [0.3, 0.7, 1, 1.2, 1.5, 1.7].

However, we can reorder the array differently with a comparison function, such as:

a.sort(function(a,b) b-a)

Note that this is protovis notation. In traditional javascript, you’d have written a.sort(function(a,b) {return b-a;});

This would sort a in descending order ([1.7, 1.5, 1.2, 1, 0.7, 0.3]). Here is how it works.

The comparison function takes 2 elements in the array. If the first element (corresponding to the first argument, a) should appear first, the function must return a negative number. If the result is positive, the second element is appearing first. With our example, b-a is negative when the first element is greater than the second. So, this function sorts an array in descending order.

Sort also works with arrays of associative arrays. For instance:

var data = [
  {key:"a", value:1},
  {key:"b", value:1.2},
  {key:"c", value:1.7}, 
  {key:"d", value:1.5},
  {key:"e", value:.7},
  {key:"f", value:.3}
];

data.sort(function(a,b) b.value-a.value);

Here, we have to specify which criterion is used to sort the array. In this example, we sort it in descending order by value. But we could have sorted it in ascending order by key, for instance.

The method reverse() turns an array upside down.

var a = [1, 1.2, 1.7, 1.5, .7, .3];

a.reverse(); // gives [0.3, 0.7, 1.5, 1.7, 1.2, 1]

So, a.sort().reverse() will also sort that array in decreasing order.

Extracting sub-arrays

The method slice() will extract a sub-array out of an array, in other words, it will retrieve certain elements. It works with one or two arguments. If you only give one argument, it will give you the last elements from the index you specified.

For instance,

[1, 1.2, 1.7, 1.5, .7, .3].slice(3) // it's [1.5, 0.7, 0.3]

The index 3 corresponds to the 4th element of the array (0,1,2,3) so slice(3) will return the 4th, 5th and 6th elements.

With 2 arguments, you get a sub-array that starts at the first index but ends just before the 2nd one. For instance,

[1, 1.2, 1.7, 1.5, .7, .3].slice(0,3) // gives [1, 1.2, 1.7]

It is also possible to use negative arguments. Instead of counting from the beginning of the array, it means counting from the end.

[1, 1.2, 1.7, 1.5, .7, .3].slice(-1) // result: [0.3]

Turning a string into an array

This can be done via the split() method.

split() normally works with a separator, for instance “07/04/1975″.split(“/”) is ["07", "04", "1975"], but if an empty string is used, then each character becomes an element of the new array. For instance,

"ABCDEFGHIJ".split("") // result: ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"] 

This can be very useful to generate data on the go.

And what methods does protovis has for working with arrays and data?

Many.

Some simple, some very elaborate. In addition to visualization functions protovis has impressive data processing capabilities.

The workhorse: pv.range()

pv.range() is a method that creates an array of numbers.

pv.range(10), for instance, is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], or the 10 first integers starting from 0.
(this is the method I referred to in the example of the previous part).

By using the other arguments, it is possible to generate arrays of numbers that go from one specific index to another, or to change the step. For instance:

pv.range(5,10) // is [5, 6, 7, 8, 9]
pv.range(5,21,5) // is [5, 10, 15, 20]

While this may not look incredible at first sight, pv.range() really shines when associated with other functions or methods such as map().

map()  is an array method that creates a new array by applying a function to each of the element of the array it’s run against. For instance,

var data = pv.range(100).map(Math.random) // creates an array of 100 random numbers.

var data = pv.range(1000).map(function(a) Math.cos(a/1000)) // stores into data 1000 values 
// of cos(x) for all the numbers between 0 and 0.999 by increment of 0.001.

In other words, pv.range() and map() can quickly create very interesting datasets for protovis to visualize.

Simple statistical functions that make life easier

pv.min(), pv.max()

Protovis has functions that extract the maximum or the minimum value of an array.

pv.min([1, 1.2, 1.7, 1.5, .7, .3]) will return the value of the smallest element of the array, in that case 0.3.

Likewise, pv.max returns the value of the greatest element of the array.

Both of these can be used with an accessor function to help retrieve what will be compared.

For instance:

var data = [
  {key:"a", value:1},
  {key:"b",value:1.2},
  {key:"c", value:1.7}, 
  {key:"d", value:1.5},
  {key:"e", value:.7},
  {key:"f", value:.3}
];

pv.max(data, function(a) a.value); // should return 1.7

pv.sum(), pv.mean(), pv.median()

Likewise, protovis has aptly-named methods that return the sum, the mean and the median of the elements of an array. Note that it’s pv.mean() and not pv.average(), though.

pv.sum([1, 1.2, 1.7, 1.5, .7, .3]) // gives  6.4.
pv.mean([1, 1.2, 1.7, 1.5, .7, .3]) // gives 1.0666666666666667
pv.median([1, 1.2, 1.7, 1.5, .7, .3]) // gives 1.1.

Similarly to pv.min() you can use an optional accessor function.

pv.median(data, function(a) a.value);

pv.normalize()

pv.normalize() is a handy method that divides all elements in an array by a factor, so that the sum of all these elements is 1.

pv.normalize([1, 1.2, 1.7, 1.5, .7, .3])
// [0.15625, 0.18749999999999997, 0.265625, 0.234375, 0.10937499999999999, 0.04687499999999999]

pv.sum(pv.normalize([1, 1.2, 1.7, 1.5, .7, .3])) // result: 1.

Combining arrays

pv.blend()

pv.blend() simply turns an array of arrays into a simpler array where all elements follow each other.

pv.blend([["a", "b", "c"], [1,2,3]]) // gives  ["a", "b", "c", 1, 2, 3]

One benefit is that it is possible to run methods on this new array that wouldn’t work on an array of arrays.

For instance:

pv.max([[1,2],[3,4],[5,6]]) // result : NaN
// protovis cannot guess how to compare [1,2], [3,4] or [5,6] without any further instruction.
pv.max(pv.blend([[1,2],[3,4],[5,6]])) // result: 6.
pv.max([[1,2],[3,4],[5,6]], function(a) pv.max(a)) // also returns 6.

pv.cross()

Given two arrays a and b, pv.cross(a,b) returns an array of all possible pairs of elements.

pv.cross(["a", "b", "c"], [1,2,3]) // returns 
//[[“a”,1],[“a”,2],[“a”,3],[“b”,1],[“b”,2],[“b”,3],[“c”,1],[“c”,2],[“c”,3]]

pv.transpose()

pv.transpose() flips an array of arrays.

In our previous example –

pv.cross(["a", "b", "c"], [1,2,3])

The result was a 9×2 array. [[“a”,1],[“a”,2],[“a”,3],[“b”,1],[“b”,2],[“b”,3],[“c”,1],[“c”,2],[“c”,3]]

If we apply pv.transpose, we get a 2×9 array:

pv.transpose(pv.cross(["a", "b", "c"], [1,2,3])); // result:
//[["a","a","a","b","b","b","c","c","c"]
//[1,2,3,1,2,3,1,2,3]]

Putting it all together

This short example will show how one can work from an unprepared dataset.
For the purpose of this example, I’m retrieving GDP per capita data of OECD countries. I have a table of 34 countries on 10 years from 2000 to 2009. The countries are in alphabetical order.
I would like to do a bar chart showing the top 12 countries ranked by their GDP per capita. How complex can this be? With the array methods, not so difficult.

var data=[
["Country",2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009],
["Australia",28045.6566279103, 29241.289462815, 30444.7839453338, 32079.5610505727, 33494.6654789274, 35091.7414600449, 37098.3968669994, 39002.0335375021, 39148.4759391917, 39660.1953738791],
["Austria",28769.5911274323, 28799.1968990511, 30231.0955219759, 31080.6122212092, 32598.0698959222, 33409.3453751649, 36268.9604225683, 37801.7320979073, 39848.9481928628, 38822.6960085478],
["Belgium",27624.4454637489, 28488.7963954409, 30013.7570016445, 30241.724380444, 31151.6315179167, 32140.9239149283, 34159.4723388042, 35597.2980660662, 36878.9710906881, 36308.4784216699],
["Canada",28484.9776553362, 29331.8223245112, 29911.3226371905, 31268.5521058006, 32845.6267954986, 35105.9892865177, 36853.5554314586, 38353.0812084648, 38883.0700787846, 37807.5259929247],
["Chile",9293.72322249492, 9712.95685376592, 9973.25946014534, 10475.6486664294, 11299.9329380664, 12194.1443507847, 13036.1285137662, 13896.6946083854, 14577.5715215544, 14346.4648487517],
["Czech Republic",14992.3582327884, 16173.6387263985, 16872.3146139754, 17991.9331910401, 19303.9208098529, 20365.7571142986, 22349.7033747705, 24578.7151794413, 25845.0252652635, 25530.2001372786],
["Denmark",28822.3343389384, 29437.5232757263, 30756.3323835463, 30427.5149228283, 32301.3394496912, 33195.8834383121, 36025.8628586246, 37730.736916861, 39494.1216134705, 37688.2516868117],
["Estonia",9861.80346448344, 10693.0362387373, 11966.6746413277, 13370.127724177, 14758.4181995435, 16530.7311083966, 19134.4130457855, 21262.1415030861, 21802.2185996096, 19880.2786070771],
["Finland",25651.0005450286, 26518.293212524, 27509.1023963486, 27592.2031438268, 29855.387224753, 30689.9753400459, 33095.030855128, 36149.32963148, 37795.237097856, 35236.9462175119],
["France",25272.4127936247, 26644.8404266553, 27776.4225007831, 27399.5459524877, 28273.6330332193, 29692.129719144, 31551.3458452695, 33300.5367942194, 34233.0770304046, 33697.9698933558],
["Germany",25949.0588445755, 26855.1296905696, 27587.1573355486, 28566.8930562768, 29900.6507502683, 31365.576121107, 33713.1689185469, 35623.4147543745, 37170.693412872, 36339.9009091421],
["Greece",18410.2161113813, 19928.6672542964, 21597.5997816753, 22701.7754699285, 24088.2202877303, 24571.6085727425, 26917.7358531616, 28059.5724529461, 29919.6385697499, 29121.8306842272],
["Hungary",12134.0140204384, 13576.4413471795, 14765.2333755651, 15424.3926636975, 16316.8885518624, 16937.9383138957, 18329.1579647528, 19187.2829138188, 20699.9056965458, 20279.5060192879],
["Iceland",28840.4899286393, 30443.907492292, 31083.7723760522, 30768.0910155965, 33697.6723843766, 35025.1338769129, 35807.880160537, 37178.9762849327, 39029.2396691797, 36789.0728215933],
["Ireland",28695.0931156375, 30524.3712319885, 33052.4517849575, 34525.4148515126, 36511.518217243, 38622.5520655621, 42267.9784043449, 45293.6482834027, 42643.6362934458, 39571.204030073],
["Israel",23496.1573531847, 23455.1048827219, 23534.8524310529, 22270.5507137525, 23650.3373360675, 23390.3809761211, 24960.0508863672, 26583.387317862, 27679.3777353666, 27661.1271269539],
["Italy",25594.3456589799, 27127.4298830414, 26803.97019843, 27137.5365317475, 27416.0961790796, 28144.0379218171, 30224.2469223694, 31897.7434207435, 33270.7464832409, 32407.5033618371],
["Japan",25607.7204586644, 26156.1615068187, 26804.9303541079, 27487.123875677, 29020.8955443678, 30311.5281317716, 31865.2733339475, 33577.1846704038, 33902.3807335349, 32476.7736535378],
["Korea",17197.0800334388, 18150.876787578, 19655.5961579558, 20180.932195274, 21630.1629926488, 22783.2288432845, 24286.1803272315, 26190.6122298674, 26876.6422899926, 27099.9481192024],
["Luxembourg",53645.7229595775, 53932.4594967082, 57559.2104243171, 60723.9861616145, 65021.6940738929, 68372.258619455, 78523.2988291034, 84577.2190776183, 89732.070876649, 84802.9716853677],
["Mexico",10046.1268819126, 10135.7265293228, 10397.8915415223, 10883.8643666506, 11535.0108530091, 12460.5385178945, 13672.6034191998, 14581.5178679935, 15290.896575069, 14336.6153896301],
["Netherlands",29405.5490140191, 30788.2508953647, 31943.4974056501, 31702.5798119046, 33208.8592768987, 35110.6577619843, 38063.6877336725, 40744.3819227526, 42887.4361825772, 40813.047575264],
["New Zealand",21113.9167562199, 22128.8483527737, 23115.0676326844, 23826.8007737809, 24775.7827578032, 25460.2942993256, 27277.7654223562, 28701.3328788203, 29231.151567007, 29097.307061683],
["Norway",36125.7036078917, 37091.7150478501, 37051.9473705873, 38298.8631304819, 42257.5943785044, 47318.7844737929, 53287.9750020125, 55042.2093246253, 60634.9819502298, 55726.7456848212],
["Poland",10567.0016558591, 10950.4739587211, 11562.6245018264, 11985.0042955849, 13014.5996003514, 13785.7656450352, 15067.4402978586, 16762.2045951617, 18062.1834289945, 18928.8205162032],
["Portugal",17748.6930918065, 18464.7472033934, 19088.1764760703, 19392.2862101711, 19796.1891048601, 21294.1660710906, 22869.9272824917, 24122.9886840593, 24962.2959136026, 24980.061318413],
["Slovak Republic",10980.1169029929, 12070.70145551, 12965.6228450353, 13597.9556337584, 14659.0096334623, 16174.4605836956, 18397.5822518716, 20916.5438777989, 23241.0972994031, 22870.8998469728],
["Slovenia",17468.5101911515, 18342.8685005009, 19702.0589885459, 20448.2040741134, 22200.7770986247, 23493.9370882038, 25432.3240768864, 27228.3376725332, 29240.5471362484, 27535.6540786883],
["Spain",21320.0690041202, 22591.3855327033, 24066.5003000439, 24748.2819467008, 25958.1017090553, 27376.7644983649, 30347.9357063518, 32251.8469529105, 33173.3334547015, 32254.0895139806],
["Sweden",27948.4749835611, 28231.3958224236, 29277.7736264795, 30418.0441004337, 32505.646687298, 32701.4327422078, 35680.1787996674, 38339.6698693908, 39321.3014501815, 36996.1394001578],
["Switzerland",31618.0958082618, 32103.4812665805, 33390.9169108411, 33266.3042144918, 34536.8476164193, 35478.0395058126, 39116.3945212067, 42755.5569054368, 45516.5601956426, 44830.3719226918],
["Turkey",9169.71780977686, 8613.21159842998, 8666.90348260238, 8790.01048446305, 10166.0963401732, 11391.3768057053, 12886.6195973915, 13897.3820310325, 14962.4944915831, 14242.7276329837],
["United Kingdom",26071.3066882302, 27578.2857038985, 28887.5850874466, 29848.7319363944, 31790.9933025905, 32724.4057819337, 34970.5193042987, 35719.2060247462, 36817.493156774, 35158.8005888247],
["United States",35050.1738557741, 35866.2624634202, 36754.5543204007, 38127.524970345, 40246.0630591955, 42466.1326203714, 44594.9199470326, 46337.2237397566, 46901.0697730874, 45673.7445647402]
];

var year=10;
var rows = data.slice(1,data.length);

var x = pv.Scale.ordinal(pv.range(12)).splitBanded(0, 240, 4/5);
var y = pv.Scale.linear(0, 80000).range(0, 170);

var vis = new pv.Panel()
    .width(250)
    .height(200)
    ;
	vis.add(pv.Bar)
		.data(rows.sort(function(a,b) b[year]-a[year]).slice(0,12))
		.bottom(10)
		.width(x.range().band)
		.height(function(d) y(d[year]))
		.left(function() 5+x(this.index))
		.anchor("bottom").add(pv.Label).textAngle(Math.PI/2).text(function(d) d[0]).textBaseline("middle").textAlign("right")
		;
vis.render();

here, in the data variable, I’ve put my 2-dimensional table as I got it. The first row contains headers which I won’t need. So, in line 40, I create rows which just removes that 1st line with a slice function. data.slice(1,data.length) effectively keeps all the lines but the first.
In the next few lines I create two scales for placing my bars, a standard vis panel and a bar chart. Now, what kind of data will I pass to the bar chart?
I want to sort the rows by the value of the latest year (which happens to be the 11th column, so 10). This is what the rows.sort(function(a,b) b[year]-a[year]) part of the statement does. I’ve simply assigned 10 to the variable year, so if I want to display other years (with an HTML form for instance) it wouldn’t be difficult to modify.
And, since I only want the top 12 values, I just write .slice(0,12) after that.

In line 55, I just add a label. pv.Label inherits the data of its parent. The data item is the whole row of my 2-dimensional table, so if I write function(d) d[0], I am referring to the left-most item which will be the country name.

So, with the use of simple array functions, I can easily rework an unprepared dataset in protovis, rather than having to tailor my dataset with manual (and error-prone) manipulations in external programs. Here is the result:

Next: reshaping complex arrays

Working with data in protovis – part 2 of 5

Previous post: simple arrays

Multi-dimensional arrays, associative arrays and protovis

Even for a simple chart, chances are that you’ll have more than a single series of numbers to represent. For instance, you could have labels to describe what they mean, several series, and so on and so forth.
So, let’s say we want to add these labels to our original examples, so we know what those bars represent.

var categories = ["a","b","c","d","e","f"];
var vis = new pv.Panel()
  .width(150)
  .height(150);

vis.add(pv.Bar)
  .data([1, 1.2, 1.7, 1.5, .7, .3])
  .width(20)
  .height(function(d) d * 80)
  .bottom(0)
  .left(function() this.index * 25)
  .anchor("bottom").add(pv.Label)
    .text(function() categories[this.index]);

vis.render();

While this did the trick, nothing guarantees that the data proper and the category name will stay coordinated. If one data point is deleted or removed and this is not replicated on the categories, they will no longer match. A more integrated way to proceed would be to group category and data information, like this:

var data = [
  {key:"a", value:1},
  {key:"b", value:1.2},
  {key:"c", value:1.7}, 
  {key:"d", value:1.5},
  {key:"e", value:.7},
  {key:"f", value:.3}
];
var vis = new pv.Panel()
  .width(150)
  .height(150);

vis.add(pv.Bar)
  .data(data)
  .width(20)
  .height(function(d) d.value * 80)
  .bottom(0)
  .left(function() this.index * 25)
  .anchor("bottom").add(pv.Label)
    .text(function(d) d.key);

vis.render();

This time, we group the values and the category names in a single variable, an array of associative arrays.
When drawing the bar chart, protovis will go through this array and retrieve an associative array for each bar.
We have to change the way the height function is written. The data element being accessed is no longer of the form 1 or 1.7, but {key:”a”, value:1} or {key:”c”, value:1.7}. So to get the number part of it, we must write d.value.

Likewise, instead of accessing an array of categories for the text part, we can use the current data element via an accessor function, and write d.key.

Hierarchy and inheritance

So we’ve seen that arrays, or associative arrays, can have several levels and can be nested one into another.
Interestingly, protovis elements, like panels, charts, mark etc. also work in a hierarchy. For instance, when you start working with protovis you create a panel object. Then, you add other objects to that panel, like a bar chart (our example), or another panel. You can add other objects to your new objects, or attach them to your first panel.
This diagram shows the hierarchy between elements in the previous example.

var categories = ["a","b","c","d","e","f"];
var vis = new pv.Panel()
  .width(150)
  .height(150)

var bar = vis.add(pv.Bar)
  .data([1, 1.2, 1.7, 1.5, .7, .3])
  .width(20)
  .height(function(d) d * 80)
  .bottom(0)
  .left(function() this.index * 25)
  .anchor("bottom").add(pv.Label)
  .text(function() categories[this.index]);

vis.render();

The bar object is considered to be the child of vis, who is its parent.

You may know that in protovis, children objects inherit properties of their parents.

For instance, if width wasn’t specified for the bar object, it would have the width of its parent, 150. Each mark would cover the whole screen.

For data, when a new object is added, data is either specified at that level, or obtained from the parent element of this object.

Let’s take our example and tweak it a bit.

var vis = new pv.Panel().width(150).height(150);
var bar = vis.add(pv.Bar)
  .data([1, 1.2, 1.7, 1.5, .7, .3]).width(20) .bottom(0)
  .height(function(d) d * 80).left(function() this.index * 25)
  .anchor("top").add(pv.Label)
vis.render();

Here, I didn’t specify a data or a text value for the labels I added. They just took the value of its parent element – the marks of the pv.Bar object.
Here’s another variation:

var vis = new pv.Panel().width(150).height(150);
vis.add(pv.Panel)
  .data([1, 1.2, 1.7, 1.5, .7, .3]) .left(function() this.index * 25)
  .add(pv.Bar).width(20) .bottom(0)
  .height(function(d) d * 80)
  .anchor("top").add(pv.Label)
vis.render();

Here, I’m adding panels, then a bar in each panel.
From the root panel, I’m adding a group of panel with this data: [1, 1.2, 1.7, 1.5, .7, .3].
Since there are 6 elements here, I’m adding 6 panels.
Here, the left method applies to each of the panel. The first one is to the left, the next one is 25 pixels further, etc.
I’m then adding a bar object to each panel. Is that one group of bar? Technically yes, but each has only one element! Each pv.Bar implicitly gets the data element of its parent, so the first bar gets [1], the next one gets [1.2], etc. The height of each bar is determined by multiplying the value of that element by 80.
Note that since the fillStyle properties are not defined for the bars, they get the ones which are attributed by default, which explains the color changes.

Further refinement: accessing the data of its parent!

var vis = new pv.Panel().width(150).height(150);
vis.add(pv.Panel)
  .data([1, 1.2, 1.7, 1.5, .7, .3]) .left(function() this.index * 25)
  .add(pv.Bar).width(20) .bottom(0)
  .height(function(a,b) b * 80)
  .anchor("top").add(pv.Label)
vis.render();

Well, the output is exactly the same, but how I obtained the data is different. Instead of getting the data using the standard accessor function, I passed two arguments: function(a, b).
What this does is that the first argument corresponds to the current data element of this object, and the second, to that of its parent.

In this example, they happen to be the same, but this is how you can access the data of the parent objects.

Putting it all together

Let’s see how we can use protovis and the properties of hierarchy! This example is less trivial than the ones we’ve seen so far but with what we’ve seen it is quite accessible.
The challenge: re-create square pie charts.
How it’s done:

var data=[36,78,63,24],  // arbitrary numbers
cellSize = 16,
cellPadding = 1,
squarePadding=5,
colors=pv.Colors.category20().range()
;

var vis=new pv.Panel()
    .width(4*(10*cellSize+squarePadding))
    .height(10*cellSize)
    ;

var square = vis.add(pv.Panel)
    .data(data)
    .width(10*cellSize)
    .height(10*cellSize)
    .left(function() this.index*(cellSize*10+squarePadding))
    ;

var row = square.add(pv.Panel)
	.data([0,1,2,3,4,5,6,7,8,9])
	.height(cellSize)
	.bottom(function(d) d*cellSize)
	;

var cell = row.add(pv.Panel)
    .data([0,1,2,3,4,5,6,7,8,9])
    .height(cellSize-2*cellPadding)
    .width(cellSize-2*cellPadding)
    .top(cellPadding)
    .left(function(d) d*cellSize+cellPadding)
    .fillStyle(function(a,b,c) (b*10+a)<c?colors[this.parent.parent.index].color:"lightGrey")
;

square.anchor("center")
  .add(pv.Label)
    .text(function(d) d)
    .textStyle("rgba(0,0,0,.1)")
    .font("100px sans-serif");

vis.render();

First, we initiate the data (4 arbitrary numbers from 1 to 100), and various parameters which will help size the square pie – size of the cells, space between them, space between the square pie charts. We also initiate a color palette.
Then, we are going to create 4 panels or groups of panels, each a child of the previous one.
First comes the vis panel, which groups everything,
Then the square panels, which correspond to each square pie. This is to this panel that our data is assigned.
Then come the row panels, and, finally, the cell panels.
The numbers which we want to represent are assigned to the square panel. So, what data are we passing to the row and the cell panels? The only thing we want is to have 10 rows and 10 cells per row. So, we can use an array with 10 items. We are going to use [0,1,2,3,4,5,6,7,8,9] so the data value of the row and that of the cell will correspond to the coordinate of the row and the cell, respectively. In other words, the 5th row will be assigned the data value of 4, and the 7th cell in that row will get the data value of 6. We could retrieve the same numbers using “this.index” but this can lead to obfuscated formulas.

Note that in the next part of the tutorial, we’ll see that in Protovis, there is a more elegant way to write [0,1,2,3,4,5,6,7,8,9] or similar series of numbers. But, we’ll leave this more explicit form for now.

Back to our row panel. We position it with bottom(function(d) d*cellSize). Here, again, d represents the rank of the row, so the 1st row will get 0, and its bottom value will be 0, the next row will get 1, and its bottom value will be 1*cellSize or 16, etc.

Likewise, in the cell panel, the cells are positioned with left(function(d) d*cellSize+cellPadding). This is the same principle. (here, cellPadding is just used to fine-tune the position of the cell).

This is in the final line that we are really going to use hierarchy.

.fillStyle(function(a,b,c) (b*10+a)<c?colors[this.parent.parent.index].color:"lightGrey")

here, a represents the data value of the cell – in other words, the column number.
b, the data value of the cell’s parent, the row. This is the row number.
and c is the data value of the parent of the parent of the row, the square – this is the number that we are trying to represent.

so, what we are going to determine is whether b*10+a<c. If that’s the case, we color the cell, else, we leave it in pale grey. To get the color we need, we go back to the palette that we defined at the beginning, and take the color corresponding to the square number (0 to 3 from left to right).
The square number can be obtained by this.parent.parent.index.

Finally, we add the numbers in large transparent gray digits on top of the squares.

Here is the result:

Next: Javascript and protovis array functions

Working with data in protovis – part 1 of 5

When I started using protovis I had only a very basic knowledge of javascript, which in theory isn’t a problem as protovis is meant to be learned by example, and as it has its own logic and structure which is different from typical javascript code. So I started by looking and modifying examples which was enough to do basic stuff.
But I soon felt limited by what hid behind a single property: data. I knew that protovis had lots of features to manipulate and process data but they were not obvious from the examples.

I mean,

var vis = new pv.Panel()
.width(150)
.height(150);

vis.add(pv.Bar)
.data([1, 1.2, 1.7, 1.5, .7, .3])
.width(20)
.height(function(d) d * 80)

vis.render();

Here, it’s pretty obvious that the bars represent the values 1, 1.2, 1.7, 1.5, 0.7 and 0.3 respectively. One can infer that the sizes of bars are 25 pixels wide and 80 times their value long.

But protovis doesn’t usually look like this “hello world” kind of example, but rather like this:

/* Compute yield medians by site and by variety. */
function median(data) pv.median(data, function(d) d.yield);
var site = pv.nest(barley).key(function(d) d.site).rollup(median);
var variety = pv.nest(barley).key(function(d) d.variety).rollup(median);
/* Nest yields data by site then year. */
barley = pv.nest(barley)
    .key(function(d) d.site)
    .sortKeys(function(a, b) site[b] - site[a])
    .key(function(d) d.year)
    .sortValues(function(a, b) variety[b.variety] - variety[a.variety])
    .entries();
[. . .]
/* A panel per site-year. */
var cell = vis.add(pv.Panel)
    .data(barley)
    .height(h)
    .top(function() this.index * h)
    .strokeStyle("#999");

What just happened? pv.nest, key, rollup, sortKeys, entries – what could that do?

To go beyond merely touching up examples, and do your own visualizations from scratch, it is important to get a good grip on how to feed protovis with data. In order to do so, you need a few javascript notions.

Arrays, arrays, how do they work?

In javascript, an array is an ordered list of stuff.

In our initial example, we had one such list:

[1, 1.2, 1.7, 1.5, .7, .3]

Anything can be put in an array: numbers, strings, Booleans (true/false values), objects … including other arrays. All elements of an array don’t have to be of the same type. Arrays can be assigned to a variable.

var a = [1, 1.2, 1.7, 1.5, .7, .3];

Elements of the array can be accessed using the [] notation. In javascript, indices start at 0, so the first element of an array can be obtained so:

a[0];

This returns 1. Javascript has many functions to create and manipulate arrays, which we will talk about later. For the time being, let’s look at arrays of arrays. If we wrote instead:

var a = [[1, 1.2], [1.7, 1.5], [.7, .3]];

a is now an array of arrays, or “multi-dimensional array”.

a[0] is now worth [1, 1.2]. To access the first number of the array, one has to write a[0][0], which will return the first element (1) of the first element ([1, 1.2]) of a.

Javascript also has another type of array called associative arrays, where values are assigned to keys instead of an index. For instance,

var a = {yield: 27.00000, variety: "Manchuria", year: 1931, site: "University Farm"};

is an associative array. To access a value, one can use a . operator:

a.yield

will retun 27.

a["yield"]

also works.

Like other variable types, it is possible to have an array of associative arrays. In fact, this is used quite often in protovis.

Protovis and arrays – deconstructing the first example

The reason why I introduced javascript arrays is that the data property requires an array. Protovis then loops through that array, performing operations on each of its elements. To that end, it uses things such as accessor functions and properties of an object called this.

To explain all of this let’s go back to the first example and analyse it line by line.

var vis = new pv.Panel()
  .width(150)
  .height(150);
vis.add(pv.Bar)
  .data([1, 1.2, 1.7, 1.5, .7, .3])
  .width(20)
  .bottom(0)
  .height(function(d) d * 80)
  .left(function() this.index * 25);
vis.render();

The first 3 lines create a panel, which is like the sheet of paper on which protovis will draw the chart. Its width and height properties must be filled, as they are 0 by default which would make the whole visualization invisible.

The next line adds a bar chart to this panel we’ve just created.

The line after specifies the data on which to work: here comes our array. Here, we have written the array literally in the data property, but nothing prevents us to assign it to a variable first and to pass the variable instead.

The next line, and the line with the bottom property, assign constant numbers to these properties. It means that all the bars will have a width of 20 pixels, and they will all be aligned with the bottom of the panel – that’s what

bottom(0)

does.

Now let’s look at the two remaining lines:

.height(function(d) d * 80)
.left(function() this.index * 25);

The first line uses an accessor function. What this does is that it looks at the current element, and perform an operation on it, the result of which will be the height of that element.

In proper javascript, we would have written:

function(d) {return d*80;}

but protovis uses a shorthand notation that allows us to omit curly braces and the return statement. By the way, d in the function is completely arbitrary, and could be any variable name –

function(a) a*80

also works. It’s just that the name of the variable between parentheses will represent the value of the current element.

The second line uses the this object. this represents what protovis is working on at the moment, and it has properties that can be used. The most commonly used is index: this.index returns the position of the current element in its array, so it is going to be: 0 for the first bar, 1 for the next one, etc.

So this line specifies that each new bar should start every 25 pixels from the left border of the panel.

You may wonder, why not write

.left(this.index * 25);

and omit the function()? Well, function() means that the content of the property gets re-evaluated. If we had omitted it, this.index * 25 would have been computed once (for a result of 0) and that value would have been used for all the bars.

By the way, instead of writing the height property as it is, we could have written:

.height(function()[1, 1.2, 1.7, 1.5, .7, .3][this.index] * 80)

Using an accessor function is shorter and clearer.

Next: Multi-dimensional arrays, inheritance and hierarchy

Working with data in protovis

For the past year or so I have been dabbling with protovis. I don’t have a heavy CS background but protovis is supposedly easy to pick up for people like me, who are vaguely aware that computers can make calculations but who need to check the manual for the most mundane programming instructions.

I found was while it’s reasonnably easy to modify the most basic examples to make stuff happen, it is much harder to understand or adapt the more complex ones, let alone to create a fairly complex visualization.

The stumbling block for me was the use of the method data. Data is used to feed all the other protovis methods with, well, data. In the simplest examples, the data which is passed is very plain and simple and so easy to understand. But, for slightly more advanced uses, the shape of data is increasingly complex and the very powerful methods that protovis uses to process and reshape data are just out of reach for a noob.

So I started documenting my struggle with data, first for my own use, and eventually realized I could share what I learned. This is it.

I split this tutorial in 5 parts.

  1. First, we’ll look at the humble arrays and how protovis works with them.
  2. Then, we’ll talk about multi-dimensional arrays, associative arrays, hierarchy and inheritance.
  3. Third, we’ll take a break from protovis and look at the javascript methods to work with arrays, such as sorting.
  4. [update] Since I first published the array part I wrote a supplement
  5. We’ll then check out the powerful protovis data processing methods, such as those who reshape complex arrays.
  6. Finally, we’ll see how data must be prepared to work with the built-in layouts, such as treemaps, force-directed layouts etc.

And as a bonus, I have also deconstructed several interesting (but not immediately accessible) examples from the gallery:

To make the best use of this material, it would be helpful to know a bit about protovis. The best ways to get started are:

Now that’s said, en route for the 1st part!