d3: scales, and color.

In protovis, scales were super-useful in just about everything. That much hasn’t changed in d3, even though d3.scale is a bit different from pv.Scale. (do note that d3.scale is in lowercase for starters).

Scales: the main idea

Simply put: scales transform a number in a certain interval (called the domain) into a number in another interval (called the range).
an example of how scales work
For instance, let’s suppose you know your data is always over 20 and always below 80. You would like to plot it, say, in a bar chart, which can be only 120 pixels tall.
You could, obviously, do the math:

.attr("height", function(d) {return (d-20)*2;})

But what if you suddenly have more or less space? or your data changes? you’d have to go back to the entrails of your code and make the change. This is very error prone. So instead, you can use a scale:

var y=d3.scale.linear().domain(20,80).range(0,120);
.attr("height", y)

this is much simpler, elegant, and easy to maintain. Oh, and the latter notation is equivalent to

.attr("height", function(d) {return y(d);})

… only more legible and shorter.
And, there are tons of possibility with scales.

Fun with scales

In d3, quantitative scales can be of several types:

  • linear scales (including quantize and quantile scales,
  • logarithmic scales,
  • power scales (including square root scales)

While they behave differently, they have a lot in common.

Domain and range

For all scales, with the exception of quantize and quantile scales which are a bit different, domain and range work the same.
First, note that unlike in protovis, domain and range take an array as argument. Compare:

var y=pv.Scale.linear().range(20,60).domain(0,120);
var y=d3.scale.linear().range([20,60]).domain([0,120]);

This is because contrary to protovis, where domain could be a whole dataset, in d3, domain contains the bounds of the interval that is going to be transformed.
Typically, this is two numbers. If this is more, we are talking about a polypoint scale: there are as many segments in the intervals as there are numbers in the domain (minus one). The range must have as many numbers, and so as many segments. When using the scale, if a number is in the n-th segment of the domain, it is transformed into a number in the n-th segment of the range.
illustration of a multipoint scale
With this example, 30 finds itself in the first segment of the domain. So it’s transformed to a value in the first segment of the range. 60, however, is in the 2nd segment, so it’s transformed into a value in the 2nd segment of the range.
Also, bounds of domain and range need not be numbers, as long as they can be converted to numbers. One useful examples are colors. Color names can be used as range, for instance, to create color ramps:

var ramp=d3.scale.linear().domain([0,100]).range(["red","blue"]);

This will transform any value betwen 0 and 100 into the corresponding color between red and blue.


What happends if the scale is asked to process a number outside of the domain? That’s what clamping controls. If it is set, then the bounds of the range are the minimum and maximum value that can be returned by the scale. Else, the same transformation applies to all numbers, whether they fall within the domain or not.
Clamping example
Here, with clamping, the result of the linear transformation is 120, but without it, it’s 160.

var clamp=d3.scale.linear().domain([20,80]).range([0,120]);
clamp(100); // 160
clamp(100); // 120

Scales and nice numbers

More often than not, the bounds of the domain and/or those of the ranges will be calculated. So, chances are they won’t be round numbers, or numbers a human would like. Scales, however, come with a bunch of method to address that. d3 keeps in mind that scales are often used to position marks along an axis.


When applied to a scale, the nice method expends the domain to “nicer” numbers. You wouldn’t want your axis to start at -2.347 and end at 7.431, right?
So, there.

var data=[-2.347, 4, 5.23,-1.234,6.234,7.431]; // or whatever.
var y=d3.scale.linear().range([0,120]);
y.domain([d3.min(data), d3.max(data)]); // domain takes bounds as arguments, not all numbers
y.domain() // [-2.347, 7.431];
y.nice() // [-3, 8]


Given a domain, and a number n (which, contrary to protovis, is mandatory in d3), the ticks method will split your domain in (more or less) n convenient, human-readable values, and return an array of these values. This is especially useful to label axes. Passing these values to the scale allows them to position ticks nicely on an axis.

var y=d3.scale.linear([20,80]).range([0,120]);
var ticks=axis.selectAll("line")
  .data(y.ticks(4)) // 20, 40, 60 and 80
  .attr("y1",y).attr("y2",y) // short and simple. 


If used instead of .range(), this will guarantee that the output of the scales are integers, which is better to position marks on the screen with pixel precision than numbers with decimals.


The invert function turns the scale upside down: for one given number in the range, it returns which number of the domain would have been transformed into that number.
For instance:

var y=d3.scale.linear([20,80]).range([0,120]);
y(50); // 60
y.invert(60); // 50

That’s quite useful, for instance, when a user mouses over a chart, and you would like to know to what value the mouse coordinates correspond.

Power scales and log scales

The linearscale is a function of the form y=ax+b which works for both ends of the domain and range. In the example we’ve used most often until now, this function is really f(x): y=2x-40.
Power and logarithm scales work the same, only we are looking for a function of the form y=axk+b, or y=a.log(x)+b.
For the power scales, you can specify an exponent (k) with the .exponent() method. For instance, if we specify an exponent of 2, here is what the scale would look like:
an example of a power scale
The equation is now f(x): y=x²/50-8. So 20 still becomes 0 and 80 still becomes 120, but other than that the values at the beginning of the domain would be lower than with the linear scale, and those at the end of the scale will be higher.
For convenience, d3 includes a d3.scale.sqrt() (the square root scale) so you never have to type d3.scale.pow.exponent(0.5) in full.
Also note that if you are using a log scale, you cannot have 0 in the domain.

Quantize and quantile

quantize and quantile are specific linear scales.
quantize works with a discrete, rather than continuous, range: in other terms, the output of quantize can only take a certain number of values.
For instance:

var q=d3.scale.quantize().domain([0,10]).range([0,2,8]); 
q(0); // 0
q(3); // 0
q(3.33); // 0
q(3.34); // 2
q(5); // 2
q(6.66); // 2
q(6.67); // 8
q(8); // 8
q(1000); // 8

quantile on the other hand matches values in the domain (which, this time, is the full dataset) with their respective quantile. The number of quantiles is specified by the range.
For instance:

var q=d3.scale.quantile().domain([0,1,5,6,2,4,6,2,4,6,7,8]).range([0,100]);
q.quantiles(); // [4.5], only one quantile - the median
q(0); // 0
q(4); // 0
q(4.499); // 0
q(4.5); // 100 - over the median
q(5); // 100
q(10000); // 100
q.quantiles(); // [2, 4, 5.6, 6];
q(0); // 0 
q(2); // 25 - greater than the first quantile limit
q(3); // 25
q(4); // 50
q(6); // 100
q(10000); // 100

Ordinal scales

All the scales we’ve seen so far have been quantitative, but how about ordinal scales?
The big difference is that ordinal scales have a discrete domain, in other words, they turn a limited number of values into something else, without caring for what’s between those values.
Ordinal scales are very useful for positioning marks along an x axis. Let’s suppose you have 10 bars to position for your bar chart, each corresponding to a category, a month or whatever.
For instance:

var x=d3.scale.ordinal()
  .domain(["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]) // 7 items
x("Tuesday"); // 34.285714285714285

There are 3 possibilites for range. Two are similar: the .rangePoints() and .rangeBands() methods, which both work with an array of two numbers – i.e. .rangeBands([0,120]). The last one is to specify all values in the range with .range().

rangePoints() and rangeBands()

With .rangePoints(interval), d3 fits n points within the interval, n being the number of categories in the domain. In that case, the value of the first point is the beginning of the interval, that of the last point is the end of the interval.
With .rangeBands(interval), d3 fit n bands within the interval. Here, the value of the last item in the domain is less than the upper bound of the interval.
Those methods replace the protovis methods .split() and .splitBanded().
difference between rangeBands and rangePoints
This chart illustrates the difference between using rangeBands and rangePoints.

var x=d3.scale.ordinal()
x("Saturday"); // 120
x("Saturday"); // 102.85714285714286
x("Saturday")+x.rangeBand(); // 120

the range method

Finally, we can also use the .range method with several values.
We can specify the domain, or not. Then, if we use such a scale on a value which is not part of the domain (or if the domain is left empty), this value is added to the domain. If there are n values in the range, and more in the domain, then the n+1th value of the doamin is matched with the 1st value in the range, etc.

var x=d3.scale.ordinal().range(["hello", "world"]); 
x.domain(); // [] - empty still.
x(0); // "hello"
x(1); // "world"
x(2); // "hello"
x.domain(); // [0,1,2]

Color palettes

Unlike in protovis, which had them under pv.Colors – i.e. pv.Colors.category10(), in d3, built-in color palettes can be accessed through scales. Well, even in protovis they had been ordinal scales all along, only not called this way.
There are 4 built-in color palette in protovis: d3.scale.category10(), d3.scale.category20(), d3.scale.category20b(), and d3.scale.category20c().

A palette like d3.scale.category10() works exactly like an ordinal scale.

var p=d3.scale.category10();
var r=p.range(); // ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd", 
                      // "#8c564b", "#e377c2", "#7f7f7f", "#bcbd22", "#17becf"]
var s=d3.scale.ordinal().range(r); 
p.domain(); // [] - empty
s.domain(); // [] - empty, see above
p(0); // "#1f77b4"
p(1); // "#ff7f0e"
p(2); // "#2ca02c"
p.domain(); // [0,1,2];
s(0); // "#1f77b4"
s(1); // "#ff7f0e"
s(2); // "#2ca02c"
s.domain(); // [0,1,2];

It’s noteworthy that in d3, color palette return strings, not pv.Color objects like in protovis.

d3.scale.category10(1); // this doesn't work
d3.scale.category10()(1); // this is the way.


Compared to protovis, d3.color is simpler. The main reason is that protovis handled color and transparency together with the pv.Color object, whereas in SVG, those two are distinct attributes: you handle the background color of a filled object with fill, its transparency with opacity, the color of the outline with stroke and the transparency of that color with stroke-opacity.

d3 has two color objects: d3_Rgb and d3_Hsl, which describe colors in the two of the most popular color spaces: red/green/blue, and hue/saturation/light.

With d3.color, you can make operations on such objects, like converting colors between various formats, or make colors lighter or darker.

d3.rgb(color), and d3.hsl(color) create such objects.
In this context, color can be (straight from the manual):

  • rgb decimal – “rgb(255,255,255)”
  • hsl decimal – “hsl(120,50%,20%)”
  • rgb hexadecimal – “#ffeeaa”
  • rgb shorthand hexadecimal – “#fea”
  • named – “red”, “white”, “blue”

Once you have that object, you can make it brighter or darker with the appropriate method.
You can use .toString() to get it back in rgb hexadecimal format (or hsl decimal), and .rgb() or .hsl() to convert it to the object in the other color space.

var c=d3.rgb("violet") // d3_Rgb object
c.toString(); // "#ee82ee"
c.darker().toString(); // "#a65ba6"
c.darker(2).toString(); // "#743f74" - even darker
c.brighter().toString();// "ffb9ff"
c.brighter(0.1).toString(); // "#f686f6" - only slightly brighter
c.hsl(); // d3_Hsl object
c.hsl().toString() // "hsl(300, 76, 72)"

d3: adding stuff. And, oh, understanding selections

From data to graphics

the d3 principle (and also the protovis principle)
d3 and protovis are built around the same principle. Take data, put it into an array, and for each element of data a graphical object can be created, whose properties are derived from the data that was provided.

Only d3 and protovis have a slightly different way of adding those graphical elements and getting data.

In protovis, you start from a panel, a protovis-specific object, to which you add various marks. Each time you add a mark, you can either:

  • not specify data and add just one,
  • or specify data and create as many as there are items in the array you pass as data.


How de did it in protovis

var vis=new pv.Panel().width(200).height(200); 
    .left(function() {return this.index*20;})
    .height(function(d) {return d*10;});

this simple bar chart in protovis
you first create a panel (first line), you may add an element without data (here, another panel, line 2), and add to this panel bars: there would be 5, one for each element in the array in line 4.

And in d3?

In d3, you also have a way to add either one object without passing data, or a series of objects – one per data element.

var vis=d3.select("body").append("svg:svg").attr("width",200).attr("height",200);
var rect=vis.selectAll("rect").data([1,4,3,2,5]).enter().append("svg:rect");
rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}

In the first line, we are creating an svg document which will be the root of our graphical creation. It behaves just as the top-level panel in protovis.

However we are not creating this out of thin air, but rather we are bolting it onto an existing part of the page, here the tag. Essentially, we are looking through the page for a tag named and once we find it (which should be the case often), that’s where we put the svg document.

Oftentimes, instead of creating our document on , we are going to add it to an existing <div> block, for instance:

<div id="chart"></div>
<script type="text/javascript">
var vis=d3.select("#chart").append("svg:svg");

Anyway. To add one element, regardless of data, what you do is:

The logic is : d3.select(where we would like to put our new object).append(type of new object).

Going back to our code:

var vis=d3.select("body").append("svg:svg").attr("width",200).attr("height",200);
var rect=vis.selectAll("rect").data([1,4,3,2,5]).enter().append("svg:rect");
rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}

On line 2, we see a different construct:

an existing selection, or a part of the page
.data(an array)
.append(an object type)

This sequence of methods (selectAll, data, enter and append) are the way to add a series of elements. If all you need to know is to create a bar chart, just remember that, but if you plan on taking your d3 skills further than where you stopped with protovis, look at the end of the post for a more thorough explanation of the selection process.

Attributes and accessor functions

At this stage, we’ve added our new rectangles, and now we are going to shape and style them.

rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}

All the attributes of a graphical element are controlled by the method attr(). You specify the attribute you want to set, and the value you want to give.
In some cases, the value doesn’t depend on the data. All the bars will be 15 pixels wide, and they will all be of the steelblue color.
In some others, the value do depend on the data. We decide that the height of each bar is 20 times the value of the underlying data, in pixels (so 1 becomes 20, 5 becomes 100 etc.). Like in protovis, once data has been attributed to an element, function(variable name) enables to return a dynamic value in function on that element. By convention, we usually write function(d) {…;} (d for data) although it could be anything. Those functions are still called accessor functions.
so for instance:

.attr("height",function(d) {return d*20;})

means that the height will be 20 times the value of the underlying data element (exactly what we said above).
In protovis, we could position the mark relatively to any corner of its parent, so we had a .top method and a .bottom method. But with SVG, objects are positioned relatively to the top-left corner. So when we specify the y position, it is also relative to the top of the document, not necessarily to the axis (and not in this case).
so –

.attr("y", function(d) {return 100-d*20;})

if we use scales (see next post), all of this will have no impact whatsoever anyway.
Finally, there is an attribue here which doesn’t so much depend on the value of the data, but of its rank in the data items: the x position.
for this, we write: function(d,i) {return i*20;}
Here is a fundamental difference with protovis. In protovis, when we passed a second argument to such a function, it meant the data of the parent element (grand parent for the third, etc.). But here in d3, the second parameter is the position of the data element in its array. By convention, we write it i (for index).
And since you have to know: there is no easy way to retrieve the data of the parent element.

Bonus: understanding selections

To add many elements at once we’ve used the sequence: selectAll, data, enter, append.
Why use 4 methods for what appears to be one elementary task? If you don’t care about manipulating nodes individually, for instance for animations, you can just remember the sequence. But if you want to know more, here is what each method does.


the selectAll method
First, we select a point on which to add your new graphical objects. When you are creating your objects and use the selectAll method, it will return an empty selection but based on that given point. You may also use selectAll in another context, to update your objects for instance. But here, an empty selection is expected.


the data method
Then, you attribute data. This works quite similarly to protovis: d3 expects an array. d3 takes the concept further (with the concept of data joins) but you need not concern yourself with that until you look at transitions.
Anyway, at this stage you have an empty selection, based on a given point in the page, but with data.


the enter method
The enter method updates the selection with nodes which have data, but no graphical representation. Using enter() is like creating stubs where the graphical elements will be grafted.


the append method
Finally, by appending we actually create the graphical objects. Each is tied to one data element, so it can be further styled (for instance, through “attr”) to derive its characteristics from the value of that data.


From protovis to d3

You’ve spent some time learning protovis only to find that its development is halted as authors have switched to work on d3. Have your efforts all been in vain? Fear not! This series of posts will help you adapt to d3 with a protovis background.

Before we go anywhere further, let me say that these posts won’t make you awesome at d3 (yet). We won’t be talking about how to do all amazing things you could never do in protovis. Rather, we’ll focus on enabling you to be as comfortable with d3 than you could have been with protovis. And once that’s done, nothing will prevent you from learning the more powerful aspects of d3.

Anyway, if you’re reading this, you are already awesome.

Why should I make the switch to d3?

Frankly, you don’t have to. Protovis is a fine framework and works well. Now you may want to switch to d3 for several reasons.

  • d3 is fast. d3 is better at handling scenes with hundreds or thousands of elements. So if you like scatterplots or network graphs, and who doesn’t, d3 has much stronger performance.
  • d3 does animation. There were workarounds to get animation in protovis but there were that. Workarounds. Animation and transitions are built in d3 and are a snap to implement.
  • More features. Just because development has stopped on protovis doesn’t mean that it has stopped elsewhere… for instance, d3 has more ready-to-use layouts, like voronoi tesselation or chords, and it has more methods and functions to make your life easier, to access and manipulate data for example.
  • Styling. In d3 it is possible to apply style sheets in CSS to graphical elements. This helps keeping the code and the format separate.

Yes but doesn’t everything change?

Short answer: no.

Less short answer: some things do change substantially. Most things stay the same. And then, some things look the same but have changed.

Things that stay the same

  • The general principle.Protovis is about transforming an array of data into the same number of graphical elements, with characteristics derived from that data. d3 does exactly this as well.
  • pv.Nest, which in my personal protovis experience has been the hardest to understand. Only, it’s called d3.nest now.
  • Methods that supplement the existing javascript array manipulation methods, like pv.min, pv.values, pv.entries etc. are also back (as d3.min, d3.values, d3.entries, but you’ve guessed it by now). Some, like pv.mean or pv.median, didn’t make it through but you could easily rewrite them, or continue using the protovis ones.

Things that look different, but which are largely the same

Protovis had a number of native graphical objects, or marks, that could be manipulated at will with methods.

var vis=new pv.Panel()

In protovis, it is inherently different to set the height, the width or just any property of an object. This uses different methods.

var vis=d3.select("body").append("svg:svg");

This produces essentially the same thing. We add a rectangle of a specified height, width, and colors. There are a few differences though. Here, controlling height, width or fill is essentially the same thing and uses the same method, .attr(). Notice also that we first created an svg document, then a shape within that document. And also, that we don’t need to use vis.render(); anymore.

The d3 approach looks longer. But if we define all the style information first we could make it much shorter, shorter than in protovis in fact!
For instance:

var vis=d3.select("body").append("svg:svg").append("svg:rect").attr("class", "myRect");

Much of the apparent differences between d3 and protovis come from using explicitly svg shapes (paths, polygons, ellipses, etc.) as opposed to native objects (pv.Panel, pv.Bar, pv.Dot, etc.), although – it’s essentially the same thing. Yes, you have to learn your SVG but it’s really on a need-to-know basis. In fact, if you’ve worked with protovis, or even if you’ve worked with HTML and CSS, you probably know more SVG than you thought.

SVG is more flexible than protovis objects. The flipside is that constructs which were once simple in protovis become less obvious in SVG. But for those cases, d3 has recreated some native objects, even if not as many as in protovis.

Things that look the same, but which are different

There have been some changes in methods that have kept the same name since protovis – some minor, some more substantial. In any case, the basic ways of using these methods (like scale, color, data…) doesn’t change much. It’s only their more exotic uses who do change.


datavizchallenge.org: my entry

I just posted my www.datavizchallenge.org entry!

So here’s a little explanation about what I’ve done, how and why.

The idea behind the data provider web site, WhatWePayFor, is that billions and millions don’t talk to the average citizen. This doesn’t help them understand where their money goes.

And in my work, I generally try to show as little data as possible to make a point. And I like to use interactivity to involve users.

I got my idea from thinking back to the presentation made by Nick Diakopoulos at last Visweek. We had that workshop on data storytelling and Nick showed how to get user attention on a subject by letting them play and guess first. So here goes.

First, I wanted to let users play with data, in this case try to establish their priorities for federal budget spending. Where should money go?

I’ve manipulated this kind of figures a few times, so I know that USA has extreme positions on some “functions of spending”, like defense, or culture. Some rich countries that spend up to 7% of their budget on culture, and some who spend less than 0.1% of their budget on defense. Hint, the USA is not one of them.

So I’m hoping that the users will see differences between their preferences and the choices that the government makes in their names, and that from the differences will be striking enough for them to understand the orders of magnitude and possibly inspire them to take action.


Building the France Dorling cartogram

This is a follow-up to the previous post where I used a Dorling cartogram to show election results in France. So how did I come up with that?

Step 1: the data

Without data, there wouldn’t be much to plot, right? So the first order of business was to fetch the data, which is theoretically available from an official source. Unfortunately, there is no way to download those results at once, instead they are presented in a myriad of formatted HTML pages – over 2000 in all.

I used wget to download them all. wget allows for recursive download, that is, you can download a page, and all the links from that page, etc. You can also specify the depth of the recursive download, which is a good idea in that case because there are lots of duplicates of that information on the server, so trying to get everything will take a couple of hours vs about 10 minutes.

Once that’s done, I used python and Beautiful Soup to turn the HTML pages into nice JSON objects. Fortunately the pages followed the same strict template, so I could run a relatively simple extraction script, like: look at the second HTML table of the page, get the number of registered voters who didn’t vote from the 2nd cell of the 2nd row, etc.
Once that was done I came up with 2 files: one which listed the aggregate votes per county and per party, and another one which listed all the candidates, their affiliation and their votes. I ended up using only the 1st one, actually at an even more aggregated level, but I may use the second one for further work.

Next up: setting the geography

A dorling cartogram is really a kind of force-directed layout graph where the nodes represent geographical entities, which are initially positioned at a place corresponding to their geography, but which are then resized by a variable, and whose movement is subject to several forces: each node is linked to its original geographical neighbors, which attract each other, but also cannot intersect with other nodes, so it pushes back its neighbors.

So for this to work I had to:

  • find the initial position of my geographical entities (French départements) and
  • list their neighbors.

To get the initial position I fetched the list of all French towns and their lon-lat coordinates which I averaged for each département. That gave me a workable proxy of the centroid of that departement. Note that I didn’t need anything very accurate at this stage, since nodes are going to be pushing each other anyway.
I listed the neighbors manually, since the number of départements is workable. Else, one possible way to do that from a shape file is look at every point of every shape and list all the entities that use it. If two entities use it they are neighbors. Again at this stage it’s not mandatory to be exhaustive. If an entity is surrounded by 8 others and we only use 4 the results will look sensibly the same.
Anyone who wants to create dorling cartograms for France can reuse the file.

Putting it all together

The rest is standard protovis fare. There is an example which uses Dorling cartograms which I used as a basis. I prepared my data files so that I could aggregate the stuff I needed at the level of the département (which is about 20 counties). I never really considered showing anything at the county level because except for looking up specific results it is never too relevant. For instance, in most places there are some agreement between parties which don’t all present candidates in order to favor their allies.

An interesting parameter when building the visualization is choosing the constant by which to multiply the size of the nodes. If it is too high, the cartogram will look like a featureless cluster of circles. If it is too low, it will ressemble the map a lot, and circles won’t touch each other.


More fun with arrays in protovis

In my short tutorial on working with data and protovis I briefly covered some standard javascript and protovis methods to work with arrays. The more I work with Protovis, the more I am convinced that efficient array manipulation is key to achieving just about anything with the library. So, I would like to go into more detail in some javascript methods for building, processing and testing arrays that can really be helpful.

Going through arrays: map and forEach

I said rapidly that the map method was very useful in protovis especially used in combination with pv.range. But that's not very fair to map() to be treated this lightly. Protovis canon examples do not use many traditional loops such as for or while statements. One reason for that is that many constructs in protovis are de facto loops: when we pass an array to protovis as a data file, to create a bar chart for instance (or panels, pie wedges, you name it), it will go through each element of the array to create individul bars (panels, wedges...), to position them, style them, and so on and so forth. This is why it is so important to have your data elements in the best possible shape when you first pass them to protovis. It makes the rest of your code much nicer. Remember our early example:
var categories = ["a","b","c","d","e","f"];
var vis = new pv.Panel()

  .data([1, 1.2, 1.7, 1.5, .7, .3])
  .height(function(d) d * 80)
  .left(function() this.index * 25)
    .text(function() categories[this.index]);

This is not ideal to have values and categories in two separate places, because one could be changed without updating the other. So let's try to use map to create one single variable.
var categories = ["a","b","c","d","e","f"];
var data=[1, 1.2, 1.7, 1.5, .7, .3].map(function(d,i) {return {value:d, key: categories[i]};});
Let's look at our map() method here. First, it's right after an array. It will run against this array, so it will perform an operation on each element of this array, and the result will be another array of the same size with the outcome of each operation in the same order. The next thing here is a function with two arguments: d and i. Again the naming is arbitrary, call them what you want. And they are both optional. But it has to be a function. pv.range(3).map(3) will not return [3,3,3], you need to write pv.range(3).map(function() 3). The first argument refers to the current item of the array map is working on. So 1, then 1.2, etc. If the array is more complex, and is an array of arrays or similar, the current element can be itself an array, an object or anything. It doesn't have to be a number. Here, we want to create an array of associative arrays where the value handle corresponds to the values of the array, and where key corresponds to the category name. So we start our output by "{value: d,". This puts the value of each array element in sequence where we need it to be. The second argument corresponds to the index of the current item in the array, so - 0, 1, 2 etc. This is not unlike using "this.index" in other parts of protovis. This helps us getting the right category name, the one in the same position as the value we are fetching. So we write "key: categories[i]}". The rest of the code can then be changed to :
  .height(function(d) d.value * 80)
  .left(function() this.index * 25)
    .text(function() d.key);
Now how about forEach()? forEach works in a very similar way to map(), the difference is that it doesn't output an array. It's just a function that runs on each element of the array. It can be used to perform an operation a number of times corresponding to the length of that array, for instance.

Testing arrays

There are some times when you need to know whether some or all the elements of your array fulfill a condition. And some other times, you need to be able to extract a subset of your array also on a conditional basis. Now, that would be possible using forEach or map methods as above, but fortunately javascript provides simpler means to achieve that.

Testing a condition on an array at once

There are two methods that do that: every() and some(). every() will return true is the condition is true for, well, every element of the array. some() will return true if the condition is true for at least one element of the array. So, they can also be used to tell if the condition is false for at least one element of the array, or all elements of the array respectively. This is how they work:
[0,1,2].every(function(d) {return d;})
// will return false: 0 is false, 1 is true and 2 is true.
[0,1,2].every(function(d) {return (d<3);})
// will return true. All elements are less than 3.

[0,1,2].some(function(d) {return d;})
// will return true. 1 is true, so at least one element in the array is true.

Creating conditional subsets of an array

It is also possible to get only the elements that fit a condition using the filter() method.
[0,1,2].filter(function(d) {return d;})
// this will return [1,2]. 0 is evaluated as "false".
[1,2,3,4,5].filter(function(d) {return (d>2);})
// this will return [3,4,5].
Of course, the more complex the array, the more interesting these functions get. With the barley array from part 4:
 barley.filter(function(d) {return d.variety=="Manchuria";}
/* this will return: 
  [{"yield":27,"variety":"Manchuria","year":1931,"site":"University Farm"},
   {"yield":32.96667,"variety":"Manchuria","year":1931,"site":"Grand Rapids"},
   {"yield":26.9,"variety":"Manchuria","year":1932,"site":"University Farm"},
   {"yield":22.13333,"variety":"Manchuria","year":1932,"site":"Grand Rapids"},

Visualizing arrays

(without plotting them, of course) When you are manipulating arrays, turning them into maps, performing roll-ups and sorts, you may want to get a glimpse of the array. But, unless it's a single, one-dimensional array, firebug or chrome debugger will represent it as a cryptic [ > Object, > Object, > Object ]. Not being able to follow step by step what's happening to an array makes understanding the data functions much more difficult. Fortunately, you can use the JSON.stringify() method.
{"yield":27,"variety":"Manchuria","year":1931,"site":"University Farm"},
{"yield":32.96667,"variety":"Manchuria","year":1931,"site":"Grand Rapids"},
{"yield":43.06666,"variety":"Glabron","year":1931,"site":"University Farm"},
{"yield":29.13333,"variety":"Glabron","year":1931,"site":"Grand Rapids"},
{"yield":35.13333,"variety":"Svansota","year":1931,"site":"University Farm"},
{"yield":29.66667,"variety":"Svansota","year":1931,"site":"Grand Rapids"},
{"yield":39.9,"variety":"Velvet","year":1931,"site":"University Farm"},
{"yield":23.03333,"variety":"Velvet","year":1931,"site":"Grand Rapids"},
{"yield":36.56666,"variety":"Trebi","year":1931,"site":"University Farm"},
{"yield":29.76667,"variety":"Trebi","year":1931,"site":"Grand Rapids"},
{"yield":43.26667,"variety":"No. 457","year":1931,"site":"University Farm"},
{"yield":58.1,"variety":"No. 457","year":1931,"site":"Waseca"},
{"yield":28.7,"variety":"No. 457","year":1931,"site":"Morris"},
{"yield":45.66667,"variety":"No. 457","year":1931,"site":"Crookston"},
{"yield":32.16667,"variety":"No. 457","year":1931,"site":"Grand Rapids"},
{"yield":33.6,"variety":"No. 457","year":1931,"site":"Duluth"},
{"yield":36.6,"variety":"No. 462","year":1931,"site":"University Farm"},
{"yield":65.7667,"variety":"No. 462","year":1931,"site":"Waseca"},
{"yield":30.36667,"variety":"No. 462","year":1931,"site":"Morris"},
{"yield":48.56666,"variety":"No. 462","year":1931,"site":"Crookston"},
{"yield":24.93334,"variety":"No. 462","year":1931,"site":"Grand Rapids"},
{"yield":28.1,"variety":"No. 462","year":1931,"site":"Duluth"},
{"yield":32.76667,"variety":"Peatland","year":1931,"site":"University Farm"},
{"yield":34.7,"variety":"Peatland","year":1931,"site":"Grand Rapids"},
{"yield":24.66667,"variety":"No. 475","year":1931,"site":"University Farm"},
{"yield":46.76667,"variety":"No. 475","year":1931,"site":"Waseca"},
{"yield":22.6,"variety":"No. 475","year":1931,"site":"Morris"},
{"yield":44.1,"variety":"No. 475","year":1931,"site":"Crookston"},
{"yield":19.7,"variety":"No. 475","year":1931,"site":"Grand Rapids"},
{"yield":33.06666,"variety":"No. 475","year":1931,"site":"Duluth"},
{"yield":39.3,"variety":"Wisconsin No. 38","year":1931,"site":"University Farm"},
{"yield":58.8,"variety":"Wisconsin No. 38","year":1931,"site":"Waseca"},
{"yield":29.46667,"variety":"Wisconsin No. 38","year":1931,"site":"Morris"},
{"yield":49.86667,"variety":"Wisconsin No. 38","year":1931,"site":"Crookston"},
{"yield":34.46667,"variety":"Wisconsin No. 38","year":1931,"site":"Grand Rapids"},
{"yield":31.6,"variety":"Wisconsin No. 38","year":1931,"site":"Duluth"},
{"yield":26.9,"variety":"Manchuria","year":1932,"site":"University Farm"},
{"yield":22.13333,"variety":"Manchuria","year":1932,"site":"Grand Rapids"},
{"yield":36.8,"variety":"Glabron","year":1932,"site":"University Farm"},
{"yield":14.43333,"variety":"Glabron","year":1932,"site":"Grand Rapids"},
{"yield":27.43334,"variety":"Svansota","year":1932,"site":"University Farm"},
{"yield":16.63333,"variety":"Svansota","year":1932,"site":"Grand Rapids"},
{"yield":26.8,"variety":"Velvet","year":1932,"site":"University Farm"},
{"yield":32.23333,"variety":"Velvet","year":1932,"site":"Grand Rapids"},
{"yield":29.06667,"variety":"Trebi","year":1932,"site":"University Farm"},
{"yield":20.63333,"variety":"Trebi","year":1932,"site":"Grand Rapids"},
{"yield":26.43334,"variety":"No. 457","year":1932,"site":"University Farm"},
{"yield":42.2,"variety":"No. 457","year":1932,"site":"Waseca"},
{"yield":43.53334,"variety":"No. 457","year":1932,"site":"Morris"},
{"yield":34.33333,"variety":"No. 457","year":1932,"site":"Crookston"},
{"yield":19.46667,"variety":"No. 457","year":1932,"site":"Grand Rapids"},
{"yield":22.7,"variety":"No. 457","year":1932,"site":"Duluth"},
{"yield":25.56667,"variety":"No. 462","year":1932,"site":"University Farm"},
{"yield":44.7,"variety":"No. 462","year":1932,"site":"Waseca"},
{"yield":47,"variety":"No. 462","year":1932,"site":"Morris"},
{"yield":30.53333,"variety":"No. 462","year":1932,"site":"Crookston"},
{"yield":19.9,"variety":"No. 462","year":1932,"site":"Grand Rapids"},
{"yield":22.5,"variety":"No. 462","year":1932,"site":"Duluth"},
{"yield":28.06667,"variety":"Peatland","year":1932,"site":"University Farm"},
{"yield":26.76667,"variety":"Peatland","year":1932,"site":"Grand Rapids"},
{"yield":30,"variety":"No. 475","year":1932,"site":"University Farm"},
{"yield":41.26667,"variety":"No. 475","year":1932,"site":"Waseca"},
{"yield":44.23333,"variety":"No. 475","year":1932,"site":"Morris"},
{"yield":32.13333,"variety":"No. 475","year":1932,"site":"Crookston"},
{"yield":15.23333,"variety":"No. 475","year":1932,"site":"Grand Rapids"},
{"yield":27.36667,"variety":"No. 475","year":1932,"site":"Duluth"},
{"yield":38,"variety":"Wisconsin No. 38","year":1932,"site":"University Farm"},
{"yield":58.16667,"variety":"Wisconsin No. 38","year":1932,"site":"Waseca"},
{"yield":47.16667,"variety":"Wisconsin No. 38","year":1932,"site":"Morris"},
{"yield":35.9,"variety":"Wisconsin No. 38","year":1932,"site":"Crookston"},
{"yield":20.66667,"variety":"Wisconsin No. 38","year":1932,"site":"Grand Rapids"},
{"yield":29.33333,"variety":"Wisconsin No. 38","year":1932,"site":"Duluth"}
No matter the manipulations you inflict to an array you will always be able to make it reveal its innards by using this.

Protovis: analysis of the Map projections example

What is a map?

before we start looking at the code it may be a good idea to think of the best way to represent a country.
Countries are areas of land surrounded by borders, which are imaginary (or sometimes physical) lines going through a set of points.

Some countries are made of one of such surfaces, but many countries are not one contiguous territory (they may include islands for instance) so they could be made out of several disjointed polygons.

Now let’s put on our protovis hat. Let’s suppose we want to draw a map where each country could be colored differently (choropleth). What kind of data structure should be use to represent that?
First there should be a sort of array of countries. Each country should be an item in that array, so they can be indexed and assigned an individual color and various data points.
Then, at the lowest level, we would be drawing polygons, which are treated as pv.Line in protovis. For each polygon, we would require an array of coordinate pairs. To draw a country, we would need a list (array) of those polygons.

So the data structure we are looking at is:

var world=[  // an array of countries
    [ // an array of polygons
        [ // an array of pairs of coordinates
            [x0, y0], // coordinates of the first point
            [x1, y1], // coordinates of the next one
            [xn, yn],
            [x0, y0]  // coordinates of the first point to close the polygon
       ...              // another polygon, but maybe not.
   [                    // next country

the map projections example

Can be found here: http://vis.stanford.edu/protovis/ex/projection.html

 * A diverging color scale, using previously-computed quantiles of population
 * densities; in the future, we might use a quantile scale here to do this
 * automatically. Map colors based on www.ColorBrewer.org, by Cynthia A. Brewer,
 * Penn State.
var fill = pv.Scale.linear()
    .domain(140, 650, 1900)
    .range("#91bfdb", "#ffffbf", "#fc8d59");

/* Precompute the country's population density and color. */
countries.forEach(function(c) {
  c.color = stats[c.code].area
      ? fill(stats[c.code].pop / stats[c.code].area)
      : "#ccc"; // unknown

var w = 860,
    h = 3 / 5 * w,
    geo = pv.Geo.scale("hammer").range(w, h);

var vis = new pv.Panel()

/* Countries. */
    .data(function(c) c.borders)
    .data(function(b) b)
    .title(function(d, b, c) c.name)
    .fillStyle(function(d, b, c) c.color)
    .strokeStyle(function() this.fillStyle().darker())

/* Latitude ticks. */
    .data(function(b) b)

/* Longitude ticks. */
    .data(function(b) b)


In addition there are two arrays of the following shape:
First, stats which is an associative arrays of associative arrays, and which associate each 2-letter country code with values of population and area:

var stats = {
'AG': {pop:83039, area:44},
'DZ': {pop:32854159, area:238174},
'US': {pop:299846449, area:915896},

Then, countries, which is an array of associative arrays.

var countries = [
{code:'AG', name:"Antigua and Barbuda", 
borders:[ // an array of one or several areas, 
  [ // an array of coordinates, 
    [ // a pair of the form longitude, lattitude

Now this second data structure looks a lot like the one we’ve drafted in the prologue. All the geographic information is tucked in a property called “borders”. The array has other properties for comfort.
Because the data is put in the right shape and order, this script can produce a very good map with a remarkable economy of code.
This example has been put together to showcase the various map projections of protovis (identity, mercator, and so on.). These projections have zero impact on the way data should be assembled for making maps, so we’ll just treat them as “magic”.

 * A diverging color scale, using previously-computed quantiles of population
 * densities; in the future, we might use a quantile scale here to do this
 * automatically. Map colors based on www.ColorBrewer.org, by Cynthia A. Brewer,
 * Penn State.
var fill = pv.Scale.linear()
    .domain(140, 650, 1900)
    .range("#91bfdb", "#ffffbf", "#fc8d59");

This part creates a color scale which will return a color according to the value passed to it. The color returned will be somewhere between the ones specified in the range, depending on where the value is relatively to the values specified in the domain. So a value of 140 will result in a color of #91bfdb (bluish), it will go towards the grey as the value moves up to 650, and towards #fc8d59 (redish) as the value goes up to 1900.

/* Precompute the country's population density and color. */
countries.forEach(function(c) {
  c.color = stats[c.code].area
      ? fill(stats[c.code].pop / stats[c.code].area)
      : "#ccc"; // unknown

As the remark says, this will precompute the country’s color once and for all.
The forEach() method goes to every element of the countries array.
the c.color = statement will add a color key to each element of that array (which, as you may recall, already has values for the code, name and borders keys.
What it does is that is retrieves the country code of that element of countries, c.code, and uses that to find out whether we have an area value for that country code (this is stats[/c].area?).
If this is the case, we are going to compute the color that should be attributed to the country, by passing the population divided by the area to the color scale we just made. Else, we just use light grey.

The next few lines are standard constants that will shape the vis.
Note however

geo = pv.Geo.scale("hammer").range(w, h)

This is a geographic scale, which will be used to convert longitudes and latitudes to X and Y coordinates on the screen.

/* Countries. */
    .data(function(c) c.borders)
    .data(function(b) b)
    .title(function(d, b, c) c.name)
    .fillStyle(function(d, b, c) c.color)
    .strokeStyle(function() this.fillStyle().darker())

This is where it all happens.
First, we create a series of panels, one for each country. So, we pass the countries array as data.
Then, we are going to create another series of panels for every country, that is, with as many panels as there are independent areas in the country. For instance, if there are islands, we are going to need extra panels to represent them. If the country is one contiguous mass of land, there will be just one panel here.
This time, we use function(c) c.borders as data. That is, we go into the borders array.

Finally, we are going to create a filled polygon for each of these independent areas. This is achieved by adding a pv.Line to the previous panels. Likewise, we use (function(b) b) as data, meaning that we go yet another level into the borders array. Now, we are accessing the pairs of longitude + latitude numbers.

geo.x and geo.y convert this pair of numbers to X and Y coordinates on the screen.
For the next two lines, title and fillStyle, we need to go back to the country level.
so, we use a function of the form function(d,b,c). d is the current item (pair of longitude, latitude), b its parent (individual area) and c, its grand-parent (the country).
so, function(d,b,c) c.name retrieves the country name, and function(d,b,c) c.color retrieves the color we had computed for that country to begin with.

For the color of the border, we wish to use a darker version of the fill color. This is what the this.fillStyle().darker() does.

The rest of the vis is longitude and latitude ticks, using the built-in properties of the scale.


The top TV earners are not found on tabloids.

This is my contribution to Flowing Data challenge: visualize this / top tv earners.

When I looked at the dataset provided by Nathan I first wondered what was missing. Quite a few stars were missing in the list. I added Simon Baker whose Mentalist gets a huge audience and who is said to be getting $450k per show. Others who probably should be there include Forest Whittaker for the Criminal Minds spinoff, Jim Belushi in the Defenders, Kate Walsh in Private Practice or Elizabeth Michell in V. I don't think any of them is settling for less than $15k per episode.

Another thing were the audiences. There are a few good sources for that (and some less good) so I tried to approximate how much viewers a show would generate for their channel. I came up with hourly viewers because a show that brings 10 million people to one channel for half an hour should be equivalent to one that gets 5 million viewers for an hour, right?
In fact, I wanted to come up with a proxy of how much value a TV show was generating and how much of that went into the pockets of the cast. It's probably possible to come up with a precise answer with better data than could be assembled by an amateur over a lunch break, audience is probably a part of the equation, but the short story doesn't require much calculation: actors only get the crumbs of a very fat pie.
A show like Grey's Anatomy got $329.1m dollars of ad revenue in 2009, which I assume is an US-only figure, the show being syndicated in many countries, unfortunately including France. And that excludes sales of DVD, paid donwloads and other streams of revenue. Out of that, Patrick Dempsey only got $6m. Now $6m is a lot of money, but actors of successful, established shows don't get a very good deal here. In the last seasons of "Friends", the 6 main actors all got $1m per episode, which seemed fair in retrospect. Sarah Jessica Parker's salary even reached $3.2m per Sex and the City episode, but she was co-producer. And this was then.

The chart can be divided in quadrants. On the lower-right corner, Charlie Sheen, who is the only one there with an old-school deal.
On the higher-left corner, the work horses - stars of the crime shows with stellar ratings, who could ask for more.
On the very bottom-left, those who are happy to be there. Etc.

Now some stars of shows that get well over 10m viewers get "only" around $100k per episode. So obviously the revenue stream must go somewhere else! my money is on the writers 🙂

Also for fun, I computed the ratio of their salary to the length of each episode. I once calculated that I earned about 83 cents a minute, which sounds pretty ridiculous compared to Charlie Sheen's $62500!


Working with data in protovis: part 5 of 5

previous: reshaping complex arrays (4/5)

Working with layouts

In this final part, we’re going to look at how we can shape our data to use the protovis built-in layouts such as stacked areas, treemaps or force-directed graphs.
This is not a tutorial on how to use layouts stricto sensu, and I advise anyone interested to first look at the protovis documentation to see what can be done with this and to understand the underlying concepts.

But if there is one thing to know about layouts, it’s that they allow you to create non-trivial visualizations in even less code than regular protovis, provided that you pass them data in a form they can use, and this is precisely where we come in.

Three great categories of layouts

Currently, there are no fewer than 13 types of layouts in Protovis. Fortunately, there are examples for all of them in the gallery.
There are layouts for:

In addition, there are layouts like pv.Layout.Bullet which require data to have a certain specific shape but the example from the gallery is very explicit. (et tu, Horizon layout).

Arrays of data

In order to work with this kind of layout, the simplest thing is to put your data in a 2-dimensional array:

var data=[

For the grid layout, this gives you an array of cells divided in columns (number of elements in each line) and rows (number of lines).
The idea of the grid layout is that your cells are automatically positioned and sized, so afaik the only thing you can do is add a mark such as a pv.Bar which would fill them completely, but which you could still style with fillStyle or strokeStyle. You can’t really access the underlying data with functions but you can use methods that rely on default values, like adding labels.

For instance, you can use it to generate a QR code:

var qr=[
].map(function(i) i.split(""));

var vis = new pv.Panel()
 	    .fillStyle(pv.colors("#fff", "#000"))
(BTW, this is the QR code to this page)

On line 29, I’m using a map function to turn this array of strings, which is easier and shorter to type, into a bona fide 2-dimensional array.

That’s all there is to grids, of all the layouts they are among the easiest to reproduce with regular protovis.

Now, stacks.
The easiest way to use them is to pass them 2-dimensional arrays. Now it doesn’t have to be arrays of numbers, it can be arrays of associative arrays in case you need to do something exotic. But for the following examples let’s just assume you don’t. Here is how you’d do a stacked area, stacked columns and stacked bars respectively:

var data=[
var vis=new pv.Panel().width(200).height(200);
    .x(function() 50*this.index)
    .y(function(d) d/20)

all you need is to feed the layers, x, y properties of your stack, then say what you want to add to your layers.
Now, columns:

    .x(function() 50*this.index)
    .y(function(d) d/20)

and finally, bars:

    .x(function() 50*this.index)
    .y(function(d) d/20)

For bars, there is a little trick here. I specify that the layer orientation is horizontal (“left”) and I change the height instead of the width of the added pv.Bar.
And that all there is. You can create various streamgraphs by playing with the order and offset properties of the stack but this doesn’t change anything to the data structure, so we’re done here.

Representing networks

Protovis provides 3 cool layouts to easily exhibit relationships between nodes: arc diagrams, matrix diagrams and force-directed layouts.
The good news is that the shape of the data required by those three layouts is identical.

They require an array that correponds to the nodes. This can be as simple as a pv.range(), or as sophisticated as an array of associative arrays if you want to style your network graph according to several potential attributes of the node.

And they also require an array for the links. This array has a more rigid form, it must be an array of associative arrays of the shape: {source: #, target: #, value: #} where the values for source and target correspond to the position of a node in the node array, and value indicates the strength of the link.

So let’s do a simple one.

var nodes=pv.range(6); // why more complex, right?
var links=[
{source:0, target:1, value:2},
{source:1, target:2, value:1},
{source:1, target:3, value:1},
{source:2, target:4, value:4},
{source:3, target:5, value:1},
{source:4, target:5, value:1},
{source:1, target:5, value:3}
var vis = new pv.Panel()
var arc = vis.add(pv.Layout.Arc)

Here, by varying the strength of the link, the thickness of the arcs changes accordingly. The nodes are left unstyled, had we passed a more complicated dataset to the nodes array, we could have changed their properties (fillStyle, size, strokeStyle, labels etc.) with appropriate accessor functions.

With little modifications we can create a force-directed layout and a matrix diagram.

var force = vis.add(pv.Layout.Force)


		.text(function() this.index);


Here I labelled the nodes so one can tell which is which. This is done by adding a pv.Label to the pv.Dot that’s attached to the node, just like with any other mark.

var Matrix = vis.add(pv.Layout.Matrix)

    .fillStyle(function(d) pv.Scale.linear(0, 2, 4)
      .range('#eee', 'yellow', 'green')(d.linkValue))

Matrix.label.add(pv.Label).text(function() Math.floor(this.index/2))


For the matrix things are slightly more complex than for the previous 2. Here I opted for a directed matrix, as opposed to a bidirectional one: this means that each link is shown once, to its source from its target, and not twice (ie from its target back to its source) which is the default.
I chose to color the bar attached to my links (which are cells of the matrix) according to the strength of my links. Again, if my nodes field was more qualified, I could have used these properties.

Finally, we’ve added labels to the custom property Matrix.label. Only, the labels are numbered from 0 to 11 so to get numbers from 0 to 5 for both rows and columns I used Math.floor(this.index/2) (integer part of half of this number).

Hierarchized data

Like for networks, the shape of the data we can feed to treemaps, icicles and other hierarchical representation doesn’t change. So once you have your data in order, you can easily switch representations.

Essentially, you will be passing a tree of the form:

var myTree={
   rootnode: {
      node: {
         node: {
            leaf: value,
            leaf: value,
            leaf: value

The protovis examples use the hierarchy of flare source code as an example, which really shows what can be done with a treemap and other tree represenations.

For our purpose we are going for a simpler tree, inspired by the work of Periscopic on congressspeaks.com which Kim Rees showed at Strata.
Kim presentation featured tiny treemaps that showed the voting record for a congressperson, and whether they had voted for or against their party.

So let’s play with the voting record of an hypothetic congressperson:

var hasVoted={
	didnt: 100,
	voted: {
	    yes: {
	        yesWithParty: 241,
	        yesAgainstParty: 23
	    no: {
	        noWithParty: 73,
	        noAgainstParty: 5

Once you have your tree, you will need to pass it to your layout using pv.dom, like this:


Based on that let’s do two hierarchical representations.
Let’s start with a tree:

var vis = new pv.Panel()
var tree = vis.add(pv.Layout.Tree)
    .size(function(n) n.nodeValue)
	.anchor("center").add(pv.Label).textAlign("center").text(function(n) n.nodeName)

And here is the result:

There are many styling possibilities obviously left unexplored in this simple example (you can control properties of the tree.link, tree.node, tree.labels which we didn’t use here, etc.), but this won’t change much as far as data are concerned.

Now let’s try a treemap with the same dataset.

var vis = new pv.Panel()

var tree = vis.add(pv.Layout.Treemap)

	.fillStyle(function(d) d.nodeName=="didnt"?"darkgrey":d.nodeName.slice(0,3)=="yes"?

		   {label:"yes with party", 	color: "powderblue"},
		   {label:"yes against party", 	color: "steelblue"},
		   {label:"no with party", 		color: "lightsalmon"},
		   {label:"no against party", 	color: "salmon"},
		   {label:"didn't vote", 		color: "darkgrey"}
	.top(function() 50+20*this.index)
	.fillStyle(function(d) d.color)
	.anchor("right").add(pv.Label).textAlign("left").text(function(d) d.label)


and what took the longest part of the code was making the legend.

Here is the outcome:


Working with data in protovis – part 4 of 5

Previous: array functions in javascript and protovis

Reshaping complex arrays

This really is what protovis data processing is all about.
In today’s tutorial, I am going to refer extensively to protovis’s Becker’s Barley example. One reason for that is that it’s also used in the API documentation of the methods we are going to cover, and also because I am posting a line-by-line explanation of this example that you can refer to.

So far we’ve seen that :

  • Associative arrays are great as data elements, as their various values can be used for various attributes.
    For instance, if the current data element is an associative array of this shape:

    { yield: 27.00000, variety: "Manchuria", year: 1931, site: "University Farm" }

    one could imagine a bar chart where the length of the bar would come from the yield, their fillStyle color from the variety, the label from the site, etc.

  • arrays of associative arrays are very practical to manipulate thanks to accessor functions.
    An array of the shape:

    var barley = [
      { yield: 27.00000, variety: "Manchuria", year: 1931, site: "University Farm" },
      { yield: 48.86667, variety: "Manchuria", year: 1931, site: "Waseca" },
      { yield: 27.43334, variety: "Manchuria", year: 1931, site: "Morris" },
      { yield: 39.93333, variety: "Manchuria", year: 1931, site: "Crookston" },
      { yield: 32.96667, variety: "Manchuria", year: 1931, site: "Grand Rapids" },
      { yield: 28.96667, variety: "Manchuria", year: 1931, site: "Duluth" },
      { yield: 43.06666, variety: "Glabron", year: 1931, site: "University Farm" },
      { yield: 55.20000, variety: "Glabron", year: 1931, site: "Waseca" },
      { yield: 28.76667, variety: "Glabron", year: 1931, site: "Morris" },
      { yield: 38.13333, variety: "Glabron", year: 1931, site: "Crookston" },
      { yield: 29.13333, variety: "Glabron", year: 1931, site: "Grand Rapids" },
      { yield: 29.66667, variety: "Glabron", year: 1931, site: "Duluth" },
      { yield: 35.13333, variety: "Svansota", year: 1931, site: "University Farm" },
      { yield: 29.33333, variety: "Wisconsin No. 38", year: 1932, site: "Duluth" }

    could be easily sorted according to any of the keys – yield, variety, year, site, etc.

  • it is easy to access the data of an element’s parent, and in some cases it can greatly simplify the code.


For this last reason, you may want to turn one flat array of associative arrays into an array of arrays of associative arrays. This process is called nesting.

Simple nesting

If you turn a single array like the one on the left-hand side to an array of arrays like on the right-hand side, you could easily do 3 smaller charts, one next to the other, by creating them inside of panels. You could have some information at this panel level (for instance the variety) and the rest at a lower level.

Fortunately, there are protovis methods that can turn your flat list into a more complex array of arrays. And since protovis methods are meant to be chained, you can even go on and create arrays of arrays of arrays of arrays if needs be.
Even better – combined with the other data processing functions, you don’t only change the structure of your array, but you can also filter and control the order of the elements to show everything that you want and only what you want.

And how complicated can this be?
To do the above, all you have to type is

barley=pv.nest(barley).key(function(d) d.variety).entries();

What this does is that it nests your barley array, according to the variety key. entries() at the end is required to obtain the array of arrays needed.

Here is an example of what can be done with both kinds of data structures in just a few lines of code (which won’t include the data proper. The long, flat array is stored in the variable barley, as above).
Without nesting:

var vis = new pv.Panel()
	.left(function() this.index*5)
	.height(function(d) d.yield*2)
	.fillStyle(function(d) d.year==1931?"steelBlue":"Orange")

As the pv.Bar goes through the array, there is not much it can do to structure it. We can just size the bars according to the value of yield, and color them according to another key (here the year).

Now using nesting:

barley2=pv.nest(barley).key(function(d) d.variety).entries();
barley2=pv.nest(barley).key(function(d) d.variety).entries();
var vis = new pv.Panel()
var cell=vis.add(pv.Panel)
	.left(function() this.index*70)
cell.anchor("top").add(pv.Label).textAlign("center").font("9px sans-serif").text(function(d) d.key)
		.data(function(d) d.values)
		.left(function() 5+this.index%6*10)
		.height(function(d) d.yield*2)
		.fillStyle(function(d) d.year==1931?"steelBlue":"Orange")
		.add(pv.Label).text(function(d) d.site).textAngle(-Math.PI/2)
			.textAlign("left").left(function() 15+this.index%6*10).textStyle("#222")

Here we used the same simple nesting command as above (the original example uses a more complicated one which allows for more refinement in the display). This structure allows us to create first panels, which we can style by displaying the name of the sites for instance, then, within these panels, the corresponding bars.

Doing this with the data in its original form would have been possible, but would have required writing a much longer program. So the whole idea of nesting is to take some time to plan the data structure once, so that the code is as short and useful as possible.

Going further with nesting

However, it is possible to go beyond that:

  • by nesting further, which can be done by adding other .key() methods:

      .key(function(d) d.variety)
      .key(function(d) d.year)


  • by sorting keys or values using the sortKeys() and sortValues() methods, respectively.

    For instance, we can change the order in which the variety blocks are displayed with sortKeys():

      .key(function(d) d.variety)
      .key(function(d) d.variety)

By using sortKeys without argument, the natural order is used (alphabetical, since our key is a string). But we could provide a comparison function if we wanted a more sophisticated arrangement.

Nesting and hierarchy

If you run the double nesting command we discussed above,

  .key(function(d) d.variety)
  .key(function(d) d.year)

you’ll get as a result something of the form:

var barley=[
  {key:"Manchuria", values: [
    {key:"1931", values: [
      {site:"University farm", variety: "Manchuria", year: 1931, yield: 27},
      {site:"Waseca", variety: "Manchuria", year: 1931, yield: 48.86667},
      {site:"Morris", variety: "Manchuria", year: 1931, yield: 27.43334},
      {site:"Crookston", variety: "Manchuria", year: 1931, yield: 39.93333},
      {site:"Grand Rapids", variety: "Manchuria", year: 1931, yield: 32.96667},
      {site:"Duluth", variety: "Manchuria", year: 1931, yield: 28.96667}
    {key:"1932", values: [  
      {site:"University farm", variety: "Manchuria", year: 1932, yield: 26.9},
      {site:"Waseca", variety: "Manchuria", year: 1932, yield: 33.46667},
      {site:"Morris", variety: "Manchuria", year: 1932, yield: 34.36666},
      {site:"Crookston", variety: "Manchuria", year: 1932, yield: 32.96667},
      {site:"Grand Rapids", variety: "Manchuria", year: 1932, yield: 22.13333},
      {site:"Duluth", variety: "Manchuria", year: 1932, yield: 22.56667}
  {key: "Glabron", ...

and so on and so forth for all the varieties of barley. Now how can we use this structure in a protovis script? why not use multi-dimensional arrays instead, and if so, how would the code change?

Well. You’d start using this structure by creating a first panel and and passing it the nested structure as data.
Assuming your root panel is called vis, you’d start likewise:


now, since barley has been nested first by variety, it is now an array of 10 elements. You are going to create 10 individual panels. At some point you should worry about dimensioning and positioning them. But here we are only focusing on passing data to subsequent elements.

Next, you are going to create another set of panels (or any mark, really, this doesn’t change anything for the data)

    .data(function(d) d.values)

This is how you drill down to the next level of data, by using an accessor function with the key “values”.
Congratulations! you have created 2 panels in each of our 10 individual panels, one per year.

Finally, you are going to create a final mark (let’s say, a pv.Bar)

    .data(function(d) d.values)

Again, you use an accessor function of the same form. This will create a bar chart with 6 bars.
The data element corresponding to each bar is of the form:

{site:"University farm", variety: "Manchuria", year: 1932, yield: 26.9}

So, when you style the chart, you can access these properties with accessor functions, and write for instance:

    .height(function(d) d.yield)
    .add(pv.Label).text(function(d) d.variety)


To sum it up: you can create a hierarchical structure in protovis that corresponds to the shape of your nested array by adding elements and passing data using an accessor function with the key “values”.
At the lowest level of your structure you can access all the properties of the original array using accessor functions.

Now, what if instead we used a multi-dimensional, normal array without keys and values? don’t they have structure and hierarchy, too?
This is not only possible, but also advised when your dataset is getting really big, as you would plague your users with annoying loading times. This changes the structure of the code though.

An equivalent multi-dimensional array would be something like:

var yields =
[ // this is the level of the variety
  [ // this is the level of the year
    [ 27, 48.86667, 27.43334, 39.93333, 32.96667, 28.96667], 
    [ 26.9, 33.46667, 34.36666, 32.96667, 22.13333, 22.56667]
    [ 43.06666, 55.2, 28.76667, 38.13333, 29.13333, 29.66667],
    [ 36.8, 37.73333, 35.13333, 26.16667, 14.43333, 25.86667]
    [ 35.13333, 47.33333, 25.76667, 40.46667, 29.66667, 25.7],
    [ 27.43334, 38.5, 35.03333, 20.63333, 16.63333, 22.23333]
    [ 39.9, 50.23333, 26.13333, 41.33333, 23.03333, 26.3],
    [ 26.8, 37.4, 38.83333, 32.06666, 32.23333, 22.46667]
    [ 36.56666, 63.8333, 43.76667, 46.93333, 29.76667, 33.93333],
    [ 29.06667, 49.2333, 46.63333, 41.83333, 20.63333, 30.6]
    [ 43.26667, 58.1, 28.7, 45.66667, 32.16667, 33.6],
    [ 26.43334, 42.2, 43.53334, 34.33333, 19.46667, 22.7]
    [ 36.6, 65.7667, 30.36667, 48.56666, 24.93334, 28.1],
    [ 25.56667, 44.7, 47, 30.53333, 19.9, 22.5]
    [ 32.76667, 48.56666, 29.86667, 41.6, 34.7, 32],
    [ 28.06667, 36.03333, 43.2, 25.23333, 26.76667, 31.36667]
    [ 24.66667, 46.76667, 22.6, 44.1, 19.7, 33.06666],
    [ 30, 41.26667, 44.23333, 32.13333, 15.23333, 27.36667]
    [ 39.3, 58.8, 29.46667, 49.86667, 34.46667, 31.6],
    [ 38, 58.16667, 47.16667, 35.9, 20.66667, 29.33333]

and that’s the whole lot. It is indeed shorter. now this array is only the yields, you may want to create an array of the possible values of varieties, sites and years for good measure.

var varieties=["Manchuria", "Glabron", "Svansota", "Velvet", "Trebi",
     "No. 457", "No. 462", "Peatland", "No. 475", "Wisconsin No. 38"], 
    sites=["University Farm", "Waseca", "Morris", "Crookston", "Grand Rapids", "Duluth"],

And by the way, it is very possible to create these arrays out of the original array using the map() method or equivalent.

how can we create an equivalent structure?
we start like the above:


Likewise, our 3-dimensional array is really an array of 10 arrays of 2 arrays of 6 elements. So we are also creating 10 panels. Let’s continue and create panels for the years:

    .data(function(d) d)

To drill down one level in an array, you have to use this form. you say that you are giving the children of your object what’s inside the data property of their parent.

So naturally, you follow by

    .data(function(d) d)

now how you style your bars will be slightly different than before. What you passed your first panel was an array of yields. So that’s what you get now from your data. If you want something else, you’ll have to get it with this.index for instance.

    .height(function(d) d) // that's the yield
    .add(pv.Label).text(function() varieties[this.index])

All in all it’s trickier to work with arrays. The code is less explicit, and if you change one array even by accident, you’ll have to check that others are still synchronized. But it could make your vis much faster.


Sometimes, what you want out of an array is not a more complex array, but a simpler list of numbers. For instance, what if you could obtain the sum of all the values in the array for such or such property? This is also possible in protovis, and in fact, it looks a lot like what we’ve done. The difference is that instead of using the method entries(), we will use the method rollup().

Let’s suppose we have a flat array that looks like this: these are scores of students on 3 exams.

var scores=[
{student:"Adam", exam:1, score: 77},
{student:"Adam", exam:2, score: 34},
{student:"Adam", exam:3, score: 85},
{student:"Barbara", exam:1, score: 92},
{student:"Barbara", exam:2, score: 68},
{student:"Barbara", exam:3, score: 97},
{student:"Connor", exam:1, score: 84},
{student:"Connor", exam:2, score: 54},
{student:"Connor", exam:3, score: 37},
{student:"Daniela", exam:1, score: 61},
{student:"Daniela", exam:2, score: 58},
{student:"Daniela", exam:3, score: 64}

Now, we would like to get, in one simple object, the average for each student.
We know we could reshape the array if we wanted by using pv.Nest and entries():

pv.nest(scores).key(function(d) d.student).entries()

This would be something of the shape:

[{key:"Adam", values:[
    {exam:1, score: 77, student: "Adam"},
    {exam:2, score: 34, student: "Adam"},
    {exam:3, score: 85, student: "Adam"}
  {key:"Barbara", values:[
    {exam:1, score: 92, student: "Barbara"},
    {exam:2, score: 68, student: "Barbara"},
    {exam:3, score: 97, student: "Barbara"}

Useful, for instance, if we’d want to chart the progress of each student separately.

Now if instead of using entries() at the end, we use rollup(), we could get this:

Adam: 65.33333333333333
Barbara: 85.66666666666667
Connor: 58.333333333333336
Daniela: 61}

The exact statement is

  .key(function(d) d.student)
  .rollup(function(data) pv.mean(data, function(d) d.score))

To understand how this works, it helps to visualize what the pv.nest would have returned if we had asked for entries.
What rollup does is that it would go through each of the values that correspond to the keys, and return one aggregate value, depending on the function.

For the first student, “Adam”, the corresponding values array is like this:

    {exam:1, score: 77, student: "Adam"},
    {exam:2, score: 34, student: "Adam"},
    {exam:3, score: 85, student: "Adam"}

so rollup will just look at each element of this array and apply the function.
This is what (data) in “function(data)” corresponds to.
Next, we tell protovis what to do with these elements. Here, we are interested in the average, so we take pv.mean (not pv.average, remember?)
However, we can’t directly compute the average of an array of associative arrays – we must tell protovis exactly what to average. This is why we use an accessor function, function(d) d.score.

Of course, pv.mean used in this example can be replaced by just about any function.

In the name of clarity, especially if there is only one property that can be aggregated, you can declare a function outside of the rollup() method. This is useful if you are going to aggregate your array by different dimensions:

function meanScore(data) pv.mean(data, function(d) d.score);
var avgStudent=pv.nest(scores)
  .key(function(d) d.student)
var avgExam=pv.nest(scores)
  .key(function(d) d.exam)


Protovis also provides methods that turn a “nested” array back into a flat array. And methods that turn a normal array into a tree.
The main advantage of having a flat array is that you can nest it in a different way. This is useful, for instance, if you got your data in a nested form that doesn’t work for you. Likewise, a tree is easier to reshape in protovis than an array.

To create a flat array out of a nested one, you have to use pv.flatten and specify all the keys of the array and conclude by array().


It’s important to note that you need to specify all the keys, not just the keys that correspond to actual nesting. So again, if you start from a flat array, and you do


to reverse this, you’ll have to enter the full formula, using key four times:


Finally, pv.tree – well, I haven’t seen this method used outside the documentation. It’s not used in any live example, not covered by any question in the forum, and I haven’t found any trace of it in the internet. So I’d rather leave you with the explanation in the documentation which is fairly clear than come up with my own. If you find yourself in a situation like in the documentation, where you have a flat array of associative arrays, which have a property that could be interpreted as a hierarchy, then you could use this method to turn your array in something more useful to protovis.

Putting it all together

Instead of coming up with a specific new example for this section I refer you to my explanation of the Becker’s Barley example.
On the same subject, see a comparison of how to re-create Becker’s Barley with protovis and Tableau

next: working with layouts