my entry

I just posted my entry!

So here’s a little explanation about what I’ve done, how and why.

The idea behind the data provider web site, WhatWePayFor, is that billions and millions don’t talk to the average citizen. This doesn’t help them understand where their money goes.

And in my work, I generally try to show as little data as possible to make a point. And I like to use interactivity to involve users.

I got my idea from thinking back to the presentation made by Nick Diakopoulos at last Visweek. We had that workshop on data storytelling and Nick showed how to get user attention on a subject by letting them play and guess first. So here goes.

First, I wanted to let users play with data, in this case try to establish their priorities for federal budget spending. Where should money go?

I’ve manipulated this kind of figures a few times, so I know that USA has extreme positions on some “functions of spending”, like defense, or culture. Some rich countries that spend up to 7% of their budget on culture, and some who spend less than 0.1% of their budget on defense. Hint, the USA is not one of them.

So I’m hoping that the users will see differences between their preferences and the choices that the government makes in their names, and that from the differences will be striking enough for them to understand the orders of magnitude and possibly inspire them to take action.


Building the France Dorling cartogram

This is a follow-up to the previous post where I used a Dorling cartogram to show election results in France. So how did I come up with that?

Step 1: the data

Without data, there wouldn’t be much to plot, right? So the first order of business was to fetch the data, which is theoretically available from an official source. Unfortunately, there is no way to download those results at once, instead they are presented in a myriad of formatted HTML pages – over 2000 in all.

I used wget to download them all. wget allows for recursive download, that is, you can download a page, and all the links from that page, etc. You can also specify the depth of the recursive download, which is a good idea in that case because there are lots of duplicates of that information on the server, so trying to get everything will take a couple of hours vs about 10 minutes.

Once that’s done, I used python and Beautiful Soup to turn the HTML pages into nice JSON objects. Fortunately the pages followed the same strict template, so I could run a relatively simple extraction script, like: look at the second HTML table of the page, get the number of registered voters who didn’t vote from the 2nd cell of the 2nd row, etc.
Once that was done I came up with 2 files: one which listed the aggregate votes per county and per party, and another one which listed all the candidates, their affiliation and their votes. I ended up using only the 1st one, actually at an even more aggregated level, but I may use the second one for further work.

Next up: setting the geography

A dorling cartogram is really a kind of force-directed layout graph where the nodes represent geographical entities, which are initially positioned at a place corresponding to their geography, but which are then resized by a variable, and whose movement is subject to several forces: each node is linked to its original geographical neighbors, which attract each other, but also cannot intersect with other nodes, so it pushes back its neighbors.

So for this to work I had to:

  • find the initial position of my geographical entities (French départements) and
  • list their neighbors.

To get the initial position I fetched the list of all French towns and their lon-lat coordinates which I averaged for each département. That gave me a workable proxy of the centroid of that departement. Note that I didn’t need anything very accurate at this stage, since nodes are going to be pushing each other anyway.
I listed the neighbors manually, since the number of départements is workable. Else, one possible way to do that from a shape file is look at every point of every shape and list all the entities that use it. If two entities use it they are neighbors. Again at this stage it’s not mandatory to be exhaustive. If an entity is surrounded by 8 others and we only use 4 the results will look sensibly the same.
Anyone who wants to create dorling cartograms for France can reuse the file.

Putting it all together

The rest is standard protovis fare. There is an example which uses Dorling cartograms which I used as a basis. I prepared my data files so that I could aggregate the stuff I needed at the level of the département (which is about 20 counties). I never really considered showing anything at the county level because except for looking up specific results it is never too relevant. For instance, in most places there are some agreement between parties which don’t all present candidates in order to favor their allies.

An interesting parameter when building the visualization is choosing the constant by which to multiply the size of the nodes. If it is too high, the cartogram will look like a featureless cluster of circles. If it is too low, it will ressemble the map a lot, and circles won’t touch each other.


La carte des cantonales

[I write this here post in French because it's more relevant for a French audience, a more technical post with an explanation of how it's done follows in English].

J'ai écrit que j'avais été déçu par les infographies des principaux médias au lendemain des cantonales.

Si on prend la carte du Monde par exemple, on y voit la France nettemement découpée en départements, soit bien roses, soit bien bleus, ou à la rigueur gris si l'issue est incertaine. Mais ce n'est pas l'enjeu de ces élections. Ce sont les dernières élections avant la présidentielle, et on est donc attentif aux enseignements nationaux du scrutin au-delà de la composition des conseils généraux. Ce qu'on commente, c'est plus la percée de l'extrême droite ou le poids de l'abstention. C'est la capacité de la droite ou de la gauche à s'imposer.

Dans cette optique, chaque département n'a pas le même poids. Le Nord par exemple, avec presque 900,000 électeurs, compte plus que la Lozère qui en compte juste un peu plus de 20,000. Donc, tracer une carte géographiquement exacte n'est pas très honnête, puisque cela cache ces différences qui peuvent être énormes.

Comme je n'ai pas l'habitude de porter des critiques sans proposer autre chose, j'ai fabriqué des cartogrammes de Dorling pour rendre compte de ce qui s'est vraiment passé dimanche dernier.

Les cartogrammes, ce sont presque des cartes, sauf que le côté géographique n'est pas pris au sens strict. Leur forme dépend plus des données qu'elles représentent. Et dans la version de Dorling, on remplace les morceaux de la carte (ici, les départements) par des cercles, plus ou moins gros. Là, la taille des cercles dépend du nombre d'électeurs inscrits aux cantonales. Du coup, leur position sur la carte n'est pas exactement la même que celle des départements sur la carte de France mais elle est assez proche pour qu'on puisse les retrouver.

Je propose 3 scénarios: une comparaison gauche-droite, l'importance de l'extrême-droite et celle de l'abstention.

Gauche-droite Extrême-droite Abstention

More fun with arrays in protovis

In my short tutorial on working with data and protovis I briefly covered some standard javascript and protovis methods to work with arrays. The more I work with Protovis, the more I am convinced that efficient array manipulation is key to achieving just about anything with the library. So, I would like to go into more detail in some javascript methods for building, processing and testing arrays that can really be helpful.

Going through arrays: map and forEach

I said rapidly that the map method was very useful in protovis especially used in combination with pv.range. But that's not very fair to map() to be treated this lightly. Protovis canon examples do not use many traditional loops such as for or while statements. One reason for that is that many constructs in protovis are de facto loops: when we pass an array to protovis as a data file, to create a bar chart for instance (or panels, pie wedges, you name it), it will go through each element of the array to create individul bars (panels, wedges...), to position them, style them, and so on and so forth. This is why it is so important to have your data elements in the best possible shape when you first pass them to protovis. It makes the rest of your code much nicer. Remember our early example:
var categories = ["a","b","c","d","e","f"];
var vis = new pv.Panel()

  .data([1, 1.2, 1.7, 1.5, .7, .3])
  .height(function(d) d * 80)
  .left(function() this.index * 25)
    .text(function() categories[this.index]);

This is not ideal to have values and categories in two separate places, because one could be changed without updating the other. So let's try to use map to create one single variable.
var categories = ["a","b","c","d","e","f"];
var data=[1, 1.2, 1.7, 1.5, .7, .3].map(function(d,i) {return {value:d, key: categories[i]};});
Let's look at our map() method here. First, it's right after an array. It will run against this array, so it will perform an operation on each element of this array, and the result will be another array of the same size with the outcome of each operation in the same order. The next thing here is a function with two arguments: d and i. Again the naming is arbitrary, call them what you want. And they are both optional. But it has to be a function. pv.range(3).map(3) will not return [3,3,3], you need to write pv.range(3).map(function() 3). The first argument refers to the current item of the array map is working on. So 1, then 1.2, etc. If the array is more complex, and is an array of arrays or similar, the current element can be itself an array, an object or anything. It doesn't have to be a number. Here, we want to create an array of associative arrays where the value handle corresponds to the values of the array, and where key corresponds to the category name. So we start our output by "{value: d,". This puts the value of each array element in sequence where we need it to be. The second argument corresponds to the index of the current item in the array, so - 0, 1, 2 etc. This is not unlike using "this.index" in other parts of protovis. This helps us getting the right category name, the one in the same position as the value we are fetching. So we write "key: categories[i]}". The rest of the code can then be changed to :
  .height(function(d) d.value * 80)
  .left(function() this.index * 25)
    .text(function() d.key);
Now how about forEach()? forEach works in a very similar way to map(), the difference is that it doesn't output an array. It's just a function that runs on each element of the array. It can be used to perform an operation a number of times corresponding to the length of that array, for instance.

Testing arrays

There are some times when you need to know whether some or all the elements of your array fulfill a condition. And some other times, you need to be able to extract a subset of your array also on a conditional basis. Now, that would be possible using forEach or map methods as above, but fortunately javascript provides simpler means to achieve that.

Testing a condition on an array at once

There are two methods that do that: every() and some(). every() will return true is the condition is true for, well, every element of the array. some() will return true if the condition is true for at least one element of the array. So, they can also be used to tell if the condition is false for at least one element of the array, or all elements of the array respectively. This is how they work:
[0,1,2].every(function(d) {return d;})
// will return false: 0 is false, 1 is true and 2 is true.
[0,1,2].every(function(d) {return (d<3);})
// will return true. All elements are less than 3.

[0,1,2].some(function(d) {return d;})
// will return true. 1 is true, so at least one element in the array is true.

Creating conditional subsets of an array

It is also possible to get only the elements that fit a condition using the filter() method.
[0,1,2].filter(function(d) {return d;})
// this will return [1,2]. 0 is evaluated as "false".
[1,2,3,4,5].filter(function(d) {return (d>2);})
// this will return [3,4,5].
Of course, the more complex the array, the more interesting these functions get. With the barley array from part 4:
 barley.filter(function(d) {return d.variety=="Manchuria";}
/* this will return: 
  [{"yield":27,"variety":"Manchuria","year":1931,"site":"University Farm"},
   {"yield":32.96667,"variety":"Manchuria","year":1931,"site":"Grand Rapids"},
   {"yield":26.9,"variety":"Manchuria","year":1932,"site":"University Farm"},
   {"yield":22.13333,"variety":"Manchuria","year":1932,"site":"Grand Rapids"},

Visualizing arrays

(without plotting them, of course) When you are manipulating arrays, turning them into maps, performing roll-ups and sorts, you may want to get a glimpse of the array. But, unless it's a single, one-dimensional array, firebug or chrome debugger will represent it as a cryptic [ > Object, > Object, > Object ]. Not being able to follow step by step what's happening to an array makes understanding the data functions much more difficult. Fortunately, you can use the JSON.stringify() method.
{"yield":27,"variety":"Manchuria","year":1931,"site":"University Farm"},
{"yield":32.96667,"variety":"Manchuria","year":1931,"site":"Grand Rapids"},
{"yield":43.06666,"variety":"Glabron","year":1931,"site":"University Farm"},
{"yield":29.13333,"variety":"Glabron","year":1931,"site":"Grand Rapids"},
{"yield":35.13333,"variety":"Svansota","year":1931,"site":"University Farm"},
{"yield":29.66667,"variety":"Svansota","year":1931,"site":"Grand Rapids"},
{"yield":39.9,"variety":"Velvet","year":1931,"site":"University Farm"},
{"yield":23.03333,"variety":"Velvet","year":1931,"site":"Grand Rapids"},
{"yield":36.56666,"variety":"Trebi","year":1931,"site":"University Farm"},
{"yield":29.76667,"variety":"Trebi","year":1931,"site":"Grand Rapids"},
{"yield":43.26667,"variety":"No. 457","year":1931,"site":"University Farm"},
{"yield":58.1,"variety":"No. 457","year":1931,"site":"Waseca"},
{"yield":28.7,"variety":"No. 457","year":1931,"site":"Morris"},
{"yield":45.66667,"variety":"No. 457","year":1931,"site":"Crookston"},
{"yield":32.16667,"variety":"No. 457","year":1931,"site":"Grand Rapids"},
{"yield":33.6,"variety":"No. 457","year":1931,"site":"Duluth"},
{"yield":36.6,"variety":"No. 462","year":1931,"site":"University Farm"},
{"yield":65.7667,"variety":"No. 462","year":1931,"site":"Waseca"},
{"yield":30.36667,"variety":"No. 462","year":1931,"site":"Morris"},
{"yield":48.56666,"variety":"No. 462","year":1931,"site":"Crookston"},
{"yield":24.93334,"variety":"No. 462","year":1931,"site":"Grand Rapids"},
{"yield":28.1,"variety":"No. 462","year":1931,"site":"Duluth"},
{"yield":32.76667,"variety":"Peatland","year":1931,"site":"University Farm"},
{"yield":34.7,"variety":"Peatland","year":1931,"site":"Grand Rapids"},
{"yield":24.66667,"variety":"No. 475","year":1931,"site":"University Farm"},
{"yield":46.76667,"variety":"No. 475","year":1931,"site":"Waseca"},
{"yield":22.6,"variety":"No. 475","year":1931,"site":"Morris"},
{"yield":44.1,"variety":"No. 475","year":1931,"site":"Crookston"},
{"yield":19.7,"variety":"No. 475","year":1931,"site":"Grand Rapids"},
{"yield":33.06666,"variety":"No. 475","year":1931,"site":"Duluth"},
{"yield":39.3,"variety":"Wisconsin No. 38","year":1931,"site":"University Farm"},
{"yield":58.8,"variety":"Wisconsin No. 38","year":1931,"site":"Waseca"},
{"yield":29.46667,"variety":"Wisconsin No. 38","year":1931,"site":"Morris"},
{"yield":49.86667,"variety":"Wisconsin No. 38","year":1931,"site":"Crookston"},
{"yield":34.46667,"variety":"Wisconsin No. 38","year":1931,"site":"Grand Rapids"},
{"yield":31.6,"variety":"Wisconsin No. 38","year":1931,"site":"Duluth"},
{"yield":26.9,"variety":"Manchuria","year":1932,"site":"University Farm"},
{"yield":22.13333,"variety":"Manchuria","year":1932,"site":"Grand Rapids"},
{"yield":36.8,"variety":"Glabron","year":1932,"site":"University Farm"},
{"yield":14.43333,"variety":"Glabron","year":1932,"site":"Grand Rapids"},
{"yield":27.43334,"variety":"Svansota","year":1932,"site":"University Farm"},
{"yield":16.63333,"variety":"Svansota","year":1932,"site":"Grand Rapids"},
{"yield":26.8,"variety":"Velvet","year":1932,"site":"University Farm"},
{"yield":32.23333,"variety":"Velvet","year":1932,"site":"Grand Rapids"},
{"yield":29.06667,"variety":"Trebi","year":1932,"site":"University Farm"},
{"yield":20.63333,"variety":"Trebi","year":1932,"site":"Grand Rapids"},
{"yield":26.43334,"variety":"No. 457","year":1932,"site":"University Farm"},
{"yield":42.2,"variety":"No. 457","year":1932,"site":"Waseca"},
{"yield":43.53334,"variety":"No. 457","year":1932,"site":"Morris"},
{"yield":34.33333,"variety":"No. 457","year":1932,"site":"Crookston"},
{"yield":19.46667,"variety":"No. 457","year":1932,"site":"Grand Rapids"},
{"yield":22.7,"variety":"No. 457","year":1932,"site":"Duluth"},
{"yield":25.56667,"variety":"No. 462","year":1932,"site":"University Farm"},
{"yield":44.7,"variety":"No. 462","year":1932,"site":"Waseca"},
{"yield":47,"variety":"No. 462","year":1932,"site":"Morris"},
{"yield":30.53333,"variety":"No. 462","year":1932,"site":"Crookston"},
{"yield":19.9,"variety":"No. 462","year":1932,"site":"Grand Rapids"},
{"yield":22.5,"variety":"No. 462","year":1932,"site":"Duluth"},
{"yield":28.06667,"variety":"Peatland","year":1932,"site":"University Farm"},
{"yield":26.76667,"variety":"Peatland","year":1932,"site":"Grand Rapids"},
{"yield":30,"variety":"No. 475","year":1932,"site":"University Farm"},
{"yield":41.26667,"variety":"No. 475","year":1932,"site":"Waseca"},
{"yield":44.23333,"variety":"No. 475","year":1932,"site":"Morris"},
{"yield":32.13333,"variety":"No. 475","year":1932,"site":"Crookston"},
{"yield":15.23333,"variety":"No. 475","year":1932,"site":"Grand Rapids"},
{"yield":27.36667,"variety":"No. 475","year":1932,"site":"Duluth"},
{"yield":38,"variety":"Wisconsin No. 38","year":1932,"site":"University Farm"},
{"yield":58.16667,"variety":"Wisconsin No. 38","year":1932,"site":"Waseca"},
{"yield":47.16667,"variety":"Wisconsin No. 38","year":1932,"site":"Morris"},
{"yield":35.9,"variety":"Wisconsin No. 38","year":1932,"site":"Crookston"},
{"yield":20.66667,"variety":"Wisconsin No. 38","year":1932,"site":"Grand Rapids"},
{"yield":29.33333,"variety":"Wisconsin No. 38","year":1932,"site":"Duluth"}
No matter the manipulations you inflict to an array you will always be able to make it reveal its innards by using this.

Protovis: analysis of the Map projections example

What is a map?

before we start looking at the code it may be a good idea to think of the best way to represent a country.
Countries are areas of land surrounded by borders, which are imaginary (or sometimes physical) lines going through a set of points.

Some countries are made of one of such surfaces, but many countries are not one contiguous territory (they may include islands for instance) so they could be made out of several disjointed polygons.

Now let’s put on our protovis hat. Let’s suppose we want to draw a map where each country could be colored differently (choropleth). What kind of data structure should be use to represent that?
First there should be a sort of array of countries. Each country should be an item in that array, so they can be indexed and assigned an individual color and various data points.
Then, at the lowest level, we would be drawing polygons, which are treated as pv.Line in protovis. For each polygon, we would require an array of coordinate pairs. To draw a country, we would need a list (array) of those polygons.

So the data structure we are looking at is:

var world=[  // an array of countries
    [ // an array of polygons
        [ // an array of pairs of coordinates
            [x0, y0], // coordinates of the first point
            [x1, y1], // coordinates of the next one
            [xn, yn],
            [x0, y0]  // coordinates of the first point to close the polygon
       ...              // another polygon, but maybe not.
   [                    // next country

the map projections example

Can be found here:

 * A diverging color scale, using previously-computed quantiles of population
 * densities; in the future, we might use a quantile scale here to do this
 * automatically. Map colors based on, by Cynthia A. Brewer,
 * Penn State.
var fill = pv.Scale.linear()
    .domain(140, 650, 1900)
    .range("#91bfdb", "#ffffbf", "#fc8d59");

/* Precompute the country's population density and color. */
countries.forEach(function(c) {
  c.color = stats1.area?).
If this is the case, we are going to compute the color that should be attributed to the country, by passing the population divided by the area to the color scale we just made. Else, we just use light grey.

The next few lines are standard constants that will shape the vis.
Note however

geo = pv.Geo.scale("hammer").range(w, h)

This is a geographic scale, which will be used to convert longitudes and latitudes to X and Y coordinates on the screen.

/* Countries. */
    .data(function(c) c.borders)
    .data(function(b) b)
    .title(function(d, b, c)
    .fillStyle(function(d, b, c) c.color)
    .strokeStyle(function() this.fillStyle().darker())

This is where it all happens.
First, we create a series of panels, one for each country. So, we pass the countries array as data.
Then, we are going to create another series of panels for every country, that is, with as many panels as there are independent areas in the country. For instance, if there are islands, we are going to need extra panels to represent them. If the country is one contiguous mass of land, there will be just one panel here.
This time, we use function(c) c.borders as data. That is, we go into the borders array.

Finally, we are going to create a filled polygon for each of these independent areas. This is achieved by adding a pv.Line to the previous panels. Likewise, we use (function(b) b) as data, meaning that we go yet another level into the borders array. Now, we are accessing the pairs of longitude + latitude numbers.

geo.x and geo.y convert this pair of numbers to X and Y coordinates on the screen.
For the next two lines, title and fillStyle, we need to go back to the country level.
so, we use a function of the form function(d,b,c). d is the current item (pair of longitude, latitude), b its parent (individual area) and c, its grand-parent (the country).
so, function(d,b,c) retrieves the country name, and function(d,b,c) c.color retrieves the color we had computed for that country to begin with.

For the color of the border, we wish to use a darker version of the fill color. This is what the this.fillStyle().darker() does.

The rest of the vis is longitude and latitude ticks, using the built-in properties of the scale.


Stuff I do with Tableau

I put together a list of some of the things I’ve done with Tableau public there. I had put the link on twitter last Friday, and I just saw the number of connections through – never thought something posted just before the week-end could get so much attention! so thanks, twitter followers. I’m putting another link here on the blog for convenience.

I’ve been using Tableau public ever since its release (well actually a tad before).  Over time I’ve been trying to use it less as a visually pleasing and convenient way to shove a lot of data in a limited space (like here), and more as a way to promote a certain angle when looking at a dataset (like here), that is, as an invitation to the viewers to reach the same conclusions than us, but by giving them access to the data so they can see for themselves.

One of my first vizzes - just a layer on top of the dataset

This one is less neutral. I would like readers to embrace the opinion in the associated article, so I use Tableau to present the data in a way that supports this opinion.

That long list is still a subset of what I’ve done with Tableau, mostly because like many people in datavis I use Tableau at two stages.

I use it to communicate a finished visualization such as these, although I may use a static image or another interactive tool.

But I also almost systematically use it in the early stages, when I receive a dataset and I need to make sense of it before I can represent it. By manipulating a dataset in Tableau and testing various basic dimension combinations one can quickly see the points of interest in the data and come up with relevant questions to ask the data, to which a visualization is the answer. So while I can’t share these “drafts” they are very very helpful.

Also over time I got better (hopefully) at controlling dashboards so they look exactly the way I want them to and not how I manage to put them together. What helps is setting the size of the dashboard as exact dimensions (ie 600 by 400), not as a range, and, unsurprisingly, to draw the dashboard on paper first. Anyway, all the dashboards which are on that page are freely downloadable if you want to see how they are done