datavizchallenge.org: my entry

I just posted my www.datavizchallenge.org entry!

So here’s a little explanation about what I’ve done, how and why.

The idea behind the data provider web site, WhatWePayFor, is that billions and millions don’t talk to the average citizen. This doesn’t help them understand where their money goes.

And in my work, I generally try to show as little data as possible to make a point. And I like to use interactivity to involve users.

I got my idea from thinking back to the presentation made by Nick Diakopoulos at last Visweek. We had that workshop on data storytelling and Nick showed how to get user attention on a subject by letting them play and guess first. So here goes.

First, I wanted to let users play with data, in this case try to establish their priorities for federal budget spending. Where should money go?

I’ve manipulated this kind of figures a few times, so I know that USA has extreme positions on some “functions of spending”, like defense, or culture. Some rich countries that spend up to 7% of their budget on culture, and some who spend less than 0.1% of their budget on defense. Hint, the USA is not one of them.

So I’m hoping that the users will see differences between their preferences and the choices that the government makes in their names, and that from the differences will be striking enough for them to understand the orders of magnitude and possibly inspire them to take action.

 

Building the France Dorling cartogram

This is a follow-up to the previous post where I used a Dorling cartogram to show election results in France. So how did I come up with that?

Step 1: the data

Without data, there wouldn’t be much to plot, right? So the first order of business was to fetch the data, which is theoretically available from an official source. Unfortunately, there is no way to download those results at once, instead they are presented in a myriad of formatted HTML pages – over 2000 in all.

I used wget to download them all. wget allows for recursive download, that is, you can download a page, and all the links from that page, etc. You can also specify the depth of the recursive download, which is a good idea in that case because there are lots of duplicates of that information on the server, so trying to get everything will take a couple of hours vs about 10 minutes.

Once that’s done, I used python and Beautiful Soup to turn the HTML pages into nice JSON objects. Fortunately the pages followed the same strict template, so I could run a relatively simple extraction script, like: look at the second HTML table of the page, get the number of registered voters who didn’t vote from the 2nd cell of the 2nd row, etc.
Once that was done I came up with 2 files: one which listed the aggregate votes per county and per party, and another one which listed all the candidates, their affiliation and their votes. I ended up using only the 1st one, actually at an even more aggregated level, but I may use the second one for further work.

Next up: setting the geography

A dorling cartogram is really a kind of force-directed layout graph where the nodes represent geographical entities, which are initially positioned at a place corresponding to their geography, but which are then resized by a variable, and whose movement is subject to several forces: each node is linked to its original geographical neighbors, which attract each other, but also cannot intersect with other nodes, so it pushes back its neighbors.

So for this to work I had to:

  • find the initial position of my geographical entities (French départements) and
  • list their neighbors.

To get the initial position I fetched the list of all French towns and their lon-lat coordinates which I averaged for each département. That gave me a workable proxy of the centroid of that departement. Note that I didn’t need anything very accurate at this stage, since nodes are going to be pushing each other anyway.
I listed the neighbors manually, since the number of départements is workable. Else, one possible way to do that from a shape file is look at every point of every shape and list all the entities that use it. If two entities use it they are neighbors. Again at this stage it’s not mandatory to be exhaustive. If an entity is surrounded by 8 others and we only use 4 the results will look sensibly the same.
Anyone who wants to create dorling cartograms for France can reuse the file.

Putting it all together

The rest is standard protovis fare. There is an example which uses Dorling cartograms which I used as a basis. I prepared my data files so that I could aggregate the stuff I needed at the level of the département (which is about 20 counties). I never really considered showing anything at the county level because except for looking up specific results it is never too relevant. For instance, in most places there are some agreement between parties which don’t all present candidates in order to favor their allies.

An interesting parameter when building the visualization is choosing the constant by which to multiply the size of the nodes. If it is too high, the cartogram will look like a featureless cluster of circles. If it is too low, it will ressemble the map a lot, and circles won’t touch each other.