Building the France Dorling cartogram

This is a follow-up to the previous post where I used a Dorling cartogram to show election results in France. So how did I come up with that?

Step 1: the data

Without data, there wouldn’t be much to plot, right? So the first order of business was to fetch the data, which is theoretically available from an official source. Unfortunately, there is no way to download those results at once, instead they are presented in a myriad of formatted HTML pages – over 2000 in all.

I used wget to download them all. wget allows for recursive download, that is, you can download a page, and all the links from that page, etc. You can also specify the depth of the recursive download, which is a good idea in that case because there are lots of duplicates of that information on the server, so trying to get everything will take a couple of hours vs about 10 minutes.

Once that’s done, I used python and Beautiful Soup to turn the HTML pages into nice JSON objects. Fortunately the pages followed the same strict template, so I could run a relatively simple extraction script, like: look at the second HTML table of the page, get the number of registered voters who didn’t vote from the 2nd cell of the 2nd row, etc.
Once that was done I came up with 2 files: one which listed the aggregate votes per county and per party, and another one which listed all the candidates, their affiliation and their votes. I ended up using only the 1st one, actually at an even more aggregated level, but I may use the second one for further work.

Next up: setting the geography

A dorling cartogram is really a kind of force-directed layout graph where the nodes represent geographical entities, which are initially positioned at a place corresponding to their geography, but which are then resized by a variable, and whose movement is subject to several forces: each node is linked to its original geographical neighbors, which attract each other, but also cannot intersect with other nodes, so it pushes back its neighbors.

So for this to work I had to:

  • find the initial position of my geographical entities (French départements) and
  • list their neighbors.

To get the initial position I fetched the list of all French towns and their lon-lat coordinates which I averaged for each département. That gave me a workable proxy of the centroid of that departement. Note that I didn’t need anything very accurate at this stage, since nodes are going to be pushing each other anyway.
I listed the neighbors manually, since the number of départements is workable. Else, one possible way to do that from a shape file is look at every point of every shape and list all the entities that use it. If two entities use it they are neighbors. Again at this stage it’s not mandatory to be exhaustive. If an entity is surrounded by 8 others and we only use 4 the results will look sensibly the same.
Anyone who wants to create dorling cartograms for France can reuse the file.

Putting it all together

The rest is standard protovis fare. There is an example which uses Dorling cartograms which I used as a basis. I prepared my data files so that I could aggregate the stuff I needed at the level of the département (which is about 20 counties). I never really considered showing anything at the county level because except for looking up specific results it is never too relevant. For instance, in most places there are some agreement between parties which don’t all present candidates in order to favor their allies.

An interesting parameter when building the visualization is choosing the constant by which to multiply the size of the nodes. If it is too high, the cartogram will look like a featureless cluster of circles. If it is too low, it will ressemble the map a lot, and circles won’t touch each other.


La carte des cantonales

[I write this here post in French because it's more relevant for a French audience, a more technical post with an explanation of how it's done follows in English].

J'ai écrit que j'avais été déçu par les infographies des principaux médias au lendemain des cantonales.

Si on prend la carte du Monde par exemple, on y voit la France nettemement découpée en départements, soit bien roses, soit bien bleus, ou à la rigueur gris si l'issue est incertaine. Mais ce n'est pas l'enjeu de ces élections. Ce sont les dernières élections avant la présidentielle, et on est donc attentif aux enseignements nationaux du scrutin au-delà de la composition des conseils généraux. Ce qu'on commente, c'est plus la percée de l'extrême droite ou le poids de l'abstention. C'est la capacité de la droite ou de la gauche à s'imposer.

Dans cette optique, chaque département n'a pas le même poids. Le Nord par exemple, avec presque 900,000 électeurs, compte plus que la Lozère qui en compte juste un peu plus de 20,000. Donc, tracer une carte géographiquement exacte n'est pas très honnête, puisque cela cache ces différences qui peuvent être énormes.

Comme je n'ai pas l'habitude de porter des critiques sans proposer autre chose, j'ai fabriqué des cartogrammes de Dorling pour rendre compte de ce qui s'est vraiment passé dimanche dernier.

Les cartogrammes, ce sont presque des cartes, sauf que le côté géographique n'est pas pris au sens strict. Leur forme dépend plus des données qu'elles représentent. Et dans la version de Dorling, on remplace les morceaux de la carte (ici, les départements) par des cercles, plus ou moins gros. Là, la taille des cercles dépend du nombre d'électeurs inscrits aux cantonales. Du coup, leur position sur la carte n'est pas exactement la même que celle des départements sur la carte de France mais elle est assez proche pour qu'on puisse les retrouver.

Je propose 3 scénarios: une comparaison gauche-droite, l'importance de l'extrême-droite et celle de l'abstention.

Gauche-droite Extrême-droite Abstention