Making-of: cutting Paris in voting districts

Hi, in my previous post I showcased one of my recent projects. I really enjoyed building it and so would like to share how this has been done.

First, getting the data. I already scraped the results of both rounds of the presidential election by city. The districts for the congress election are also known, but it’s not possible to do a match, because large cities are almost systematically broken down into several such districts. Paris, for instance, will be represented by no less than 18 députés.

So I needed the results by the finest possible unit, that is by individual polling station. During the election night these results are compiled by city and centralized, so you would assume that the raw data of each polling station is available somewhere. That is not the case, unfortunately. Although it seems that they will be made public eventually, that may not be the case before the June 2012 election.

Fortunately, Open Data Paris had the results by polling station. More: it had their address and matching of every inhabited building in Paris to its corresponding polling station.

To map the polling stations, my first intuition was to create a voronoi tesselation of their projected, geocoded coordinates (I only had their addresses in the raw data file). In short, voronoi polygons can be generated for a certain number of control points and correspond to the area nearer to that control point than to any other. So it’s a good approximation of the areas  which correspond to a given polling station.

Problem: several polling stations could be in the same address, and for the voronoi algorithm the control points have to be distinct. So I tried jittering them (adding random noise to each one). A tesselation could be done that kind of looked like Paris but voting districts will look messy as there were frequent inversions between neighboring districts.

So I had to come up with a better approximation of what part of the city corresponded to what voting district. So I used the address to polling station correspondance, and for each polling station I took the first and the last street number of any street that was covered by it. Then I geocoded the whole lot. That’s about 16000 points. It took some time.

Here's my polling station as an example.

Then, for each polling station area, I took the minimum and maximum longitude and latitude, which formed a bounding box, and assigned the polling station to the center of that box. Then, I used tesselation again.

I found a number of oddities in the geocoding that I had to correct manually, because if one address was not accurately coded, chances are it would change the shape of the bounding box drastically and so the position of its corresponding polling station. Sometimes the geocoding service wouldn’t find the street and/or would use a street of the same name in another city, sometimes they did find the street but the coordinates were way off… So the dataset required a lot of massaging before it got into shape.

The last geographic errand I had to do for this visualization was to create a perimeter of Paris to use as a clipping mask, else the tesselation would be done on a rectangular shape with the edge polygons being very large and very skewed. So I collected coordinates of points around Paris to create one polygon. Only what’s inside of this polygon is shown (.style(“clip-path”) in d3).

After the data has been acquired, the building of the rest of the datavis was nothing special. I have used extensively mouseover and click events to trigger transformations as I always do, although this time I did prepare a lot of rules.

Originally I wanted to make the whole of France like this, though it will be difficult: one, to get the data, and two, to get it into shape. As of today the location (i.e. street address) of most of the polling stations is not available online, so even if we got the number of votes for each of the polling stations (there should be about 40000 of them) the geographic part of the problem will remain unsolved. Though, it’s a worthy endeavour. While the election results have little interest at a macro-geographic level – by region or by département – they are very useful at a very fine level as strategies can be constructed.

For instance, it’s worthwhile to send heavyweights to conquer districts that are winnable, but it’s a waste to keep them in their respective fiefdoms if victory in these districts is already certain. Also, when districts would have to be redefined, having this kind of information can be invaluable to the political force which gets to draw their new limits, or to their opponents.


Le découpage de Paris en circonscriptions

Mon dernier projet permet de voir les résultats des élections présidentielles à Paris par bureau de vote et de les projeter sur les circonscriptions qui serviront aux élections législatives de juin 2012.

Et surtout il permet de changer la composition de ces circonscriptions, dont le tracé aujourd’hui est assez arbitraire. Il y aura 18 circonscriptions à Paris contre 21 aujourd’hui, et elles ne suivent pas les arrondissements.

Le tracé de ces circonscriptions est déterminant pour le résultat des élections. Aujourd’hui, par exemple, il y a deux circonscriptions où Nicolas Sarkozy a récupéré plus de 75% des voix au 2ème tour de l’élection présidentielle, j’imagine que la gauche ne place pas trop d’espoir sur leur reconquête. De même, il existe pas moins de 9 circonscriptions où François Hollande a reçu plus de 60% des votes. En l’état actuel des choses, 12 circonscriptions semblent acquises à la gauche et 6 à la droite, dont 3 pourraient peut-être quand même être gagnées par la gauche.

Le découpage actuel n’est optimum ni pour la gauche, ni pour la droite. En modifiant le tracé des circonscriptions, la gauche pourrait toutes les remporter, et la droite pourrait en gagner 12 sur 18 (ou peut-être plus, 12 restant mon high-score personnel). Pour favoriser un camp, l’idée consiste à répartir les bureaux de votes les plus favorables entre le plus de circonscriptions possibles, plutôt que de les garder dans peu de circonscriptions. En généralisant sur le territoire, on imagine ce que ça peut donner!

Donc, quelque soit le sentiment actuel, tel ou tel redécoupage peut complètement redistribuer les cartes. C’est un sentiment dérangeant parce que ces redécoupages arrivent régulièrement et sont relativement opaques. D’ailleurs, il est assez difficile de faire le lien entre les données des élections présidentielles et les circonscriptions législatives parce que les résultats ne sont que rarement disponibles par bureau de vote.

Je donnerai les détails techniques de l’implémentation dans un futur post.