To my datavis readers, sorry for that string of posts in French but what better data to visualize than political data, and what better time to visualize political data than election time, and what better audience for such visualizations than the folks who are asked to vote?

Like last time, though, I am writing a follow-up technical post about how I dealt with the issues of this visualization.

So anyone who ever tried to make data visualizations knows that you can hardly start without **data**.

My ingredients for the recipe were:

2012 presidential election results by circonscription, plus those of 2007.

Results of the previous congressional election. There were 2 files, one per round, as opposed to a flat file of *députés *in place (I didn’t find one that didn’t required some significant editing to be of use). Most importantly, I needed their political orientation which required some tweaking.

Matching tables between circonscriptions and cities. From a previous project, presidential election data at the city level. Also, geo coordinates of the cities.

The data which was most painful to extract was the list of candidates. In all fairness, UMP made it easier than PS as they had them all on a page. For PS, they had a google fusion table which had this as a data source. That file required a lot of massaging. Eventually local pages of the PS site would list the candidates missing from the map (or provide alternate names). When it was up, I also used the http://www.elections-legislatives.fr/ site to check for the missing names.

Finally, I figured out the genders of all the candidates by extracting their first name and looking up all the ones I wasn’t sure about (there are quite a few unisex first names in French).

Now **calculations**.

There is a pretty strong statistical link between the score of a party on an election in a certain territory, and the chances of a congress candidate of the same party of winning the district.

Predicting these chances is a well-known problem known as *classification * for which the textbook method is *logistic regression.*

All we need was the 575 districts for which I had results. We then associate the score of a party at the 2nd round of the election to whether the corresponding congress candidate got elected (1 or 0). That gives us 1150 pairs of values which we throw in the mathematical cooking pot.

And what we get is the following formula:

where x is the score in the previous election (between 0 and 1). As you can see when x gets close to 0, the denominator becomes a very large number and the probability quickly drops to virtually nothing, and converserly when x gets close to 1, the denominator becomes very close to 1 so the probability rises up to 1 equally fast.

With this and that in place, it is possible to come up with a reasonable estimation of the chances of any candidate based on the recent results. As an aside, the current Prime Minister has renewed the tradition started by his predecessor to ask ministers to seek office and to force them to step down if they fail to win their district. As a result, 24 out of the 37 ministers are campaigning. Out of those 24, 2 are taking very serious risks according to this model: Marie-Anne Carlotti and Benoît Hamon.

Finally, **geography**.

In an ideal world, there will be an abundance of geoJSON files describing France and its many administrative entities. Usable data must exist somewhere, because the maps on www.elections-legislatives.fr have all been generated (by Raphael.js says the source code). If I’m doing another project on these elections I might reverse engineer the shape of the maps to extract the coordinates.

Without a dataset, the work of drawing the boundaries of 577 districts is just huge. However, accuracy is not required as I’m only putting the districts on a map so people can look up where they live or places they know. In my previous work in order to let users change the composition of the districts, I wanted to be rigorous in the placement of everything but here we can live with imperfection.

So I am using the same principle as I did: voronoi tesselation.

For each district I am picking the largest city, for which I have the coordinates. But most large cities belong to several districts. So I am adding random noise to each point. Then, I am drawing shapes around them.

That would normally fill a rectangle, so in order to make it look like France, I have drawn a clipping mask on top of it (that, I’ve done by hand, picking coordinates of the outline of France).

That about wraps it up!

In case it is still relevant.

You can get a shape file a country’ administrative areas from http://www.gadm.org/country. For France, at least, it very accurately gives 5 (!) admin levels (region, departement, and then I don’t know…) You can then simplify (if necessary) it using mapshaper.com/test/MapShaper.swf (~80% reduction is hardly visible), and turn it to a geoJSON/totpoJSON file with ogr2ogr (cf http://bost.ocks.org/mike/map/).

Hope this helps.

thank you.. great tutorial.