Making-of: cutting Paris in voting districts

Hi, in my previous post I showcased one of my recent projects. I really enjoyed building it and so would like to share how this has been done.

First, getting the data. I already scraped the results of both rounds of the presidential election by city. The districts for the congress election are also known, but it’s not possible to do a match, because large cities are almost systematically broken down into several such districts. Paris, for instance, will be represented by no less than 18 députés.

So I needed the results by the finest possible unit, that is by individual polling station. During the election night these results are compiled by city and centralized, so you would assume that the raw data of each polling station is available somewhere. That is not the case, unfortunately. Although it seems that they will be made public eventually, that may not be the case before the June 2012 election.

Fortunately, Open Data Paris had the results by polling station. More: it had their address and matching of every inhabited building in Paris to its corresponding polling station.

To map the polling stations, my first intuition was to create a voronoi tesselation of their projected, geocoded coordinates (I only had their addresses in the raw data file). In short, voronoi polygons can be generated for a certain number of control points and correspond to the area nearer to that control point than to any other. So it’s a good approximation of the areas  which correspond to a given polling station.

Problem: several polling stations could be in the same address, and for the voronoi algorithm the control points have to be distinct. So I tried jittering them (adding random noise to each one). A tesselation could be done that kind of looked like Paris but voting districts will look messy as there were frequent inversions between neighboring districts.

So I had to come up with a better approximation of what part of the city corresponded to what voting district. So I used the address to polling station correspondance, and for each polling station I took the first and the last street number of any street that was covered by it. Then I geocoded the whole lot. That’s about 16000 points. It took some time.

Here's my polling station as an example.

Then, for each polling station area, I took the minimum and maximum longitude and latitude, which formed a bounding box, and assigned the polling station to the center of that box. Then, I used tesselation again.

I found a number of oddities in the geocoding that I had to correct manually, because if one address was not accurately coded, chances are it would change the shape of the bounding box drastically and so the position of its corresponding polling station. Sometimes the geocoding service wouldn’t find the street and/or would use a street of the same name in another city, sometimes they did find the street but the coordinates were way off… So the dataset required a lot of massaging before it got into shape.

The last geographic errand I had to do for this visualization was to create a perimeter of Paris to use as a clipping mask, else the tesselation would be done on a rectangular shape with the edge polygons being very large and very skewed. So I collected coordinates of points around Paris to create one polygon. Only what’s inside of this polygon is shown (.style(“clip-path”) in d3).

After the data has been acquired, the building of the rest of the datavis was nothing special. I have used extensively mouseover and click events to trigger transformations as I always do, although this time I did prepare a lot of rules.

Originally I wanted to make the whole of France like this, though it will be difficult: one, to get the data, and two, to get it into shape. As of today the location (i.e. street address) of most of the polling stations is not available online, so even if we got the number of votes for each of the polling stations (there should be about 40000 of them) the geographic part of the problem will remain unsolved. Though, it’s a worthy endeavour. While the election results have little interest at a macro-geographic level – by region or by département – they are very useful at a very fine level as strategies can be constructed.

For instance, it’s worthwhile to send heavyweights to conquer districts that are winnable, but it’s a waste to keep them in their respective fiefdoms if victory in these districts is already certain. Also, when districts would have to be redefined, having this kind of information can be invaluable to the political force which gets to draw their new limits, or to their opponents.

6 thoughts on “Making-of: cutting Paris in voting districts

  1. Great writeup! – and really great insights on how you tackled this interesting problem.

    Could I ask what geocoding service you were using for this?
    And, did you use a GIS tool to do the manual correction – like qQis?

  2. Hi Jim, thanks
    actually I’m not very good at manipulating geo data (but wish I could be better). I do some geocoding now and again though so I have python scripts that can in theory lookup a bunch of addresses and spit back longitudes and latitudes. Only by the time I run them the APIs they use are deprecated or seriously limited. so for this exercise I ended up using websites with online forms to lookup a bunch of addresses in one go. I used http://www.batchgeocodeur.mapjmz.com/ which can lookup 10,000 addresses but which is quite slow and runs against google geocoder (I presume) and http://www.gpsvisualizer.com/geocoder/ which can process 1,000 addresses and uses yahoo’s. I actually looked up the missing addresses or the ones that were apparently incorrect on google map and manually entered more reliable points. I have been living for about 15 years in Paris so it was time I put that knowledge to good use :) Though, I have discovered that Paris has a “rue Python”. no word on a javascript avenue though.

  3. there is a Git street of sorts, too, “rue Gît-le-coeur”, though it has little to do with every coder’s favorite cat

Leave a Reply