Gun murders in America


Click for interactive visualization

I have created this map of every homicide in the USA using firearms for the latest year where detailed information was available. Every, that is from all the agencies that report homicides to the FBI, which is not an obligation – this is why the map lacks Florida data.
In the interactive version you can see how murders happen through the year and explore them according to several criteria that were available in the database. While large shooting sprees receive media attention, unfortunately there are thousands of cases each year in just about every community.
Technically this is my first foray with d3.v3 and it uses two of its major new features, topoJSON for easy, lightweight maps, and hex binning to represent many individual events in one hexagon. Thanks to Mike Bostock for tutoring on that.

 

Building the France Dorling cartogram

This is a follow-up to the previous post where I used a Dorling cartogram to show election results in France. So how did I come up with that?

Step 1: the data

Without data, there wouldn’t be much to plot, right? So the first order of business was to fetch the data, which is theoretically available from an official source. Unfortunately, there is no way to download those results at once, instead they are presented in a myriad of formatted HTML pages – over 2000 in all.

I used wget to download them all. wget allows for recursive download, that is, you can download a page, and all the links from that page, etc. You can also specify the depth of the recursive download, which is a good idea in that case because there are lots of duplicates of that information on the server, so trying to get everything will take a couple of hours vs about 10 minutes.

Once that’s done, I used python and Beautiful Soup to turn the HTML pages into nice JSON objects. Fortunately the pages followed the same strict template, so I could run a relatively simple extraction script, like: look at the second HTML table of the page, get the number of registered voters who didn’t vote from the 2nd cell of the 2nd row, etc.
Once that was done I came up with 2 files: one which listed the aggregate votes per county and per party, and another one which listed all the candidates, their affiliation and their votes. I ended up using only the 1st one, actually at an even more aggregated level, but I may use the second one for further work.

Next up: setting the geography

A dorling cartogram is really a kind of force-directed layout graph where the nodes represent geographical entities, which are initially positioned at a place corresponding to their geography, but which are then resized by a variable, and whose movement is subject to several forces: each node is linked to its original geographical neighbors, which attract each other, but also cannot intersect with other nodes, so it pushes back its neighbors.

So for this to work I had to:

  • find the initial position of my geographical entities (French départements) and
  • list their neighbors.

To get the initial position I fetched the list of all French towns and their lon-lat coordinates which I averaged for each département. That gave me a workable proxy of the centroid of that departement. Note that I didn’t need anything very accurate at this stage, since nodes are going to be pushing each other anyway.
I listed the neighbors manually, since the number of départements is workable. Else, one possible way to do that from a shape file is look at every point of every shape and list all the entities that use it. If two entities use it they are neighbors. Again at this stage it’s not mandatory to be exhaustive. If an entity is surrounded by 8 others and we only use 4 the results will look sensibly the same.
Anyone who wants to create dorling cartograms for France can reuse the file.

Putting it all together

The rest is standard protovis fare. There is an example which uses Dorling cartograms which I used as a basis. I prepared my data files so that I could aggregate the stuff I needed at the level of the département (which is about 20 counties). I never really considered showing anything at the county level because except for looking up specific results it is never too relevant. For instance, in most places there are some agreement between parties which don’t all present candidates in order to favor their allies.

An interesting parameter when building the visualization is choosing the constant by which to multiply the size of the nodes. If it is too high, the cartogram will look like a featureless cluster of circles. If it is too low, it will ressemble the map a lot, and circles won’t touch each other.

 

Protovis: analysis of the Map projections example

What is a map?

before we start looking at the code it may be a good idea to think of the best way to represent a country.
Countries are areas of land surrounded by borders, which are imaginary (or sometimes physical) lines going through a set of points.

Some countries are made of one of such surfaces, but many countries are not one contiguous territory (they may include islands for instance) so they could be made out of several disjointed polygons.
.

Now let’s put on our protovis hat. Let’s suppose we want to draw a map where each country could be colored differently (choropleth). What kind of data structure should be use to represent that?
First there should be a sort of array of countries. Each country should be an item in that array, so they can be indexed and assigned an individual color and various data points.
Then, at the lowest level, we would be drawing polygons, which are treated as pv.Line in protovis. For each polygon, we would require an array of coordinate pairs. To draw a country, we would need a list (array) of those polygons.

So the data structure we are looking at is:

var world=[  // an array of countries
    [ // an array of polygons
        [ // an array of pairs of coordinates
            [x0, y0], // coordinates of the first point
            [x1, y1], // coordinates of the next one
                ... 
            [xn, yn],
            [x0, y0]  // coordinates of the first point to close the polygon
        ]
       ...              // another polygon, but maybe not.
   ], 
   [                    // next country
  ...
   ]
...
]

the map projections example

Can be found here: http://vis.stanford.edu/protovis/ex/projection.html

/*
 * A diverging color scale, using previously-computed quantiles of population
 * densities; in the future, we might use a quantile scale here to do this
 * automatically. Map colors based on www.ColorBrewer.org, by Cynthia A. Brewer,
 * Penn State.
 */
var fill = pv.Scale.linear()
    .domain(140, 650, 1900)
    .range("#91bfdb", "#ffffbf", "#fc8d59");

/* Precompute the country's population density and color. */
countries.forEach(function(c) {
  c.color = stats[c.code].area
      ? fill(stats[c.code].pop / stats[c.code].area)
      : "#ccc"; // unknown
});

var w = 860,
    h = 3 / 5 * w,
    geo = pv.Geo.scale("hammer").range(w, h);

var vis = new pv.Panel()
    .width(w)
    .height(h);

/* Countries. */
vis.add(pv.Panel)
    .data(countries)
  .add(pv.Panel)
    .data(function(c) c.borders)
  .add(pv.Line)
    .data(function(b) b)
    .left(geo.x)
    .top(geo.y)
    .title(function(d, b, c) c.name)
    .fillStyle(function(d, b, c) c.color)
    .strokeStyle(function() this.fillStyle().darker())
    .lineWidth(1)
    .antialias(false);

/* Latitude ticks. */
vis.add(pv.Panel)
    .data(geo.ticks.lat())
  .add(pv.Line)
    .data(function(b) b)
    .left(geo.x)
    .top(geo.y)
    .strokeStyle("rgba(128,128,128,.3)")
    .lineWidth(1)
    .interpolate("cardinal")
    .antialias(false);

/* Longitude ticks. */
vis.add(pv.Panel)
    .data(geo.ticks.lng())
  .add(pv.Line)
    .data(function(b) b)
    .left(geo.x)
    .top(geo.y)
    .strokeStyle("rgba(128,128,128,.3)")
    .lineWidth(1)
    .interpolate("cardinal")
    .antialias(false);

vis.render();

In addition there are two arrays of the following shape:
First, stats which is an associative arrays of associative arrays, and which associate each 2-letter country code with values of population and area:

var stats = {
'AG': {pop:83039, area:44},
'DZ': {pop:32854159, area:238174},
...
'US': {pop:299846449, area:915896},
...
};

Then, countries, which is an array of associative arrays.

var countries = [
{code:'AG', name:"Antigua and Barbuda", 
borders:[ // an array of one or several areas, 
  [ // an array of coordinates, 
    [ // a pair of the form longitude, lattitude
       ...
    ]
  ]
]}
...
]

Now this second data structure looks a lot like the one we’ve drafted in the prologue. All the geographic information is tucked in a property called “borders”. The array has other properties for comfort.
Because the data is put in the right shape and order, this script can produce a very good map with a remarkable economy of code.
This example has been put together to showcase the various map projections of protovis (identity, mercator, and so on.). These projections have zero impact on the way data should be assembled for making maps, so we’ll just treat them as “magic”.

/*
 * A diverging color scale, using previously-computed quantiles of population
 * densities; in the future, we might use a quantile scale here to do this
 * automatically. Map colors based on www.ColorBrewer.org, by Cynthia A. Brewer,
 * Penn State.
 */
var fill = pv.Scale.linear()
    .domain(140, 650, 1900)
    .range("#91bfdb", "#ffffbf", "#fc8d59");

This part creates a color scale which will return a color according to the value passed to it. The color returned will be somewhere between the ones specified in the range, depending on where the value is relatively to the values specified in the domain. So a value of 140 will result in a color of #91bfdb (bluish), it will go towards the grey as the value moves up to 650, and towards #fc8d59 (redish) as the value goes up to 1900.

/* Precompute the country's population density and color. */
countries.forEach(function(c) {
  c.color = stats[c.code].area
      ? fill(stats[c.code].pop / stats[c.code].area)
      : "#ccc"; // unknown
});

As the remark says, this will precompute the country’s color once and for all.
The forEach() method goes to every element of the countries array.
the c.color = statement will add a color key to each element of that array (which, as you may recall, already has values for the code, name and borders keys.
What it does is that is retrieves the country code of that element of countries, c.code, and uses that to find out whether we have an area value for that country code (this is stats[/c].area?).
If this is the case, we are going to compute the color that should be attributed to the country, by passing the population divided by the area to the color scale we just made. Else, we just use light grey.

The next few lines are standard constants that will shape the vis.
Note however

geo = pv.Geo.scale("hammer").range(w, h)

This is a geographic scale, which will be used to convert longitudes and latitudes to X and Y coordinates on the screen.

/* Countries. */
vis.add(pv.Panel)
    .data(countries)
  .add(pv.Panel)
    .data(function(c) c.borders)
  .add(pv.Line)
    .data(function(b) b)
    .left(geo.x)
    .top(geo.y)
    .title(function(d, b, c) c.name)
    .fillStyle(function(d, b, c) c.color)
    .strokeStyle(function() this.fillStyle().darker())
    .lineWidth(1)
    .antialias(false);

This is where it all happens.
First, we create a series of panels, one for each country. So, we pass the countries array as data.
Then, we are going to create another series of panels for every country, that is, with as many panels as there are independent areas in the country. For instance, if there are islands, we are going to need extra panels to represent them. If the country is one contiguous mass of land, there will be just one panel here.
This time, we use function(c) c.borders as data. That is, we go into the borders array.

Finally, we are going to create a filled polygon for each of these independent areas. This is achieved by adding a pv.Line to the previous panels. Likewise, we use (function(b) b) as data, meaning that we go yet another level into the borders array. Now, we are accessing the pairs of longitude + latitude numbers.

geo.x and geo.y convert this pair of numbers to X and Y coordinates on the screen.
For the next two lines, title and fillStyle, we need to go back to the country level.
so, we use a function of the form function(d,b,c). d is the current item (pair of longitude, latitude), b its parent (individual area) and c, its grand-parent (the country).
so, function(d,b,c) c.name retrieves the country name, and function(d,b,c) c.color retrieves the color we had computed for that country to begin with.

For the color of the border, we wish to use a darker version of the fill color. This is what the this.fillStyle().darker() does.

The rest of the vis is longitude and latitude ticks, using the built-in properties of the scale.