La france qui exporte

This week, I was made aware of a new set of maps by French ministry of Foreign Trade, called cartographie de la France qui exporte (map of France exports) (link). Since I’m interested in the topic and that I know that French public services have killer cartographers I was eager to see what was so exciting about the first set of online maps on French exports.

I was a little underwhelmed to be honest. Online here meant static pdf files, although this is a dataset that just begs to be explored and manipulated.
On top of that, those where basic choropleth maps with markers such as this one here:

Now this map has two problems. First, it’s a choropleth with a discrete scale, but the values of adjacent areas can vary a lot. So, if you look at this portion of the map, what can be deduced on the values? not much I’m afraid.

Second, it’s difficult to compare the marks on the map. Which region has the biggest? the smallest? how do two specific regions compare? with this representation, this type of question is even more difficult to answer than with a table.

Also these charts answer one partial question. So this one, here, shows which region exports most food products. But to where? and how about the imports and balance? now if one given view was the most relevant and could illustrate some important finding, it can be highlighted but here the website gives us collections of many of such maps. As a citizen I’m leaving no more informed than I was.

Not being the one to criticize without proposing an alternative, I whipped out an interactive exploratory tool of France trade flows.

(The interactive vis is too wide to conveniently fit in a blog, but clicking on the image will open it in a new tab).

I don’t have access to the same dataset so I can’t show a strict equivalent. My data comes from COMTRADE, the UN database of trade flows, and shows all exports and imports to France in 2009. They are not broken down by region or by type of company, but I got the flows by partner country and product category.
The idea is that one can select something on one treemap to update the other. Also, it’s possible to alternate between a categorical view (where all groups of products and continents look neatly separated) and a view of the balance, which quickly shows which products or which countries get the bulk of French trade.


(technical explanation follows for those interested in the code proper)
Now following last week’s tutorial, of course it had to be done in protovis.
Actually it illustrates some interesting principles of working with arrays, trees, maps etc.

First, I want to do as much data manipulation as possible in protovis as opposed to manually. So, my source data for the treemap is stored as an array of associative arrays, which is probably the preferred form in protovis. This is no different than, for instance, Protovis’s barley example.
Now how do you get something of the shape –

var data=
[{com:"02",cat:0,cou:4,con:3,imp:0,exp:101421},
{com:"03",cat:0,cou:4,con:3,imp:9716,exp:0},
{com:"04",cat:0,cou:4,con:3,imp:0,exp:9272355},
{com:"05",cat:0,cou:4,con:3,imp:531587,exp:0},
{com:"07",cat:0,cou:4,con:3,imp:0,exp:83360},
...
{com:"08",cat:0,cou:4,con:3,imp:0,exp:2779}

to something shaped like a tree like:

var tree=
{0: {
       02: 101421,
       03: 0,
       04: 9272355,
       05: 0,
       07:83360},
...

The solution is to use the rollup method.

First, if you look at my individual records, they are of the shape:

{com:"04",cat:0,cou:4,con:3,imp:0,exp:9272355}

where com is commodity code, cat is product category, cou is country, con is continent, imp is imports and exp is exports.

For any country + commodity combination, there will be only one record.
What I’m interested to get in the tree I will use for the treemap are exports. That is what will determine the size of the leaves of the tree.

So…
first I am going to nest my array:

var byProduct=pv.nest(data) 
	.key(function(d) {return d.cou;})
	.key(function(d) {return d.cat;})
	.key(function(d) {return d.com;})

once I have written this I could follow up with a .entries() statement which would return me a nested array, or with rollup() which could give me the tree I need.
Since, again, there is only one record for a combination of country (cou) and commodity (com), I can use any aggregation I want.

I define this function:

function rollup(data) {return pv.sum(data, function(d) {return d.exp;});} 

It returns the sum of all the export values. Since there is just one record, what it does is that it gives me the one export value I need in a tree form.

So the complete statement is:

function rollup(data) {return pv.sum(data, function(d) {return d.exp;});} 

var byProduct=pv.nest(data) 
	.key(function(d) {return d.cou;})
	.key(function(d) {return d.cat;})
	.key(function(d) {return d.com;})
	.rollup(rollup)

This creates a tree, nested by country, then by product category, then by commodity. The corresponding values are the exports.

now creating my treemap data dynamically saves me a ton of hassle compared to trying to come up with a data file of the right shape and size, not mentioning the calculation errors which creep in each manual manipulation !

Another point of interestingness: how I computed the data to create the bar charts on the side.
For the left treemap (and left bar chart) the user has selected a country. (and for the right ones, it’s a given product, but let’s focus on the left side, the reasoning is the same for the other side anyway).

so first I am going to take the tree we made earlier and just look at the selected country. We can do that with a statement like:

myProductTree=byProduct[selCountry];

(so now we have a tree with just 2 levels, product category and commodity).

Now I can’t run pv.nest and all that on a tree. I need an array! so I have to use flatten to turn that section of the tree into a bona fide array which I will be able to further process.

catsByCountry = pv.flatten(myProductTree).key("cat").key("com").key("exp").array(); 

Here, note that the arguments: “cat”, “com”, “exp” are completely arbitrary. But, since I’m recreating the array almost as it originally was, I might as well use the same names for the keys.

So now, I have like a little subset of my original dataset, only the records of the selected country.
I can now proceed to sum exports by categories using a standard rollup method, just as we’ve seen here.

catsByCountry = pv.nest(catsByCountry).key(function(d) {return d.cat;}).rollup(rollup);

Conveniently, the rollup function that I defined earlier sums the records! and here I do need summing, not any aggregation.

The problem is that the rollup() method creates an associative array, and if I need to use that in a bar chart I need a proper array! so, I use pv.values() which does just that, it creates an array out of the values of an associative array.

catsByCountry = pv.values(catsByCountry);

Now the values can vary a lot in absolute terms depending on the selected country. This is why in the actual bar chart, I use pv.normalize() to have only values from 0 to 1 which are much more convenient to plot.

vis.add(pv.Bar)
	.data(function() pv.normalize(catsByCountry))

one last thing, to save space in the data set (which means: bandwidth + loading time) I’ve used short keys in my data file, and I’ve used codes for countries, commodities and the like.

so I have this:

{com:"04",cat:0,cou:4,con:3,imp:0,exp:9272355}

instead of

{
    com:"CEREALS,CEREAL PREPRTNS.",
    cat:"Food and live animals",
    cou:"Algeria",
    con:"Africa",
    imp:0,exp:9272355}

to get the names of the countries, categories etc. I have in my data file variables that associate, say, a country code to its long name, its continent etc.
so I can have to write things like:

countries["4"].name+" ("+continents[countries["4"].continent]+")"

instead of something simpler, but it’s a good trade-off because writing those names in full in the original dataset inflates the size of the file to megabytes (there are approx 10.000 records).