La france qui exporte

This week, I was made aware of a new set of maps by French ministry of Foreign Trade, called cartographie de la France qui exporte (map of France exports) (link). Since I’m interested in the topic and that I know that French public services have killer cartographers I was eager to see what was so exciting about the first set of online maps on French exports.

I was a little underwhelmed to be honest. Online here meant static pdf files, although this is a dataset that just begs to be explored and manipulated.
On top of that, those where basic choropleth maps with markers such as this one here:

Now this map has two problems. First, it’s a choropleth with a discrete scale, but the values of adjacent areas can vary a lot. So, if you look at this portion of the map, what can be deduced on the values? not much I’m afraid.

Second, it’s difficult to compare the marks on the map. Which region has the biggest? the smallest? how do two specific regions compare? with this representation, this type of question is even more difficult to answer than with a table.

Also these charts answer one partial question. So this one, here, shows which region exports most food products. But to where? and how about the imports and balance? now if one given view was the most relevant and could illustrate some important finding, it can be highlighted but here the website gives us collections of many of such maps. As a citizen I’m leaving no more informed than I was.

Not being the one to criticize without proposing an alternative, I whipped out an interactive exploratory tool of France trade flows.

(The interactive vis is too wide to conveniently fit in a blog, but clicking on the image will open it in a new tab).

I don’t have access to the same dataset so I can’t show a strict equivalent. My data comes from COMTRADE, the UN database of trade flows, and shows all exports and imports to France in 2009. They are not broken down by region or by type of company, but I got the flows by partner country and product category.
The idea is that one can select something on one treemap to update the other. Also, it’s possible to alternate between a categorical view (where all groups of products and continents look neatly separated) and a view of the balance, which quickly shows which products or which countries get the bulk of French trade.


(technical explanation follows for those interested in the code proper)
Now following last week’s tutorial, of course it had to be done in protovis.
Actually it illustrates some interesting principles of working with arrays, trees, maps etc.

First, I want to do as much data manipulation as possible in protovis as opposed to manually. So, my source data for the treemap is stored as an array of associative arrays, which is probably the preferred form in protovis. This is no different than, for instance, Protovis’s barley example.
Now how do you get something of the shape -

var data=
[{com:"02",cat:0,cou:4,con:3,imp:0,exp:101421},
{com:"03",cat:0,cou:4,con:3,imp:9716,exp:0},
{com:"04",cat:0,cou:4,con:3,imp:0,exp:9272355},
{com:"05",cat:0,cou:4,con:3,imp:531587,exp:0},
{com:"07",cat:0,cou:4,con:3,imp:0,exp:83360},
...
{com:"08",cat:0,cou:4,con:3,imp:0,exp:2779}

to something shaped like a tree like:

var tree=
{0: {
       02: 101421,
       03: 0,
       04: 9272355,
       05: 0,
       07:83360},
...

The solution is to use the rollup method.

First, if you look at my individual records, they are of the shape:

{com:"04",cat:0,cou:4,con:3,imp:0,exp:9272355}

where com is commodity code, cat is product category, cou is country, con is continent, imp is imports and exp is exports.

For any country + commodity combination, there will be only one record.
What I’m interested to get in the tree I will use for the treemap are exports. That is what will determine the size of the leaves of the tree.

So…
first I am going to nest my array:

var byProduct=pv.nest(data) 
	.key(function(d) {return d.cou;})
	.key(function(d) {return d.cat;})
	.key(function(d) {return d.com;})

once I have written this I could follow up with a .entries() statement which would return me a nested array, or with rollup() which could give me the tree I need.
Since, again, there is only one record for a combination of country (cou) and commodity (com), I can use any aggregation I want.

I define this function:

function rollup(data) {return pv.sum(data, function(d) {return d.exp;});} 

It returns the sum of all the export values. Since there is just one record, what it does is that it gives me the one export value I need in a tree form.

So the complete statement is:

function rollup(data) {return pv.sum(data, function(d) {return d.exp;});} 

var byProduct=pv.nest(data) 
	.key(function(d) {return d.cou;})
	.key(function(d) {return d.cat;})
	.key(function(d) {return d.com;})
	.rollup(rollup)

This creates a tree, nested by country, then by product category, then by commodity. The corresponding values are the exports.

now creating my treemap data dynamically saves me a ton of hassle compared to trying to come up with a data file of the right shape and size, not mentioning the calculation errors which creep in each manual manipulation !

Another point of interestingness: how I computed the data to create the bar charts on the side.
For the left treemap (and left bar chart) the user has selected a country. (and for the right ones, it’s a given product, but let’s focus on the left side, the reasoning is the same for the other side anyway).

so first I am going to take the tree we made earlier and just look at the selected country. We can do that with a statement like:

myProductTree=byProduct[selCountry];

(so now we have a tree with just 2 levels, product category and commodity).

Now I can’t run pv.nest and all that on a tree. I need an array! so I have to use flatten to turn that section of the tree into a bona fide array which I will be able to further process.

catsByCountry = pv.flatten(myProductTree).key("cat").key("com").key("exp").array(); 

Here, note that the arguments: “cat”, “com”, “exp” are completely arbitrary. But, since I’m recreating the array almost as it originally was, I might as well use the same names for the keys.

So now, I have like a little subset of my original dataset, only the records of the selected country.
I can now proceed to sum exports by categories using a standard rollup method, just as we’ve seen here.

catsByCountry = pv.nest(catsByCountry).key(function(d) {return d.cat;}).rollup(rollup);

Conveniently, the rollup function that I defined earlier sums the records! and here I do need summing, not any aggregation.

The problem is that the rollup() method creates an associative array, and if I need to use that in a bar chart I need a proper array! so, I use pv.values() which does just that, it creates an array out of the values of an associative array.

catsByCountry = pv.values(catsByCountry);

Now the values can vary a lot in absolute terms depending on the selected country. This is why in the actual bar chart, I use pv.normalize() to have only values from 0 to 1 which are much more convenient to plot.

vis.add(pv.Bar)
	.data(function() pv.normalize(catsByCountry))

one last thing, to save space in the data set (which means: bandwidth + loading time) I’ve used short keys in my data file, and I’ve used codes for countries, commodities and the like.

so I have this:

{com:"04",cat:0,cou:4,con:3,imp:0,exp:9272355}

instead of

{
    com:"CEREALS,CEREAL PREPRTNS.",
    cat:"Food and live animals",
    cou:"Algeria",
    con:"Africa",
    imp:0,exp:9272355}

to get the names of the countries, categories etc. I have in my data file variables that associate, say, a country code to its long name, its continent etc.
so I can have to write things like:

countries["4"].name+" ("+continents[countries["4"].continent]+")"

instead of something simpler, but it’s a good trade-off because writing those names in full in the original dataset inflates the size of the file to megabytes (there are approx 10.000 records).

 

8 thoughts on “La france qui exporte

  1. Hi Jerome,

    again, an excellent tutorial!! I really enjoyed your analysis of the maps, and also your explanation of how you’ve implemented the treemap in Protovis is very clear. I also looked at your source code, and it’s really good to see that most of it is actual Protovis code, and not really much custom code. I also like the idea of 2 connected treemaps very much. I can’t remember seeing this anywhere before! So again, a great tutorial, I really enjoyed it.

    I have a few suggestions though, so I hope you don’t mind:

    - I saw in your source code that there were some out-commented alert statements. I don’t know which tools you use for creating this, but one thing that may ease the development is using FireFox, and especially use the FireBug plugin. This allows you to write statements like: console.log(‘bla bla bla’) which gets written to the console window of FireBug. This will relieve you from alerts where you must always click a button to remove the alert. And, if you have a Mac, I highly recommend TextMate as a tool, because you can open a preview window that gets updated as you type. This is very very useful when creating a Protovis visualization: immediate feedback works great!!

    - in your critique on the map you say that it is hard to say ‘which is the biggest, which the smallest?’. I think in your version these answers are easier to answer with the barcharts for each treemap, but I also think that these questions can be even more easily answered if you sort the barcharts on size, so that the biggest is always on top.

    - I am not totally convinced by the way you use labels and bar charts. First, the readability of the labels is now dependent on the color of the barchart, and darker bar colors make the labels a little more difficult to read. Also, since both the labels and the barchart occupy the same space, the length of the text of a label somehow seems to be mixing with the length of the bars. So visually it is a little confusing to me.

    - I wonder if the interaction / visual feedback could be improved if you could somehow see which of the areas in a treemap you have selected. Right now you can click on it, but afterwards you don’t see which one that was

    - the number of digits in the labels are often very long (around 10 digits) and I find it a little hard to comprehend such a large number. I think it would be easier to comprehend if the labels showed the numbers in billion or something. I may be wrong, but the level of detail shown right now may not be relevant.

    - finally, as someone from the Netherlands I tried to find the Netherlands of course. Although I don’t have a solution just yet, it is pretty hard to find a specific country. But this also depends on the purpose of your visualization of course.

    I hope you find my suggestions useful, and I am looking forward to see your next post, tutorial or creation!! Great work, Jerome!!!

  2. Hi JanWillem and thanks for your message.
    yes out-commented alert statements are shabby… I removed them ! so no one can see what you’re talking about now :)
    I use chrome console mostly. I think console.log works there too! and – I’m a PC.

    now on the substance, ok visually the solution is far from perfect. actually I wanted to show two things.
    when you try to do something complex in protovis, you’d better get your data in the shape:

    [{key1:… , key2:…, , keyn:…},
    {key1:… , key2:…, , keyn:…},
    {key1:… , key2:…, , keyn:…},

    ]

    because from this shape you can do anything. from this I can do 2 different treemaps, bar charts with an interesting aggregation, etc.
    when I was starting with protovis I would have prepared different sets of datasets in excel, which is error-prone, time-consuming, and wastes a lot of space.

    now bar charts, order them or not?
    I’m a devout follower of Stephen Few, so I usually like them ordered. But the basic idea was to use them as legend to show which color is what. now if I ordered them, well, their order would change from country to country so it would be more difficult to see the evolution of each bar. also, the aggregation formula which comes up with a single array would have to be slightly trickier to keep the color info. Here, I can get away with just having a simple array and use this.index to get the right color because the order of the bar is always the same as the order of the product categories. If I had them re-ordered, I would have to use a 2 dimensional array to keep the category information, which is not too difficult but was getting obfuscated.
    I agree, though, that the way the text gets in the way of the bars is not ideal, as the choice of colours etc. :(

    yes, there should have been a form of visual feedback on the treemap you click in. I’ve added something. I also changed how numbers are displayed, although the solution based on “title” is not ideal…

    there could be a way to lookup countries like in the protovis treemap example using regular expressions etc. maybe for the next project :)

  3. You’re right about the data structure, that’s also my experience so far. In hindsight my previous Protovis projects were not always very effective. I am going to submit my entry for the Visualizing.org contest, and I am using a similar data structure. The result is much cleaner code. I also picked up some of the things you mentioned in your tutorial series, like the way you access parent data with a function with 2 arguments :)

    I see your point with the bar charts, and you’re right that this way it’s easier to see the change per continent.

    The changes you’ve made to the selected item and to the labels works really well!

    Looking forward to see your next project! :)

  4. Hi Agnieszka, now that the code is written I could do the same for Russia or any country in less time than writing this reply, so if you want this for your blog just let me know

Leave a Reply