Cantonales suite et fin: duels et transferts de voix

[A note to my English readers – as you’d have probably guessed by the title in French, this post about French politics is in French again, but I’ll soon resume posting in English]
Dimanche 27 a eu lieu le deuxième tour des élections cantonales, dont j’ai déjà un peu parlé ici ou ici. Et qui dit second tour dit duels :


Comme on peut le voir, le “classico” du second tour c’était le duel UMP-PS (336 cas). Mais ce qui intéressait tout le monde c’était plutôt les cas où le FN était présent: 403 scrutins. Alors que s’est-il passé? est-ce que les voix se sont bien transférées?

Ce que j’ai lu suite aux cantonales, c’est que si les électeurs de gauche avaient fait barrage au front national dans le cas des duels entre le FN et la droite, ce n’était pas trop le cas des électeurs de droite qui s’étaient souvent contentés de laisser gagner la gauche sans se mobiliser davantage. Alors voyons ce que disent les chiffres.

Sur les 403 cantons où le front national s’est maintenu, il y a eu 266 duels avec la gauche et 127 avec la droite. On va passer sur les 10 autres cas de figure (triangulaires, duel avec une autre formation etc.) parce que ce qui nous intéresse ici c’est de voir pour qui vote un électeur d’une grande faction qui ne se retrouve pas au second tour.

Dans le cas des duels droite-FN, il y a presque 240,000 électeurs qui avaient voté à gauche au premier tour et qui se sont fait confisquer leur vote. Bien plus que les voix supplémentaire récoltées par les candidats de droite (environ 150,000)! De même, il y a eu environ 280,000 électeurs de droite qui n’ont pas pu soutenir leur candidat au deuxième tour, alors que les candidats de gauche n’ont récupéré que 190,000 voix en plus.

Dans les deux cas, la participation a augmenté: environ 210,000 votants en plus. Pourtant, il y a eu une augmentation du nombre de votes nuls (plus de 150,000 supplémentaires) et surtout, un fort gain des voix du FN (presque 300,000).

Qu’est ce qu’on peut en déduire?

L’idée d’un front républicain, d’une mobilisation massive d’électeurs de tout bord qui empêcheraient systématiquement la victoire d’un candidat FN est bien mise à mal. Dans ces 403 cantons, la participation n’était que de 55.28%, pas très différente de la moyenne nationale. On a déjà vu plus mobilisé!

En plus, il n’y a pas de report systématique des voix vers l’adversaire du FN (et ce dans les deux cas, gauche/FN ou droite/FN). En fait, le FN est même capable de gagner des voix entre les deux tours et même beaucoup de voix! Bref, il n’y aura plus d’effet 5 mai 2002 et tout est désormais possible pour les candidats du FN.

 

Working with data in protovis – part 2 of 5

Previous post: simple arrays

Multi-dimensional arrays, associative arrays and protovis

Even for a simple chart, chances are that you’ll have more than a single series of numbers to represent. For instance, you could have labels to describe what they mean, several series, and so on and so forth.
So, let’s say we want to add these labels to our original examples, so we know what those bars represent.

var categories = ["a","b","c","d","e","f"];
var vis = new pv.Panel()
  .width(150)
  .height(150);

vis.add(pv.Bar)
  .data([1, 1.2, 1.7, 1.5, .7, .3])
  .width(20)
  .height(function(d) d * 80)
  .bottom(0)
  .left(function() this.index * 25)
  .anchor("bottom").add(pv.Label)
    .text(function() categories[this.index]);

vis.render();

While this did the trick, nothing guarantees that the data proper and the category name will stay coordinated. If one data point is deleted or removed and this is not replicated on the categories, they will no longer match. A more integrated way to proceed would be to group category and data information, like this:

var data = [
  {key:"a", value:1},
  {key:"b", value:1.2},
  {key:"c", value:1.7}, 
  {key:"d", value:1.5},
  {key:"e", value:.7},
  {key:"f", value:.3}
];
var vis = new pv.Panel()
  .width(150)
  .height(150);

vis.add(pv.Bar)
  .data(data)
  .width(20)
  .height(function(d) d.value * 80)
  .bottom(0)
  .left(function() this.index * 25)
  .anchor("bottom").add(pv.Label)
    .text(function(d) d.key);

vis.render();

This time, we group the values and the category names in a single variable, an array of associative arrays.
When drawing the bar chart, protovis will go through this array and retrieve an associative array for each bar.
We have to change the way the height function is written. The data element being accessed is no longer of the form 1 or 1.7, but {key:”a”, value:1} or {key:”c”, value:1.7}. So to get the number part of it, we must write d.value.

Likewise, instead of accessing an array of categories for the text part, we can use the current data element via an accessor function, and write d.key.

Hierarchy and inheritance

So we’ve seen that arrays, or associative arrays, can have several levels and can be nested one into another.
Interestingly, protovis elements, like panels, charts, mark etc. also work in a hierarchy. For instance, when you start working with protovis you create a panel object. Then, you add other objects to that panel, like a bar chart (our example), or another panel. You can add other objects to your new objects, or attach them to your first panel.
This diagram shows the hierarchy between elements in the previous example.

var categories = ["a","b","c","d","e","f"];
var vis = new pv.Panel()
  .width(150)
  .height(150)

var bar = vis.add(pv.Bar)
  .data([1, 1.2, 1.7, 1.5, .7, .3])
  .width(20)
  .height(function(d) d * 80)
  .bottom(0)
  .left(function() this.index * 25)
  .anchor("bottom").add(pv.Label)
  .text(function() categories[this.index]);

vis.render();

The bar object is considered to be the child of vis, who is its parent.

You may know that in protovis, children objects inherit properties of their parents.

For instance, if width wasn’t specified for the bar object, it would have the width of its parent, 150. Each mark would cover the whole screen.

For data, when a new object is added, data is either specified at that level, or obtained from the parent element of this object.

Let’s take our example and tweak it a bit.

var vis = new pv.Panel().width(150).height(150);
var bar = vis.add(pv.Bar)
  .data([1, 1.2, 1.7, 1.5, .7, .3]).width(20) .bottom(0)
  .height(function(d) d * 80).left(function() this.index * 25)
  .anchor("top").add(pv.Label)
vis.render();

Here, I didn’t specify a data or a text value for the labels I added. They just took the value of its parent element – the marks of the pv.Bar object.
Here’s another variation:

var vis = new pv.Panel().width(150).height(150);
vis.add(pv.Panel)
  .data([1, 1.2, 1.7, 1.5, .7, .3]) .left(function() this.index * 25)
  .add(pv.Bar).width(20) .bottom(0)
  .height(function(d) d * 80)
  .anchor("top").add(pv.Label)
vis.render();

Here, I’m adding panels, then a bar in each panel.
From the root panel, I’m adding a group of panel with this data: [1, 1.2, 1.7, 1.5, .7, .3].
Since there are 6 elements here, I’m adding 6 panels.
Here, the left method applies to each of the panel. The first one is to the left, the next one is 25 pixels further, etc.
I’m then adding a bar object to each panel. Is that one group of bar? Technically yes, but each has only one element! Each pv.Bar implicitly gets the data element of its parent, so the first bar gets [1], the next one gets [1.2], etc. The height of each bar is determined by multiplying the value of that element by 80.
Note that since the fillStyle properties are not defined for the bars, they get the ones which are attributed by default, which explains the color changes.

Further refinement: accessing the data of its parent!

var vis = new pv.Panel().width(150).height(150);
vis.add(pv.Panel)
  .data([1, 1.2, 1.7, 1.5, .7, .3]) .left(function() this.index * 25)
  .add(pv.Bar).width(20) .bottom(0)
  .height(function(a,b) b * 80)
  .anchor("top").add(pv.Label)
vis.render();

Well, the output is exactly the same, but how I obtained the data is different. Instead of getting the data using the standard accessor function, I passed two arguments: function(a, b).
What this does is that the first argument corresponds to the current data element of this object, and the second, to that of its parent.

In this example, they happen to be the same, but this is how you can access the data of the parent objects.

Putting it all together

Let’s see how we can use protovis and the properties of hierarchy! This example is less trivial than the ones we’ve seen so far but with what we’ve seen it is quite accessible.
The challenge: re-create square pie charts.
How it’s done:

var data=[36,78,63,24],  // arbitrary numbers
cellSize = 16,
cellPadding = 1,
squarePadding=5,
colors=pv.Colors.category20().range()
;

var vis=new pv.Panel()
    .width(4*(10*cellSize+squarePadding))
    .height(10*cellSize)
    ;

var square = vis.add(pv.Panel)
    .data(data)
    .width(10*cellSize)
    .height(10*cellSize)
    .left(function() this.index*(cellSize*10+squarePadding))
    ;

var row = square.add(pv.Panel)
	.data([0,1,2,3,4,5,6,7,8,9])
	.height(cellSize)
	.bottom(function(d) d*cellSize)
	;

var cell = row.add(pv.Panel)
    .data([0,1,2,3,4,5,6,7,8,9])
    .height(cellSize-2*cellPadding)
    .width(cellSize-2*cellPadding)
    .top(cellPadding)
    .left(function(d) d*cellSize+cellPadding)
    .fillStyle(function(a,b,c) (b*10+a)<c?colors[this.parent.parent.index].color:"lightGrey")
;

square.anchor("center")
  .add(pv.Label)
    .text(function(d) d)
    .textStyle("rgba(0,0,0,.1)")
    .font("100px sans-serif");

vis.render();

First, we initiate the data (4 arbitrary numbers from 1 to 100), and various parameters which will help size the square pie – size of the cells, space between them, space between the square pie charts. We also initiate a color palette.
Then, we are going to create 4 panels or groups of panels, each a child of the previous one.
First comes the vis panel, which groups everything,
Then the square panels, which correspond to each square pie. This is to this panel that our data is assigned.
Then come the row panels, and, finally, the cell panels.
The numbers which we want to represent are assigned to the square panel. So, what data are we passing to the row and the cell panels? The only thing we want is to have 10 rows and 10 cells per row. So, we can use an array with 10 items. We are going to use [0,1,2,3,4,5,6,7,8,9] so the data value of the row and that of the cell will correspond to the coordinate of the row and the cell, respectively. In other words, the 5th row will be assigned the data value of 4, and the 7th cell in that row will get the data value of 6. We could retrieve the same numbers using “this.index” but this can lead to obfuscated formulas.

Note that in the next part of the tutorial, we’ll see that in Protovis, there is a more elegant way to write [0,1,2,3,4,5,6,7,8,9] or similar series of numbers. But, we’ll leave this more explicit form for now.

Back to our row panel. We position it with bottom(function(d) d*cellSize). Here, again, d represents the rank of the row, so the 1st row will get 0, and its bottom value will be 0, the next row will get 1, and its bottom value will be 1*cellSize or 16, etc.

Likewise, in the cell panel, the cells are positioned with left(function(d) d*cellSize+cellPadding). This is the same principle. (here, cellPadding is just used to fine-tune the position of the cell).

This is in the final line that we are really going to use hierarchy.

.fillStyle(function(a,b,c) (b*10+a)<c?colors[this.parent.parent.index].color:"lightGrey")

here, a represents the data value of the cell – in other words, the column number.
b, the data value of the cell’s parent, the row. This is the row number.
and c is the data value of the parent of the parent of the row, the square – this is the number that we are trying to represent.

so, what we are going to determine is whether b*10+a<c. If that’s the case, we color the cell, else, we leave it in pale grey. To get the color we need, we go back to the palette that we defined at the beginning, and take the color corresponding to the square number (0 to 3 from left to right).
The square number can be obtained by this.parent.parent.index.

Finally, we add the numbers in large transparent gray digits on top of the squares.

Here is the result:

Next: Javascript and protovis array functions

 

Working with data in protovis

For the past year or so I have been dabbling with protovis. I don’t have a heavy CS background but protovis is supposedly easy to pick up for people like me, who are vaguely aware that computers can make calculations but who need to check the manual for the most mundane programming instructions.

I found was while it’s reasonnably easy to modify the most basic examples to make stuff happen, it is much harder to understand or adapt the more complex ones, let alone to create a fairly complex visualization.

The stumbling block for me was the use of the method data. Data is used to feed all the other protovis methods with, well, data. In the simplest examples, the data which is passed is very plain and simple and so easy to understand. But, for slightly more advanced uses, the shape of data is increasingly complex and the very powerful methods that protovis uses to process and reshape data are just out of reach for a noob.

So I started documenting my struggle with data, first for my own use, and eventually realized I could share what I learned. This is it.

I split this tutorial in 5 parts.

  1. First, we’ll look at the humble arrays and how protovis works with them.
  2. Then, we’ll talk about multi-dimensional arrays, associative arrays, hierarchy and inheritance.
  3. Third, we’ll take a break from protovis and look at the javascript methods to work with arrays, such as sorting.
  4. [update] Since I first published the array part I wrote a supplement
  5. We’ll then check out the powerful protovis data processing methods, such as those who reshape complex arrays.
  6. Finally, we’ll see how data must be prepared to work with the built-in layouts, such as treemaps, force-directed layouts etc.

And as a bonus, I have also deconstructed several interesting (but not immediately accessible) examples from the gallery:

To make the best use of this material, it would be helpful to know a bit about protovis. The best ways to get started are:

Now that’s said, en route for the 1st part!

 

VisWeek 2010: the one-minute edition.

Visweek 2010 is just over.

With lectures and presentations going on in up to 4  rooms simultaneously for 6 straight and very full days, it’s impossible to see everything let alone to describe it. And even that would be ignoring all exchanges and social interactions which are precisely the point of visweek.

So instead, I’m showing what I liked best,  one image per day.

 

Playing with Tableau contest datasets

I’ve played a bit with the other 3 datasets of the Tableau Public contest. When I get to see what others have done, it will be easier to take something from that after having manipulated them. The one I’ve spent most time with is the US budget spending one. Here’s the sheet I came up with:

(if the viz doesn’t show in the blog, here’s the direct link)

a few explanations:

Unit: % of GDP

The dataset covers almost 40 years, and includes a notion of inflation. But even with that it’s too difficult to compare spending over time. Instead of trying to convert everything to 2009 constant dollars, it’s easier (and it makes more sense) to compare everything as percentage of GDP.

Filter: by function

The original dataset lists over 30 departments. I don’t think they are immediately comparable as is, some being much bigger than others. Besides, it’s just too complicated to ask people to choose between 30 items to make comparisons. So, instead, I grouped several departments by function, as defined by the COFOG (classifications of functions of government, a UN classification). To be honest I wasn’t extra careful when I assigned some departments to a function, for instance Veteran Affairs could have been assigned to Defense or to Social Protection (I chose the latter).  But the assignments are fair. The added bonus is that using functions enables us to make international comparisons:

Comparing with OECD values

Not too long ago I made a chart comparing OECD countries’ budget expenditures. So what I didn’t like about this dataset is that it didn’t give a way to determine whether US spendings in such or such area were high or low. From the dataset proper, one can tell that, for instance, that social protection expenses were never as high as in 2009. But are they really “high”? Or – defense expenditure were at an all-time low in 1999. But were they really low?

Comparing with other values help answer those questions. To continue on these 2 examples, social protection expenditure, in 2009, was 7.2% – a much higher share than in 1965 (3.9%) but still very low compared to OECD countries – the average being 15.2%. Conversely, defense, in 1999, only represented 3.1% of GDP – it was as high as 9% during Viet-Nam, and it’s almost 5% today. Meanwhile, the OECD average is 1.4%.

Again, that comparison is not very scientific, because the numbers used for those OECD averages include other levels of government (states, cities…) which are not included here. But still, they help putting the dataset in the context.

 

Tableau contest, additional views

For the tableau contest, I had prepared many other views which I didn’t include in the final dashboard. Here’s a couple:

This here is the number of points in the adult obesity rates which are not explained by the median income of the county. In other words, marks that appear very green are counties where people are less obese than counties with similar incomes, and conversely, red marks show counties where obesity is more widespread than expected by income alone.
This suggests that there could be some cultural/regional explanations to the health situation. People in the mountains area, especially Colorado, show as very green, while people in the old South are very red. Mid West and North West are average, New England and Florida tend to be better than average. Yesterday, there was a show on French TV calling Houston the fat capital of the USA, explaining that by cultural reasons. But the fact is, obesity rates in Harris county are lower than average, and on this map the corresponding mark is green, showing that factors other than income play a positive, not negative role. I love it when facts and numbers get in the way of a nice story.

and this is put together, say, just for aesthetic purposes. it categorizes counties by their median income (in increments of 1000$, X-axis) and their obesity rates (by chunks of 0.5%, Y-axis) and plots the total population of the counties that fill both criteria. Then again, it doesn’t show the actual number of people who are in a specific income and obesity bracket, it just adds the population of whole counties.

 

My tableau contest entry

so here it is. I chose to compete on the Activity Rates and Healthy Living data set, because after downloading it I really enjoyed exploring it.

If the viz doesn’t show well in the blog, here’s a link to its page

My main reason for entering the contest is to be able to see what others have done. There are obviously many, many ways to tackle this and I am very much looking forward to see everyone’s work! my interactions with the Tableau community, especially through the forum, have always been very rewarding and what better way to learn than from example!

So for the fellow contestants that will see my work, here is my train of thoughts for the dashboard.

The dataset

I’m aware of USDA’s food environment atlas. It’s an application where people can see various food-related indicators on a map. The dataset we were handled is actually the background data of this. So, there is already a place where people can consult food indicators.

Now this beeing Tableau and all, I wanted to create an analytical dashboard where people could understand if and how the input variables affected the output variables.

The dataset consists mostly of input variables: various indicators that influence how healthy a local population is. That status (output) is expressed through a few variables, such as adult and child obesity rates and adult diabetes rates. Those variables are highly correlated with each other, so in my work I chose to focus on adult obesity rates which is the simplest one.

Now, inputs. The rest of the variables fall in several categories:

  • income (median household income, poverty rates);
  • diet (consumption of various food items per capita);
  • shopping habits (for various types of stores or restaurants, the dataset would give their number and the money spent in each county, both in absolute numbers and per capita);
  • lifestyle information (data on households without cars and far from stores, on the physical activity level of the population, and the facilities offered by the state);
  • pricing variables (price ratios between some “healthy” food items and some less healthy, equivalent food items, for instance fruits vs. snacks; tax information on unhealthy food);
  • policy variables (measuring participation to various programmes such as SNAP or WIC);
  • socio-demographic variables (ethnic groups in population, “metro” status of county, whether the county was growing and shrinking, and voting preferences).

Yes, that’s a lot of variables (about 90, plus the county and state dimensions).

Oddly enough, there wasn’t a population measure in the dataset, and many indicators were available in absolute value only, so I constructed a proxy by dividing two variables on the same subject (something like “number of convenience stores” and “number of convenience stores / capita”).

That enabled me to build indicators per capita for all subjects, so I could see if they were correlated with my obesity rates.

Findings – using Tableau desktop to make sense of the dataset

The indicators which were most correlated with obesity were the income ones, which came as no surprise. All income indicators were also very correlated to each other. In the USA, poverty means having an income below a certain threshold which is defined at the federal level. But in other contexts, poverty is most often defined in relation to the median income (typically, a household is in poverty if its income is below half of the median income), so it can be used to measure inequality of a community, and dispersion of incomes.

As a result, many indicators appear to be correlated with obesity because they are not independent of income. This is the case for instance for most of the policy indicators: if a programme has many recipients in a county, it is because poverty is widespread, so residents are more likely to be affected by obesity. This makes it difficult to measure the impact of the programmes with this dataset. This is also the case, unfortunately, for racial indicators, as most of the counties with a very high black population have a low income.

Diet indicators also appear to be uncorrelated with obesity. This is counter-intuitive – isn’t eating vegetables or fresh farm produce the most certain way to prevent obesity? But one has to remember that this dataset is aggregated at the county level. Just because a county has a high level of, say, fruits consumption per capita doesn’t mean that every household is eating that much. Realistically, consumption will be very dispersed: the households where people cook, which are less likely to be affected by obesity, will buy all the fruits, and those where people don’t cook will simply buy none. Also, just because one buys more vegetable than average doesn’t mean they don’t also buy other, less recommended foodstuff.

The only diet indicator that appear to be somewhat correlated to obesity is the consumption of soft drinks.

When it comes to lifestyle habits, surprisingly, the proportion of households without car and living far from a store – people likely to walk more, so to be healthier – is positively correlated with obesity. This is because counties where this indicator is high are also poorer than average – again, income explains most of this. However, physical activity in general plays a positive role. States where people are most active, such as Colorado, enjoy the lowest obesity figures. In fact, all the counties with less than 15% of obesity are in Colorado.

Finally, pricing didn’t seem to have much impact on neither obesity, nor consumption. Why is that? Economists would call this “low price elasticity”, meaning that price changes do not encourage people to switch products and habits. But there is another explanation. Again, people who can’t cook are not going to buy green vegetables because they are cheaper. Also, consider the tax amount that are applied: no more than 7% in the most aggressive states. Compare that figure to the 400%+ levy that is applied to cigarettes in many countries of the world! Clearly, 4-7% is not strong enough to change habits. However, this money can be used to sponsor programmes that can help people adopt safer behaviors.

What to show? making the visualization

First, I wanted to show all of those findings. If 2 variables that you expect to be correlated (say, consumption of vegetables and obesity) are in fact not correlated, a point is made! But visually, nothing is less interesting than a scatterplot that doesn’t exhibit correlation. It’s just a stupid cloud of dots.

So instead I chose to focus on the correlations I could establish, namely: obesity and income, and obesity and activity. Those are the 2 lower scatterplot of my dashboard. I chose the poverty rate measure, because I’d rather have a trend line going up, than going down.

I duplicated that finding with a bar chart made with median income bins. For each bin (which represent all the counties where the median income fall in that range), I would plot the average obesity rate, and, miracle! this comes up as a markedly decreasing bar chart. Now, this figure doesn’t establish correlation, let alone causality, but it certainly suggests it more efficiently than a scatterplot. Also, it can be doubled as a navigation aide: clicking on a bar would highlight or select the relevant counties.

Finally, I decided to do a map. Well, actually, it was the first thing I had done, but  had second thoughts about it, and eventually I put it in. Why? first, to allow people to look up their county. Technically, my county is Travis county (Austin, TX) and I can find it easily on a map. Less so if I have to look for county names listed in order of any of their indicators. I added a quick filter on county name, for those who’d rather type than look up.

I also wanted to see whether there was a link between geography and obesity. So try the following.

  • Where are the counties with obesity rates less than 15% ? Colorado only.
  • If we raise the threshold a little, we get San Francisco and New York. But until 20%, these counties remain very localized.
  • Likewise, virtually all counties above 35% are in the South – Alabama, Louisianne, Mississipi.

Population also has an importance. The counties with a population above 1m people tend to have lower rates – their citizens also usually have higher incomes.

I decided to zoom the map on the lower 48 by default. It is possible to zoom out to see Alaska and Hawaii, but I don’t think that the advantage of seeing them all the time is greater than the inconvenient of having a smaller view point even if they are not necessary.

Regarding the marks. Originally, I didn’t assign any variable to their size, but then thought that the larger counties (i.e. LA, Harris (Houston), Cook (Chicago) …) were underrepresented. So I assigned my population proxy to size. But then, the density of the marks competed with the intensity of the color, which was attributed to the obesity rate. So I removed that and chose a size so that marks wouldn’t overlap each other too much. Regarding color, I wasn’t happy with the default scale. If I let it as is, it would consider that 12.5%, the minimum value of the dataset, is an extremely low number. But in absolute terms, it’s not. Most developed countries have obesity rates lower than that value at the national level. Japan or Korea are below 4%. So I made the scale start at 0. But I didn’t like the output: the counties with the highest values didn’t stand out. Eventually, I chose a diverging scale, which helped counties with high and low values to be more visible.

I edited an tooltip card for the view. In another version of the dashboard, I had a sheet with numbers written out that would change depending on which country was last brushed. I like the idea that this information can stay on. But I got confused in the configuration of the actions, and couldn’t completely prevent the filter that applied to this sheet to be disabled, sometimes, which caused number for all counties to overlap, and an annoying downtime as that happens. So I made an tooltipinstead. Anyway, it’s easier to format text like this. But the problem is that it can hide a good portion of the dashboard. So I exercised constraint and only chose what I found the 15 or so most relevant variables.

Voilà! that’ s it. I hope you like my dashboard, and I look forward to see the work of others! If you are a contestant, please leave a link to your entry in the comments. Good luck to all!!

 

Mortality data with Tableau Public

Last month I saw this infographic chart put together by GE and GOOD magazine:

While the look and feel is pleasing I was bothered by a few choices of design.

First, homicides and accidental deaths are not taken into account. I suspect that for some demographic categories, they represent a significant proportion of the deaths.

Second, the table doesn’t give an indication of the differences in mortality between the different age groups. For instance, there are over 15,000 deaths per 100,000 people over 85 years old, but only about 130 / 100,000 for young people aged  15-24. So the last item in the right-most column corresponds to much more deaths than the top item in the left-most column, although they have the same visual weight.

Coincidentally, I got to try Tableau Public Beta and thought it would be a good exercise to give it a spin.
The data source is the same. I got my data through the wonder service of the CDC.
Here goes:

By playing with the filters you can see the ranking of the causes of death. For instance, we can see that accidents and homicide are precisely the leading causes of death of young people aged 20 to 24. Now what if you want to see the demographic categories that one given cause of death affects most? Here’s a second visualization:

You can see that certain causes of death, for instance, only affect one gender or the other (such are certain forms of cancer).
I’ve made that last one to illustrate the evolution of mortality with age. No one would be surprised to learn that older people have a higher probablity of dying but by what proportions?

 

The Stiglitz-Sen-Fitoussi commission

Photo credit: Le point. That’s Stiglitz on the right with files, Sen is the one with the mischevious smile and blue shirt, and Fitoussi is the smoker.

Yesterday, I attended the presentation of the report of the Stiglitz-Sen-Fitoussi commission on the measurement of economic performance and social progress.

In a nutshell, since there is a consensus around the limitations of GDP as the main indicator of performance, the commission aims to find other ways to measure the efficiency of an economic policy.

The idea is to build indicators that would be closer to the experience of the citizen, rather than an abstract, expert top-view of a system. The report argues for a system of indicators of well-being, measures of leisure and culture, better environmental indicators and also better ways to understand inequalities, rather than a constant focus on averages and aggregates.

OECD, my employer, was quite involved in the report, as several members of the commission were OECD (or recent ex-OECD) and because there are several programmes at OECD with similar goals.

So the ideas in the report are not very new. But the real breakthrough is the degree of political support the commission is getting.

You’d probably think that governments would take heed of the musings of 5 Nobel prizes and 17 top-notch academics on their own merits. But the reason why GDP is such a popular indicator is not because it’s perfect, but because it’s relatively easy to compute. With a stress of “relatively”: it’s an incredibly complex endeavor on which hundreds of people work full time in every country. But at least, it obeys to very explicit rules and as such it is comparable across nations. Any new indicators would be difficult (read expensive) to set up, and to be effective, they’d require a similar framework, which means a good number of countries following the same methodology to compile data.

That has been the stumbling block. But Sarkozy, who sponsored the report, stated very clearly that he’d put it on the agenda of every international meeting he’ll attend, and that he’ll demand that all international organizations put it into practice. Because of the crisis, and the global inability of statistical offices to prevent it, the timing may well be right to make that claim.