Better life index – a post mortem

Today, OECD launched the Your better Life Index, a project on which I had been involved for months.

The short

I’m happy with the launch, the traffic has been really good, and there was some great coverage.

The three main lessons are:

  • Just because data is presented in an interactive way doesn’t make it interesting. The project is interesting because the design was well-suited to the project. That design may not have worked in other contexts, and other tools may not have worked in this one.
  • The critical skill here was design. There are many able developers, but the added value of this project was the ability of the team to invent a form which is unique, excellent and well-suited for the project.
  • Getting a external specialists to develop the visualization was the critical decision. Not only did we save time and money, but the quality of the outcome is incomparable. What we ended up with was far beyond the reach of what we could have done ourselves.

The less short

Before going any further, I remind my kind readers that while OECD pays my bills, views are my own.

Early history of the project

OECD had been working on measuring progress for years and researching alternative ways to measure economy. In 2008-2009 it got involved in the Stiglitz-Fitoussi-Sen commission which came up with concrete recommendations on new indicators to develop. It was then that our communication managers started lobbying for a marketable OECD indicator, like the UN’s Human Development Index or the Corruption Perception Index from Transparency International.

The idea was to come up with some kind of Progress Index, which we could communicate once a year or something. Problem – this was exactly against the recommendations of the commission, which warned against an absolute, top-down ranking of countries.

Eventually, we came up with an idea. A ranking, yes, but not one definitive list established by experts. Rather, it would be a user’s index, where said user would get their own index, tailored to their preferences.

Still, the idea of publishing such an index encountered some resistance, some countries did not like the idea of being ranked… But at some point in 2010 those reserves were overcome and the idea was generally accepted.

Our initial mission

It was then that my bosses asked  a colleague and I to start working on what such a tool could look like, and we started working based on the data we had at hand. I’ll skip on the details of that but we first came up with something in Excel which was a big aggregation of many indicators. It was far from perfect (and further still from the final result) but it got the internal conversation going.

Meanwhile, our statistician colleagues were working on new models to represent inequality, and started collecting data  for a book on a similar project, which will come out later this year (“How is Life?”). It made sense to join forces, we would use their data and their models, but will develop an interactive tool while they write their book, each project supporting the other one.

From prototypes to outsourcing

It wasn’t clear then how the tool would be designed. Part of our job was to look at many similar attempts online. We also cobbled some interactive prototypes, I made one in processing, my colleague in flash. Those models were quite close to what we had seen, really. My model was very textbook stuff, one single screen, linked bar charts. Quite basic too.

I was convinced that in order to be marketable, our tool needed to be visually innovative. Different, yes, but not against the basic rules of infovis! No 3D glossy pie or that kind of stuff. Unique, but in a good way. There was also some pressure to use the existing infovis tools used by OECD. We have one, for instance, which we had introduced for subnational statistics and which was really good for that, and which we have used since then in other contexts, with mixed success. My opinion was that using that tool as is on this project would bury it.

That’s my 1st lesson here. I’ll take a few steps back here.

In 2005, the world of public statistics was shaken by the introduction of gapminder. The way that tool presented statistics, and the huge success of the original TED talk – which attracted tens of millions of viewers – prompted all statistical offices to consider using data visualization, or rather in our words to produce “dynamic charts”, as if the mere fact that gapminder was interactive was the essence of its success. The bulk of such initiatives was neither interesting nor successful. While interactivity opens new possibilities, it is a means and certainly not than an end in itself. Parenthese closed.

At this stage, the logical conclusion was that we needed to have a new tool developed from scratch, specifically suited for the project. Nothing less would give it the resonance that we intended. My colleague lobbied our bosses, who took it to their bosses, and all the way to the Secretary-General of OECD. This went surprisingly well, and soon enough we were granted with a generous envelope and tasked with finding the right talent to build the tool.

Selecting talent

So our job shifted from creating a tool and campaigning for the project to writing specifications that could be understood by external developers. We had to “unwrite” our internal notes describing our prototypes and rewrite them in a more abstract way, trying to describe functionality rather than how we thought we could implement it (i.e. “the user can select a country” rather than “and when they click on a country, the middle pane changes to bla bla bla”.)

Being a governmental organization we also had to go through a formal call for tenders process, where we’d have a minimum number of bidders and an explicit decision process that could justify our choices.

This process was both very difficult and very interesting. Difficult because we had many very qualified applicants and not only could we only choose one, but that choice had to be justified, vetted by our bosses, which would take time. And it was rewarding because all took a different approach to the project and to the selection process. What heavily influenced the decision process was (nod to the 2nd lesson I outlined) whether the developers showed potential to create something visually unique. We found that many people were able to answer functionally to what we had asked. But the outcome probably wouldn’t match the unspoken part of our specifications. We needed people who could take the project beyond technical considerations, and imbue it with the creative spirit that would make it appealing to the widest audience.

Working with a developer

When we officially started to work with the selected developer – a joint effort by Moritz Stefaner and RauReif, some time had passed since we had introduced the project to them. When Moritz started presenting some visual research (which by the way has very little to do with the final site) I was really surprised by how much this was different what we had been working on. And that’s my 3rd lesson here.

We had become unable to start again from a blank sheet of paper and to re-imagine the project from scratch. We were too conditioned by the other projects we had seen and our past prototypes that we lacked that mental agility. Now that’s a predicament that just can’t affect an external team. Besides, even if we had the degree of mastery of our developers in flash or visual design (and we don’t), we still had our normal jobs to do, meetings to attend and all kind of office contingencies, and we just couldn’t be that productive. Even if we had equivalent talent inhouse, it would still had been more effective to outsource it.

What I found most interesting in our developers approach is that it underplayed the accuracy of the data. The scores of each country were not shown, nor the components of that score. That level of detail was initially hidden, which produced a nice, simple initial view. But added complexity could be revealed by selecting information, following links etc. At any time, the information load would remain manageable.

Two things happened in a second phase. On one hand, Moritz had that brilliant idea of a flower. I instantly loved it, so did the colleagues who worked with me since the start of the project. But it was a very hard sale to our management who would have liked something more traditional. Yet that flower form was exactly what we were after: visually unique, a nice match with the theme of the project, aesthetically pleasing, an interesting construction, many possibilities of variation… Looking back, it’s still not clear how we managed to impose that idea that almost every manager hated. The most surprising is that one month after everybody had accepted that as an evidence.

On the other, the written part of the web site, which was initially an afterthought of the project, really gained in momentum and importance, both in terms of contents and design. Eventually the web site would become half of the project. What’s interesting is that the project can cater to all kinds of depths of attention: it takes 10 seconds to create an index, 1 minute to play with various hypotheses and share the results on social networks, but one could spend 10 more minute reading the page of a country or of a topic, and several hours by checking the reference texts linked from these pages…

Closing thoughts

Fast forward to the launch. I just saw a note from Moritz that says that we got 60k unique visitors and 150k views. That’s about 12 hours after the site was launched (and, to be honest, it has been down during a couple of these 12 hours, but things are fine now)!! those numbers are very promising.

When we started on that project we had an ambition for OECD. But beyond that, I hoped to convince our organization and others of the interest of developing high-quality dataviz projects to support their messages. So I am really looking forward to see similar projects that this one might inspire.

My tableau contest entry

so here it is. I chose to compete on the Activity Rates and Healthy Living data set, because after downloading it I really enjoyed exploring it.

If the viz doesn’t show well in the blog, here’s a link to its page

My main reason for entering the contest is to be able to see what others have done. There are obviously many, many ways to tackle this and I am very much looking forward to see everyone’s work! my interactions with the Tableau community, especially through the forum, have always been very rewarding and what better way to learn than from example!

So for the fellow contestants that will see my work, here is my train of thoughts for the dashboard.

The dataset

I’m aware of USDA’s food environment atlas. It’s an application where people can see various food-related indicators on a map. The dataset we were handled is actually the background data of this. So, there is already a place where people can consult food indicators.

Now this beeing Tableau and all, I wanted to create an analytical dashboard where people could understand if and how the input variables affected the output variables.

The dataset consists mostly of input variables: various indicators that influence how healthy a local population is. That status (output) is expressed through a few variables, such as adult and child obesity rates and adult diabetes rates. Those variables are highly correlated with each other, so in my work I chose to focus on adult obesity rates which is the simplest one.

Now, inputs. The rest of the variables fall in several categories:

  • income (median household income, poverty rates);
  • diet (consumption of various food items per capita);
  • shopping habits (for various types of stores or restaurants, the dataset would give their number and the money spent in each county, both in absolute numbers and per capita);
  • lifestyle information (data on households without cars and far from stores, on the physical activity level of the population, and the facilities offered by the state);
  • pricing variables (price ratios between some “healthy” food items and some less healthy, equivalent food items, for instance fruits vs. snacks; tax information on unhealthy food);
  • policy variables (measuring participation to various programmes such as SNAP or WIC);
  • socio-demographic variables (ethnic groups in population, “metro” status of county, whether the county was growing and shrinking, and voting preferences).

Yes, that’s a lot of variables (about 90, plus the county and state dimensions).

Oddly enough, there wasn’t a population measure in the dataset, and many indicators were available in absolute value only, so I constructed a proxy by dividing two variables on the same subject (something like “number of convenience stores” and “number of convenience stores / capita”).

That enabled me to build indicators per capita for all subjects, so I could see if they were correlated with my obesity rates.

Findings – using Tableau desktop to make sense of the dataset

The indicators which were most correlated with obesity were the income ones, which came as no surprise. All income indicators were also very correlated to each other. In the USA, poverty means having an income below a certain threshold which is defined at the federal level. But in other contexts, poverty is most often defined in relation to the median income (typically, a household is in poverty if its income is below half of the median income), so it can be used to measure inequality of a community, and dispersion of incomes.

As a result, many indicators appear to be correlated with obesity because they are not independent of income. This is the case for instance for most of the policy indicators: if a programme has many recipients in a county, it is because poverty is widespread, so residents are more likely to be affected by obesity. This makes it difficult to measure the impact of the programmes with this dataset. This is also the case, unfortunately, for racial indicators, as most of the counties with a very high black population have a low income.

Diet indicators also appear to be uncorrelated with obesity. This is counter-intuitive – isn’t eating vegetables or fresh farm produce the most certain way to prevent obesity? But one has to remember that this dataset is aggregated at the county level. Just because a county has a high level of, say, fruits consumption per capita doesn’t mean that every household is eating that much. Realistically, consumption will be very dispersed: the households where people cook, which are less likely to be affected by obesity, will buy all the fruits, and those where people don’t cook will simply buy none. Also, just because one buys more vegetable than average doesn’t mean they don’t also buy other, less recommended foodstuff.

The only diet indicator that appear to be somewhat correlated to obesity is the consumption of soft drinks.

When it comes to lifestyle habits, surprisingly, the proportion of households without car and living far from a store – people likely to walk more, so to be healthier – is positively correlated with obesity. This is because counties where this indicator is high are also poorer than average – again, income explains most of this. However, physical activity in general plays a positive role. States where people are most active, such as Colorado, enjoy the lowest obesity figures. In fact, all the counties with less than 15% of obesity are in Colorado.

Finally, pricing didn’t seem to have much impact on neither obesity, nor consumption. Why is that? Economists would call this “low price elasticity”, meaning that price changes do not encourage people to switch products and habits. But there is another explanation. Again, people who can’t cook are not going to buy green vegetables because they are cheaper. Also, consider the tax amount that are applied: no more than 7% in the most aggressive states. Compare that figure to the 400%+ levy that is applied to cigarettes in many countries of the world! Clearly, 4-7% is not strong enough to change habits. However, this money can be used to sponsor programmes that can help people adopt safer behaviors.

What to show? making the visualization

First, I wanted to show all of those findings. If 2 variables that you expect to be correlated (say, consumption of vegetables and obesity) are in fact not correlated, a point is made! But visually, nothing is less interesting than a scatterplot that doesn’t exhibit correlation. It’s just a stupid cloud of dots.

So instead I chose to focus on the correlations I could establish, namely: obesity and income, and obesity and activity. Those are the 2 lower scatterplot of my dashboard. I chose the poverty rate measure, because I’d rather have a trend line going up, than going down.

I duplicated that finding with a bar chart made with median income bins. For each bin (which represent all the counties where the median income fall in that range), I would plot the average obesity rate, and, miracle! this comes up as a markedly decreasing bar chart. Now, this figure doesn’t establish correlation, let alone causality, but it certainly suggests it more efficiently than a scatterplot. Also, it can be doubled as a navigation aide: clicking on a bar would highlight or select the relevant counties.

Finally, I decided to do a map. Well, actually, it was the first thing I had done, but  had second thoughts about it, and eventually I put it in. Why? first, to allow people to look up their county. Technically, my county is Travis county (Austin, TX) and I can find it easily on a map. Less so if I have to look for county names listed in order of any of their indicators. I added a quick filter on county name, for those who’d rather type than look up.

I also wanted to see whether there was a link between geography and obesity. So try the following.

  • Where are the counties with obesity rates less than 15% ? Colorado only.
  • If we raise the threshold a little, we get San Francisco and New York. But until 20%, these counties remain very localized.
  • Likewise, virtually all counties above 35% are in the South – Alabama, Louisianne, Mississipi.

Population also has an importance. The counties with a population above 1m people tend to have lower rates – their citizens also usually have higher incomes.

I decided to zoom the map on the lower 48 by default. It is possible to zoom out to see Alaska and Hawaii, but I don’t think that the advantage of seeing them all the time is greater than the inconvenient of having a smaller view point even if they are not necessary.

Regarding the marks. Originally, I didn’t assign any variable to their size, but then thought that the larger counties (i.e. LA, Harris (Houston), Cook (Chicago) …) were underrepresented. So I assigned my population proxy to size. But then, the density of the marks competed with the intensity of the color, which was attributed to the obesity rate. So I removed that and chose a size so that marks wouldn’t overlap each other too much. Regarding color, I wasn’t happy with the default scale. If I let it as is, it would consider that 12.5%, the minimum value of the dataset, is an extremely low number. But in absolute terms, it’s not. Most developed countries have obesity rates lower than that value at the national level. Japan or Korea are below 4%. So I made the scale start at 0. But I didn’t like the output: the counties with the highest values didn’t stand out. Eventually, I chose a diverging scale, which helped counties with high and low values to be more visible.

I edited an tooltip card for the view. In another version of the dashboard, I had a sheet with numbers written out that would change depending on which country was last brushed. I like the idea that this information can stay on. But I got confused in the configuration of the actions, and couldn’t completely prevent the filter that applied to this sheet to be disabled, sometimes, which caused number for all counties to overlap, and an annoying downtime as that happens. So I made an tooltipinstead. Anyway, it’s easier to format text like this. But the problem is that it can hide a good portion of the dashboard. So I exercised constraint and only chose what I found the 15 or so most relevant variables.

Voilà! that’ s it. I hope you like my dashboard, and I look forward to see the work of others! If you are a contestant, please leave a link to your entry in the comments. Good luck to all!!

Using Tableau Public: first thoughts

I am currently beta testing Tableau Public. Essentially Tableau Public let you bring the power of Tableau analysis online. With Tableau public, your audience doesn’t need to download a workbook file that they can see in an offline, software client – they can see and interact with your work directly on a web page.

There are quite a few examples of the things you can do with Tableau public. These are the examples you are given when you start the product:

Tracking Economic Indicators by FreakalyticsA Tale of 100 Entrepreneurs by Christian ChabotBird strikes by airport by CrankyflierInteractive Running Back Selector by CBS sports

And there are always more on Tableau’s own blog. I’ve done quite a few which I’ll share progressively on this blog and on my OECD blog, http://www.oecd.blog/statistics/factblog.

So that’s the context. What’s the verdict?

1. There is no comparable data visualization platform out there.

There are many ways to communicate data visually. Count them: 1320, 2875… and many more.

However these tools have a narrower focus than Tableau, or require the user some programming ability. For instance, Many Eyes uses a certain number of types of data visualization which can be set up in seconds, but which cannot be customized. Conversely, Protovis is very flexible but requires some knowledge of Javascript. And even for a skilled developer, coding an interactive data visualization from scratch takes time.

By contrast, Tableau is a fully-featured solution which doesn’t require programming. It has many representation types which can be deeply customized: every visual characteristic of a chart (colour, size, position, etc.) can depend on your data. Several charts can also be combined as one dashboard. On top of that, data visualization done in Tableau comes with many built-in controls, with an interface to highlight and filter data, or to get more details on demand. For dashboards, it is also possible to link charts, so that actions done on one chart (highlighting records, for instance) affect other charts.

2. The solution is not limitless.

Tableau enables you to do things which are not possible using other packages. But it doesn’t allow you to do anything. That’s for your own good – it won’t allow you to do things that don’t make sense.

There are many safety nets in Tableau, which you may or may not run into. For instance, you can’t make a line chart for data which don’t have a temporal dimension – so much for parallel coordinates. However, the system is not fool-proof. Manipulating aggregates, for instance, can lead to errors that you wouldn’t have to worry about in plain old Excel, where the various steps through which data are computed to create a graph are more transparent (and more manual). Compared to Excel, you have to worry less about formatting – the default options for colours, fonts and positions are sterling – and be more vigilant about calculations.

3. Strength is in numbers.

Over the years, many of us grew frustrated with Excel visual capacities. Others firmly believed that anything could be done with the venerable spreadsheet and have shown the world that nothing is impossible.

The same applies to Tableau. The vibrant Tableau community provides excellent advice. “Historic” Tableau users are not only proficient with the tool, but also have a better knowledge of data visualization practices than the average Excel user. Like any fully-featured product, there is a learning curve to Tableau, which means that there are experts (the proper in-house term is Jedis) which find hacks to make Tableau even more versatile. So of course, it is possible to do parallel coordinates with Tableau.

The forum, like the abundant training, available as videos, manuals, list of tips,or online sessions with an instructor, doesn’t only help the user to solve their problems, but it also a fantastic source of inspiration.

With the introduction of Tableau Public, the forum will become even more helpful, as there will be more questions, more problems and more examples.

 

Plotter: a tool to create bitmap charts for the web

In the past couple of months, I have been busy maintaining a blog for OECD: Factblog.

The idea is to illustrate topics on which we work by a chart which we’ll change regularly. So in order to do that, I’d have to be able to create charts of publishable quality.

Excel screenshots: not a good option

There are quite a few tools to create charts on the net. Despite this, the de facto standard is still a screenshot of Excel, a solution which is even used by the most reputable blogs.

excelinblog

This is taken from http://theappleblog.com/2009/12/18/iphone-and-ipod-touch-see-international-surge/

But alas, Excel is not fit for web publishing. First, you have to rely on Excel’s choice of colours and fonts, which won’t necessarily agree to those of your website. Second, you can’t control key characteristics of your output, such as its dimensions. And if your chart has to be resized, it will get pixelated. Clearly, there is a better way to do this.

That's a detail of the chart on the link I showed above. The letters and the data bars are not as crisp as they could have been.

That's a detail of the chart on the link I showed above. The letters and the data bars are not as crisp as they could have been.

How about interactive charts?

Then again, the most sensible way to present a chart on the web is by making it interactive. And there is no shortage of tools for that. But there are just as many issues.
Some come from the content management system or blogging environment. Many CMS don’t allow you to use javascript and/or java and/or flash. So you’ll have to use a technology which is tolerated by your system.

Most javascript charting solutions rely on the <CANVAS> element.  Canvas is supported by most major browsers, with the exception of the Internet Explorer family. IE users still represent roughly 40% of the internet, but much more in the case of my OECD blog, so I can’t afford to use a non-IE friendly solution. There is at least one library which works well with IE, RaphaelJS.
Using java cause two problems. First, the hiccup caused by the plug-in loading is enough to discourage some users. Second, it may not be understood well by readers:

This is how one of my post reads in google reader.

This is how one of my posts reads in google reader.

And it’s futile to believe that readers will read blogs from their home pages. So if all readers can’t show it well it’s a show-stopper.

A tool to create good bitmap charts

So, in a variety of situations the good old bitmap image is still the most appropriate thing to post. That’s why I created my own tools with Processing.

plotter windows

plotter mac OS X

plotter linux

Here’s how it works.

when you unzip the files, you have a file called “mychart.txt” which is a set of parameters. Edit the file according to the instructions in “instructions.txt” to your liking, then launch the tool (plotter application). It will generate an image, called “mychart.png”.

The zip files contain the source code, which is also found here on my openprocessing account.

With my tools, I wanted to address two things. First, I wanted to be able to create a chart and to have a precise control of all of its components, especially the size. In Excel, by contrast, it’s difficult to control the size of the plotting area, or the placement of the title – all of this things are done automatically and are difficult to correct (when it’s possible). Second, I wanted to be able to create functional thumbnails.

If you have to create smaller versions of a chart from a bigger image, the easiest solution is to resize the chart using an image editing software. But that’s what you’d get:

That's the original chart.

That's the original chart.

And that's the resized version. Legible? nah.

And that's the resized version. Legible? nah.

But what if it were just as easy to re-render the chart in a smaller size, than to resize it with an external program? My tool can do that, too.

Left: resized, right: re-rendered.

Left: resized, right: re-rendered.

Here’s a gallery of various charts done with the tool. The tool supports: line charts, bar charts (both stacked and clustered), dots charts and area charts. No pie charts included. It’s best suited for simple charts with few series and relatively few data points.

Impact of energy subsidies on CO2 emissions

Impact of energy subsidies on CO2 emissions

Temperature and emission forecasts

Temperature and emission forecasts

Greenhouse gas emission projections

Greenhouse gas emission projections

I hope you find it useful, tell me if you do and let me know if you find bugs.

Review of Tableau 5.0

Those last 2 weeks, I finally found time to give Tableau 5.0. Tableau enjoys a stellar reputation among the data visualization community. About a year ago, I saw a live demo of Tableau by CEO and salesman extraordinaire Christian Chabot. Like most of the audience, I was very impressed, not so much by the capacities of the software but by the ease and speed with which insightful analysis seemed to appear out of bland data. But what does it feel on from the user perspective?

Chartz: ur doing it wrong

Everyone who wrote about charts would pretty much agree that the very first step in making one is to decide what to show. The form of the display is a consequence of this choice.

Most software got this wrong. They will ask you how you want your display to look like, then ask you for your data. Take this screenshot from Excel:

excel

When you want to insert a chart, you must first choose what kind of chart (bar, line, column, pie, area, scatter, other charts) and one of its sub-types. You are not asked, what data does this apply to, and what that data really is. You are not asked, what you are trying to show through your chart – this is something you have to manage outside of the software. You just choose a chart.

I’m picking Excel because with 200m users, everyone will know what I’m talking about, but virtually all software packages ask the user to choose a rather rigid chart type as a prerequisite to seeing anything, despite overwhelming theoretic evidence that this approach is flawed. In Excel, like in many other packages, there is a world of difference between a bar chart and a column chart. They are not of the same nature.

A reverted perspective

Fortunately, Tableau does it the other way round. When you first connect with your data in Tableau, it distinguishes two types of variables you can play with: dimensions and measures. And measures can be continuous or discrete.

tableau-dimensions(This is from an example file).

Then, all you have to do is to drag your dimensions and your measures to the center space to see stuff happening. Let’s drag “close” to the rows…

tableau-dragging-1We already see something, which is not terribly useful but still. Now if we drag Date into the columns…

tableau-dragging-2

Instant line chart! the software found out that this is the type of representation that made the most sense in this context. You’re trying to plot continuous variables over time, so it’s pretty much a textbook answer. Let’s suppose we want another display: we can click on the aptly name “show me!” button, and:

tableau-show-me

These are all the possible representations we have. Some are greyed out, because they don’t make sense in this context. For instance, you need to have dimensions with geographic attributes to plot things on a map (bottom left). But if you mouse over one of those greyed out icons, you’ll be told why you can’t use them. So we could choose anything: a table, a bar chart, etc.

A simple thing to do would be to switch rows and columns. What if we wanted to see date vertically and the close horizontally? Just drag and drop, and:

tableau-flip

Crafting displays

Gone are the frontiers between artificial “chart types”. We’re no longer forcing data into preset representations, rather, we assign variables (or their automatic aggregation, more on that shortly) to possible attributes of the graph. Rows and columns are two, which shouldn’t be taken too literally – in most displays, those would be better described as abcissa and ordinate – but all the areas in light grey (called “shelves”) can welcome variables : pages, filters,path, text, color, size, level of detail, etc.

more-dimensions

Here’s an example with a more complex dataset. Here, we’re looking at sales figures. We’re plotting profit against sales. The size of the marks correspond to the volume of the order, and the colour, to their category. Results are presented year by year. It is possible to loop through the years. So this display replicates the specs of the popular Trendalyzer / Motion chart tool, only simpler to set up.

Note that as I drag variables to shelves, Tableau often uses an aggregation that it thinks makes more sense. For instance, as I dragged Order Date to the page shelf, Tableau picked the year part of the date. I could ask the program to use every value of the date, the display will be almost empty but there would be a screen for each day. Likewise, when I dragged Order Quantity to the Size shelf, Tableau chose to use the sum of Order Quantity instead. Not that it makes much of a difference here, as each bubble represents only one order. But the idea is that Tableau will automatically aggregate data in a way that makes sense to display, and that this can always be overridden.

But if I keep the data for all the years in the display, I can quickly see the transactions where profit was negative.

sets1And I can further investigate on this set of values.

So that’s the whole idea. Because you can assign any variable to any attribute of the visualization, in the Tableau example gallery you can see some very unusual examples of displays.

Using my own data

When I saw the demos, I was a little skeptical of the data being used. I mean, things were going so smoothly, evidence seemed to be jumping at the analyst, begging to be noticed. Tableau’s not bad at connecting with data of all forms and shapes, so I gave it a whirl with my own data.

Like a lot of other official data providers, OECD’s format of choice for exporting data is SDMX, a flavor of XML. Unfortunately, Tableau can’t read that. So the next easiest thing for me was Excel.

I’m not going to get too much into details, but to come up with a worksheet that Tableau liked with more than a few tidbits of data required some tweaking and some guessing. The best way seems to be: a column for each variable, dimensions and dates included, and don’t include missing data (which we usually represent by “..” or by another similar symbol).

Some variables weren’t automatically reckognized for what they were: some were detected as dimensions when they were measures, date data wasn’t processed that well (I found that using 01/01/2009 instead of 2009 or 1/2009 worked much better). But again, that was nothing that a little bit of tweaking couldn’t overcome.

On a few occasions, I have been scratching my head quite hard as I was trying to understand why I could get Y-o-Y growth rates for some variables, but not for some others, or to make custom calculated fields. Note that there are plenty of online training videos on the website. I found myself climbing the learning curve very fast (and have heard similar statements of recent users who quickly felt empowered) but am aware that practice is needed to become a Tableau Jedi. What I found recomforting is that without prior knowledge of the product, but with exposure to data design best practices, almost everything in Tableau seems logical and simple.

But anyway – I was in. Here’s my first Tableau dashboard:

my-dashboardA Dashboard is a combination of several displays (sheets) on one space. And believe me, it can become really sophisticated, but here let’s keep it simple. The top half is a map of the world with bubbles sized after the 2007 population of OECD countries. The bottom half is the same information as a bar chart, with a twist: the colour corresponds to the population change in the last 10 years. So USA (green) have been gaining population while Hungary has seen its numbers decrease.

I’ve created an action called “highlighting on country” to link both displays. The best feature of these actions is that they are completely optional, so if you don’t want to have linked displays, it is entirely up to you and each part of the dashboard can behave independantly. You can also bring controls to filter or animate data which I left out for the sake of simplicity. However, you can still select data points directly to highlihght them in both displays, like this:

my-dashboard-highlight-bottomHere I’ve highlighted the top 5 countries. The other ones are muted in both displays. Here my colour choice is unfortunate because Japan and Germany, which are selected, don’t look too different from the other countries. Now I can select the values for the countries of Europe:

my-dashboard-highlight-europe

And you’ll see them highlighted in the bottom pane.

Display and style

Representing data in Tableau feels like flipping the pages of a Stephen Few book, which is more than coincidiential as he is an advisor to Tableau. From my discussion with the Tableau consultant that called me, I take that Tableau takes pride in their sober look and feel, which fervently follows the recommendation of Tufte, and Few. I remember a few posts from Stephen’s blog where he lashed as business intelligence vendors for their vacuous pursuit of glossiness over clarity and usefulness. Speaking of Few, I’ve upgraded my Tableau trial by re-reading his previous book, Information Dashboard Design, and I could really see where his philosophy and that of Tableau clicked.

So there isn’t anything glossy about Tableau. Yet the interface is state-of-the-art (no more, no less). Anyone who’ve used a PC in the past 10 years can use it without much guessing. Colours of the various screen elements are carefully chosen and command placement makes sense. Most commands are accessible in contextual menus, so you really feel that you are directly manipulating data the whole time.

When attempting to create sophisticated dashboards, I found that it was difficult to make many elements fit on one page, as the white space surrounding all elements becomes incompressible. I tried to replicate displays that I had made or that I had seen around, I was often successful (see motion chart reproduction above) but sometimes I couldn’t achieve the level of customization that I had with visualizations which are coded from scratch in Tableau. Then again even Tableau’s simplest representations have many features and would be difficult to re-code.

Sharing data

According to Dan Jewett, VP of product development at Tableau,

“Today it is easier to put videos on the Web than to put data online.”

But my job is precisely to communicate data, so I’m quite looking forward this state of affairs to change. Tableau’s answer is twofold.

The first half is Tableau Server. Tableau Server is a software that organizes Tableau workbooks for a community so they can access it online, from a browser. My feeling is that Tableau Server is designed to distribute dashboards within an organization, less so with the anyone on the internet.

That’s where the second part of the answer, Tableau Public, comes into play. Tableau Public is still in closed beta, but the principle is that users would have a free desktop applications which can do everything that Tableau Desktop does, except saving files locally. Instead, workbooks would have to be published on Tableau servers for the world to see.

There are already quite a few dashboards made by Tableau Public first users around. See for instance How Long Does It Take To Build A Technology Empire? on one of the WSJ blogs.

Today, there is no shortage of tools that let users embed data online without technical manipulations. But as of today, there is no product that could come close to this embedded dashboard. Stephen McDaniel from Freakalytics notes that due to Tableau’s technical choices (javascript instead of flash), dashboards from Tableau Public can be seen in a variety of devices, including the iPhone.

I’ve made a few dashboards that I’d be happy to share with the world through Tableau Public.

This wraps up my Tableau review. I can see why the product has such an enthusiastic fan base. People such as Jorge Camoes, Stephen Few, Robert Kosara, Garr Reynolds, Nathan Yau, and even the Federal CIO Vivek Kundra have all professed their loved for the product. The Tableau Customer Conference, which I’ve only been able to follow online so far, seems to be more interesting each year. Beyond testimonies, the gallery of examples (again at http://www.tableausoftware.com/learning/examples, but do explore from there to see videos and white papers), still in the making, shows the incredible potential of the software.

New data services 1: Google’s public data

Google’s public data has been launched somewhat unexpectedly at the end of April 2009.

The principle is as follows. When someone enters a search query that could be interpreted as a time series, Google displays a line graph of this time series before other results. Click on it, and you can do some more things with the chart.

googlepublicdata1

The name public data can seem ambiguous.

Public, in one sense, refers to official, government-produced statistics. But, for content, public is also the opposite of copyrighted. And here, a little bit of digging reveals that it’s clearly the latter sense. If you want this service to point to your data, it must be copyright-free.

I’ve seen Hans Rosling (of Gapminder fame, now Google’s data guru) deliver a few speeches to national statisticians to which he expressed all the difficulties he had to access their data, and battle with formatting or copyright issues. So I can understand where this is coming from. However. Imagine the outcry if google.com decided to stop indexing websites which were not in the public domain!

Remember my find > access > process > present > share diagram?

I’d expect that google will solve the find problem. After all, they’re search people. But they don’t! You’d find a time series if you enter its exact name in google. There is no such thing (yet, as I imagine it would be easy to fix) as a list of their datasets.

They don’t tackle the access problem either. Once you see the visualizations, you’re not any step closer to actually getting the  data. You can see them, point by point, by mousing over the chart. I was also disappointed by the inaccuracy of the citation of their datasets. I’d have imagined that they’d provide a direct link to their sources, but they only state which agency produced the dataset. And finding a dataset from an agency is not a trivial matter.

They don’t deal with process, but who will hold that against them? Now what they offer is a very nice, very crisp representation of data (presenting data). I was impressed how legible the interface remained with many data series on the screen, while respecting Google’s look and feel and colour code.

Finally, it is also possible to share charts. Or rather, you can have a link to an image generated by google’s chart API, which is more than decent. A link to this static image, and a link to the chart on google’s public data service, and that’s all you should need (except, obviously, a link to the data proper!)

Another issue comes from the selection of the data sets proper.

One of the datasets is the unemployment rates, which are available monthly and by USA county. Now I can understand the rationale to match a google query of “unemployment rates” to that specific dataset. But there are really many unemployment rates, depending on what you divide by what. (are we counting unemployed people? unemployed jobseekers? which definition of unemployment are we using – ILO’s, or the BLS’s? and against what is the rate calculated – total population? population of working age? total labour force?) But how could that work if you expand the system to another country? To obtain the same level of granularity (to a very narrow geographic location, to a period of a month) would require some serious cooking of the data, so you can’t have granularity, comparability and accuracy.

I don’t think the system is sustainable. I don’t like the idea that it gives the impression to people that economic statistics can be measured in real time at any level, just like web usage statistics for instance. They can’t be just observed, they’re calculated by people.

Google public data is still in its infancy. To have a usable list of the datasets, for instance, would alleviate much of my negarive comments on the system. But for the time being, I’m not happy with the orientation they’ve chosen.

Processing 1.0 released.

Tuesday, Ben Fry announced on his blog that Processing 1.0 was released!


I had a vague idea of was processing was about for a while, and thought it was not for me. I bought Ben Fry’s book by accident, because of its title and because I was generally interested on different opinions on how to make charts. When I first read it, I thought it was a fraud! that book was not about “visualizing data” as the name implies, but about practical uses of Processing to display data. Luckily for me, I read on and I am very happy to have learned Processing, or rather, to be continuously learning it.

What I have found most interesting in this book is the focus that Fry put on the collection of data. Techniques and methods to represent data assume that you already have the data you want, neatly prepared. But in the real world, that just doesn’t happen. If you want to show something, you’ll have to find a way to obtain the raw data and process it first, before you can use it.

Anyway. Today, I would want to congratulate Ben Fry, Casey Reas and all others who’ve contributed to processing. 

Download processing:

Other books on processing I can recommand are: