Site menu:

 

March 2010
M T W T F S S
« Feb    
1234567
891011121314
15161718192021
22232425262728
293031  

Archives

Meta

Site search

Categories

Posts from my other blog

Tags

advice book book review chart charts contest data data.gov data publishing data visualization download Excel finding data flowing data garr garr reynolds graph guidelines image infovis many-eyes martin wattenberg metadata misleading official statistics open data panel powerpoint presentation presentation zen processing public data publishing robert kosara sarah cohen slides slideshare statistics story Tableau Tableau Public tool visual visualization wolfram alpha

Twitter

Recent Posts

Recent Comments

Links:

Making data meaningful – Style guide on the presentation of statistics

Making Data Meaningful part 2
Introducing Making Data Meaningful Part 2 – Style guide on the presentation of statistics – which, as its name cleverly suggests, is a compilation of  advice to present graphical information.

It’s a follow up to Making Data Meaningul part 1 , which focused on writing about data, as opposed to visualize it.

The book is a cooperation between representatives of national statistical offices and intergovernmental organizations – all public statisticians, if you will. I hope it will help others to communicate their data better. Personally, I have written the part about charts and collaborated to some other chapters. But if I could sum up my advice in one sentence, it would be: go buy Stephen Few books. Start with Show me the numbers.

The list of people who collaborated to the book includes:

More on Tableau Public

Yesterday’s post on Tableau Public generated a surge of traffic so I thought I should add more examples and practical information for people interested in the software.

 

Here’s a quick one on health, based on OECD Health at a Glance:

click to interact

Just select two indicators, and you see how one influences the other. Or rather, is correlated because correlation doesn’t imply causation!

Here are links to more example done with Tableau Public.

Another Paris-based intergovernmental organisation is using Tableau – the UNESCO.

These 2 have been done by PAHO to describe the situation in Haiti (the 2nd is really powered by Tableau Server, but it’s close enough)

 

There are further examples on the Tableau blog.

Now more about Tableau Public and the Beta.

Tableau Public doesn’t exactly allow you to do everything that Tableau does from the web. To prepare the views which are going to be published on the web, you need to use a software that runs on your computer.  It lets you do whatever you can do with the regular Tableau Desktop, with a couple of limitations: you have to stick to basic source file types (access, excel, and text file, no exotic database) and you are limited to 100,000 records of data. One other difference with the regular Tableau Desktop  is that you can’t save your work locally: you have to save it on the web, in your private space on Tableau servers. However, there are the same analytical and visual features in Tableau Public than in Tableau Desktop.

When your work is published, users don’t have access to all the tools you had when creating the view: they can’t move dimensions around, create exotic filters or calculations. They really see the chart as you intended it to be seen. There are a certain number of interactions built-in, however: users can select, highlight, sort and filter. If you are publishing a dashboard, the different tables and charts of the dashboard can be linked, meaning that an action (such as highlighting one dimension) in one place will be replicated elsewhere, or not. The underlying data can also be downloaded. So there is a great deal of interactivity, but not enough to twist your display beyond recognition. That being said, other Tableau Public users can download your workbook and manipulate it with the client software.

About the Beta: currently, Tableau Public is in closed beta. It will be in open Beta in February, as far as I know. To get a spot in the close beta, you need to write to the people of Tableau.

 

Using Tableau Public: first thoughts

I am currently beta testing Tableau Public. Essentially Tableau Public let you bring the power of Tableau analysis online. With Tableau public, your audience doesn’t need to download a workbook file that they can see in an offline, software client – they can see and interact with your work directly on a web page.

There are quite a few examples of the things you can do with Tableau public. These are the examples you are given when you start the product:

Tracking Economic Indicators by FreakalyticsA Tale of 100 Entrepreneurs by Christian ChabotBird strikes by airport by CrankyflierInteractive Running Back Selector by CBS sports

And there are always more on Tableau’s own blog. I’ve done quite a few which I’ll share progressively on this blog and on my OECD blog, http://www.oecd.blog/statistics/factblog.

So that’s the context. What’s the verdict?

1. There is no comparable data visualization platform out there.

There are many ways to communicate data visually. Count them: 1320, 2875… and many more.

However these tools have a narrower focus than Tableau, or require the user some programming ability. For instance, Many Eyes uses a certain number of types of data visualization which can be set up in seconds, but which cannot be customized. Conversely, Protovis is very flexible but requires some knowledge of Javascript. And even for a skilled developer, coding an interactive data visualization from scratch takes time.

By contrast, Tableau is a fully-featured solution which doesn’t require programming. It has many representation types which can be deeply customized: every visual characteristic of a chart (colour, size, position, etc.) can depend on your data. Several charts can also be combined as one dashboard. On top of that, data visualization done in Tableau comes with many built-in controls, with an interface to highlight and filter data, or to get more details on demand. For dashboards, it is also possible to link charts, so that actions done on one chart (highlighting records, for instance) affect other charts.

2. The solution is not limitless.

Tableau enables you to do things which are not possible using other packages. But it doesn’t allow you to do anything. That’s for your own good – it won’t allow you to do things that don’t make sense.

There are many safety nets in Tableau, which you may or may not run into. For instance, you can’t make a line chart for data which don’t have a temporal dimension – so much for parallel coordinates. However, the system is not fool-proof. Manipulating aggregates, for instance, can lead to errors that you wouldn’t have to worry about in plain old Excel, where the various steps through which data are computed to create a graph are more transparent (and more manual). Compared to Excel, you have to worry less about formatting – the default options for colours, fonts and positions are sterling – and be more vigilant about calculations.

3. Strength is in numbers.

Over the years, many of us grew frustrated with Excel visual capacities. Others firmly believed that anything could be done with the venerable spreadsheet and have shown the world that nothing is impossible.

The same applies to Tableau. The vibrant Tableau community provides excellent advice. “Historic” Tableau users are not only proficient with the tool, but also have a better knowledge of data visualization practices than the average Excel user. Like any fully-featured product, there is a learning curve to Tableau, which means that there are experts (the proper in-house term is Jedis) which find hacks to make Tableau even more versatile. So of course, it is possible to do parallel coordinates with Tableau.

The forum, like the abundant training, available as videos, manuals, list of tips,or online sessions with an instructor, doesn’t only help the user to solve their problems, but it also a fantastic source of inspiration.

With the introduction of Tableau Public, the forum will become even more helpful, as there will be more questions, more problems and more examples.

 

Health statistics

In the last days of 2009, this chart has been published by the National Geographic blog:
the cost of care

The chart has since been debated and criticized, among others, by Jon Peltier, Andrew Gelman, and Evan Falchuk – which all made valid points. For instance, to show correlation and outliers, a scatterplot does a much better job. That being said, it’s difficult to see the country names with a scatterplot. On the substance, the number of doctor visits is not the most relevant variable to bring into this picture, mostly because this number directly depends on the compensation mode of these doctors, not on their efficiency. The notion of “universal coverage” is also quite arbitrary. France, for instance, which had what could be called universal coverage since 1945, got an even more “universal” one in 2000. And still, some people can’t receive the healthcare they need.

The chart is based on OECD data, from a recently released book: OECD Health at a Glance.  For the release of the book, I had worked on 2 presentations, which we remained unpublished. Since they were not formerly published by OECD the standard disclaimer apply – they do not commit the organization and do not necessarily represent its point of view and that of its members.

Anyway, for anyone interested in health statistics in general and in USA healthcare specifically, here they are in their slideshare glory:

Mortality data with Tableau Public

Last month I saw this infographic chart put together by GE and GOOD magazine:

While the look and feel is pleasing I was bothered by a few choices of design.

First, homicides and accidental deaths are not taken into account. I suspect that for some demographic categories, they represent a significant proportion of the deaths.

Second, the table doesn’t give an indication of the differences in mortality between the different age groups. For instance, there are over 15,000 deaths per 100,000 people over 85 years old, but only about 130 / 100,000 for young people aged  15-24. So the last item in the right-most column corresponds to much more deaths than the top item in the left-most column, although they have the same visual weight.

Coincidentally, I got to try Tableau Public Beta and thought it would be a good exercise to give it a spin.
The data source is the same. I got my data through the wonder service of the CDC.
Here goes:

By playing with the filters you can see the ranking of the causes of death. For instance, we can see that accidents and homicide are precisely the leading causes of death of young people aged 20 to 24. Now what if you want to see the demographic categories that one given cause of death affects most? Here’s a second visualization:

You can see that certain causes of death, for instance, only affect one gender or the other (such are certain forms of cancer).
I’ve made that last one to illustrate the evolution of mortality with age. No one would be surprised to learn that older people have a higher probablity of dying but by what proportions?

Plotter: a tool to create bitmap charts for the web

In the past couple of months, I have been busy maintaining a blog for OECD: Factblog.

The idea is to illustrate topics on which we work by a chart which we’ll change regularly. So in order to do that, I’d have to be able to create charts of publishable quality.

Excel screenshots: not a good option

There are quite a few tools to create charts on the net. Despite this, the de facto standard is still a screenshot of Excel, a solution which is even used by the most reputable blogs.

excelinblog

This is taken from http://theappleblog.com/2009/12/18/iphone-and-ipod-touch-see-international-surge/

But alas, Excel is not fit for web publishing. First, you have to rely on Excel’s choice of colours and fonts, which won’t necessarily agree to those of your website. Second, you can’t control key characteristics of your output, such as its dimensions. And if your chart has to be resized, it will get pixelated. Clearly, there is a better way to do this.

That's a detail of the chart on the link I showed above. The letters and the data bars are not as crisp as they could have been.

That's a detail of the chart on the link I showed above. The letters and the data bars are not as crisp as they could have been.

How about interactive charts?

Then again, the most sensible way to present a chart on the web is by making it interactive. And there is no shortage of tools for that. But there are just as many issues.
Some come from the content management system or blogging environment. Many CMS don’t allow you to use javascript and/or java and/or flash. So you’ll have to use a technology which is tolerated by your system.

Most javascript charting solutions rely on the <CANVAS> element.  Canvas is supported by most major browsers, with the exception of the Internet Explorer family. IE users still represent roughly 40% of the internet, but much more in the case of my OECD blog, so I can’t afford to use a non-IE friendly solution. There is at least one library which works well with IE, RaphaelJS.
Using java cause two problems. First, the hiccup caused by the plug-in loading is enough to discourage some users. Second, it may not be understood well by readers:

This is how one of my post reads in google reader.

This is how one of my posts reads in google reader.

And it’s futile to believe that readers will read blogs from their home pages. So if all readers can’t show it well it’s a show-stopper.

A tool to create good bitmap charts

So, in a variety of situations the good old bitmap image is still the most appropriate thing to post. That’s why I created my own tools with Processing.

plotter windows

plotter mac OS X

plotter linux

Here’s how it works.

when you unzip the files, you have a file called “mychart.txt” which is a set of parameters. Edit the file according to the instructions in “instructions.txt” to your liking, then launch the tool (plotter application). It will generate an image, called “mychart.png”.

The zip files contain the source code, which is also found here on my openprocessing account.

With my tools, I wanted to address two things. First, I wanted to be able to create a chart and to have a precise control of all of its components, especially the size. In Excel, by contrast, it’s difficult to control the size of the plotting area, or the placement of the title – all of this things are done automatically and are difficult to correct (when it’s possible). Second, I wanted to be able to create functional thumbnails.

If you have to create smaller versions of a chart from a bigger image, the easiest solution is to resize the chart using an image editing software. But that’s what you’d get:

That's the original chart.

That's the original chart.

And that's the resized version. Legible? nah.

And that's the resized version. Legible? nah.

But what if it were just as easy to re-render the chart in a smaller size, than to resize it with an external program? My tool can do that, too.

Left: resized, right: re-rendered.

Left: resized, right: re-rendered.

Here’s a gallery of various charts done with the tool. The tool supports: line charts, bar charts (both stacked and clustered), dots charts and area charts. No pie charts included. It’s best suited for simple charts with few series and relatively few data points.

Impact of energy subsidies on CO2 emissions

Impact of energy subsidies on CO2 emissions

Temperature and emission forecasts

Temperature and emission forecasts

Greenhouse gas emission projections

Greenhouse gas emission projections

I hope you find it useful, tell me if you do and let me know if you find bugs.

Changing the world with visualization: slides from the panel.

I’ve put my slides from the panel on slideshare:

Changing the world with data visualization

This Wednesday, I had the privilege of talking at Visweek at a panel with Robert Kosara, Sarah Cohen and Martin Wattenberg. That was a truly great experience (at least from that side of the microphone). We all had a different approach to the subject. Sarah showed some of the stories she ran on the Washington Post where showing data visually helped expose scandals and move things forward. Martin made insightful comparisons with writing – information verbalization. As for myself I elaborated on the OECD mantra that if people had better knowledge, they could make better decisions and that data visualization can help by providing the people that knowledge, without requiring them to actually know the data.

But as with panels, the most interesting part is always the discussion. And I was quite surprised to see where it was headed.

I have reserves in my belief that data visualization can save the world. For instance, I have been slightly disappointed by the outcome of the sunlight foundation apps for America contest. I thought the idea was fantastic and the finalist applications were very well designed, but not necessary useful. But I had read many positive reactions on blogs on this, or on anything related to data.gov, I thought I would be the skeptical one.

But during the panel, during the discussions and in the subsequent days, I really found myself in the opposite role. I think data visualization can achieve much more than what we ask it to do!

let’s put it this way. Currently there are approximately 1.7 billion internet users. That’s a order of magnitude of  the number of people that data visualization could help. Now before the before the panel, we had a talk about the number of visits that a successful data representation gets, and we convened that 100,000 viewers for one visualization is a lot. In other words, we still have more than 99.99% of the population to reach!

True, we can use data visualization to inform better. But we can do more! use it to support decisions! couldn’t the subprime crisis have been avoided, for instance, if households were helped to make the right ones?

Raising the level of adoption of data visualization – not increasing it, but multiplying it – should really be a challenge of the field. However, academics seem to be more concerned with designing novel solutions which could turn into published papers. Then again, if public interest for data visualization was higher, funding would be more easily available to researchers.

As an aside, Excel has also been discussed. Is it the problem? Partly. If a data representation is not a canonical chart type in Excel, people are not aware it exists, and mainstream media or others with a long reach will not use it for fear that potential users may be confused. Even scatterplots, to Martin’s lament, although they are in Excel and that they are pretty straightforward to use and understand, generate that aura of fear.

Another comment which I really took to heart was the regret that while data visualization was thought to computer scientists, using data analytics isn’t tought in business schools. Wouldn’t it be part of the solution?

Testing Microsoft Office 2010

If you are using computers for work, chances are that you are spending a good portion of your day with Microsoft products such as the Office suite. Some hate it, some love it, but to hundreds of millions it’s part of our daily lives and its design choices affect how we think and work in a much more profound way than we are aware of. So, the release of a new version of Office is always a significant event.

I’ve just installed Office 2010 and here are my first impressions.

The UI is rationalized.

excel 2010

The UI is rationalized.

The interface will be familiar to Office 2007 users – they are still using the ribbon. Only a few buttons have been added to the applications I’ve tested, and the others have fortunately not moved since the previous version. However, the ribbon’s colours have been muted to a conservative white to grey gradient, which is much easier on the eyes. The added benefit is to make highlighted sections of the ribbon stand out much more efficiently.

excel highlight

Highlighting a section works much better against a sober gray than against a vivid blue.

The one button that changed was the top-left Office button. Frankly, what it was for was obvious to no-one in Office 2007. Due to its appearance, it wasn’t really clear that it was clickable, and the commands it gave access to were a mixed bunch – file control, program options, printing, document properties… which, before, were not in the same top-category.

This new area is called "Office backstage" and is a welcome change to the akward "file" menu or office button from previous versions.

This new area is called "Office backstage" and is a welcome change to the akward "file" menu or office button from previous versions.

In Office 2010, the Office button is still there, but this time, it looks like a button and is much more inviting. This time, it presents the user with the various commands on a separate screen. That way, commands are well-categorized, and there is ample space for UI designers to explain those commands which are not clear. This had not been possible when all those commands were forced to fit in one tiny menu.

Another thing that jumped at me when I started manipulating the programs were the improvement in the copy/paste interface. It’s fair to say that pasting has always been a very time-consuming task. It had never been easy, for instance, to paste values only or to keep source formatting, without having to open menus and choose options which require time and effort. Besides, some pasting options descriptions are cryptic and a bit daunting, so novice users aren’t encouraged to use them for fear of what might happen.

I've been using Excel for about 15 years so I know my way around. But improvement in the paste interface directly translates into productivity gains.

I've been using Excel for about 15 years so I know my way around. But improvement in the paste interface directly translates into productivity gains.

Now the various pasting options are promoted within the contextual menu – they are big icons, and it is possible to preview how pasted material would look before pasting. The best part is that these commands are now accessible via native keyboard shortcuts, so we no longer need a string of 4 mouse clicks, or having to key in alt+E, V, S,  enter alt + H, V, S, V, enter in sequence. After a normal paste (ctrl +V) you can hold control and choose a one key option, such as V for values, T for transposing, etc. Much better.

Changes in the Excel chart engine

There are 3 ways in Excel to represent numbers graphically: charts proper, pivot charts and sparklines.

Charts and pivot Charts didn’t see much improvement since the previous version of Excel. The formatting options move along in the direction initiated by Excel 2007: in addition to the controversed 3-D format set of options, users now have an advanced “shadow” and “glow and soft edges” submenus to spice up their charts. The interface for designing gradient fills has been upgraded. The underlying functionality remains unchanged but it is now easier to control. However, the pattern fill option returns, which is great news for people who print their graphs in B&W.

Even more complex formatting options mean a greater chance to use them poorly

Even more complex formatting options mean a greater chance to use them poorly

Sparklines are the real innovation of Excel 2010. Sparklines are a minimalist genre of chart that has been designed to fit in the regular flow of the text – they don’t require more space to be legible and efficient. While sparklines do not allow a user to look up the value of a specific data point, they are very efficient for communicating a trend. As such, they are increasingly used in dashboards and reports. There has been 3rd-party solutions to implement them in Excel but this native implementation is robust and well done. This will put sparklines on the radar for the great number of people who didn’t use them because they were not aware of their existence.

Sparklines give immediate insight on the trends in this data table. A dot marks when the maximum value was reached. That makes it easier to compare peaks at a glance.

Sparklines give immediate insight on the trends in this data table. A dot marks when the maximum value was reached. That makes it easier to compare peaks at a glance.

Changes in other applications

Word has advanced options for opentype fonts, for instance, if your font has several character sets, you can now access them from Word. This is especially good for distressed fonts or the excessively ornate ones. In addition to kerning, it is now possible to control ligatures (i.e. to allow users to specify how ff, fl or fi would appear on screen, as one unique glyph or as two separated letters). Another new feature of Word is an advanced spell checker who is able to warn you of possible word choice errors, when using homonyms for instance.

On my setup, these 3 options didn’t really work, but it’s a beta and I understand the intent.

The advanced spell checker didn't catch those words which were quite obviously used out of context.

The advanced spell checker didn't catch those words which were quite obviously used out of context.

In French, it picked sides in a famous spelling controversy. Many people believe that Perrault originally wrote that Cindirella wore fur slippers (soulier de vair). Microsoft sides with Disney on that ones and glass slippers (souliers de verre).

In French, it picked sides in a famous spelling controversy. Many people believe that Perrault originally wrote that Cindirella wore fur slippers (soulier de vair). Microsoft sides with Disney on that one and glass slippers (souliers de verre).

Powerpoint features 3 high-level changes: the possibility to structure a long presentation using sections, which somehow helps. However, as far as I could see, sections are only a grouping feature. There are few operations that can be performed on the section as a whole (as opposed to on all the presentation, or on each slide separately). For some tasks, you can think it is the case (as selecting the section implicitly selects its slides) but you’ll see that the operation only affected the current slide. Hmm. It can be useful to manage a presentation after it’s done, but IMO this will reduce the amount of time people spend designing their presentation away from powerpoint which is ultimately a bad thing.

Powerpoint sections make it easier to manage very long documents.

Powerpoint sections make it easier to manage very long documents.

Powerpoint 2010 also features 3D transitions not unlike those of Keynote ‘08. It is also possible to include movie clips in presentation. Wasn’t this already the case? Previously, you’d have to embed video files in your presentations. Now it is possible to embed online videos as well. I’m not quite sure about these two options really, the first one for ideologic reasons, the 2nd because I wouldn’t recommend any speaker to rely overly on an internet connection and a video hosting service during a live presentation.

The insert screenshot shows a gallery from all my open windows to choose from. The screen clipping tool allows one to insert only a section of the window. Neat!

The insert screenshot shows a gallery from all my open windows to choose from. The screen clipping tool allows one to insert only a section of the window. Neat!

There’s another thing available everywhere in Office but which is possibly most useful in powerpoint, that is, insert screenshot. By clicking on this button, you have a list of thumbnails of all your open windows to choose from, this really reduces the hassle of using a screen capture tool, or worse, to manually do a screen capture, paste it in an image editing program, crop the image, save it to an acceptable format and copy/paste it again where you need it. It is possible to only copy part of these screens, too. It’ s quite well done.

Overall impressions

I’m impressed with the thinking that went into the interface. The ribbon was already a great demonstration of out-of-the-box thinking and looked great on paper. I wasn’t thrilled to use it as the commands I had been using for some 15 years were not always easily found, but it seems that first-time users of Office 2007 outweight those who’ve used previous versions. The execution of the ribbon in Office 2010 is improved, and the team allowed themselves to go beyond some arbitrary constraints they had imposed to themselves, such as the pasting options or the office button. Well done.

I’m happy that sparklines have been added to Excel. In the next few years, we’ll find even better usage for them. However, I’m disappointed that the charting options remain essentially unchanged. Take the pie chart for instance. Everyone is aware of its limitations. There are many alternatives which would be easy to implement in Excel. Also, I’m disappointed that the charting mechanism remains the same: present the user with a long list of chart types, without supporting their reasoning in the choice of one over the other. There should be a chart wizard that would ask the user what do they want to show with the data and suggests the best choice (and not many possible choices) of chart.

I am not sure about the improved spell checker. Improved means increased dependency on the tool, which is the reason why typos haven’t been eradicated despite the technology.

I am very skeptical about all the advances of the Office product into design. Office users are not designers. Or rather, to be a designer requires a specific form of critical reasoning, not a new tool. More sophisticated graphical options allow novice users to achieve complex results without going through that phase of reasoning, which ultimately won’t help them.

The Stiglitz-Sen-Fitoussi commission

Photo credit: Le point. That’s Stiglitz on the right with files, Sen is the one with the mischevious smile and blue shirt, and Fitoussi is the smoker.

Yesterday, I attended the presentation of the report of the Stiglitz-Sen-Fitoussi commission on the measurement of economic performance and social progress.

In a nutshell, since there is a consensus around the limitations of GDP as the main indicator of performance, the commission aims to find other ways to measure the efficiency of an economic policy.

The idea is to build indicators that would be closer to the experience of the citizen, rather than an abstract, expert top-view of a system. The report argues for a system of indicators of well-being, measures of leisure and culture, better environmental indicators and also better ways to understand inequalities, rather than a constant focus on averages and aggregates.

OECD, my employer, was quite involved in the report, as several members of the commission were OECD (or recent ex-OECD) and because there are several programmes at OECD with similar goals.

So the ideas in the report are not very new. But the real breakthrough is the degree of political support the commission is getting.

You’d probably think that governments would take heed of the musings of 5 Nobel prizes and 17 top-notch academics on their own merits. But the reason why GDP is such a popular indicator is not because it’s perfect, but because it’s relatively easy to compute. With a stress of “relatively”: it’s an incredibly complex endeavor on which hundreds of people work full time in every country. But at least, it obeys to very explicit rules and as such it is comparable across nations. Any new indicators would be difficult (read expensive) to set up, and to be effective, they’d require a similar framework, which means a good number of countries following the same methodology to compile data.

That has been the stumbling block. But Sarkozy, who sponsored the report, stated very clearly that he’d put it on the agenda of every international meeting he’ll attend, and that he’ll demand that all international organizations put it into practice. Because of the crisis, and the global inability of statistical offices to prevent it, the timing may well be right to make that claim.