Don’t take my word for it

Inspiration

In June 2010, I attended a Wolfram|Alpha event called the London Computational Knowledge Summit where speakers mostly focused on how computers can transform the way we teach and transmit knowledge. Several of the presentations made a lasting impression, and mostly the talk by Jon McLoone:

Jon’s point was that academic papers today look an awful lot like those in the 17th century. Granted, they’re not in latin, they can be displayed online and there is color, but as far as maths are concerned it’s still long pages of difficult language and long formulas. The computer, however, can do so much more than transmit information. In the clip above (around 6’20”) Jon shows how a paper on edge detection can be much more effective if instead of using a static example to demonstrate the technique, the paper were able to use a live example, such as input from the camera. In that talk and throughout the day, there were more examples on how interactive displays could be useful for teaching.

Teaching, telling stories and getting a message across use similar functions. Fast forward to VisWeek 2010 and the first “Telling Stories with Data” workshop. Some of the presentations there (I’m thinking of Nick Diakopoulos and Matthias Shapiro mostly) hinted that there could be a process through which readers/users/audience could be taken through so they can make the most of an intended message. Interestingly, this process is not about transmitting as much data as effortlessly as possible but rather to engage the audience, to get them to challenge their assumptions.

Those two events really made me pause and think. Ever since I had started working in visualization, all my efforts had been focused on being as clear as possible, and my focus, on efficient visuals. However, for some tasks, clarity just isn’t optimal. That wasn’t much of an issue in most of my OECD work where such an approach makes a lot of sense but I started seeing that there was a world of possibility when it comes to changing people’s perception on a subject or even persuading them.

Application

French pension reform

Right at the moment of visWeek 2010, France was plagued by strikes against the proposed pension reform. At the peak of contestation up to 3m people demonstrated (that’s as much as one adult out of 14). I was quite irritated by the protests. In theory, left and right had very comparable views on this problem and only disagreed on unsignificant details. They both knew reform was unavoidable, and, again, had similar plans. But when those of the current government were implemented, the opposition capitalized on the discontent and attacked the plan vigorously. Their rethoric were entirely verbal – no numbers were harmed in the making of their discourse! Consequently, protesters and a large part of the population started to develop notions about the state of pensions which were completely disconnected from reality.

I believe that if numbers had been used early enough, it would have been enough to provide a counterpoint to such fallacies and while it may not have prevented demonstrations, it would have greatly helped to dampen their effect. With that in mind and with official data I tried to build a model to show what would happen if one changed this or that parameter of pension policy. Pension mechanics are quite simple: what you give on one side, you take on another; the evolution of population is quite well known, so making such a model is pretty straight forward. But putting that in a visual application really showed how the current situation was unsustainable. In this application I challenge the user to find a solution – any solution – to the pension problem, by using the same levers as the policy makers. It turns out that there is just one viable possibility. Yet, letting people find that by themselves and challenge that idea as hard as they could was very different from paternalizing and telling people that this was just the way it is.

On the course of the year I got involved in several occasions in situations like this where data visualization could be used to influence people’s opinion, and likewise I tried to use that approach. Instead of sending a top-down message (with or without data), instead confront the assumptions of the audience and get them to interact with a model. After this experience, their perception will have changed. This technique doesn’t try to bypass the viewers critical thinking, but instead to leverage their intelligence.

In politics

I am very concerned with the use of data visualization in politics for many reasons. One of them is because I’m a public servant. In my experience, most decisions are not taken by politicians, but by experts or technicians who are commited to the public good. Yet, when poorly  explained, these decisions can be misunderstood and attacked. Visualization, I believe, can help defend such decisions (those who are justifiable at least) and explain them better to a greater number.

Although a lot of data is available out there (or perhaps for that very reason) only few people have a good grasp of the economic situation of their country. This just can’t be helped. It’s not possible to increase the percentage of people who can guestimate the unemployment rate and it’s not really important. Very few people need to know such a number, now what is important is to be able to use that information in context when it is useful. For instance, at election time, a voter should be able to know if the incumbent has created or destroyed jobs. This is something that data visualization can handle brilliantly.

Finally, my issue with political communication is that it is written by activists, for activists. It works well to motivate people with a certain sensitivity but it is not very effective at getting others to change side. This is a bias which is difficult to detect by those in charge of political communications because, well, they’re activits too… and here this flavor of model-based data visualization, with its appearance of objectivity and neutrality, can complement the more verbal aspect of rhetoric quite well.

In the talk I used Al Gore’s An Inconvenient Truth as a counter example. This movie is a fine example of story-telling, and operating at an emotional rather that at a rational novel. I trust that people who feel concerned about climate change will be reinforced in their beliefs after seeing the movie. However, those who do not were unconvinced. In fact, the movie also gave a strong boost to climate skeptics. There was a real barrage of blog posts and websites attempting to debunk the assertions of that “truth”, most often with data. There is a missed opportunity: if the really well-made stories of the movie had been complemented with a climate model that people could experiment with, it would have been perceived as less monolithic, less manichean, less dogmatic.

The conclusions

In my practice using an interactive model can help a lot to get a message across (and no, I don’t have a rigorous evaluation for “a lot”, that’s the advantage of not being an academic).

Such models engage the users, they come out as more objective and truthful as static representations, and they can be very useful to address preconceptions. Chances are they’re more fun, too.

Then again, just because a model is interactive and built on transparent data and equations doesn’t mean it’s objective. It is usually possible to control the model or the interface so that one interpretation is more likely than the other, and that’s precisely the point if you are using data visualization to influence.

It can be very cheap and easy to turn a static representation into an interactive display. Every chart with more than 2 dimensions can be turned in a visualization where the user controls one dimension and sees data for the others evolve.

And if you build a model like this, you must be very open and transparent about the data and the equations and sometimes find ways to get people to overcome their doubts.

Besides, having a working interactive model is no guarantee of success. You really have to be careful that your users are not likely to interpret your visualization in ways you never intended.

The presentation


All examples I used in the presentation both good and bad, both mine and others can be found at http://www.jeromecukier.net/data-stories/

 

Promising difficulties

At the recent VisWeek conference, Jessica Hullman and her coauthors presented “Benefitting Infovis with Visual Difficulties (pdf)”, a paper that suggests that the charts which are read almost effortlessly are not necessarily the ones that readers understand or remember best. To answer that claim, Stephen Few wrote a rather harsh critique of this paper (pdf). As I read this I felt the original paper was not always fairly represented, but more importantly, that the views develop by both parties are not at all inreconcilable. Let me explain.

What is cognitive efficiency, or “say it with bar charts”

For quite some time, we were told that to better communicate with data, we had to make visuals as clear as possible.

The more complicated way of saying that is talking of “cognitive efficiency”. By reducing the number of tasks needed to understand a chart and simplifying them, which is sometimes called reducing the “cognitive cost” or “cognitive load”, we improve all virtues of the chart.

Various charts based on the same data points, shown in order of cognitive cost

Various charts based on the same data points. From left to right, they make the task of comparing individuals value increasingly easier

For instance: bar charts are easier to process than pie charts, because it’s easier for the human eye to compare lengths than angles. So, with equivalent data, bar charts have a lower cognitive cost than pie charts. Likewise, bar charts which are ordered by value (smallest bars to largest bars) are easier to read than unordered ones. Ordered bar charts have an even lower cognitive cost than unordered ones.

Conversely, adding non-data elements add extra tasks for the reader and increase cognitive cost. These non-data elements have been reviled by Edward Tufte as “chartjunk”. His data-ink theory says that out of all the ink used for the chart, as much as possible should be devoted to data elements. Again, this goes in the direction of data efficiency.

Engagement rather than immediacy?

Again for quite some times those rules were held to be universal. Yet, several tried to challenge them, the latest being Jessica Hullman in her paper “Benefitting Infovis with Visual Difficulties“. This paper was so thought-provoking that it received an honorable mention at the recent IEEE Information Visualization Conference 2011 (as a note to the non-academic reader, this is quite a competitive achievement).

New information visualisation techniques are often evaluated.  This paper argues that such evaluations typically consider response time or accuracy, and not how well users are able to interpret and remember visuals. When only the former criteria are taken into account then cognitive efficiency is the superior framework. But this is not the case of data storytelling (which is, arguably, a small subset of  all data visualizations).

When visualizations attempt to transmit a message, then how well users can receive this message, as well their capacity to remember this for a long time are of utmost importance, much more than the ease with which a visualization is read.

In that case, Jessica Hullman proposes a trade-off between cognitive efficiency and “obstructions”. The idea is that such obstructions, or visual difficulties, can trigger active learning processes. In other words, if when trying to read a chart, a user doesn’t understand it effortlessly, but is somehow willing to get to the bottom of it, she will apply all her active brainpower to it. This effort surge will lead her to not only better interpret it but also to better remember it. To sum up, these obstructions can have positive effects, this is why when this effect works, they are called desirable difficulties.

Desirable difficulties are tricky, because if the “obstruction” is too large, if a small additional effort is not enough to understand the chart, then it will not work. So, this is definitely not about maximizing the difficulty to understand the visualizations.

In the recommendations parts of the paper the authors say:

Instead of minimizing the steps required to process visualization, induce constructive, self-directed, cognitive activity on the part of the user.

This doesn’t mean that anything goes. This paper does not argue to add as many difficulties as possible, to use every gratuitous effect in the book. Instead, the paper goes on to give actionable design suggestions to enhance reader stimulation and active information processing.

In my practice, for instance with the Better Life Index, I verify the analyses of the Hullman paper: the novelty of the form and the aesthetic appeal of the representation drive the users to overcome the difficulty posed by the unusual shape of the flower/glyph. Would bar charts have conveyed the data more efficiently and more accurately? Definitely! would the user engagement have been comparable? Definitely not.

A critique by Stephen Few

Stephen Few, whose work I have praised at multiple occasions in this blog, has published a critique of this paper (pdf). Reading his article, then the paper again, I had the feeling that they didn’t talk about the same things. In certain contexts, difficulties are not desirable at all and must be eradicated. Yet, in other contexts, cognitive efficiency does not provide the  optimal solution.

For instance, Stephen writes:

Long-term recall is rarely the purpose of information visualization.

Fair enough! so let’s agree that when it is not the case, we should not trouble ourselves with seeking to add obstructions to the display. For instance: business intelligence systems, dashboards (for monitoring), visual analytics (and more on this shortly). Spreadsheets, mostly. All usages of data that support decision, and most usages in the corporate world. The Hullman paper only applies in the other cases anyway.

He would also write (emphasis by me):

Skilled data analysts learn to view data from many perspectives to prevent knee-jerk conclusions based on inappropriate heuristics.

Agreed! and by all means, let them analyse and let them view data from as many perspectives as they see fit, and don’t get in the way of their job.

For context, check out www.palantirtech.com/government/analysis-blog/mortgage-fraud

This here is taken from a demo from Palantir government. Here analysts are tracking mortgage fraud. Each yellow dot on the top display is a transaction where a house has been sold for over 200% of its purchase value, and the ones which are connected are about the same house. We can immediately see 2 suspicious clusters where a property has been resold 4 times in these conditions. And if at the end of their work day the analysts don’t remember the address of the fraudulent transaction, it’s no big deal as long as they have identified a wrong practice.

Conversely, at the risk of repetition, the paper authors write of a trade-off between efficiency and obstructions – cognitive efficiency being generally positive. They say that obstructions become desirable difficulties only if they are constructive, that is if they are able to trigger active information processing. They are not championning 3D pie charts or atrocious dashboards as the one at the end of Stephen’s article.  Jessica signals that novelty enhance active information processing. I don’t know how to characterize speed dials in dashboards, for instance, but novel would not be the word I’d use, and again they wouldn’t be favoured by the authors of the paper. So, I think it’s a bit unfair to associate the paper with the terrible, terrible visuals presented in Stephen’s article, the ones in the original paper being a little bit more defendable.

To see this chart in context see http://www.oecd.org/dataoecd/41/50/47984536.pdf

Consider this other chart (and let’s assume for the sake of discussion that its cognitive cost is low, while it could be much lower by showing fewer time series for instance). This was published in an OECD publication almost 2 years before the 2008 crisis. I would say this chart is easy to read (we see mortgage delinquency rates dropping in most countries) but difficult to interpret and to recall. Like other charts of the document, this one is an oracle of financial apocalypse, as the proportion of delinquent mortgage in the US, the only one without a downward trend, will have the consequences that we know. So if a different way of showing the same data could have made that more obvious at the cost of legibility, I think it would have been worth a shot.

Are we on common ground yet?

If not, let’s assume now that there exists visualizations where long-term recall is, indeed, the main purpose. Examples would include use in journalism, politics, advocacy, marketing… Jessica has been involved in the series of workshop Telling stories with data at VisWeek. This suggests an interesting distinction.

  • visualizations which are tools with which a user accesses or manipulates data.
  • visualizations where an author, with a specific intent, tries to frame data in a certain way to an audience. In that case, the author wants to make sure the audience receives the message as intended, and remembers it.
See where I’m going?
In the first case, we want cognitive efficiency all the way.
In the second, we are mostly concerned with getting our message across and making it stick.
So, there is no contradiction between having a set of rules for one category of visuals, and a different one for the other, especially since the criteria of success are so different. To illustrate this I note that both the article and the paper refer to Tableau, a “cognitive efficiency” company. Yet, it turns out that Tableau is also very interested in doing as well as possible in the story telling front, and that questions asked at the paper’s presentation by Tableau representatives show their interest in this research.

Where to from there?

We have proven methods to reduce the cognitive cost of a visual, and we can thank Stephen Few for making them more accessible. It’s much more difficult, though, to optimize the characteristics of a successful “data narrative”, that is interpretation and memorability. It’s an infographics jungle out there. Those of us who haven’t seen their share of undefendable visuals just haven’t searched enough, but absolutely anything goes.
We still do not have an equivalent framework for visualizations that tell stories. InfoVis started to study them (such as in the remarkable Narrative Visualization: Telling Stories With Data) and characterize them, but we don’t have a systematic, reproducible way to make sure that data narrative will work well, just as we can do the perfect dashboard.  We do know that the best examples at large do not comply with the rules of cognitive efficiency though. Fortunately, practitioners have not waited for convincing resarch and are leading the way, even though many get lost in the process. This is why we need more research on that front! I for one is looking forward new developments in this area of InfoVis.

	    		
 

An open letter to Tableau

Normally, at this time of the year, I’ll be writing a recap of VisWeek. And I will – the writeups I have been doing for visualisingdata.com were just highlights of individual talks. But much of my VisWeek 2011 experience happened in between the talks. People you meet, ideas you glimpse, that collide and generate new ideas, plans you make…

One person I am always happy to meet at VisWeek, and who is visible from afar, is Tableau’s Jock MacKinlay. Since I couldn’t attend TCC11, I spent some time talking with Jock and Lori Williams about 7 and plans for the future. Whenever I would start a sentence by “hmm, you know…” Lori or Jock would write it down and ask me to send any remark in writing, so I thought I should do that, but rather than keeping this private, I’m going to stick to what’s written on one button I picked up on their stand: do it in public.

For a piece of software that complex, Tableau Desktop/Public has received improvements at an astonishing rate. So fast that one of the recommendations I had is already implemented or ready in the next release. So I have the conviction that Tableau product people care and are on the lookout for useful suggestions on how to move forward. Here are a few.

On design

More fonts please!

In Tableau Public, you’re stuck with the windows XP default fonts. That’s not an awful lot. Is that enough?

For the first 15 years or so of the web, webmasters had to be content with so called web-safe fonts. There was a time where the web was probably 90% verdana and people accepted this like death and taxes. But this is now 2011 and any personal website can be a marvel of customized typography – at least, the parts around a Tableau viz.

More specifically there are two issues with the current set of fonts.

First, there is no condensed font (think Helvetica Condensed, etc.). But condensed fonts are extremely useful to display numbers in callouts or on axes.

Second, the default font is Arial. This is not a neutral choice. There are lots of adjectives that can be associated with Arial, and frankly I don’t think it’s right these qualifiers should apply to Tableau. Arial is the default font of Excel pre-2007 , it’s the font of those bullet-ridden, unstyled, unsavory powerpoint slides that bore us to death for over a decade of unnecessary meetings. Many say it’s a rip-off of Helvetica.

Tableau, you're better than Arial. Say it.

Being able to chose one’s font is essential to branding. The typography of my vizzes should concord with that of my website and my brand. Same goes with my colours (more on that in a second). Conversely if any output on my website is not aligned with my brand it denotes a lack of control.

So what should Tableau do?

Offer more fonts! There are literally hundreds of proven fonts which would be very well suited for dashboards. Fonts which are not unheard of, yet not trite. Legible, yet distinct.

Ideally, Tableau should commission its own font. There is no agreement about which is the best font to display numbers. Settle the question by creating that font! Anyone who would use this font in Desktop, Public or otherwise will subtly scream “Ra Ra Tableau”.

Then, offer several themes to choose from. In my view, Tableau encourages user to focus on what question to ask the data. Once this is done the resulting “visual query” should look ok, or require very little rework to be publishable as is. Today moving away from the arial bold, arial, light gray background on header scheme does require a lot of rework, and shouldn’t. Other Washington-based software firms have been offering themes for ages. Now look at all the existing Office palettes. They all are quite good. I have trouble using the default colors because it really says: this guy hasn’t given half a thought to color. At least I am aware that many very acceptable choices exist. I think it’s a reasonable stance.

Node-link graphs

Node-link graphs, or as they really should be called, graphs, are the most useful way to represent relationship across data elements. If your dataset has items and relationship between them (like movements from a place to another, transactions from an entity to another, people and how they relate to another… there are really many examples), a graph can show quickly and elegantly what there is to know about the data.

Mobile patent suits, by Mike Bostock

I insist on the “quick insight” part because I believe that fundamentally this is what Tableau is about. Many people, including me, use Tableau systematically on a new dataset they try to understand. It’s very quick and efficient to explore data with Tableau and find out what’s interesting in a dataset. You’ll find consensus in the visual community that Tableau is the best “drafting tool” ever.

Back on graphs. One problem with graphs is that since they are not in the Excel canonical list of charts, they are not in the mindset of most corporate users.

Graphs, I believe, would be relatively easy to include in the Tableau toolkit. There are several algorithms that can calculate where the nodes should be drawn. This is not unlike drawing a map – suddenly the x and y come from lat-lon coordinates which are deduced from the underlying data. Likewise an algorithm could assign x and y coordinates to any row.  Then all there remains to do is draw a dot or shape with all the standard attributes, color, size, etc. Links between nodes could be handled by the line object.

When I told Jock about graphs he asked me what would I do with that. Oh, I had some ready ideas: as an HR manager, I could visualize all the relationship in my company. There are many explicit (work in the same team) but also implicit (have worked together in the past, graduated in the same school, similar interests, offices close by etc) relationship in HR files. Or, logistics – I could represent my itineraries non-geographically. Suddenly I can make cartograms (which are usually tricky to do) instead of maps.

In retrospect this is not the right way to approach the problem. I’ve only worked in a handful of positions and industries. There are thousands of different jobs in as many domains who can look at data with different lenses. What I do know as a data visualization person is that graphs are useful and that if people who traditionally didn’t have access to them could pull them instantly on their data – fantastic things may occur.

Treemaps and hierarchical charts

 

A circle map done with Tableau. This was slightly cumbersome and doesn’t work so well. This is also the biggest you can do as it uses the size property of circles, here at its maximum. Coordinates of circles are computed outside of Tableau. My attempts at making a treemap are, well, less polished. The dataset is the flare class hierarchy as in here

I could say pretty much the same thing about treemaps: they are very useful, especially since tabulated datasets, the bread and butter of Tableau, often contain hiearachies. Yet they are not part of the Excel family so if you’ve never worked with them you can’t feel you need them.

So I asked Jock, why doesn’t Tableau have treemaps? It makes sense, IMO. Treemaps are one of the few “advanced” visualization tools which have gained quasi-acceptance. I really see that as a progress. Isn’t there a natural match with Tableau? Jock answer was that the problem with treemap is that they do not fit well with Tableau algebra. To turn a dataset into a treemap would require a specific process that can’t generalize well like the rest of Tableau.

See? There’s your problem

So instead (or rather as a first step) I suggest Tableau adds an attribute to marks: a secondary size attribute. See it this way: if only one size attribute is filled, then it governs all aspects of the size of a mark (height, width, area, diagonal you name it. It would be proportional to all). But if there are two sizes one will affect height, the other, width. So yes, we can do rectangles. And ellipses too. Now that in itself is not half-bad. Fattened stars or what not, I don’t know what to do with them but rectangles and ellipses, now…

Now once you can handle rectangles, once you can handle layouts (which you can, because: maps), you can have treemaps, and/or any hierarchy or packing algorithm. I think we can stop at treemaps but folks at Tableau know there are wide, (mostly) unexplored territories beyond this point.

Tableau and storytelling

Stateful urls

OK this is the innovation that was already in the pipeline when I mentioned it. It would be very useful to have a system of stateful urls for Tableau public. What is that? The possibility to pass a set of parameters in addition to the url of a view, so that it is in a certain state. So the same dashboard could be shared in different states (for instance, with different items selected, or different filters) without having to be saved several times under several different names.

Bonus: dashboards under a certain state, if they can be uniquely identified through those parameters, can then be shared on facebook and twitter as is.

All of this is taken care of.

Interactive slides framework

Now the next step: an interactive slides framework.

The interactive slideshow, once quasi-exclusive to the New York Times, is now gaining in popularity.

The idea? there is a structure like the common slideshow, where the user can go to the next or previous slide, guided by a narrative. But at any step instead of following the flow, they can stop and interact with the view they have.

That said despite being a simple idea it requires quite an amount of overwork to implement properly. It makes sense for Tableau to have their own framework so users can arrange nicely a sequence of views in the desktop tool, for instance dashboards in specific states, with an extra layer of commentary if needs be, and then deploy the finished product which can be embedded in one location as opposed to requiring the user to embed several vizzes in one page. The advantage would be that the reader would go from one to the other as intended by the author, with a supporting narrative or explanations, and wouldn’t be required to explore or interact with each one to get an idea of what is going on.

What do you think?

What would you like in Tableau?