d3: adding stuff. And, oh, understanding selections

From data to graphics

the d3 principle (and also the protovis principle)
d3 and protovis are built around the same principle. Take data, put it into an array, and for each element of data a graphical object can be created, whose properties are derived from the data that was provided.

Only d3 and protovis have a slightly different way of adding those graphical elements and getting data.

In protovis, you start from a panel, a protovis-specific object, to which you add various marks. Each time you add a mark, you can either:

  • not specify data and add just one,
  • or specify data and create as many as there are items in the array you pass as data.

.

How de did it in protovis

var vis=new pv.Panel().width(200).height(200); 
vis.add(pv.Panel).top(10).left(10)
  .add(pv.Bar)
    .data([1,4,3,2,5])
    .left(function() {return this.index*20;})
    .width(15)
    .bottom(0)
    .height(function(d) {return d*10;});
vis.render();

this simple bar chart in protovis
you first create a panel (first line), you may add an element without data (here, another panel, line 2), and add to this panel bars: there would be 5, one for each element in the array in line 4.

And in d3?

In d3, you also have a way to add either one object without passing data, or a series of objects – one per data element.

var vis=d3.select("body").append("svg:svg").attr("width",200).attr("height",200);
var rect=vis.selectAll("rect").data([1,4,3,2,5]).enter().append("svg:rect");
rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}
  .attr("fill","steelblue");

In the first line, we are creating an svg document which will be the root of our graphical creation. It behaves just as the top-level panel in protovis.

However we are not creating this out of thin air, but rather we are bolting it onto an existing part of the page, here the tag. Essentially, we are looking through the page for a tag named and once we find it (which should be the case often), that’s where we put the svg document.

Oftentimes, instead of creating our document on , we are going to add it to an existing <div> block, for instance:

<div id="chart"></div>
<script type="text/javascript">
var vis=d3.select("#chart").append("svg:svg");
...
</script>

Anyway. To add one element, regardless of data, what you do is:

The logic is : d3.select(where we would like to put our new object).append(type of new object).

Going back to our code:

var vis=d3.select("body").append("svg:svg").attr("width",200).attr("height",200);
var rect=vis.selectAll("rect").data([1,4,3,2,5]).enter().append("svg:rect");
rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}
  .attr("fill","steelblue");

On line 2, we see a different construct:

an existing selection, or a part of the page
.selectAll(something)
.data(an array)
.enter()
.append(an object type)

This sequence of methods (selectAll, data, enter and append) are the way to add a series of elements. If all you need to know is to create a bar chart, just remember that, but if you plan on taking your d3 skills further than where you stopped with protovis, look at the end of the post for a more thorough explanation of the selection process.

Attributes and accessor functions

At this stage, we’ve added our new rectangles, and now we are going to shape and style them.

rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}
  .attr("fill","steelblue");

All the attributes of a graphical element are controlled by the method attr(). You specify the attribute you want to set, and the value you want to give.
In some cases, the value doesn’t depend on the data. All the bars will be 15 pixels wide, and they will all be of the steelblue color.
In some others, the value do depend on the data. We decide that the height of each bar is 20 times the value of the underlying data, in pixels (so 1 becomes 20, 5 becomes 100 etc.). Like in protovis, once data has been attributed to an element, function(variable name) enables to return a dynamic value in function on that element. By convention, we usually write function(d) {…;} (d for data) although it could be anything. Those functions are still called accessor functions.
so for instance:

.attr("height",function(d) {return d*20;})

means that the height will be 20 times the value of the underlying data element (exactly what we said above).
In protovis, we could position the mark relatively to any corner of its parent, so we had a .top method and a .bottom method. But with SVG, objects are positioned relatively to the top-left corner. So when we specify the y position, it is also relative to the top of the document, not necessarily to the axis (and not in this case).
so –

.attr("y", function(d) {return 100-d*20;})

if we use scales (see next post), all of this will have no impact whatsoever anyway.
Finally, there is an attribue here which doesn’t so much depend on the value of the data, but of its rank in the data items: the x position.
for this, we write: function(d,i) {return i*20;}
Here is a fundamental difference with protovis. In protovis, when we passed a second argument to such a function, it meant the data of the parent element (grand parent for the third, etc.). But here in d3, the second parameter is the position of the data element in its array. By convention, we write it i (for index).
And since you have to know: there is no easy way to retrieve the data of the parent element.

Bonus: understanding selections

To add many elements at once we’ve used the sequence: selectAll, data, enter, append.
Why use 4 methods for what appears to be one elementary task? If you don’t care about manipulating nodes individually, for instance for animations, you can just remember the sequence. But if you want to know more, here is what each method does.

selectAll

the selectAll method
First, we select a point on which to add your new graphical objects. When you are creating your objects and use the selectAll method, it will return an empty selection but based on that given point. You may also use selectAll in another context, to update your objects for instance. But here, an empty selection is expected.

data

the data method
Then, you attribute data. This works quite similarly to protovis: d3 expects an array. d3 takes the concept further (with the concept of data joins) but you need not concern yourself with that until you look at transitions.
Anyway, at this stage you have an empty selection, based on a given point in the page, but with data.

enter

the enter method
The enter method updates the selection with nodes which have data, but no graphical representation. Using enter() is like creating stubs where the graphical elements will be grafted.

append

the append method
Finally, by appending we actually create the graphical objects. Each is tied to one data element, so it can be further styled (for instance, through “attr”) to derive its characteristics from the value of that data.

 

Plotter: a tool to create bitmap charts for the web

In the past couple of months, I have been busy maintaining a blog for OECD: Factblog.

The idea is to illustrate topics on which we work by a chart which we’ll change regularly. So in order to do that, I’d have to be able to create charts of publishable quality.

Excel screenshots: not a good option

There are quite a few tools to create charts on the net. Despite this, the de facto standard is still a screenshot of Excel, a solution which is even used by the most reputable blogs.

excelinblog

This is taken from http://theappleblog.com/2009/12/18/iphone-and-ipod-touch-see-international-surge/

But alas, Excel is not fit for web publishing. First, you have to rely on Excel’s choice of colours and fonts, which won’t necessarily agree to those of your website. Second, you can’t control key characteristics of your output, such as its dimensions. And if your chart has to be resized, it will get pixelated. Clearly, there is a better way to do this.

That's a detail of the chart on the link I showed above. The letters and the data bars are not as crisp as they could have been.

That's a detail of the chart on the link I showed above. The letters and the data bars are not as crisp as they could have been.

How about interactive charts?

Then again, the most sensible way to present a chart on the web is by making it interactive. And there is no shortage of tools for that. But there are just as many issues.
Some come from the content management system or blogging environment. Many CMS don’t allow you to use javascript and/or java and/or flash. So you’ll have to use a technology which is tolerated by your system.

Most javascript charting solutions rely on the <CANVAS> element.  Canvas is supported by most major browsers, with the exception of the Internet Explorer family. IE users still represent roughly 40% of the internet, but much more in the case of my OECD blog, so I can’t afford to use a non-IE friendly solution. There is at least one library which works well with IE, RaphaelJS.
Using java cause two problems. First, the hiccup caused by the plug-in loading is enough to discourage some users. Second, it may not be understood well by readers:

This is how one of my post reads in google reader.

This is how one of my posts reads in google reader.

And it’s futile to believe that readers will read blogs from their home pages. So if all readers can’t show it well it’s a show-stopper.

A tool to create good bitmap charts

So, in a variety of situations the good old bitmap image is still the most appropriate thing to post. That’s why I created my own tools with Processing.

plotter windows

plotter mac OS X

plotter linux

Here’s how it works.

when you unzip the files, you have a file called “mychart.txt” which is a set of parameters. Edit the file according to the instructions in “instructions.txt” to your liking, then launch the tool (plotter application). It will generate an image, called “mychart.png”.

The zip files contain the source code, which is also found here on my openprocessing account.

With my tools, I wanted to address two things. First, I wanted to be able to create a chart and to have a precise control of all of its components, especially the size. In Excel, by contrast, it’s difficult to control the size of the plotting area, or the placement of the title – all of this things are done automatically and are difficult to correct (when it’s possible). Second, I wanted to be able to create functional thumbnails.

If you have to create smaller versions of a chart from a bigger image, the easiest solution is to resize the chart using an image editing software. But that’s what you’d get:

That's the original chart.

That's the original chart.

And that's the resized version. Legible? nah.

And that's the resized version. Legible? nah.

But what if it were just as easy to re-render the chart in a smaller size, than to resize it with an external program? My tool can do that, too.

Left: resized, right: re-rendered.

Left: resized, right: re-rendered.

Here’s a gallery of various charts done with the tool. The tool supports: line charts, bar charts (both stacked and clustered), dots charts and area charts. No pie charts included. It’s best suited for simple charts with few series and relatively few data points.

Impact of energy subsidies on CO2 emissions

Impact of energy subsidies on CO2 emissions

Temperature and emission forecasts

Temperature and emission forecasts

Greenhouse gas emission projections

Greenhouse gas emission projections

I hope you find it useful, tell me if you do and let me know if you find bugs.

 

Review of Tableau 5.0

Those last 2 weeks, I finally found time to give Tableau 5.0. Tableau enjoys a stellar reputation among the data visualization community. About a year ago, I saw a live demo of Tableau by CEO and salesman extraordinaire Christian Chabot. Like most of the audience, I was very impressed, not so much by the capacities of the software but by the ease and speed with which insightful analysis seemed to appear out of bland data. But what does it feel on from the user perspective?

Chartz: ur doing it wrong

Everyone who wrote about charts would pretty much agree that the very first step in making one is to decide what to show. The form of the display is a consequence of this choice.

Most software got this wrong. They will ask you how you want your display to look like, then ask you for your data. Take this screenshot from Excel:

excel

When you want to insert a chart, you must first choose what kind of chart (bar, line, column, pie, area, scatter, other charts) and one of its sub-types. You are not asked, what data does this apply to, and what that data really is. You are not asked, what you are trying to show through your chart – this is something you have to manage outside of the software. You just choose a chart.

I’m picking Excel because with 200m users, everyone will know what I’m talking about, but virtually all software packages ask the user to choose a rather rigid chart type as a prerequisite to seeing anything, despite overwhelming theoretic evidence that this approach is flawed. In Excel, like in many other packages, there is a world of difference between a bar chart and a column chart. They are not of the same nature.

A reverted perspective

Fortunately, Tableau does it the other way round. When you first connect with your data in Tableau, it distinguishes two types of variables you can play with: dimensions and measures. And measures can be continuous or discrete.

tableau-dimensions(This is from an example file).

Then, all you have to do is to drag your dimensions and your measures to the center space to see stuff happening. Let’s drag “close” to the rows…

tableau-dragging-1We already see something, which is not terribly useful but still. Now if we drag Date into the columns…

tableau-dragging-2

Instant line chart! the software found out that this is the type of representation that made the most sense in this context. You’re trying to plot continuous variables over time, so it’s pretty much a textbook answer. Let’s suppose we want another display: we can click on the aptly name “show me!” button, and:

tableau-show-me

These are all the possible representations we have. Some are greyed out, because they don’t make sense in this context. For instance, you need to have dimensions with geographic attributes to plot things on a map (bottom left). But if you mouse over one of those greyed out icons, you’ll be told why you can’t use them. So we could choose anything: a table, a bar chart, etc.

A simple thing to do would be to switch rows and columns. What if we wanted to see date vertically and the close horizontally? Just drag and drop, and:

tableau-flip

Crafting displays

Gone are the frontiers between artificial “chart types”. We’re no longer forcing data into preset representations, rather, we assign variables (or their automatic aggregation, more on that shortly) to possible attributes of the graph. Rows and columns are two, which shouldn’t be taken too literally – in most displays, those would be better described as abcissa and ordinate – but all the areas in light grey (called “shelves”) can welcome variables : pages, filters,path, text, color, size, level of detail, etc.

more-dimensions

Here’s an example with a more complex dataset. Here, we’re looking at sales figures. We’re plotting profit against sales. The size of the marks correspond to the volume of the order, and the colour, to their category. Results are presented year by year. It is possible to loop through the years. So this display replicates the specs of the popular Trendalyzer / Motion chart tool, only simpler to set up.

Note that as I drag variables to shelves, Tableau often uses an aggregation that it thinks makes more sense. For instance, as I dragged Order Date to the page shelf, Tableau picked the year part of the date. I could ask the program to use every value of the date, the display will be almost empty but there would be a screen for each day. Likewise, when I dragged Order Quantity to the Size shelf, Tableau chose to use the sum of Order Quantity instead. Not that it makes much of a difference here, as each bubble represents only one order. But the idea is that Tableau will automatically aggregate data in a way that makes sense to display, and that this can always be overridden.

But if I keep the data for all the years in the display, I can quickly see the transactions where profit was negative.

sets1And I can further investigate on this set of values.

So that’s the whole idea. Because you can assign any variable to any attribute of the visualization, in the Tableau example gallery you can see some very unusual examples of displays.

Using my own data

When I saw the demos, I was a little skeptical of the data being used. I mean, things were going so smoothly, evidence seemed to be jumping at the analyst, begging to be noticed. Tableau’s not bad at connecting with data of all forms and shapes, so I gave it a whirl with my own data.

Like a lot of other official data providers, OECD’s format of choice for exporting data is SDMX, a flavor of XML. Unfortunately, Tableau can’t read that. So the next easiest thing for me was Excel.

I’m not going to get too much into details, but to come up with a worksheet that Tableau liked with more than a few tidbits of data required some tweaking and some guessing. The best way seems to be: a column for each variable, dimensions and dates included, and don’t include missing data (which we usually represent by “..” or by another similar symbol).

Some variables weren’t automatically reckognized for what they were: some were detected as dimensions when they were measures, date data wasn’t processed that well (I found that using 01/01/2009 instead of 2009 or 1/2009 worked much better). But again, that was nothing that a little bit of tweaking couldn’t overcome.

On a few occasions, I have been scratching my head quite hard as I was trying to understand why I could get Y-o-Y growth rates for some variables, but not for some others, or to make custom calculated fields. Note that there are plenty of online training videos on the website. I found myself climbing the learning curve very fast (and have heard similar statements of recent users who quickly felt empowered) but am aware that practice is needed to become a Tableau Jedi. What I found recomforting is that without prior knowledge of the product, but with exposure to data design best practices, almost everything in Tableau seems logical and simple.

But anyway – I was in. Here’s my first Tableau dashboard:

my-dashboardA Dashboard is a combination of several displays (sheets) on one space. And believe me, it can become really sophisticated, but here let’s keep it simple. The top half is a map of the world with bubbles sized after the 2007 population of OECD countries. The bottom half is the same information as a bar chart, with a twist: the colour corresponds to the population change in the last 10 years. So USA (green) have been gaining population while Hungary has seen its numbers decrease.

I’ve created an action called “highlighting on country” to link both displays. The best feature of these actions is that they are completely optional, so if you don’t want to have linked displays, it is entirely up to you and each part of the dashboard can behave independantly. You can also bring controls to filter or animate data which I left out for the sake of simplicity. However, you can still select data points directly to highlihght them in both displays, like this:

my-dashboard-highlight-bottomHere I’ve highlighted the top 5 countries. The other ones are muted in both displays. Here my colour choice is unfortunate because Japan and Germany, which are selected, don’t look too different from the other countries. Now I can select the values for the countries of Europe:

my-dashboard-highlight-europe

And you’ll see them highlighted in the bottom pane.

Display and style

Representing data in Tableau feels like flipping the pages of a Stephen Few book, which is more than coincidiential as he is an advisor to Tableau. From my discussion with the Tableau consultant that called me, I take that Tableau takes pride in their sober look and feel, which fervently follows the recommendation of Tufte, and Few. I remember a few posts from Stephen’s blog where he lashed as business intelligence vendors for their vacuous pursuit of glossiness over clarity and usefulness. Speaking of Few, I’ve upgraded my Tableau trial by re-reading his previous book, Information Dashboard Design, and I could really see where his philosophy and that of Tableau clicked.

So there isn’t anything glossy about Tableau. Yet the interface is state-of-the-art (no more, no less). Anyone who’ve used a PC in the past 10 years can use it without much guessing. Colours of the various screen elements are carefully chosen and command placement makes sense. Most commands are accessible in contextual menus, so you really feel that you are directly manipulating data the whole time.

When attempting to create sophisticated dashboards, I found that it was difficult to make many elements fit on one page, as the white space surrounding all elements becomes incompressible. I tried to replicate displays that I had made or that I had seen around, I was often successful (see motion chart reproduction above) but sometimes I couldn’t achieve the level of customization that I had with visualizations which are coded from scratch in Tableau. Then again even Tableau’s simplest representations have many features and would be difficult to re-code.

Sharing data

According to Dan Jewett, VP of product development at Tableau,

“Today it is easier to put videos on the Web than to put data online.”

But my job is precisely to communicate data, so I’m quite looking forward this state of affairs to change. Tableau’s answer is twofold.

The first half is Tableau Server. Tableau Server is a software that organizes Tableau workbooks for a community so they can access it online, from a browser. My feeling is that Tableau Server is designed to distribute dashboards within an organization, less so with the anyone on the internet.

That’s where the second part of the answer, Tableau Public, comes into play. Tableau Public is still in closed beta, but the principle is that users would have a free desktop applications which can do everything that Tableau Desktop does, except saving files locally. Instead, workbooks would have to be published on Tableau servers for the world to see.

There are already quite a few dashboards made by Tableau Public first users around. See for instance How Long Does It Take To Build A Technology Empire? on one of the WSJ blogs.

Today, there is no shortage of tools that let users embed data online without technical manipulations. But as of today, there is no product that could come close to this embedded dashboard. Stephen McDaniel from Freakalytics notes that due to Tableau’s technical choices (javascript instead of flash), dashboards from Tableau Public can be seen in a variety of devices, including the iPhone.

I’ve made a few dashboards that I’d be happy to share with the world through Tableau Public.

This wraps up my Tableau review. I can see why the product has such an enthusiastic fan base. People such as Jorge Camoes, Stephen Few, Robert Kosara, Garr Reynolds, Nathan Yau, and even the Federal CIO Vivek Kundra have all professed their loved for the product. The Tableau Customer Conference, which I’ve only been able to follow online so far, seems to be more interesting each year. Beyond testimonies, the gallery of examples (again at http://www.tableausoftware.com/learning/examples, but do explore from there to see videos and white papers), still in the making, shows the incredible potential of the software.

 

New data services 2: Wolfram|alpha

In March this year, überscientist Stephen Wolfram, of Mathematica fame, revealed the world he was working on something new, something big, something different. The first time I heard of this was through semantic web prophet Nova Spivack, who is not known to get excited by less-than-revolutionary projects. That, plus the fact that the project was announced so short before its release, contributed to build anticipation to huge levels.

wolframalpha

Wolfram|alpha describes itself as a “computational knowledge engine” or, simply put, as an “answer engine”. Like google and search engines, it tries to provide information based on a query. But while search engines simply try to retrieve the keywords of the query in their indexed pages, the answer engine tries to understand the query as a question and forms an educated answer. In a sense, this is similar to the freebase project, which is to put all the knowledge of a world in a database where links could be established across items.

It attempts to detect the nature of each of the word of the query. Is that a city? a mathematic formula? foodstuff? an economic variable? Once it understands the terms of the query, it gives the user all the data it can to answer.

Here for instance:

wolframalpha-2

Using the same find access process present share diagram as before,

Wolfram|alpha’s got “find” covered. More about that below.

It lets you access the data. If data have been used to produce a chart, then there is a query that will retrieve those bare numbers in a table format.

Process is perhaps Wolfram|Alpha’s forte. It will internally reformulate and cook your query to produce all meaningful outputs in its capacity.

The presentation is excellent. It is very legible, consistent across the site, efficient and unpretentious. When charts are provided which is often, the charts are small but both relevant and informative, only the necessary data are plotted. This is unusual enough to be worth mentioning.

Wolfram|alpha doesn’t allow people to share its outputs per se, but since a given query will produce consistent results, users can simply exchange queries or communicate links to a successful query result.

Now back to finding data.

When a user submits a query, the engine does not query external sources of data in real time. Rather, it used its internal, freebase-like database. This, in turn, is updated by external sources when possible.

For each query, sources are available. Unfortunately, the data sources provided are for the general categories. For instance, for all the country-related informations, the listed sources are the same, and some are accurate and dependable (national or international statistical offices), some are less reliable or verifiable (such as the CIA world factbook or what’s cited as Wolfram|Alpha curated data, 2009.). And to me that’s the big flaw of this otherwise impressive system.

Granted, coverage is not perfect. That can only improve. Syntax is not always intuitive – to make some results appear in a particular way can be very elusive. But this, as well, will get gradually better over time. But to be able to verify the data presented, or not, is a huge difference – either it is possible or not. I’m really looking forward to this.

 

Junk charts

The enemy.

I’d like to write a few words about blogs I read regularly and to start this series, I am happy to talk about Junk Charts. 

Junk Charts is the one-man crusade of Kaiser Fung against ineffective charts, who has been working on the blog since 2005. Kaiser is a convinced follower of Tufte, whom he owes his blog’s name. Tufte describes chart junk as whatever clutters graphs and obscures information contained in the chart data. 

To this end, Kaiser collects ineffective, misleading or just plain wrong graphs and redraws them according to sound principles. He fights a current trend that gives too much emphasis on the aesthetics of a published graph, to the expense of its meaning. 

He is now assisted by readers who gladly submit candidates for redesign. I am grateful that no OECD chart ever had the honors of Junk Charts, although a couple of charts which were direct copies of ours have been featured, such as this one.

In 3 years, Junk Charts has amassed quite a diverse community of readers, and posts are always followed by a healthy discussion – just what blogging is about. I have found great tips and ideas both in the articles and the comments.

So thank you Kaiser for Junk Charts and I am always looking forward to the next post!