Designing data visualizations

Designing data visualization book cover

Noah Iliinsky and O’Reilly were kind enough to send me one review copy of Noah’s book and who says review copy says review, so here goes.

We need more introductory books to data visualization.

I’ve had several discussions with data visualization colleagues who feel that there are too many books already. I strongly believe otherwise.

As of this writing, there are 59 books tagged data visualization on Amazon, versus well over a thousand for Java (for example). And on those 59, I would say about a dozen qualify as introductory. Here are 3 reasons why introductory books are important.

  • You only need to know a little to start making effective visualizations. A small book won’t teach you all there is to know about visualization, but you don’t need that to get off to a good start. A lot of this has to do with asking yourself the right questions. But this is a very unnatural thing to do, especially when you feel you can do stuff. Fortunately, even a short book can help you to pause and think.
  • An effective visualization is not harder to make than a poor one. Well, actually it is, really good visualizations are built after many iterations on one promising concept. But the point is, a lot of efforts and ressources can go into abyssmal visualizations. If you are in a position to buy visualization, having even basic knowledge of how data visualization works can prevent you from wasting your money.
  • There are many approaches to visualization. The right introductory book will be the one that resonates with you. Some people who are interested in this love to code, some are afraid of programming. Some are accomplished visual artists, some don’t know to draw. Some have specific needs (business dashboards, presentations, interactive web applications, etc.).

Where does designing data visualizations fit?

Designing Data Visualizations is a very short book – the advantage is that you can read this in a couple of hours. It’s perfect for a train or plane trip for instance. The format of the book (23 x 17.5 cm, flexible paperback) makes it easy to carry and read anywhere. And it’s an easy read – you won’t need to put down the book every few pages to make sure you understood.

The flipside of this is that you won’t learn any actionable skills from the book. The book is never trying to teach you to make things : this is explicitly outside of its scope. What is does is make you think on how to do stuff. It makes you consider the choices you make.

So you’re making a visualization. Does your choice of representation makes sense? how about your colors? placement? If you’re not confident that you know the answer to this kind of questions you must read the book right now; else, you won’t be able to improve your work. And again that is what successful designers do – iterate and improve, again and again and again.

As a non-native speaker of English one reason why I enjoy reading introductory books is for their excellent formulation of things. You know, there are those things you have a vague idea of, and the writer puts the exact words on it. So I’ll go ahead and quote my favorite paragraph :

Consult [your goal] when you are about to be seduced by the siren song of circular layouts, the allure of extra data, the false prophet of “because I can”. These are distractions on your journey. As Bruce Lee would say, “It is like a finger pointing a way to the moon. Don’t concentrate on the finger or you will miss all that heavenly glory”.

Who is this book for?

I think the people who would benefit the most from the books fall in two categories:

  1. Those who know absolutely nothing about visualization but have some interest in the subject. And the subset of those who don’t really have time to find out all about it (think: your client, your n+2 boss). They will appreciate that there is a real take-out value in such a short book.
  2. Those who can create visualization because for instance they are coders, designers, excel users etc. and who see data visualization as a byproduct of their activity, so they never really asked themselves those questions. And among those, I’m thinking mostly of coders. Noah and I met at last year’s Strata conference which is also attended by the cream of the crop of the data scientists. I was surprised to see that some of them, despite being able to harness huge quantity of data, were severely limited in their visualization options because they never had an opportunity to learn. These people who are already at ease with the tool will see their activity supercharged thanks to the book.
For a data practitioner who has already an interest in theory I won’t lie to you – reading the book will feel like patting yourself on the back and there will be little you will learn. But consider, for instance, giving copies to your customers and think of all the fruitless discussions that will  this will save you in the course of a project.

Hollywood + data III: our info+beauty awards entry. Bonus: making of.

So Jen and I released our Info+beauty awards entry.

How did we end up with this?

it’s really cool working around movies, because it’s something we can relate to.

A part of my movie ticket stubs stash.

At first I wanted to do something out of keywords we could grab on the movies but  Jen came up with another idea I found more worth pursuing: working around the story types (which was the most interesting aspect of the curated contest dataset) and see if there was not some kind of grand truth we could unravel there. She also requested stars and glitter, because we were not going to work on this glamorous dataset with a tedious dashboard done in Excel.

That truth didn’t take so much time to find: the most frequently used story types (like comedy or movies with monsters) do not perform well in the box office while different story types (stories of teens growing up, or when the main character turns into something else), which are used less often, are much more profitable. So why doesn’t hollywood make more Junos and Black Swans and fewer College Road Trips or Dylan Dogs?

That’s the idea. Now the making.

Fair warning – the rest of this post is fairly technical. 

Making stars

If I had to contribute significantly to the project it had to be done in d3/svg.

Fortunately, it’s easy to generate star shapes in d3. Once you have the coordinates of where the points of one unitary star should be, you can easily make stars of any size with a function and a parameter.

var c1=Math.cos(.2*Math.PI),c2=Math.cos(.4*Math.PI),

    // ok the constant after r1 is the thickness of the branches.
    // 1 is a "straight" star, less is narrower, more is thicker.

    // this is a list of the pair of coordinates of the points that make a star.
lineStar=function(k) {
	var line=d3.svg.line()
		.x(function(d) {return d[0]*k;})
		.y(function(d) {return d[1]*k;})
	return line(star)+"Z"; // this will stitch everything together.

Now, running lineStar(10) will return the path description of a star with a radius of 10, thusly:


Placing, moving (and spinning) the stars

The next idea was placing the stars.

And for this we need two things: being able to position them somewhere, and being able to move them easily from point A to point B, ideally with some cool effect in between.

So, it would be possible to change the x and y attributes of the path, but each would have to be dealt with separately with a different function call. I found it a better approach to rely on the transform attribute and translate. Each time I want to position a star somewhere, I need it to be set at an x and y coordinate, which will always correspond to either the data of the star, or that of a group above it. For instance, a star corresponding to a movie will need to be at the position corresponding to the data of that movie, or that of the story type above it if it’s still collapsed, or that of the high-level grouping of story types if that’s collapsed.

Now all of the data structures for that are array of objects which all have x and y keys. In other terms, for any star-shaped object, I can always expect the underlying datum d to have d.x and d.y values. So, I wrote a function translate(d) which works on those 2 properties. And as a result, when I need to position any object all I have to write is:


and the object will be positioned according to its underlying data. (this is equivalent to writing .attr(“transform”,function(d) {return translate(d);}) )

If I need to be them elsewhere, i.e. at the position of their parent, I can pass the data of that parent as an argument, for instance:

.attr("transform",function(d) {return translate(structs[d.struct]);})

For a cheap bit of extra action, I’ve added a spinning effect in the translate function. Since translate(d) returns a value for the transform attribute, nobody said it just had to be instructions for translation! so I’ve added a rotate after the translate. The arguments for the rotate function depend on the x and y properties of the argument as well, so when stars move across the screen, the rotate angle changes slightly with each increment of either coordinate, giving the impression of spinning.

Explosions, starlets and other effects

Most of the cool things happening in the visualization rely on one very simple principle about d3 transitions: chaining them.
In the code you’ll find oftentimes this pattern:

.selectAll("someobject").data(...).enter().append(...) // creates the items
... // sets the initial attributes
... // change the attributes
.each("end", function() { // stuff to be done on each item after the transition is over

and within that function, you’ll find either:
another transition which starts exactly when the previous one ends, so for instance opacity can decrease (causing a fading effect):…

or a command to remove the object:

When another transition is called, there can be another one after, then another one, then another one, then eventually the object can be removed (or not).

Now you may think of transitions as ways to get one object to change smoothly from state A to state B, like a rectangle moving across the screen. But if you start to think that the objects can be discarded after the transitions, you’ll realize that there is an unbelievable number of things that can be done with them.
For instance, upon clicking on some stars, I am creating another star shape at that same location. Initially it has a the same size as the star, but I increase that radius to a large number (1000px) while decreasing its opacity to 0. So it seems that the new star is both exploding and fading. When it’s become transparent I remove it.

gStructs.append("svg:path") // here I'm creating a "path" shape
.style("stroke","none") // with no outline
.style("fill",colorXp)  // with the fill color of the explosion
.style("opacity",0.2)  // and a low opacity to start with (translucent)
.attr("d",lineStar(d.size[sizeAxis])) // I give it the shape of a star and the size of the
                                      // star that's being clicked
.attr("transform",translate(d)) // and I position it on that star

.transition() // action!

.duration(500)	// a 500ms transition. Long enough to see the effect.
.attr("d",lineStar(1000)) // the star expands to a radius of 1000.
.style("opacity",0) // while fading to transparency.

.each("end",function() {;}) // and when it's done - it's removed.

Changing axes

In this visualization I let the user change what’s plotted along the axes. It’s not very difficult to do but it’s a hassle to do it late in the project as it has been our case because it requires a lot of housekeeping. This is really about the data structures that will support our items. Instead of having just one value for x, y and size they have an object with several keys, one per axis. Then we maintain one variable per axis type, so everywhere we should write: d.x, we write instead: d.x[xAxis].

So when there is an axis change, of course, we do a transition so that the stars and everything move smoothly to their new position. But what if the objects were already moving? When an unplanned transition interferes with an ongoing one, the results are often ugly, especially if the current transition had chained transitions waiting to be triggered. In other words, this will leave a mess.

The way I’ve dealt with this is by keeping a tab on the number of transitions going on at a certain time. The axis change could only occur if no other transitions were taken place. If that was the case they were simply denied. There are other ways to do that like a queue of actions but that seemed the simple and adequate way to deal with this.

Bootstrap and google fonts

This was the first non-trivial project where I used bootstrap and I’m just never going back. Bootstrap simply removes all the hassle of arranging all the elements of a visualization on a screen and is very easy to use. Plus, it comes up with sensible options for buttons, forms, and the like. Since the contest it has evolved faster than a pokémon, for instance it is now possible to specify custom colors in a form and bootstrap will generate the appropriate css files. Google fonts are another great help as they are a very easy solution to choose fonts among a relatively large number of choices without relying on the fact that all the users have these fonts on their computer.

Wrapping it up

There’s a lot of other hacks in the code which you are welcome to explore, I admit I don’t remember them all because I took too much time to write this blog post after creating the entry (bad). However if there is one point you would like be to explain please ask in the comments.
I’m not entirely sure of what happened when I submitted the entry though. First it wasn’t listed with the others, then I got a message saying it hadn’t been reviewed, so it didn’t win anything, yet some time after the prizes have been handled it appeared in the “shortlisted” visualizations for the contest (which I found by accident). So whether or not it was good, I let you guys judge, at any rate it was fun making.