Selections in d3 – the long story

This past week, Scott Murray and I presented a tutorial at Strata on d3 (of all things!)
First things first, you probably want to get Scott’s book on the subject when it’s out. I should be translating it into French eventually.
You’re also welcome to the slides and examples of the tutorial which can be found on That include my d3 cheat sheet.

We had done a d3 workshop a few months back at Visweek with Jeff Heer. This time around, we changed our approach: we covered less ground, went at a slower pace, but targeted what is in our opinion the most troublesome aspects of learning d3: selecting, creating and removing elements.

I have learned d3 from deciphering script examples and in the earliest ones one ubiquitous construct was this sequence : select / selectAll / data / enter / append.
It does the work, so like everyone else I’ve copied it and reused very often. It happens to be the most proper way of adding new elements in most cases, but the point is, while learning d3, I (and many people before and after me) have copy/pasted it without understanding it deeply. Though, copy pasting something you don’t understand thoroughly is the best way to get errors you don’t understand any better, and it would prevent you from accessing the rest of the potential of the library. Conversely, once this is cleared, you can be “thinking in d3” and easily do many things you might have thought impossible before.

We did the tutorial hands-on, live coding most of the time. To follow through, I invite you to create or open an empty page with d3 loaded (such as this one – the link opens a new tab) and then open the “console” or “web developer tools” which allow you to type javascript statements directly, without having to write and load scripts. Here are the shortcuts to the console:

  • Chrome: Ctrl-J (windows), ⌥ ⌘+j (Mac)
  • Firefox: Ctrl+Shift+k (windows), ⌥ ⌘+k (Mac)
  • Safari: Ctrl+Alt+c (windows), ⌥ ⌘+c (Mac)
  • IE9+: F12

To make the best of this tutorial, please type the examples. Some tutorials show you impressive stuff and show you step by step how to do it. That’s not one of them. I’ve sticked to very, very basic and mundane things. We’ll be only manipulating HTML elements such as paragraphs, which I assume you have seen earlier (plot twist: you are reading one at this very moment)
Some of the code snippets don’t work. That’s the idea! I think you can’t progress by merely copying code that works. It’s important that you try out code that looks reasonable but that doesn’t produce the expected result or that causes an error, but then understand why.

Adding simple stuff

Creating elements

Our empty page is, well, empty, so we are going to add stuff.
to create elements, we need the append method in d3, which takes as an argument the type of element that needs to be created, while the html method at the end allow us to specify a text.

so let’s go ahead and type:

d3.append("h1").html("My beautiful text")

and see what happens.

what do we get? and why is that?
In d3, every element which is created cannot appear out of thin air, and must be added to a container. If we don’t specify a container element, we just can’t create anything.
In HTML, most elements can be containers, that is, it’s usually possible to add elements to almost everything. Then again, our template is fairly empty, so we can select the tag and take it from there."body").append("h1").html("My beautiful text")

we’re in business! as long as there is a sensible place to put them, you can create as much stuff as you like. Since we’re on a roll, why won’t we throw in a few paragraphs (p element in HTML):"body").append("p").html("Look at me, I'm a paragraph.")"body").append("p").html("And I'm another paragraph!")"body").append("p").html("Woohoo! number 3 baby")

and lo and behold, all our paragraphs appear in sequence. Simply beautiful.
But wait! paragraphs are containers, too. Why don’t we try to add a span element to one paragraph? For those of you with no HTML knowledge, span elements are like paragraphs, except there is no line break by default at the end.

So let’s try this:"p").append("span").html("and I'm a span!")

Before typing it, take a minute to think where you expect it to go.
Then go ahead and type it.

you may have guessed that our new bit of text could go on a line of its own at the end of the document, or at the end of the last paragraph. But instead, it goes at the end of the first paragraph.
Why is that? well, our select method stops the first instance of whatever it tries to find. In our case, since we asked it to find paragraphs – p, it stopped at the first p element it found, and added the span at the end of it (append).

Beyond creating new things

adding new elements to a page programmatically is kind of useful, but if d3 stopped at that you probably wouldn’t be so interested in this tutorial to begin with. You can also modify and manipulate elements. We’ve done that to some extent with the html method. But we can also modify the style of the elements, their attributes and their properties. For the time being, don’t bother too much about the difference between these three things. Style refers to the appearance of elements, attributes, to their structure, and properties, to what can be changed in realtime, like values in a form. But again, let’s not worry about that for now and let’s just follow along. Look at this code snippet:"p").style("color","red")

this will select the first paragraph and change its style, so that the text color is changed to red.
But wait! our first paragraph, isn’t that the one with a span at the end of it? What will happen to that bit of text? Well, type the statement to find out.
All the paragraph, including its children (that is, everything added to it, in our case the span) is turned to red."span").style("color","blue")

That singles out our span and writes it in blue. Can this be overturned?"p").style("color","red")

That won’t change a thing. Our first paragraph is, in fact, already red. But its child, the span, has a style which overrides that of its parent. To have it behave like the rest, we can remove its style like so:"span").style("color",null)


it will behave like its parent, the paragraph.
But let’s try something else:"span").style("color","blue")

we write our span in blue,"span").style("color","green")

and now back in green, like its parent."p").style("color","red")

What will happen?
well, the paragraph turns red, but the span doesn’t. It’s still following its specific instruction to be written in green.

That goes to illustrate that children behave like their parents, unless they are given specific instructions.

For HTML elements, we can play with styles, not so much with attributes or properties. One thing worth noting though is that an element can be given a class or an id.

Classes and ids can be used to style elements using a cascading style sheet (CSS). Knowing how CSS works is entirely facultative in learning d3, since d3 by itself can take care of all styling needs. Though, knowing basic CSS is not the most useless of endeavors, and some sensible CSS statements can save a lot of tedious manipulation in d3.
The other use of classes and ids is that they can be used to select elements.

Let’s reload our page so we start from scratch."body").append("p").html("First paragraph");"body").append("p").html("Second paragraph").attr("class","p2");"body").append("p").html("Third paragraph").attr("id","p3");

without the use of classes and ids, it’s still possible to select and manipulate the 2nd or 3rd instance of an element, but it’s a chore. You have to use pseudo-classes like“p:nth-of-type(2)”) to select the 2nd instance of a paragraph, for instance.
Personally, I’d rather avoid this and prefer using simpler statements. With classes and IDs set, we can write instead:".p2").html("I'm classy");"#p3").html("I've got ideas");

To select things of a given class, you must use a period before the name of the class. To select things of a certain id, you must use the hash sign.
Here, we are looking for the first element of the p2 class. This happens to be our 2nd paragraph. When you know you will have to manipulate elements which are not easily accessible, you may as well give them classes which will make this easier down the road.

In theory, there should only be one element of a given ID in one page, so I recommend not using them dynamically unless you can be 100% sure that there will not be duplicates. And, in case you were wandering, one element can have several (even many) classes.

Two birds, one stone

Introducing selectAll

So far, we’ve changed properties of one element at a time. The exception was when we changed the colors of both a paragraph and a span, but even then, we were still technically only changing the characteristics of one paragraph, which its child, the span, just happened to inherit.

For a complex document, that can be super tedious, especially since we’ve seen that it’s not easy to retrieve an element which is not the first of its kind.

so let’s go ahead and type:


(for a little variety. I mean, changing text color is so 1994.)
What was that? Everything turned to bold!

Indeed: while the select method returns the first element that matches the clause, selectAll matches them all.
Let’s do more.
We’re going to add a span to our first paragraph."p").append("span")
.html("I'm a rebel child.")

we’re adding a gratuitous styling command.
Now, let’s change the background color of all the paragraphs.


As could be expected, the span doesn’t change its background color, and so it appears differently from its parent (which could be a desired effect – this gives us flexibility).
but what if we wanted to change the background color of everything? can we do better?


(quite fitting in these times of papal conclave)

Well – everything gets a background color of “white smoke” (which is a fine background color btw.). Including the “body” element – that is, everything on the page!
selectAll(“*”) matches everything. With it, you can grab all the children, their children etc. (“descendants”. I know…) of a selection, or, if used directly like so: d3.selectAll(“*”), everything on the page.
So we’ve seen we can select moaar. But can we be finer? Can we select the paragraphs and the spans only, without touching the rest?

we sure can!

d3.selectAll("p, span").style("background-color","lawngreen")

The outcome of that one statement probably won’t make it to our web design portfolio, but it does the trick: you can select as much as you like, or as little as you like.

Nested selections

To illustrate the next situation, let’s add a span to our document."body").append("span").html("select me if you can")

Well, just like there is a way to select directly the 2nd paragraph using pseudo classes, there’s also a (complicated) way to select directly that last span (namely: selectAll(“span:not(p)”) )
there’s also a simpler way which is what we’re interested in.
let’s suppose we want to turn it to bold:
we can just do


then change the first one:"p").select("span").style("font-weight",null);

Admittedly, the complicated way is more compact. But conceptually, the “simple” way is easier to follow: we can do a selection, and within that selection perform a newer selection, and so on and so forth. That way, we can get away with just using super simple selectors, as opposed to master the intricacies of CSS3 syntax. Do it for the people who will read your source code 🙂

At this point:

  • You know how to dynamically create content. Pretty cool!
  • More! you can dynamically change every property of every element of the page. woot!
  • Bonus! you’re equipped with tactics to easily reach any element you want to change.

You should also have a good grasp of, d3.selectAll and the difference between the two.
what more could you possibly want? Well, since this is about data visualization, how about a way to tie our elements to data? This is what d3 is really about.

Putting the data in data visualization

Introducing data: passing values to many elements at once

So far, we’ve entered “hard coded” values for all of our variables. That’s fine, but we can’t really set our elements one by one. I mean, we could, but it’s no way to “industrialize” the way elements are created.
Fortunately, d3 provides. Its more interesting characteristic is the ability to “bind” elements with data.

If you’ve followed the instructions step by step, you should have 3 paragraphs in the page. Plus a span afterwards, but whatever.
Let’s introduce the data method. This will match an array of values to a selection of elements in the page. Let’s go:

var fs=["10px","20px","30px"];
d3.selectAll("p").data(fs).style("font-size",function(d) {return d;})

wow wow wow what just happened?
First, we create an array of values which we intelligently call fs (for font size).
Then, right after the selectAll(“p”) which gathers a selection of elements (3 “p” elements to be exact), we specify a dataset using the data method.
It just happens that our dataset has just the same number of items as our selection of elements!

finally, we use style, like we used to, with a twist: instead of providing one fixed value, which would affect our 3 p elements in the same way, we specify a function.
This function will parse the dataset, and for each element, it will return the result of an operation in the corresponding data point: the result of the function on the first item for the first p element, the result on the 2nd item for our 2nd paragraph, and lastly the result on the last item for our last paragraph.
We write the function with an argument: d. What is d? it’s nothing but a convention. We can call it anything. d is standard fare in d3 code because that’s the writing style of Mike Bostock, the author of the framework and of many of its examples.
This function is nothing special, it returns the element itself, so we are passing “10px” for the font-size of our first paragraph, and so on and so forth (20px, 30px).
As an aside, we can use the String function, which converts any element into a string, instead of writing function(d) {return d;}. So:


would also work and is shorter to write.

Let’s recap what just happened here, because this is important.
We want to apply a dynamic transformation to a bunch of existing elements, as opposed to finding a way to select each individual element, and passing it a hard-coded value.
What’s more, we want to apply a transformation of the same nature, but of a different magnitude, on each of these items.

How to proceed?
well, first we create an array of values. That’s our fs boy over there.

var fs=["10px","20px","30px"];

Then, we will first select all of the elements we want to modify, then we’ll tie our dataset to that selection. This is what selectAll, then data does.

var selection=d3.selectAll("p").data(fs);

By the way, I’ve stored the result of the selectAll then data in a variable. In the original example, I just “chained” the methods, that is, I followed each method by a period and another one. The two syntaxes are equivalent. Chaining works, because each of these methods returns a value which is itself a selection on which further operations can be done. This syntax works well through most of d3 with some exceptions which will be duly noted.

Then, we are going to change the style of the selection, using a function on our data."font-size",function(d) {return d;})


That function will run on each value of our dataset, and return one result per value, which will be passed to all elements in sequence.

At this stage you may have two questions:

  • Can we use more sophisticated functions, because this one is kind of meh?
  • What happens if there is not the same number of items in the dataset and of elements?

The second question is actually more complicated than the first, but we’ll answer it in painstaking detail.
So let’s take care of the question on functions first.
Yes, obviously, we can use the function not just to return the element, but to do any kind of calculation that a language such as javascript is capable of, which is nearly everything.
To illustrate that, here are some variations of our initial code which will return the same result, but with a different form.

var fs=[10,20,30]; // no more px
d3.selectAll("p").data(fs).style("font-size",function(d) {return d+"px";})

Here, instead of returning just the element, we append “px” at its end. Sadly, style(“font-size”,10) doesn’t work, but style(“font-size”,10+”px”) – which is the same as style(“font-size”,”10px”) is valid.

Here is yet another way.

d3.selectAll("p").style("font-size",function(d,i) {return 10*(i+1)+"px";})

function(d,i) ? what is this devilry?
Here, i (or anything we want to call it, as long as it’s the 2nd argument of this function) represents the order of the element in the selection, so the first gets a 0, the second a 1, etc. (well, in our example it goes to 3 elements, so the last one gets a 2).
This may be a bit abstract to say here, but even if we haven’t passed data, this would still work – i represent the order of the element, not the data item. so, if no data had been passed, within this function call, d would be undefined, but i would still be equal to 0,1,2, …

The answer to the second question is the last great mystery of d3. Once you get this, you’re golden.

Creating or removing the right number of elements depending on data

Before we get further, let’s quickly introduce append’s reckless cousin, remove(). Writing remove at the end of a selection deletes all the corresponding elements from the document object model.


would remove our 3 paragraphs. Let’s do it and get rid of our paragraphs.
Actually, let’s do"body").selectAll("*").remove()

and remove everything below the body.

Now, earlier, we were alluding to what could happen if we didn’t have the same number of elements as of items in our dataset.

That means that we should be able to do the following:

  • If there are fewer elements than items in a dataset, create the missing elements
  • If there are fewer elements than items in a dataset, disregard the extra data items
  • If there are more elements than items in a dataset, remove the extra elements
  • If there are more elements than items in a dataset, don’t change the extra elements/li>
  • As data are updated, keep some elements, remove some, add some

Why would we want to do all of this?
The first case is the most common. When we start a data visualization script, chances are that there are no elements yet but there is data, so you’ll want to add elements based on the data.
Then, if you have interaction or animation, your dataset may be updated, and depending on what you intend to do you may just want to update the existing elements, create new ones, remove old ones, etc. That’s when you may want to do 2, 3 or 4.
The last (5th case) is more complicated, but don’t worry, we’ve got you covered.

Right now, we should have 0 p elements on our page (and if for some reason this is not the case, feel free to reload it).

let’s create a variable like so:

var text=["first paragraph","second paragraph","third paragraph"];

somewhat uninspired, I know, but let’s keep typing to a minimum, if you want to go all lyrical please go ahead.

We are smack in case 1: we’d like to create 3 paragraphs, we have 3 items in our dataset, but 0 elements yet.
Here’s what we’ll type:"body").selectAll("p").data(text).enter().append("p").html(String)

A-ha! we meet again, select selectAll data enter append.
After all we’ve done, select selectAll should make some sense, even though, at this stage, this selection returns 0 p elements. There are none yet.
Then we pass data as we’ve done before. Note that there are 3 items in our dataset.

Then, we use the enter() statement. What it does is that it prepares one new element for every unmatched data item. We’ll expand a bit later on the true meaning of unmatched, but for the time being, let’s focus on the difference. We have 0 elements, but 3 data items. 3 – 0 = 3, so the enter() selection will prepare 3 new elements.
What does prepare means? the elements are not created yet at this stage, but they will with the next command. Right after enter(), think of what’s created as placeholders for future element (Scott’s vocabulary), or buds that will eventually blossom into full-fledge elements (mine).
After enter(), we specify an append(“p”) command. Previously, when we had used the append method, we created one element at a time. But in this case, we are going to create as many as there are placeholders returned by enter(). So, in our case, 3.
You may legitimately wonder why we needed a select statement to begin with – after all, enter() works on the difference between selectAll and data. But when we are going to append elements, we will need to create them somewhere, to build them upon a container. This is what the first select does. Omit it, and you’ll have an error, because the system will be asked to create something without knowing where.
The final method, html, will populate our paragraphs with text. The String function, which we have already seen, simply returns the content of each item in our dataset.

We’re using select > selectAll > data > enter > append, but hopefully you will see why (and if you don’t, hang on to the end of the article, and feel free to ask questions).

But let’s recap once more. Actually, let’s see the many ways to get this wrong (or, surprisingly, right)


We’ve alluded to that: without a container to put them in, p elements can’t be created. This will result in a DOM error."body").selectAll("p").data(text).append("p").html(String)

No enter statement. After the selectAll, the selection has 0 items. This doesn’t change after the data method. As such, append creates 0 new elements, and nothing changes in the document. (but no error though)"body").data(text).selectAll("p").enter().append("p").html(String)

In many cases in d3, it’s ok to switch the order of chained methods, but that’s not true here. selectAll must come before data. We bind data to elements. The other way round would have made sense, but that’s the way it is. First selectAll, then data. Here, we get an error, because enter() can’t be fired directly from selectAll."body").selectAll("wootwoot")

This actually works. Why?
There are actually 0 elements of type “wootwoot” in our document, which may or may not surprise you. There are still 3 items in the dataset, so enter() returns space for 3 new elements. the next append subsequently creates 3 p elements, which are populated by the html method.
It usually makes more sense to use the same selector in the selectAll and the append methods, but that’s not always the case. Sometimes, you will be selecting elements of a specific class, but in an append method, you have to specify the name of an element, not any selector. So you’d go"body").selectAll(".myClass")

Now that we’ve seen a few variations on the subject, here is a really cool use of enter. Check this out:"body").selectAll("h1").data([{}]).enter().insert("h1").html("My title")

ok there are 3 things here worth mentioning. 2 are just for show, though it doesn’t hurt to know them, but the 3rd one is really neat and useful.
In data, we’ve passed: [{}]. This is an array of one object which is empty. There are two interesting things with that construct, one is that there’s only one element, the other one is that it’s an object. When you pass objects, the functions you run on them (like in the attr or style methods) can be used to add properties to them or change them. If that doesn’t make sense yet, just accept for now that it gives you more flexibility than using, say, [0].
We’ve used insert instead of append. What this means is that we’re adding things before the first child of our container, not at the end (ie after the last child). In other words, our h1 (a title) will go at the top of the body element – fitting.

But what’s really interesting is what would happen if you were to run that statement again – nothing. try it. See?
Why is that? Well, on your first go, at a point where there are no h1 elements yet, it works the standard way – you do a selectAll that returns nothing, you bind a dataset with more elements, then enter prepares space for the unmatched elements – 1 in our case – and then append creates that element. You may notice that the html part doesn’t use the data.
When you run it again, the selectAll finds one h1 element, there’s still one item in the dataset, so enter won’t find any unmatched element, so the subsequent append is ignored.

So, you can run this kind of thing in a loop safely, it will only do what it’s supposed to do on the first go, it will be ignored afterwards. Don’t be afraid to use this construct for all the unique parts of your visualization, so you won’t have to worry about creating them multiple times.

Other cases of mismatch between data items and elements

All right, so now we have 3 p elements and 3 items in our dataset.
What happens if we do this:

text2=["hello world"]


There is now one item in the data set, versus 3 p elements. Try to make a guess before you type this in. At the tutorial, the audience made a few reasonable guesses, namely: the last 2 paragraphs will be removed, only “hello world” will remain. Or: all paragraphs will be changed to “hello world”.
Either could happen if d3 was trying to be smart and guess your intent. Fortunately, d3 is no excel here and behaves consistently even if that means extra work for you. When you do that (and please try this now) what happens is that the first paragraph of text is changed and the other two are untouched.

We are in the case, change the matched elements, ignore the others.

By the way, by now you should be able to guess what would have happened if there had been an enter() right after the data. Do I hear… nothing? almost! There would be no unmatched data element, so enter() would not return anything. Besides, enter() would require an append afterwards to make anything. This is why you’ll get an error: html can’t work directly after enter(). you would need an append.

Now what if we want to remove the extra 2 elements? This is where the exit() method comes into play.
exit() is pretty much to enter() what remove() is to append(). Kind of.

let’s see how this work by example.

let’s recreate our 3 p paragraphs just in case:


Now we pass the new dataset:


– remember that only the first paragraph has changed, the other two are untouched.
Now, while all the items in the dataset are matched with elements, there are elements which are not matched with an item in the dataset: the last two. This is where exit() comes into play. exit() will select those two paragraphs, so they can be manipulated. Typically, what happens then is a remove(), but you could think of other options.


That will flag them instead of removing them.
But typically, you do:



note that even though you have already matched a one item dataset to that selection, to use exit(), you will need to use data before. selectAll(“p”).exit() won’t work. You’ll have to re-specify the data match.

So that takes care of the case when you want to remove extraneous data items.
This leaves us with only one simple case: where you have more items in your dataset than you have elements and you don’t want to create elements for the extra data items.
That’s the simplest syntax, really.

Here, for instance, we have only one paragraph left, but there are 3 items in the text variable.
so let’s do:


(no enter, no exit, no append).
The paragraph text will now come from the new dataset (from its first item to be precise), no extra paragraphs will be created, none will be deleted.

Data joins

the last case (pass a new dataset, create new elements as needed, make some elements stay and make some elements go) requires more complexity and actually I won’t cover it in detail here, instead I will explain the principle and refer you to this tutorial on object constancy by Mike Bostock.
In the general case, when you try to match your dataset to your elements, you count them and deal with the difference. So you have 5 data items and 3 elements: you can make 2 extra elements appear by using enter. With the concept of data joins, you can assign precisely each data item to one given element, so the first data item doesn’t have to be that of the first element, etc. Well, the first time it will be, and each element will receive a key, a unique identifier from the dataset. If the dataset is subsequently updated, the element will only be matched if there is an item in the dataset with the same key. Else, it will be found by an exit() method.

And that’s the general gist of it.
At Strata, we went further – we discussed interaction and transition, but that is downward trivial once you have understood – and by that, really understood, with all the implications and nuances – the selections.


d3 tutorial at visWeek 2012

Jeff Heer, Scott Murray and myself have done a d3 tutorial at visWeek 2012. You probably gathered that from the title of the post.

Here is a link to all the slides and code examples that we have presented:

d3 tutorial

For the purpose of the tutorial I have compiled a d3 cheat sheet, on 4 pages it groups some of the most common d3 functions. When I was learning d3 my number one problem was figuring out which property should be set using .attr, and which required .style. And also: which svg element support which property? All of this is addressed in the cheat sheet. It’s part of the link above, but if you want it directly without downloading a 13Mb file, here it is:

d3 cheat sheet


Getting to “Hello world” with d3

Back when I started learning programming, it was always fairly simple to achieve the canonical first step of accomplishments, that is, to get the system to announce that you are ready to do more by displaying “hello world” on the screen.

In most systems then, there was a command prompt somewhere that would usually do that when you would type, say:

PRINT "hello world"


Things have changed a lot since the early 80s. In some fields like fashion, I would argue it’s a good thing, but we’re definitely not going in the way of less complexity.

Now if you’re interested in web-oriented visualization and want to do it with d3.js, it’s still fairly simple, but it is built upon a number of technologies that you’re supposed to know a little. Front-end developers live and breathe the web and have been exposed to all things javascript, HTML, CSS, you name it, in enormous doses. Many developers probably have, at some point, tried to interface with the web and know enough of that to get started. So for this crowd, the amount of things you need to know to crack d3 code seems negligible, because they know all that and they are very familiar with it, just as well as people knew the first names of Friends characters by the end of the tenth season.

But what about those who didn’t? and the people who don’t see themselves as developers ? do they have to reimmerse themselves in 10 -odd years of web development history to get started? It turns out that this sum of knowledge, while not insurmountable, is certainly not trivial.

So without further ado, let’s get started

We’re cooking an omelette

And when we do, we need a few things: a pan, a recipe, eggs and stuff, a stove and then plates, knives and forks, etc.

The pan: a text editor

The first thing is really the pan. If you don’t have one when cooking eggs, you borrow one or go buy one. In our analogy the pan is the text editor. This is the tool with which you are going to make the files that will constitute your visualization.

There was a time when it was ok to use notepad (textedit if you are of the apple persuasion). And it’s still possible, but you are not making your life easier. What I recommend instead is that you get a hold of a copy of SublimeText2. ( There are windows versions. And Mac versions. And linux versions. For windows users, there is a mobile version so you don’t need administrator access to install it. There is a free, unlimited evaluation version,  but unless you can’t spend $69, I strongly recommend that you buy it. Sublime Text 2 has a nearly infinite amount of niceties built in. And unlike some other powerful text editors, where the best features are only understandable by the tech masters, what’s really nice about Sublime Text 2 is that it would make you gain time even if you are an absolute beginner. One such nice things that it does is detect what language you are working with, automatically color and format the words as you type them depending on the category they fall in, and when possible, suggesting the word you are trying to type, automatically format and indent your code, all in a very unobtrusive and pleasant way. This will really help you troubleshoot problems like strings not closed properly or loose closing bracket which typically consume a lot of time.

Let’s type a fairly common d3 statement to see how SublimeText2 can help. First, it recognized the var keyword as such and writes it in italics and cyan. Second, when I type my opening parenthesis, it adds a closing one, and as long as my cursor touches either it underlines them both.

Let’s carry on. The function keyword is highlighted in italics cyan too – useful. The opening/closing thing works for curly braces too.

The return statement is highlighted in red. With the cursor on the closing parenthesis, we are starting to get a feel that the underlining function is a useful safety net

New line. Joy! the indentation is aligned with the line above.

We now have four consecutive opening or closing curly braces and parentheses. Typically, this is where errors sneak in, and where sublime text 2 really shines.

And we now have 5 consecutive closing curly braces and parentheses. This is fairly common in d3 code. Is the order correct? Thank you Sublime Text 2!

we finish up writing the statement.

When moving the cursor to the left side, where the line numbers are, we notice down-pointing arrows. We know our code is correct, and we don’t want to see it again, so…

we just click on the top one to collapse this section. If we need to edit it again we can expand it.

Finally, we add a comment above. Notice the syntax highlighting, comments are colored with an unobtrusive dark grey.

The recipe: a basic file structure

In d3, you can’t really type a “print” command from a prompt. You need to write some files, which are loaded by a browser (that’s your “plate” in the metaphor, but let’s not get ahead of ourselves).

You are going to need up to 5 types of files.

First, an html file. This will be the file that your browser will read, either locally, or uploaded on a website. We’ll get to cover this in detail in a minute.

Second, believe it or not, you are going to need the d3 library, which is also a file. You may link to the version on the site, and so not worry about having the actual file handy. That has advantages (like the one we just said, also, you’re pretty sure to always have the latest version on hand), and two problems. First, you always need to have a live internet connection, so there’s no working in the park outside of free wifi space (for example), and also, it will probably be slower than having the file locally or on your own web space. And if having your own web server seems kind of scary, I’ll show you in a short while that it’s not.

The three next kind of files are optional, but hey.

The third file is a javascript .js file which would be where you put your code. Some people would rather put all their code in the html file, which is an option, especially for short programs. Personally, I prefer having a separate file. So to make d3 work, you need some script, but it doesn’t have to be in a separate file.

The fourth file is a style sheet, or css file. This can be used to define some formatting options, for instance to make all your circles blue by default, or some circles that meet some pre-defined criterion. Like the javascript file, any style information can be contained within the html file, but unlike the script, it is completely optional. I also like to keep it separate from the html.

Finally, you may want a data file, you know, with data (csv, txt, json, xml…). If you have lots of data to visualize, it’s easier to keep it in separate files than in variables within the script. But it doesn’t have to be that way. And you could also use d3 without data.

The ingredients: contents of the files

The HTML file

So let’s see how this articulates by looking at a typical d3 html file. I am using templates which I try to change as little as possible from project to project.

<!DOCTYPE html>
   <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
   <title>My project</title>
   <script type="text/javascript" src="../d3.v2.js"></script>
   <link href="style.css" rel="stylesheet">
   <div id="chart">
   <script type="text/javascript" src="script.js"></script>

Well. That is certainly longer than the BASIC one-liner (and we haven’t even printed “hello world” yet).

Let’s take this piece by piece.

The first line is a doctype declaration. What this does is that it tells your browser that what follows should be interpreted as standard, HTML5-compliant HTML (standards mode). If you omit the doctype documentation, your browser will read the html in “quirks mode“, i.e. by replicating the non-standard behavior of Nescape 4 or IE5. You can still try to run d3 under quirks mode, but don’t be surprised if your HTML doesn’t behave as expected.

The doctype declaration doesn’t have to be more complicated than <!DOCTYPE html>.

The second line opens the html document proper. Technically, it’s ok to omit <html>, <head> and <body> tags in HTML5. The document will still be considered valid by tools like the W3C validator. But it seems that some browsers, in some complex cases, don’t like that so much, and I as a person find it more convenient to find those tags when reading code.

The next line opens the header section of the document. Again, it’s not absolutely necessary, but I consider it helpful to explicitly differentiate the header from the rest of the document.

The next line, which goes

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

is not absolutely required either. It specifies the encoding of the page, that is, what kind of characters will be seen in the page. Since I use non-ascii characters often, being French and all, I make sure to use it all the time. After all, this is a template, not something I type from beginning to end each time.

Next, we specify a title. This is what will appear in the title area of your browser, or, more likely, as the name of your tab.

In the next line, we load the d3 library. This is my preferred syntax. This is how my files are set up:

I have a directory where all my d3 projects are, and in this directory, I am also keeping (and maintaining reasonably up-to-date) a version of the d3 library, a file called d3.v2.min.js. (min stands for minified, which means that it’s not meant to be read by persons, but it’s faster to load). All my projects proper are in folders within that directory. So my html files are one level down from where the d3.v2.min.js file is kept. This is why the src attribute reads “../d3.v2.min.js”: the ../ part means, look one level up. If the d3.v2.min.js file were on the same directory where I keep my html, I would write src=”d3.v2.min.js”, if I kept it within a specific directory like “d3″, I could write src=”../d3/d3.v2.min.js”, and finally, there is always the option of getting it from the website, src=””.

I don’t have to load the d3 library then. I could have done it at the end of the page. The only requirement is that it should be before the script that will use it. But honestly, the file is so small that it doesn’t make much of a difference (9ms on my machine).

Next, I link to a style sheet. With this syntax, I am assuming that my style is specified in a file called style.css which will be in the same directory as this html page. And if there is no such file, it’s not a problem. It doesn’t prevent the page to load.

Instead of using this syntax, I could have written:


... // my style definitions


in the html file. And frankly, it is sometimes more convenient. But again, for the general case, it’s just as well to leave it like this.

Note that style information should always be in the header part of the file.

And that concludes the header, as noted by the closing tag </head>. Even if we use the <head> tag to mark the beginning of the header section, we may omit the closing tag </head>, and still get away with a valid (and slightly shorter) document, but I keep it for clarity’s sake.

The next part starts with <body>, and is where the content proper, which will get displayed on the screen, is described. <body> and </body>, just like <html> and <head>, are not mandatory, but do help, somewhat, to make the document easier to read.

So what do we find in the body section? Here, I’ve kept it very simple but also close to the conventions I use.

There is one <div> element, which is the basic building block of HTML, and with an id attribute – a document-wide, unique identifier – called “chart”.

Then, there is the <script> element, which is calling the javascript code we are going to use to create our visualization. It’s at the very bottom of the page, actually just before the closing tags (which, again, could be omitted, but let’s not).

Like for the style element, it is possible to leave the script inside the html document. Instead of using a src attribute – which, incidentally, assumes that the script is within the same directory as the html document with this syntax -, we can write:


// all our javascript instructions


And that’s it for the html document! A final word about the contents of the <body> element. In most of my projects, there is an interface such as buttons or controls which is also done in HTML. In that case, the contents of the <body> element get more complex. I would add a button to tweet the page, copyright notices, and other stuff. But I almost always have a <div> element with an identifier named “chart”.

ok, so now that you’re finished with writing your html file, you must save it under any name and use the “.html” extension (or .htm, but why no love for the l? why?)

The javascript file

In this section I will walk you through a very, very basic file, which includes things I do for every project.

var w=960,h=500,"#chart")

var text=svg
.text("hello world")

I like to define variables that describe the width and length of the visualization that I am creating. By putting these in variables, at the beginning of the file, I can easily modify them in case I need to. 960 and 500 work well for visualizations that should appear on their own page, by the way. No scrolling should be necessary.

The next statement use the construct. Here, it indicates that we are going to build something on top  of the element that meets the criterion that is described between the parentheses. The syntax used by that is that of css selectors, but long story short, #chart refers to whatever has an “id” attribute of “chart”. This is our lone <div> element in the html file. Then, we are going to add an svg element, which is what will hold the visualization proper in svg form, and give it a width of w and a height of h.

I always use that syntax, an “svg” variable that holds the top-level svg container, which resides in a <div> element which has an id of “chart”.

The final part of the file writes, finally, hello world proper. Note that I specify a y attribute (vertical position) else the text have its lower-left corner in the top-left corner of the browser window and will be effectively invisible.

Now, the HTML file we just created expect this file to be called “script.js”, so let’s save it under this name.

In this most simple example, we will not need a css file nor a data file. But, for the sake of discussion, let’s create a css file nonetheless.


and let’s save this under style.css (the name that, again, our HTML file expects). What this does is that it changes the size of the font to whatever the default was to a more massive 36 pixels.

The stove: a web server

As far as writing hello world, we’re done. You can load the html file you created in a browser, you should see the encouraging inscription. Congratulations!

Many visualizations can be seen in a browser directly, just by opening a local file. However, this won’t be the case for some, for instance, those who require external data. In that case, you need a web server. If you have web hosting, you may upload the files to your (remote) server, via FTP for instance, and see your visualization by typing the address of your site in the browser url bar. That said, it is a good idea to have a local web server, that is, one that runs on your computer, so you can view your files as if they were served by a web server, but with the added bonus that you can edit them and see the modifications directly without having to upload them each time you change them.

On Macs, you’re pretty much all set. All you have to do is enable web-sharing in your system preferences. Then, http://localhost/~YOURNAME will point to /Users/YOURNAME/Sites where YOURNAME is your user name. Just put your files there and go at it.

For windows, there are a bunch of solutions. The “Professional” versions of windows include the IIS web server, so, there. But beyond that, there is a lot of web server software available. I personally use EasyPHP. EasyPHP comes up with a web server (Apache), a mySql database, a PHP preprocessor and other niceties. And, as an aside, it doesn’t require administrator rights, for you corporate users.

EasyPHP installation is a breeze. When it’s on, by default, http://localhost/ points to the www/ directory in the install directory of EasyPHP, so you may want to install it in a place that suits you. Alternatively, you can create aliases in the admin panel of EasyPHP (http://localhost/home/index.php), in other words to give a name to any part of your hard drive. This is what I do, I put all my projects there and have a shortcut to that name in my browser, so whenever I want to see a project I use that shortcut and I can see the visualization as if it were on the web.

This is how you create aliases in EasyPHP.

The plate: a browser

We’ve talked browsers before, and chances are you have one (or several) on your computer.

Now I wish that by browsers, we could just skip it and mean “the latest version of chrome”, but it turns out that there are slight differences in the way that browsers handle d3 code so you should really test your work in at least chrome and firefox. As of this writing, Chrome + Firefox (version 5 and up) represent just under 50% of the browser market share. If you add all browsers that are d3-capable (Safari, earlier versions of Firefox, Opera, IE9) you reach about 75% of the market. Sadly, IE8 and IE7 which account for slightly over 20% of the market are not d3-compatible, though they can use the Google ChromeFrame free plug-in and do pretty much all that chrome does.

Knives and forks: the console

At the beginning of my dad’s engineering career, code came on a punch card. People then, allegedly, thought it through. You didn’t want to be the kid who didn’t follow your algorithm carefully enough to forecast an avoidable bug and waste a perfectly good card and oh-so precious computing time.

But now? no code is perfect by the time it hits the browser. You may want to launch incomplete code to get a feel for where you’re going. You may not be too sure of whether that should be a plus or a minus in that equation and just try either because it would be quicker to correct an unexpected outcome than to troubleshoot the formula on paper. You may want to iterate, to bring newer, more complex ideas to your visualization with each change to the code. Or just try out different aesthetic options.

Not too long ago, debugging javascript was really a pain. You’d have to fire those annoying alert boxes to understand what was the value of the variables, and dispatch them manually. Fortunately, that time is gone and now is the age of the Console.

There are console functionalities for Chrome, Firefox and Safari, and while the interface slightly varies, the idea is the same. The console allows you to do three main things:

– first, to see if your code executed without errors or warning. Some of those messages can be generated by javascript, and some can be added by you if certain unfortunate conditions are met. You get the position of the error in your code, which helps you to understand what went wrong and fix it.

I have planted an error at the end of the code and it’s been picked up by the console which explains what’s wrong and when. Notice the red cross in the lower-right corner which counts errors. If there were warning, they would be indicated by a yellow triangle.

– second, to inspect elements, that is to find out all the information about the elements displayed on screen, even if (especially if) they have been generated at runtime. So you can see if those elements you really wanted to create have been indeed added, and if the right attributes have been passed.  third, to interact with the code after it’s run (or while it runs, if you manage to pause if with breakpoints). The most common use of this is, IMO, is to check the value of variables, which you can do simply by typing their name at the console command prompt. But you can also type in one-liner javascript statements, even if they are quite complicated. So it’s a way to test your code before you write it in your script file.

What a relief! all those paths elements that were supposed to be created in the code have been added as expected.

– third, it can be used to interact with the code after it’s run (or even during run-time, because you can pause the code with breakpoints using the console, but we won’t go into that). The most common use for that IMO is to check the value of variables, which can be changed during the code execution, but it can also be used to enter one-liner statements, which can be quite complicated. Such a use allows you to test and preview code hypothesis before you write it down in your script file, or to troubleshoot a problem that you could have difficulties seeing outside of the context.

Here, I am using the console to check the value of one variable, and to enter a statement that turns all the shapes orange.

Voilà! the last thing you need when you cook food is people to share it with, same goes for visualizations!


animations and transitions

That post originally appeared on, I’m reproducing it here for clarity and ease of retrieval

In interactive visualisation, there is the word reactive. Well, maybe not literally, but close enough.
The fact is that reactivity, or the propension of a visualisation to respond to user actions, can really help engage the user in a visualisation, and help them understand its results. Both of which are usually good things. And how can this reactivity be achieved? Through animations.

So I’ll go ahead and state that animation, if done right, can make any interactive data visualization better..
How is that?

  • When coupled with interaction, it’s a very useful way to give feedback to the user. What has changed since their last command? If what’s on screen animates from one state to another, it’s obvious, it stands out and it makes sense. Or, when showing any form of real-time data, animation is pretty much required.
  • Animation can bring focus on the important things as a chart loads. Our vision is very sensitive to movement, so using these introduction transitions sensibly helps a lot to ease the effort required to get the right information off a chart.
    Compare these two charts:

    Which is better at getting the viewer’s attention on the last bar?
    [side note on examples: they all use the same model. Click on the button to start an animation. If there is nothing on the chart, clicking the button will make something appear.]
  • animation works well with metaphors, like growing, expanding, moving, dwindling, etc. so it can really enhance the expressiveness of a visualization that tries to convey any of these ideas (those and many others)

That’s said, animation can definitely ruin your visualization, too. Here are three general problems.

  • Animation is very prominent. That can be good to call attention to a specific, unambiguous part of your chart. But what happens when there is too much animation? without other cues it gets difficult for a viewer to determine where to focus their attention.
  • Animation across many states (like a video of animated data) make them difficult to compare to one another, as opposed to showing still images of various states side by side. (see for more on this.
  • If the animation is not continuous, if the chart is somehow wiped out during it, this caused change blindness which pretty much negates any benefit you may have hoped to reap from animation.
    Look at this example.

    When animated, the line goes through a blank state which makes is close to impossible to track changes between the original and final state. The only way to detect change is to focus on one given point and memorize its original position, but this is very ineffective.

Now how to do it?

So we’ve seen how animation is helpful in data visualization. Now, let’s do it!
For this purpose, let’s use d3. d3 has many, many possibilities when it comes to data animation which are relatively painless to implement.

The principle

If you know how to draw in d3, you almost know how to animate. (and if you don’t know yet, Alignedleft has a splendid collection of tutorials to get you started, and the d3 site lists more including some by yours truly.)
Animations are called transitions in d3 for a reason. A technical definition of animation can be that over a certain lapse of time, one or more characteristics of an object would transition from one value to another.

And what do we mean by characteristics? Well, just about anything that can be expressed numerically.

A few examples of transitions

Unsuprisingly, when you update the position of an item smoothly over time, it moves. In svg, position is determined for most shapes, such as our blue rectangle here, by the x and y attributes, which correspond to the top-left corner of the shape. For circles, you use cx and cy, or the coordinates of the center. For paths, such as our red triangle, you actually specify the position of all of the points in the “d” attribute.

Likewise, when you change size, your object grows (or shrinks!). You can use width and height for shapes like rectangles, or r for circles.

Color is really a numerical attribute too, and it’s indeed possible (and very useful) to transition from one color to another. In svg, color is a style attribute that is defined by fill or stroke.

Not unlike color, it’s very useful to be able to vary opacity. When opacity is set to 0, the corresponding object is completely transparent. So transitioning on opacity is very useful to make objects fade in or out.

How this is done

Now that we’ve seen what transitions can do, let’s see how to code this in d3.
Let’s go back to our first example. In fact, let’s make it even simpler.

To create a square like this in d3, we would write something like:

var mySquare=svg.append("rect")

4 attributes. Simple enough.
so if we want to make it move to the right, we are going to update the x attribute. That’s how we do it:


It’s that simple: use the transition method, then specify all you want to see changed just as if you were creating a new item. And using that one principle, we can easily reproduce any of the above examples.

  .attr("width",120); // will make it bigger

  .style("fill","white") // if the fill is originally left blank and comes
                         //  from a style sheet, it will start as black 


Now, in our simple examples, this is not exactly what happens. The transitions occur after an event, namely, when the user clicks on the button. And indeed, transitions are most useful when linked to events and interaction. But this doesn’t add a whole new layer of complexity.
We can just write:

button.on("click", function() {

And now, our animation only starts when the button is clicked. Obviously, since the transition is within a function, we could even determine where the square should go programmatically, but let’s keep it simple for the examples.

Animation 102

So far, we’ve seen how we can do simple animations in d3 and even throw in a little interaction. We’ve seen that it’s really as simple as creating elements in the same place. But here are some good news. Transitions in d3 are extremely versatile and can be customized with a lot of finesse without getting overly complex to write. It’s more a matter of knowing what to do.

After using the transition() method, it’s possible to specify a value for duration and delay. Duration is the number of milliseconds the transition will last, while delay is the number of milliseconds the system will wait before launching it.
The syntax is:

  .duration(1000) // this is 1s
  .delay(100)     // this is 0.1s

The default is a 250ms duration, and no delay.
I find 250ms to be a bit harsh. In most cases, transitions should be noticeable, so I oftentimes find myself increasing the duration to 500 or 1000. But unless there is a very good reason for that, durations should not be too long. If you use them to support your data, you don’t want the transition to take center stage by having them take several seconds.
Consider the following two examples (which you’ll have to start with the button)

Isn’t the second one simply atrocious? You may find hard to believe that it only wasted 25 seconds of your time.

Easing is the technical name of the actual function that turns time into attribute changes. From the previous examples, you may have noticed that the values change slowly first, and then faster, then slowly at the end? Well, it turns out that you can use different functions to get different results. In my practice, I’ve only seen the use for the 3 displayed here although there are many others. And yes, you can write your own, although we are not going to cover this here.
The syntax is similar to the above:


(and by the way, the order in which you change attributes or specify animation parameters has no effect, so feel free to use .ease first then .attr).

For path objects, through transitions you can update the position of each point. This allows you to effectively turn one shape into another.
This can be especially interesting for line charts (or any chart which is a path)

Like this, if the values that you are plotting change, you can spot these changes very efficiently. If, instead, you just erase your chart redraw your data if would be very difficult to spot where the data has changed.
For both of these examples, the “d” attribute of the path is updated (so they are not intrinsically different from the simplest example).

Sometimes (and actually: often), you want to fire a transition right after another transition.
But in case you were wondering, the following doesn’t work:


You may think that this will move the square right, then down. But no: it will start to move the square right, then fire the second transition which will move it down. Since they have the same duration and no delay, what will happen is that only the second will have a visible effect.
If the second transition had a delay, smaller than the first transition’s duration, the first one will be in effect for a while until the delay expires. Then, the second transition will take over. However, chances are you don’t want to do that, because how much of the first transition will have been accomplished depends on the users machine, browser etc. and is therefore unpredictable.
So how about giving the second transition a delay which corresponds exactly to the duration of the first one? This will usually work, however, the delays and durations are not extremely accurate. Firing the transition proper takes a certain time (which is roughly 15ms on my machine and which may vary) so it is difficult to chain two transitions very precisely this way.
In more complex programs than our simplistic examples, sometimes, several events try to trigger transitions on the same object. When this happens, the first transaction is fired, and runs its course unless another transition starts. That second transition would interrupt, then replace the first one. What this means is that the attributes that were in the process of being changed by the first transition will remain as they were when the second transition starts, somewhere between their start and target value.
If you want to make sure that all your transitions update their attribute up to the value they are supposed to reach, you may want to re-specify the attributes of the first transition in subsequent ones, like so:

  .attr("x",320); // even if the first transition doesn't complete, 
                  // this one will and will update x to 320.

There is a more certain way to chain two transitions. With the following syntax, another event will start exactly at the end of a transition. That other event can be another transition (which is the case in the above example).



 .each("end", function() { ... });

here, what’s in the callback function on the last line, introduced by .each(“end”, will be fired exactly as the transition ends.

What can be done then? Here are 3 common scenarios.

(btw, if you’re wondering what’s the difference between this and the previous example, there is none – it’s just to save you some scrolling).
One possibility is to launch another transition on the same item. Here, the square moves right, then down.
Here’s how it’s done:

  .each("end",function() { // as seen above       // this is the object 
      transition()         // a new transition!
        .attr("y",180);    // we could have had another
                           // .each("end" construct here.

Another possibility is to delete the object after the transition has run its course. This is super useful, especially when you are creating a lot of temporary objects. An interesting combo is when you decrease opacity all the way to 0, making it invisible, then using remove() if you don’t need it anymore.

  .each("end",function() {       // so far, as above
      remove();            // we delete the object instead 

Finally, we can create a new object. That can be a nice way to add a special effect. Here’s an example:

Here, at the end of the transition, a circle is created, a transition is started on that circle, which decreases opacity to 0, then the circle is removed.

And here is a last example with several effects combined.

Going further

Believe it or not, we barely scratched the surface of what can be achieved with animations in d3.
There are two other uses of transition that we haven’t seen because they are slightly more complex, so I’ll just mention them here.
Up to now, we have always seen transitions based on the properties of one specific object. We make the x property of that one square vary from what it was to 200.
Sometimes, though, you want many parts of your visualization to be updated according to the changes in one variable.
That is possible, too, by using the .tween and .interpolate methods. All of this is explained in the d3 documentation.
Another possibility is the use of the d3 timer method, which allows to call a function repeatedly, which can also be used to create animation.

The point I was hoping to make was that it’s possible to do a lot with relatively simple code and technique if you know what you are trying to do. Especially, chaining transitions, particularly when adding and removing objects when appropriate, goes a long way in creating powerful effects.


Hollywood + data III: our info+beauty awards entry. Bonus: making of.

So Jen and I released our Info+beauty awards entry.

How did we end up with this?

it’s really cool working around movies, because it’s something we can relate to.

A part of my movie ticket stubs stash.

At first I wanted to do something out of keywords we could grab on the movies but  Jen came up with another idea I found more worth pursuing: working around the story types (which was the most interesting aspect of the curated contest dataset) and see if there was not some kind of grand truth we could unravel there. She also requested stars and glitter, because we were not going to work on this glamorous dataset with a tedious dashboard done in Excel.

That truth didn’t take so much time to find: the most frequently used story types (like comedy or movies with monsters) do not perform well in the box office while different story types (stories of teens growing up, or when the main character turns into something else), which are used less often, are much more profitable. So why doesn’t hollywood make more Junos and Black Swans and fewer College Road Trips or Dylan Dogs?

That’s the idea. Now the making.

Fair warning – the rest of this post is fairly technical. 

Making stars

If I had to contribute significantly to the project it had to be done in d3/svg.

Fortunately, it’s easy to generate star shapes in d3. Once you have the coordinates of where the points of one unitary star should be, you can easily make stars of any size with a function and a parameter.

var c1=Math.cos(.2*Math.PI),c2=Math.cos(.4*Math.PI),

    // ok the constant after r1 is the thickness of the branches.
    // 1 is a "straight" star, less is narrower, more is thicker.

    // this is a list of the pair of coordinates of the points that make a star.
lineStar=function(k) {
	var line=d3.svg.line()
		.x(function(d) {return d[0]*k;})
		.y(function(d) {return d[1]*k;})
	return line(star)+"Z"; // this will stitch everything together.

Now, running lineStar(10) will return the path description of a star with a radius of 10, thusly:


Placing, moving (and spinning) the stars

The next idea was placing the stars.

And for this we need two things: being able to position them somewhere, and being able to move them easily from point A to point B, ideally with some cool effect in between.

So, it would be possible to change the x and y attributes of the path, but each would have to be dealt with separately with a different function call. I found it a better approach to rely on the transform attribute and translate. Each time I want to position a star somewhere, I need it to be set at an x and y coordinate, which will always correspond to either the data of the star, or that of a group above it. For instance, a star corresponding to a movie will need to be at the position corresponding to the data of that movie, or that of the story type above it if it’s still collapsed, or that of the high-level grouping of story types if that’s collapsed.

Now all of the data structures for that are array of objects which all have x and y keys. In other terms, for any star-shaped object, I can always expect the underlying datum d to have d.x and d.y values. So, I wrote a function translate(d) which works on those 2 properties. And as a result, when I need to position any object all I have to write is:


and the object will be positioned according to its underlying data. (this is equivalent to writing .attr(“transform”,function(d) {return translate(d);}) )

If I need to be them elsewhere, i.e. at the position of their parent, I can pass the data of that parent as an argument, for instance:

.attr("transform",function(d) {return translate(structs[d.struct]);})

For a cheap bit of extra action, I’ve added a spinning effect in the translate function. Since translate(d) returns a value for the transform attribute, nobody said it just had to be instructions for translation! so I’ve added a rotate after the translate. The arguments for the rotate function depend on the x and y properties of the argument as well, so when stars move across the screen, the rotate angle changes slightly with each increment of either coordinate, giving the impression of spinning.

Explosions, starlets and other effects

Most of the cool things happening in the visualization rely on one very simple principle about d3 transitions: chaining them.
In the code you’ll find oftentimes this pattern:

.selectAll("someobject").data(...).enter().append(...) // creates the items
... // sets the initial attributes
... // change the attributes
.each("end", function() { // stuff to be done on each item after the transition is over

and within that function, you’ll find either:
another transition which starts exactly when the previous one ends, so for instance opacity can decrease (causing a fading effect):…

or a command to remove the object:

When another transition is called, there can be another one after, then another one, then another one, then eventually the object can be removed (or not).

Now you may think of transitions as ways to get one object to change smoothly from state A to state B, like a rectangle moving across the screen. But if you start to think that the objects can be discarded after the transitions, you’ll realize that there is an unbelievable number of things that can be done with them.
For instance, upon clicking on some stars, I am creating another star shape at that same location. Initially it has a the same size as the star, but I increase that radius to a large number (1000px) while decreasing its opacity to 0. So it seems that the new star is both exploding and fading. When it’s become transparent I remove it.

gStructs.append("svg:path") // here I'm creating a "path" shape
.style("stroke","none") // with no outline
.style("fill",colorXp)  // with the fill color of the explosion
.style("opacity",0.2)  // and a low opacity to start with (translucent)
.attr("d",lineStar(d.size[sizeAxis])) // I give it the shape of a star and the size of the
                                      // star that's being clicked
.attr("transform",translate(d)) // and I position it on that star

.transition() // action!

.duration(500)	// a 500ms transition. Long enough to see the effect.
.attr("d",lineStar(1000)) // the star expands to a radius of 1000.
.style("opacity",0) // while fading to transparency.

.each("end",function() {;}) // and when it's done - it's removed.

Changing axes

In this visualization I let the user change what’s plotted along the axes. It’s not very difficult to do but it’s a hassle to do it late in the project as it has been our case because it requires a lot of housekeeping. This is really about the data structures that will support our items. Instead of having just one value for x, y and size they have an object with several keys, one per axis. Then we maintain one variable per axis type, so everywhere we should write: d.x, we write instead: d.x[xAxis].

So when there is an axis change, of course, we do a transition so that the stars and everything move smoothly to their new position. But what if the objects were already moving? When an unplanned transition interferes with an ongoing one, the results are often ugly, especially if the current transition had chained transitions waiting to be triggered. In other words, this will leave a mess.

The way I’ve dealt with this is by keeping a tab on the number of transitions going on at a certain time. The axis change could only occur if no other transitions were taken place. If that was the case they were simply denied. There are other ways to do that like a queue of actions but that seemed the simple and adequate way to deal with this.

Bootstrap and google fonts

This was the first non-trivial project where I used bootstrap and I’m just never going back. Bootstrap simply removes all the hassle of arranging all the elements of a visualization on a screen and is very easy to use. Plus, it comes up with sensible options for buttons, forms, and the like. Since the contest it has evolved faster than a pokémon, for instance it is now possible to specify custom colors in a form and bootstrap will generate the appropriate css files. Google fonts are another great help as they are a very easy solution to choose fonts among a relatively large number of choices without relying on the fact that all the users have these fonts on their computer.

Wrapping it up

There’s a lot of other hacks in the code which you are welcome to explore, I admit I don’t remember them all because I took too much time to write this blog post after creating the entry (bad). However if there is one point you would like be to explain please ask in the comments.
I’m not entirely sure of what happened when I submitted the entry though. First it wasn’t listed with the others, then I got a message saying it hadn’t been reviewed, so it didn’t win anything, yet some time after the prizes have been handled it appeared in the “shortlisted” visualizations for the contest (which I found by accident). So whether or not it was good, I let you guys judge, at any rate it was fun making.


Treemaps in Tableau? can be done.

Tableau can do many things natively but there are a couple of basic primitives that are not built in because they behave somewhat differently from the overall logic. And treemaps is one of them. Then again treemaps are arguably one of the best way to express complex hierarchical information, i.e. to show the proportions in a large dataset.

Fortunately, thanks to Tableau flexibility there are ways to do that. In the tutorial I'm going to cover 2 cases. First, we'll create a somewhat complex treemap off data which will not change in runtime. Then, we'll create mini-treemaps which can change dynamically.

A complex treemap

Before we go in the details the main ideas are deceptively simple.

  • we use the polygon mark,
  • we generate the treemap layout outside of tableau.
What we want (and what we'll get) is a dataset that can be directly imported in Tableau and -boom- makes a treemap in a few clicks.

To make this dataset we can use d3. The treemap I am making is directly inspired from the d3 treemap example. d3 is already computing all of the node positions so what we'll do is modify the program slightly so that it outputs them in a way that can be directly used in Tableau.

Here is the modified file which you can download and run on your computer. To work it needs to be in the same folder as a data file called data.js which will hold your hiearchical data and which has the same structure as the one linked here.

You can just copy/paste the table that's displayed below the treemap and put it in Tableau or save it in a file for good measure. Here is the output of the data file linked above.

Let's take a look at a few rows :

Id Path Top-level category Name Value Corner x y
0 flare>analytics>cluster flare AgglomerativeCluster 3938 0 89 167
0 flare>analytics>cluster flare AgglomerativeCluster 3938 1 167 167
0 flare>analytics>cluster flare AgglomerativeCluster 3938 2 167 192
0 flare>analytics>cluster flare AgglomerativeCluster 3938 3 89 192
1 flare>analytics>cluster flare CommunityStructure 3812 0 102 138
1 flare>analytics>cluster flare CommunityStructure 3812 1 167 138
1 flare>analytics>cluster flare CommunityStructure 3812 2 167 167
1 flare>analytics>cluster flare CommunityStructure 3812 3 102 167
2 flare>analytics>cluster flare HierarchicalCluster 6714 0 89 192
2 flare>analytics>cluster flare HierarchicalCluster 6714 1 167 192
2 flare>analytics>cluster flare HierarchicalCluster 6714 2 167 236
2 flare>analytics>cluster flare HierarchicalCluster 6714 3 89 236
I'm creating 4 lines per "leaf" node. So in this example which has 220 nodes, that amounts to 880 lines. Why 4? Because to draw a rectangle in Tableau you really need to define 4 corners. This is why there is a column "Corner" which is worth 0,1,2 and 3. This, we will use to tell Tableau to read our corners in bottom left, bottom right, top right, top left order which produces a nice convex rectangle and not a concave hourglass shape.

Now off to Tableau with this data. 

Now it's just a matter of doing like this screen. Unsurprisingly the columns and rows are going to be determined by x and y. You want a polygon mark, and you absolutely must use your corner measure in the path. For color, you'll have a choice, you can use the top-level category column (as I have) or the full path which will divide your treemap in finer parts. Finally, level of detail: you must use the Id and not the name in case several of your nodes have the same name. It's quite important at this point to uncheck aggregate measures in Analysis. You do NOT want aggregate measures (though it's quite pretty). To be able to use the name, you must first make a measure out of it. And finally, you'll want to update your infotip slightly.

All of this you can see if you download the tableau file.

And voilà! Treemaps for your Tableau workbooks.

Caveat: the polygon mark doesn't support labels so you can't write on top of the small rectangles what they are but that's not the point of the treemap, which is instead to give an immediate first impression of the relative size of large groups of your data, then allow you to explore them, to that end the infotip function works just fine.

Simpler but dynamic treemaps

This is fine and dandy if your data doesn't change but it won't scale if you need to make many treemaps based on selections. What to do? You could use pie charts, but let's not.

To that end I've tried to emulate the Congress speaks visualization by Periscopic. I really like it. When you've selected representatives at the end of the process you are taken to a screen which shows the following mini-treemap:

There are just 5 rectangles. But they will change for any representative that we choose. Can this be done with Tableau? Obviously.

Now the Tableau part of this is slightly trickier than above. The idea is that we are going to use formulas to generate the coordinates of all 20 corners of the rectangles, in other words we are going to let Tableau calculate the layout. We can do it because the way that rectangles are going to be arranged is quite predictible. There is one on the left, then 4 stacked on the right one on top of the other. Again, we could compute all of these coordinates outside of Tableau but that would be a hassle and so for a large number of cases it becomes easier and more reliable to do this inside of Tableau.


For this I have used completely random data. I have generated 20 names, and for each I have generated 5 values in a likely range, number of possible votes, number of votes the representative actually voted, number of times they voted yes, number of times they voted yes with their party, and the same for no. (or nay, technically).

At the end of the day I need 20 records per representative (5 rectangles of 4 corners each), so I can either replicate the line 20 times, or use linked tables. The idea is to get something like this for all of the representatives that can somehow get into Tableau.

Id representative corner rectangle possible votes total votes voted yes yes with party voted no no with party
16 Nelson Thiede 0 no against party 888 784 320 274 464 373
16 Nelson Thiede 1 no against party 888 784 320 274 464 373
16 Nelson Thiede 2 no against party 888 784 320 274 464 373
16 Nelson Thiede 3 no against party 888 784 320 274 464 373
16 Nelson Thiede 0 no vote 888 784 320 274 464 373
16 Nelson Thiede 1 no vote 888 784 320 274 464 373
16 Nelson Thiede 2 no vote 888 784 320 274 464 373
16 Nelson Thiede 3 no vote 888 784 320 274 464 373
16 Nelson Thiede 0 no with party 888 784 320 274 464 373
16 Nelson Thiede 1 no with party 888 784 320 274 464 373
16 Nelson Thiede 2 no with party 888 784 320 274 464 373
16 Nelson Thiede 3 no with party 888 784 320 274 464 373
16 Nelson Thiede 0 yes against party 888 784 320 274 464 373
16 Nelson Thiede 1 yes against party 888 784 320 274 464 373
16 Nelson Thiede 2 yes against party 888 784 320 274 464 373
16 Nelson Thiede 3 yes against party 888 784 320 274 464 373
16 Nelson Thiede 0 yes with party 888 784 320 274 464 373
16 Nelson Thiede 1 yes with party 888 784 320 274 464 373
16 Nelson Thiede 2 yes with party 888 784 320 274 464 373
16 Nelson Thiede 3 yes with party 888 784 320 274 464 373

In Tableau

In Tableau we are going to use the same idea as above: polygon mark, disable aggregate measures, and use x and y for columns and rows.

Only, x and y are going to be much more complex. Sorry about that. Well, not that complex but definitely longer.

Here's x:

case [rectangle]
when "no vote" then
     case [corner]
       when 0 then 0
       when 1 then (([possible votes]-[total votes])/[possible votes])
       when 2 then (([possible votes]-[total votes])/[possible votes])
       when 3 then 0
     case [corner]
       when 0 then (([possible votes]-[total votes])/[possible votes])
       when 1 then 1
       when 2 then 1
       when 3 then (([possible votes]-[total votes])/[possible votes])

Depending on the rectangle we are trying to draw we can find ourselves in one of two cases (hence the use of case).

If we draw "no vote" then we are on the left of our vis. The left corners are on the leftmost side of the vis (hence value: 0) and the right corners correspond to the proportion of possible votes which where not cast by this representative, which we can compute as ([possible votes]-[total votes])/[possible votes].

In the other case, we are drawing one of the 4 stacked rectangles, so the right corners are on the rightmost side of the vis (hence value: 1) and the left corners correspond to the value we just computed.

And now, y:

case [rectangle]
when "no vote" then
case [corner]
when 0 then 0
when 1 then 0
when 2 then 1
when 3 then 1
when "yes against party" then
case [corner]
when 0 then 0
when 1 then 0
when 2 then (([voted yes]-[yes with party])/[total votes])
when 3 then (([voted yes]-[yes with party])/[total votes])
when "yes with party" then
case [corner]
when 0 then (([voted yes]-[yes with party])/[total votes])
when 1 then (([voted yes]-[yes with party])/[total votes])
when 2 then ((2*[voted yes]-[yes with party])/[total votes])
when 3 then ((2*[voted yes]-[yes with party])/[total votes])
when "no with party" then
case [corner]
when 0 then ((2*[voted yes]-[yes with party])/[total votes])
when 1 then ((2*[voted yes]-[yes with party])/[total votes])
when 2 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when 3 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when "no against party" then
case [corner]
when 0 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when 1 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when 2 then 1
when 3 then 1
y is longer but this is the same general idea. For the "no vote" rectangle, the corners are either to the top or bottom of the vis. But for the other, we can predict where the rectangle will start and when it will end, as a proportion of the [possible votes] field. The values we want are going to be correspond to these proportions, plus that of all the rectangles below so we can achieve that stacked effect (as opposed to have all rectangles superimposed at the bottom of the vis). This is why I am entering the rectangles in stacking order. Each time, the bottom corners get the value of the top corners of the previous rectangle.

Here is the final result:


Using d3 with a mySql database

Creating visualizations from static files is fine and dandy but sometimes you need to be able to access dynamic data. And some other times, you may want to somehow record interactions from your users. One way to do that is by interacting with a mySql database.

Without further ado here is the demo:

How does it work?

There are several parts to that.

First, one html file which holds everything together. By the way, for the styling I used Twitter’s bootstrap which makes it easy for all elements to find their place, and look at those purty buttons.

Second, one javascript file which contains the visualization proper.  If you have some familiarity with d3, there is really nothing scary in this script. I’ll go back to the parts where the script interacts with databases in detail.

Here’s what the rest does at a high level.

  1. We give some behaviors to the buttons
  2. Then we create a grid of small squares. All of these squares are positionned and given a class name, so that the square with class “r32” and “c17” is the 18th square from the left and 33th from the top (the class names start at 0).
  3. We catch the clicks on each square with a “clickme” function. In d3 logic, what is passed to that function is the underlying data of the element, in this case a 2-dimensional array with the x and y coordinates of the square which is being clicked on. In turn, the clickme function is going to update the data of the square, and those of the 4 surrounding squares (the one to the top, the bottom, the left and the right) by either increasing or decreasing the elevation of the terrain they represent

When it gets interesting is how the data is initialized and how it is updated.

d3.text("mapread.php", function(txt) {
	txt.split("\n").forEach(function(line,i) {
		line.split(",").forEach(function(d,j) {
			d3.selectAll(".r"+i+".c"+j).style("fill",function() {return cScale(data[i][j]);});

What’s really interesting here is the first line. We are asking d3 to go fetch a text file sitting at mapread.php, then do something with this file. The second part of the line, function(txt), calls a function with the contents of this text as argument.

The second line just removes the loading message box. Then, d3 splits the text in lines, and each line being a string of comma-separated values, it splits that too and feeds a variable, data, with the result of all of this splitting. Then, it formats the squares by coloring them according to the retrieved values.

At this stage you may think: but shouldn’t you load the data before drawing the scene? Well, what happens here is that loading the data takes much more time than drawing the scene, so it makes more sense to draw it first as an empty shell, load the data and then update the scene.

And as you may have guessed, this mapread.php is no ordinary text file, but a dynamically-generated file from a mySql database. I won’t cover setting up a mySql database. Tutorials on the subject abound, there are ISPs that offer free mySql hosting, and if you can also install a local server on your computer, for instance EasyPHP for windows users. And, if your ISP limits the number of mySql databases you can have, you don’t need to create a new one, just creating a new table within one will be fine. All you have to do really is find your mySql credentials.

Next, you want to create a PHP file that goes like this:

$username="username"; //replace with your mySql username
$password="password";  //replace with your mySql password
$dsn="database";  //replace with your mySql database name
$host="host";  //replace with the name of the machine your mySql runs on

You can call this: mysqlConfig.php or whatever, this  is a convenience file so you don’t have to type in your credentials each time you need to connect to your mySql database.

Next, here is the script that reads the database and outputs a text file:

// load in mysql server configuration (connection string, user/pw, etc)
include 'mysqlConfig.php';
// connect to the database
@mysql_select_db($dsn) or die( "Unable to select database");

// reads the map db

$query="SELECT `height` FROM `v_map` ORDER BY `row`, `col`";

$result = mysql_query($query,$link) or die('Errant query: '.$query);

// outputs the db as lines of text.
header('Content-type: text/plain; charset=us-ascii');

if(mysql_num_rows($result)) {
 while($value = mysql_fetch_assoc($result)) {

 if ($i==52) {
 echo $line."\n";
 else {$line=$line.",";}

And by the way, I am by no means a php expert. I hadn’t written a line of php in almost 10 years, so there may well be more effective ways to do that but the above works. The more interesting part is that we write an sql query which we store in $query and then we execute this query. Then, we loop over the results and echo the output.

Back to our javascript file, we also interact with another php file when we update the data.

function update(r,c,v) {
	if(r>=0 && r<y && c>=0 && c<x) {
		d3.selectAll(".r"+r+".c"+c).style("fill",function() {return cScale(data[r]);});
		d3.text("mapupdate.php?height="+data[r]+"&col="+c+"&row="+r,function() {console.log("cell on row "+r+" and col"+c+" updated to "+data[r]);});

Here the last line is the interesting one. What it does is that, again, it attempts to fetch a text file from a url. In fact, there is no text there but just accessing this url will trigger an interaction with the database. (I guess it would be good practice to actually get some text in return, but hey).

The program tries to read an url of the form mapupdate.php?height=20&col=10&row=32. By calling this url, we are actually passing these parameters to a php file, which will read them and use them to construct a query to the mySql database.

Here goes:


// load in mysql server configuration (connection string, user/pw, etc)
include 'mysqlConfig.php';
// connect to the database
@mysql_select_db($dsn) or die( "Unable to select database");

// updates the map db

$query="UPDATE `v_map` SET `height`=".$_GET["height"]." WHERE `col`= ".$_GET["col"]." and `row`= ".$_GET["row"];

Here, the line that starts with $query is doing just that. The dot “.” is PHP concatenation operator, and the $_GET variable returns an associative array with the parameters passed to the script.

For the sake of completeness, I had two other php scripts, one to initiate the table to begin with, and one to reset it if something went wrong. Those are just plain SQL queries so no need to reproduce them here.

And voilà! now all of you can interact with this terrain builder, create islands, forests, mountains etc. The graphics are kind of crude, because when I was looking for an example I decided to recreate one of my earliest attempts in creative coding. In 1990 upon the release of Powermonger I was so fascinated by the algorithmically-generated maps the game used as copy protection that I tried to code my own terrain generator, that was a time where 320x240x16 resolution was considered generous. Only here, it’s your clicks that replace the algorithm!

I hope you enjoy the tutorial and working with persistant data with d3!


d3: scales, and color.

In protovis, scales were super-useful in just about everything. That much hasn’t changed in d3, even though d3.scale is a bit different from pv.Scale. (do note that d3.scale is in lowercase for starters).

Scales: the main idea

Simply put: scales transform a number in a certain interval (called the domain) into a number in another interval (called the range).
an example of how scales work
For instance, let’s suppose you know your data is always over 20 and always below 80. You would like to plot it, say, in a bar chart, which can be only 120 pixels tall.
You could, obviously, do the math:

.attr("height", function(d) {return (d-20)*2;})

But what if you suddenly have more or less space? or your data changes? you’d have to go back to the entrails of your code and make the change. This is very error prone. So instead, you can use a scale:

var y=d3.scale.linear().domain(20,80).range(0,120);
.attr("height", y)

this is much simpler, elegant, and easy to maintain. Oh, and the latter notation is equivalent to

.attr("height", function(d) {return y(d);})

… only more legible and shorter.
And, there are tons of possibility with scales.

Fun with scales

In d3, quantitative scales can be of several types:

  • linear scales (including quantize and quantile scales,
  • logarithmic scales,
  • power scales (including square root scales)

While they behave differently, they have a lot in common.

Domain and range

For all scales, with the exception of quantize and quantile scales which are a bit different, domain and range work the same.
First, note that unlike in protovis, domain and range take an array as argument. Compare:

var y=pv.Scale.linear().range(20,60).domain(0,120);
var y=d3.scale.linear().range([20,60]).domain([0,120]);

This is because contrary to protovis, where domain could be a whole dataset, in d3, domain contains the bounds of the interval that is going to be transformed.
Typically, this is two numbers. If this is more, we are talking about a polypoint scale: there are as many segments in the intervals as there are numbers in the domain (minus one). The range must have as many numbers, and so as many segments. When using the scale, if a number is in the n-th segment of the domain, it is transformed into a number in the n-th segment of the range.
illustration of a multipoint scale
With this example, 30 finds itself in the first segment of the domain. So it’s transformed to a value in the first segment of the range. 60, however, is in the 2nd segment, so it’s transformed into a value in the 2nd segment of the range.
Also, bounds of domain and range need not be numbers, as long as they can be converted to numbers. One useful examples are colors. Color names can be used as range, for instance, to create color ramps:

var ramp=d3.scale.linear().domain([0,100]).range(["red","blue"]);

This will transform any value betwen 0 and 100 into the corresponding color between red and blue.


What happends if the scale is asked to process a number outside of the domain? That’s what clamping controls. If it is set, then the bounds of the range are the minimum and maximum value that can be returned by the scale. Else, the same transformation applies to all numbers, whether they fall within the domain or not.
Clamping example
Here, with clamping, the result of the linear transformation is 120, but without it, it’s 160.

var clamp=d3.scale.linear().domain([20,80]).range([0,120]);
clamp(100); // 160
clamp(100); // 120

Scales and nice numbers

More often than not, the bounds of the domain and/or those of the ranges will be calculated. So, chances are they won’t be round numbers, or numbers a human would like. Scales, however, come with a bunch of method to address that. d3 keeps in mind that scales are often used to position marks along an axis.


When applied to a scale, the nice method expends the domain to “nicer” numbers. You wouldn’t want your axis to start at -2.347 and end at 7.431, right?
So, there.

var data=[-2.347, 4, 5.23,-1.234,6.234,7.431]; // or whatever.
var y=d3.scale.linear().range([0,120]);
y.domain([d3.min(data), d3.max(data)]); // domain takes bounds as arguments, not all numbers
y.domain() // [-2.347, 7.431];
y.nice() // [-3, 8]


Given a domain, and a number n (which, contrary to protovis, is mandatory in d3), the ticks method will split your domain in (more or less) n convenient, human-readable values, and return an array of these values. This is especially useful to label axes. Passing these values to the scale allows them to position ticks nicely on an axis.

var y=d3.scale.linear([20,80]).range([0,120]);
var ticks=axis.selectAll("line")
  .data(y.ticks(4)) // 20, 40, 60 and 80
  .attr("y1",y).attr("y2",y) // short and simple. 


If used instead of .range(), this will guarantee that the output of the scales are integers, which is better to position marks on the screen with pixel precision than numbers with decimals.


The invert function turns the scale upside down: for one given number in the range, it returns which number of the domain would have been transformed into that number.
For instance:

var y=d3.scale.linear([20,80]).range([0,120]);
y(50); // 60
y.invert(60); // 50

That’s quite useful, for instance, when a user mouses over a chart, and you would like to know to what value the mouse coordinates correspond.

Power scales and log scales

The linearscale is a function of the form y=ax+b which works for both ends of the domain and range. In the example we’ve used most often until now, this function is really f(x): y=2x-40.
Power and logarithm scales work the same, only we are looking for a function of the form y=axk+b, or y=a.log(x)+b.
For the power scales, you can specify an exponent (k) with the .exponent() method. For instance, if we specify an exponent of 2, here is what the scale would look like:
an example of a power scale
The equation is now f(x): y=x²/50-8. So 20 still becomes 0 and 80 still becomes 120, but other than that the values at the beginning of the domain would be lower than with the linear scale, and those at the end of the scale will be higher.
For convenience, d3 includes a d3.scale.sqrt() (the square root scale) so you never have to type d3.scale.pow.exponent(0.5) in full.
Also note that if you are using a log scale, you cannot have 0 in the domain.

Quantize and quantile

quantize and quantile are specific linear scales.
quantize works with a discrete, rather than continuous, range: in other terms, the output of quantize can only take a certain number of values.
For instance:

var q=d3.scale.quantize().domain([0,10]).range([0,2,8]); 
q(0); // 0
q(3); // 0
q(3.33); // 0
q(3.34); // 2
q(5); // 2
q(6.66); // 2
q(6.67); // 8
q(8); // 8
q(1000); // 8

quantile on the other hand matches values in the domain (which, this time, is the full dataset) with their respective quantile. The number of quantiles is specified by the range.
For instance:

var q=d3.scale.quantile().domain([0,1,5,6,2,4,6,2,4,6,7,8]).range([0,100]);
q.quantiles(); // [4.5], only one quantile - the median
q(0); // 0
q(4); // 0
q(4.499); // 0
q(4.5); // 100 - over the median
q(5); // 100
q(10000); // 100
q.quantiles(); // [2, 4, 5.6, 6];
q(0); // 0 
q(2); // 25 - greater than the first quantile limit
q(3); // 25
q(4); // 50
q(6); // 100
q(10000); // 100

Ordinal scales

All the scales we’ve seen so far have been quantitative, but how about ordinal scales?
The big difference is that ordinal scales have a discrete domain, in other words, they turn a limited number of values into something else, without caring for what’s between those values.
Ordinal scales are very useful for positioning marks along an x axis. Let’s suppose you have 10 bars to position for your bar chart, each corresponding to a category, a month or whatever.
For instance:

var x=d3.scale.ordinal()
  .domain(["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]) // 7 items
x("Tuesday"); // 34.285714285714285

There are 3 possibilites for range. Two are similar: the .rangePoints() and .rangeBands() methods, which both work with an array of two numbers – i.e. .rangeBands([0,120]). The last one is to specify all values in the range with .range().

rangePoints() and rangeBands()

With .rangePoints(interval), d3 fits n points within the interval, n being the number of categories in the domain. In that case, the value of the first point is the beginning of the interval, that of the last point is the end of the interval.
With .rangeBands(interval), d3 fit n bands within the interval. Here, the value of the last item in the domain is less than the upper bound of the interval.
Those methods replace the protovis methods .split() and .splitBanded().
difference between rangeBands and rangePoints
This chart illustrates the difference between using rangeBands and rangePoints.

var x=d3.scale.ordinal()
x("Saturday"); // 120
x("Saturday"); // 102.85714285714286
x("Saturday")+x.rangeBand(); // 120

the range method

Finally, we can also use the .range method with several values.
We can specify the domain, or not. Then, if we use such a scale on a value which is not part of the domain (or if the domain is left empty), this value is added to the domain. If there are n values in the range, and more in the domain, then the n+1th value of the doamin is matched with the 1st value in the range, etc.

var x=d3.scale.ordinal().range(["hello", "world"]); 
x.domain(); // [] - empty still.
x(0); // "hello"
x(1); // "world"
x(2); // "hello"
x.domain(); // [0,1,2]

Color palettes

Unlike in protovis, which had them under pv.Colors – i.e. pv.Colors.category10(), in d3, built-in color palettes can be accessed through scales. Well, even in protovis they had been ordinal scales all along, only not called this way.
There are 4 built-in color palette in protovis: d3.scale.category10(), d3.scale.category20(), d3.scale.category20b(), and d3.scale.category20c().

A palette like d3.scale.category10() works exactly like an ordinal scale.

var p=d3.scale.category10();
var r=p.range(); // ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd", 
                      // "#8c564b", "#e377c2", "#7f7f7f", "#bcbd22", "#17becf"]
var s=d3.scale.ordinal().range(r); 
p.domain(); // [] - empty
s.domain(); // [] - empty, see above
p(0); // "#1f77b4"
p(1); // "#ff7f0e"
p(2); // "#2ca02c"
p.domain(); // [0,1,2];
s(0); // "#1f77b4"
s(1); // "#ff7f0e"
s(2); // "#2ca02c"
s.domain(); // [0,1,2];

It’s noteworthy that in d3, color palette return strings, not pv.Color objects like in protovis.

d3.scale.category10(1); // this doesn't work
d3.scale.category10()(1); // this is the way.


Compared to protovis, d3.color is simpler. The main reason is that protovis handled color and transparency together with the pv.Color object, whereas in SVG, those two are distinct attributes: you handle the background color of a filled object with fill, its transparency with opacity, the color of the outline with stroke and the transparency of that color with stroke-opacity.

d3 has two color objects: d3_Rgb and d3_Hsl, which describe colors in the two of the most popular color spaces: red/green/blue, and hue/saturation/light.

With d3.color, you can make operations on such objects, like converting colors between various formats, or make colors lighter or darker.

d3.rgb(color), and d3.hsl(color) create such objects.
In this context, color can be (straight from the manual):

  • rgb decimal – “rgb(255,255,255)”
  • hsl decimal – “hsl(120,50%,20%)”
  • rgb hexadecimal – “#ffeeaa”
  • rgb shorthand hexadecimal – “#fea”
  • named – “red”, “white”, “blue”

Once you have that object, you can make it brighter or darker with the appropriate method.
You can use .toString() to get it back in rgb hexadecimal format (or hsl decimal), and .rgb() or .hsl() to convert it to the object in the other color space.

var c=d3.rgb("violet") // d3_Rgb object
c.toString(); // "#ee82ee"
c.darker().toString(); // "#a65ba6"
c.darker(2).toString(); // "#743f74" - even darker
c.brighter().toString();// "ffb9ff"
c.brighter(0.1).toString(); // "#f686f6" - only slightly brighter
c.hsl(); // d3_Hsl object
c.hsl().toString() // "hsl(300, 76, 72)"

d3: adding stuff. And, oh, understanding selections

From data to graphics

the d3 principle (and also the protovis principle)
d3 and protovis are built around the same principle. Take data, put it into an array, and for each element of data a graphical object can be created, whose properties are derived from the data that was provided.

Only d3 and protovis have a slightly different way of adding those graphical elements and getting data.

In protovis, you start from a panel, a protovis-specific object, to which you add various marks. Each time you add a mark, you can either:

  • not specify data and add just one,
  • or specify data and create as many as there are items in the array you pass as data.


How de did it in protovis

var vis=new pv.Panel().width(200).height(200); 
    .left(function() {return this.index*20;})
    .height(function(d) {return d*10;});

this simple bar chart in protovis
you first create a panel (first line), you may add an element without data (here, another panel, line 2), and add to this panel bars: there would be 5, one for each element in the array in line 4.

And in d3?

In d3, you also have a way to add either one object without passing data, or a series of objects – one per data element.

var rect=vis.selectAll("rect").data([1,4,3,2,5]).enter().append("svg:rect");
rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}

In the first line, we are creating an svg document which will be the root of our graphical creation. It behaves just as the top-level panel in protovis.

However we are not creating this out of thin air, but rather we are bolting it onto an existing part of the page, here the tag. Essentially, we are looking through the page for a tag named and once we find it (which should be the case often), that’s where we put the svg document.

Oftentimes, instead of creating our document on , we are going to add it to an existing <div> block, for instance:

<div id="chart"></div>
<script type="text/javascript">

Anyway. To add one element, regardless of data, what you do is:

The logic is : we would like to put our new object).append(type of new object).

Going back to our code:

var rect=vis.selectAll("rect").data([1,4,3,2,5]).enter().append("svg:rect");
rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}

On line 2, we see a different construct:

an existing selection, or a part of the page
.data(an array)
.append(an object type)

This sequence of methods (selectAll, data, enter and append) are the way to add a series of elements. If all you need to know is to create a bar chart, just remember that, but if you plan on taking your d3 skills further than where you stopped with protovis, look at the end of the post for a more thorough explanation of the selection process.

Attributes and accessor functions

At this stage, we’ve added our new rectangles, and now we are going to shape and style them.

rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}

All the attributes of a graphical element are controlled by the method attr(). You specify the attribute you want to set, and the value you want to give.
In some cases, the value doesn’t depend on the data. All the bars will be 15 pixels wide, and they will all be of the steelblue color.
In some others, the value do depend on the data. We decide that the height of each bar is 20 times the value of the underlying data, in pixels (so 1 becomes 20, 5 becomes 100 etc.). Like in protovis, once data has been attributed to an element, function(variable name) enables to return a dynamic value in function on that element. By convention, we usually write function(d) {…;} (d for data) although it could be anything. Those functions are still called accessor functions.
so for instance:

.attr("height",function(d) {return d*20;})

means that the height will be 20 times the value of the underlying data element (exactly what we said above).
In protovis, we could position the mark relatively to any corner of its parent, so we had a .top method and a .bottom method. But with SVG, objects are positioned relatively to the top-left corner. So when we specify the y position, it is also relative to the top of the document, not necessarily to the axis (and not in this case).
so –

.attr("y", function(d) {return 100-d*20;})

if we use scales (see next post), all of this will have no impact whatsoever anyway.
Finally, there is an attribue here which doesn’t so much depend on the value of the data, but of its rank in the data items: the x position.
for this, we write: function(d,i) {return i*20;}
Here is a fundamental difference with protovis. In protovis, when we passed a second argument to such a function, it meant the data of the parent element (grand parent for the third, etc.). But here in d3, the second parameter is the position of the data element in its array. By convention, we write it i (for index).
And since you have to know: there is no easy way to retrieve the data of the parent element.

Bonus: understanding selections

To add many elements at once we’ve used the sequence: selectAll, data, enter, append.
Why use 4 methods for what appears to be one elementary task? If you don’t care about manipulating nodes individually, for instance for animations, you can just remember the sequence. But if you want to know more, here is what each method does.


the selectAll method
First, we select a point on which to add your new graphical objects. When you are creating your objects and use the selectAll method, it will return an empty selection but based on that given point. You may also use selectAll in another context, to update your objects for instance. But here, an empty selection is expected.


the data method
Then, you attribute data. This works quite similarly to protovis: d3 expects an array. d3 takes the concept further (with the concept of data joins) but you need not concern yourself with that until you look at transitions.
Anyway, at this stage you have an empty selection, based on a given point in the page, but with data.


the enter method
The enter method updates the selection with nodes which have data, but no graphical representation. Using enter() is like creating stubs where the graphical elements will be grafted.


the append method
Finally, by appending we actually create the graphical objects. Each is tied to one data element, so it can be further styled (for instance, through “attr”) to derive its characteristics from the value of that data.


From protovis to d3

You’ve spent some time learning protovis only to find that its development is halted as authors have switched to work on d3. Have your efforts all been in vain? Fear not! This series of posts will help you adapt to d3 with a protovis background.

Before we go anywhere further, let me say that these posts won’t make you awesome at d3 (yet). We won’t be talking about how to do all amazing things you could never do in protovis. Rather, we’ll focus on enabling you to be as comfortable with d3 than you could have been with protovis. And once that’s done, nothing will prevent you from learning the more powerful aspects of d3.

Anyway, if you’re reading this, you are already awesome.

Why should I make the switch to d3?

Frankly, you don’t have to. Protovis is a fine framework and works well. Now you may want to switch to d3 for several reasons.

  • d3 is fast. d3 is better at handling scenes with hundreds or thousands of elements. So if you like scatterplots or network graphs, and who doesn’t, d3 has much stronger performance.
  • d3 does animation. There were workarounds to get animation in protovis but there were that. Workarounds. Animation and transitions are built in d3 and are a snap to implement.
  • More features. Just because development has stopped on protovis doesn’t mean that it has stopped elsewhere… for instance, d3 has more ready-to-use layouts, like voronoi tesselation or chords, and it has more methods and functions to make your life easier, to access and manipulate data for example.
  • Styling. In d3 it is possible to apply style sheets in CSS to graphical elements. This helps keeping the code and the format separate.

Yes but doesn’t everything change?

Short answer: no.

Less short answer: some things do change substantially. Most things stay the same. And then, some things look the same but have changed.

Things that stay the same

  • The general principle.Protovis is about transforming an array of data into the same number of graphical elements, with characteristics derived from that data. d3 does exactly this as well.
  • pv.Nest, which in my personal protovis experience has been the hardest to understand. Only, it’s called d3.nest now.
  • Methods that supplement the existing javascript array manipulation methods, like pv.min, pv.values, pv.entries etc. are also back (as d3.min, d3.values, d3.entries, but you’ve guessed it by now). Some, like pv.mean or pv.median, didn’t make it through but you could easily rewrite them, or continue using the protovis ones.

Things that look different, but which are largely the same

Protovis had a number of native graphical objects, or marks, that could be manipulated at will with methods.

var vis=new pv.Panel()

In protovis, it is inherently different to set the height, the width or just any property of an object. This uses different methods.


This produces essentially the same thing. We add a rectangle of a specified height, width, and colors. There are a few differences though. Here, controlling height, width or fill is essentially the same thing and uses the same method, .attr(). Notice also that we first created an svg document, then a shape within that document. And also, that we don’t need to use vis.render(); anymore.

The d3 approach looks longer. But if we define all the style information first we could make it much shorter, shorter than in protovis in fact!
For instance:

var"body").append("svg:svg").append("svg:rect").attr("class", "myRect");

Much of the apparent differences between d3 and protovis come from using explicitly svg shapes (paths, polygons, ellipses, etc.) as opposed to native objects (pv.Panel, pv.Bar, pv.Dot, etc.), although – it’s essentially the same thing. Yes, you have to learn your SVG but it’s really on a need-to-know basis. In fact, if you’ve worked with protovis, or even if you’ve worked with HTML and CSS, you probably know more SVG than you thought.

SVG is more flexible than protovis objects. The flipside is that constructs which were once simple in protovis become less obvious in SVG. But for those cases, d3 has recreated some native objects, even if not as many as in protovis.

Things that look the same, but which are different

There have been some changes in methods that have kept the same name since protovis – some minor, some more substantial. In any case, the basic ways of using these methods (like scale, color, data…) doesn’t change much. It’s only their more exotic uses who do change.