Making the game of thrones visualization

So I made this interactive visualization about the 5 Game of Thrones books. How?

The project

The visualization is based on the events which happen to the main characters of the books. With over 2000 characters and close to 5000 pages over 343 chapters, it’s not possible to show everything, so I took about 300 characters and restricted to a small selection of events, such as characters killing each other. Also, I regrouped characters in a 2-level hierarchy so that it would be easier to find them and see what happens at a higher level.


Data is the first word in data visualization, and in order to visualize one must collect data.
this has not been a small task. When I read the books, which was a while back (before a Dance with Dragons was published), I had already half a mind to make a visualization, so I jotted down some notes but I had no clear idea of how it would look like. But I started writing down when in the books characters did die. Eventually I realized that if I wanted to prioritize characters I had to find a way to discriminate between those who appeared infrequently and could be left out, and those who were recurring. So, I had to find a way to determine when did the various characters appeared and what happened to them.

To achieve that I had the five books in printed version, which is definitely not the best way to approach this. So I tried to find something to scrape. So I approached this on two fronts. On one hand, I got a raw text version of the books. But they were very hard to scrape. For instance, there are at least 11 different characters named Pate (just Pate), and 23 called Jon something. Besides, many have aliases, titles and other names so a query to find all instances of “Jon” won’t capture all mentions of, say, Jon Snow, but will also return appearances of the Jon Conningtons, Jon Arryns and the like. To make matter worse, my text file was scanned from the book and was of less than optimal quality, with many typos on names.

The other source were the two fan-maintained ressources on the series, Tower of the Hand and a Wiki of Ice and Fire, which both contain summaries of the chapters and information on the characters. Some chapters were described in meticulous detail with all the characters that appear specified, and a description of all that happens then. But others are more loosely narrated. That said, both sites propose an exhaustive list of characters of the books which were extremely useful.

So I first scraped a wiki of Ice and Fire to know which characters were mentioned in each chapter, then read the summaries to get a feel from the events happening, which I maintainted by hand.

With that first level of material, I decided to keep characters mentioned at least 5 times, or the named character who had been killed by another named character (as opposed to “a guard” being killed by “a soldier”). That left me with about 250 characters (out of slightly over 2000). Later, when the visualization became usable, playing with it I found some inconsistencies – how come this character is not dead yet in that book? That was because some characters were missing from my roster. So by checking in the original books, I increased the roster to about 300 (296 precisely). Also for most (and not all) characters, using the text file, I was able to get all the mentions of a given character in all the books.

Data analysis

I wanted to do something around the relationships among characters and I soon noticed that there are many cliques, that is groups of characters where every one of them trust every other one. This is the case of most families of organizations. When there is one character that defects, this is clearly signalled. You never get a situation where A trusts B but not C, B trusts C and not A and C trusts A and not B, or anything complex really.

But still, that’s many, many groups.

While in the books, families and groups are presented as independent entities, they almost always align on a larger, more powerful one. So it was interesting to regroup the smaller groups in larger alliances, especially if the focus was to represent kills

In the books most characters belong to or serve noble houses, and those who don’t belong to well-identified groups. There are very few characters who just mind their own business. There is a plethora of such Houses which can make things confusing (and again: 2000 characters). After several attempts I concluded it’s neither possible nor a good idea to represent this diversity visually. Instead, I tried to “group the groups” and to create higher-level aggregates.

Eventually (and I did that fairly late in the process, after few tries on the visualization) I created 5 groups. One for the Starks and the Lannisters, which are the families which receive the most attention during the book, as 70%  of the chapters are written from the point of view of a member of either family.Also, contrary to the Targaryen house whose point of view accounts for about 10% of the book, Starks and Lannisters have many allies and followers. So, as a consistent group they are larger and more interesting.

The other 3 groups are as follows: antagonists, that is aggressive characters (including monsters) who may attack any other; neutral characters, who tend to stay out of conflict, and opportunists, who look for more power.

Each of the 5 groups exhibits different patterns when it comes to killing: Starks (“the good guys”) don’t kill their own or neutral characters, but may have to fight characters from the other groups; conversely, some characters in the Lannister clan or among opportunists may carry out assassinations where anyone can be targeted. Neutral characters don’t fight except against antagonists, and the latter may fight characters from any group.

Drawing the visualization

I started thinking of that project a long time ago, and I’ve made experiments taking many forms. One of such form was a previous visualization on the places in Games of Thrones. That one visualization was the low-hanging fruit of the dataset I was building and refining. I knew I wanted to show events happening to the characters. Originally I thought of something linear, like a gantt chart, possibly grouping the characters by families which would be collapsable. But even in the broad sense that’s a lot of families, it wouldn’t make the visualization very legible.

What I had in mind then was to find a way to represent the status of the characters over time, who got killed, who got crippled, that sort of thing.

Eventually, I thought it was more interesting to represent the relationships of characters among themselves, so I started to take notice of all the interactions between characters, such as: who kills whom, who captures whom, who marries whom, etc. There were many which didn’t make it in the final visualization which is already complex enough as is.

I thought of the chord form early because it’s possible to use it to represent a lot of nodes and a lot of relationships among them even and even if it’s difficult to see one individual node expect the most important ones, and even more difficult to see one individual relationship, it’s possible to get a vague idea of mass. So I thought of representing characters as circles around a main circle coloring them by family or something. But doesn’t work, there are just too many different families. By so doing I was just plotting complexity.

Then, I realized that one very important aspect of the story, that is, one way in which a visualization could actually help understand what’s going on in the books, is that of trust. Within a group, all characters trust each other. Actually, this is much simpler than in real life: Westeros families are very close-knit; there are no murders among siblings or even though such things were commonplace in History! In network parlance, a group of entities which are all connected among each other is called a clique. And Game of Thrones is really a game of cliques. In all key moments of the book, one character of the clique will change sides. So all other characters of that clique continue to trust him, without realizing that he is setting them up, and a string of murders usually ensues.

So I decided to show action only at the clique level (families, organizations…). The problem I had was that once a character dies the representation of the clique won’t change much, whereas if I represented characters individually I could reflect that state of affairs.

So I thought of drawing one circle per clique around the main circle, and to represent characters individually within those circles using the packed circle method.

The method I chose was good (but not completely accurate) at preserving the relative importance of one clique compared to all the others, but just barely ok at preserving the relative importance of one given character.

I would take all the mentions of all the characters, tally that by clique, then take the square roots of that for each clique. Then, for each clique I compute the ratio of that square root to the sum of all the other square roots.

I multiply that by 2π and that’s an angle, that’s the “slice” of the main circle that will be occupied by the circle corresponding to that clique. Picture:

(btw click on the circle on the left to regenerate the data points)

So while those proportions don’t exactly match they are very very close. That doesn’t hold at the character level, because the sum of the areas of the character circles can occupy anywhere between 50% and 100% of the areas of the larger circle. But that’s not important. Accuracy is not important, as long as it is sufficient to say: this character appears often and this one doesn’t.

Two other technical points about the making of the viz.
All positions for all possible time periods had to be computed ahead of time.
In d3, it is natural to add, add, add stuff over time without worrying so much. More data? we’ll just add more datapoints.
Here I couldn’t really do that because I allowed the user to go back and forth in time. So a user could set the visualization in autoplay and go from time 0 to time 50, for instance, then pause and jump to time 200 and then back to time 25.
So it wasn’t possible to read the datafile in sequence and to draw some additional data points at each step. In the above exemple, all that happens between time 50 and time 200 has to be shown at once, and then all that happened between time 25 and time 200 has to be hidden at once.

so it’s just a matter of separating the code that calculates all the positions from the one that draws the viz, two operations which more often than not are intertwined.

Last, in the visualization I get to write group names in a circle around the main circle. How is this done?

Well, in svg, you can’t write “on a circle”. You can write on a path, which can be anything, a circular arc for instance. In this case it’s a bit more complicated because I wanted to make sure that the writing would not be upside down. So I actually used two arcs.


Selections in d3 – the long story

This past week, Scott Murray and I presented a tutorial at Strata on d3 (of all things!)
First things first, you probably want to get Scott’s book on the subject when it’s out. I should be translating it into French eventually.
You’re also welcome to the slides and examples of the tutorial which can be found on That include my d3 cheat sheet.

We had done a d3 workshop a few months back at Visweek with Jeff Heer. This time around, we changed our approach: we covered less ground, went at a slower pace, but targeted what is in our opinion the most troublesome aspects of learning d3: selecting, creating and removing elements.

I have learned d3 from deciphering script examples and in the earliest ones one ubiquitous construct was this sequence : select / selectAll / data / enter / append.
It does the work, so like everyone else I’ve copied it and reused very often. It happens to be the most proper way of adding new elements in most cases, but the point is, while learning d3, I (and many people before and after me) have copy/pasted it without understanding it deeply. Though, copy pasting something you don’t understand thoroughly is the best way to get errors you don’t understand any better, and it would prevent you from accessing the rest of the potential of the library. Conversely, once this is cleared, you can be “thinking in d3″ and easily do many things you might have thought impossible before.

We did the tutorial hands-on, live coding most of the time. To follow through, I invite you to create or open an empty page with d3 loaded (such as this one – the link opens a new tab) and then open the “console” or “web developer tools” which allow you to type javascript statements directly, without having to write and load scripts. Here are the shortcuts to the console:

  • Chrome: Ctrl-J (windows), ⌥ ⌘+j (Mac)
  • Firefox: Ctrl+Shift+k (windows), ⌥ ⌘+k (Mac)
  • Safari: Ctrl+Alt+c (windows), ⌥ ⌘+c (Mac)
  • IE9+: F12

To make the best of this tutorial, please type the examples. Some tutorials show you impressive stuff and show you step by step how to do it. That’s not one of them. I’ve sticked to very, very basic and mundane things. We’ll be only manipulating HTML elements such as paragraphs, which I assume you have seen earlier (plot twist: you are reading one at this very moment)
Some of the code snippets don’t work. That’s the idea! I think you can’t progress by merely copying code that works. It’s important that you try out code that looks reasonable but that doesn’t produce the expected result or that causes an error, but then understand why.

Adding simple stuff

Creating elements

Our empty page is, well, empty, so we are going to add stuff.
to create elements, we need the append method in d3, which takes as an argument the type of element that needs to be created, while the html method at the end allow us to specify a text.

so let’s go ahead and type:

d3.append("h1").html("My beautiful text")

and see what happens.

what do we get? and why is that?
In d3, every element which is created cannot appear out of thin air, and must be added to a container. If we don’t specify a container element, we just can’t create anything.
In HTML, most elements can be containers, that is, it’s usually possible to add elements to almost everything. Then again, our template is fairly empty, so we can select the tag and take it from there."body").append("h1").html("My beautiful text")

we’re in business! as long as there is a sensible place to put them, you can create as much stuff as you like. Since we’re on a roll, why won’t we throw in a few paragraphs (p element in HTML):"body").append("p").html("Look at me, I'm a paragraph.")"body").append("p").html("And I'm another paragraph!")"body").append("p").html("Woohoo! number 3 baby")

and lo and behold, all our paragraphs appear in sequence. Simply beautiful.
But wait! paragraphs are containers, too. Why don’t we try to add a span element to one paragraph? For those of you with no HTML knowledge, span elements are like paragraphs, except there is no line break by default at the end.

So let’s try this:"p").append("span").html("and I'm a span!")

Before typing it, take a minute to think where you expect it to go.
Then go ahead and type it.

you may have guessed that our new bit of text could go on a line of its own at the end of the document, or at the end of the last paragraph. But instead, it goes at the end of the first paragraph.
Why is that? well, our select method stops the first instance of whatever it tries to find. In our case, since we asked it to find paragraphs – p, it stopped at the first p element it found, and added the span at the end of it (append).

Beyond creating new things

adding new elements to a page programmatically is kind of useful, but if d3 stopped at that you probably wouldn’t be so interested in this tutorial to begin with. You can also modify and manipulate elements. We’ve done that to some extent with the html method. But we can also modify the style of the elements, their attributes and their properties. For the time being, don’t bother too much about the difference between these three things. Style refers to the appearance of elements, attributes, to their structure, and properties, to what can be changed in realtime, like values in a form. But again, let’s not worry about that for now and let’s just follow along. Look at this code snippet:"p").style("color","red")

this will select the first paragraph and change its style, so that the text color is changed to red.
But wait! our first paragraph, isn’t that the one with a span at the end of it? What will happen to that bit of text? Well, type the statement to find out.
All the paragraph, including its children (that is, everything added to it, in our case the span) is turned to red."span").style("color","blue")

That singles out our span and writes it in blue. Can this be overturned?"p").style("color","red")

That won’t change a thing. Our first paragraph is, in fact, already red. But its child, the span, has a style which overrides that of its parent. To have it behave like the rest, we can remove its style like so:"span").style("color",null)


it will behave like its parent, the paragraph.
But let’s try something else:"span").style("color","blue")

we write our span in blue,"span").style("color","green")

and now back in green, like its parent."p").style("color","red")

What will happen?
well, the paragraph turns red, but the span doesn’t. It’s still following its specific instruction to be written in green.

That goes to illustrate that children behave like their parents, unless they are given specific instructions.

For HTML elements, we can play with styles, not so much with attributes or properties. One thing worth noting though is that an element can be given a class or an id.

Classes and ids can be used to style elements using a cascading style sheet (CSS). Knowing how CSS works is entirely facultative in learning d3, since d3 by itself can take care of all styling needs. Though, knowing basic CSS is not the most useless of endeavors, and some sensible CSS statements can save a lot of tedious manipulation in d3.
The other use of classes and ids is that they can be used to select elements.

Let’s reload our page so we start from scratch."body").append("p").html("First paragraph");"body").append("p").html("Second paragraph").attr("class","p2");"body").append("p").html("Third paragraph").attr("id","p3");

without the use of classes and ids, it’s still possible to select and manipulate the 2nd or 3rd instance of an element, but it’s a chore. You have to use pseudo-classes like“p:nth-of-type(2)”) to select the 2nd instance of a paragraph, for instance.
Personally, I’d rather avoid this and prefer using simpler statements. With classes and IDs set, we can write instead:".p2").html("I'm classy");"#p3").html("I've got ideas");

To select things of a given class, you must use a period before the name of the class. To select things of a certain id, you must use the hash sign.
Here, we are looking for the first element of the p2 class. This happens to be our 2nd paragraph. When you know you will have to manipulate elements which are not easily accessible, you may as well give them classes which will make this easier down the road.

In theory, there should only be one element of a given ID in one page, so I recommend not using them dynamically unless you can be 100% sure that there will not be duplicates. And, in case you were wandering, one element can have several (even many) classes.

Two birds, one stone

Introducing selectAll

So far, we’ve changed properties of one element at a time. The exception was when we changed the colors of both a paragraph and a span, but even then, we were still technically only changing the characteristics of one paragraph, which its child, the span, just happened to inherit.

For a complex document, that can be super tedious, especially since we’ve seen that it’s not easy to retrieve an element which is not the first of its kind.

so let’s go ahead and type:


(for a little variety. I mean, changing text color is so 1994.)
What was that? Everything turned to bold!

Indeed: while the select method returns the first element that matches the clause, selectAll matches them all.
Let’s do more.
We’re going to add a span to our first paragraph."p").append("span")
.html("I'm a rebel child.")

we’re adding a gratuitous styling command.
Now, let’s change the background color of all the paragraphs.


As could be expected, the span doesn’t change its background color, and so it appears differently from its parent (which could be a desired effect – this gives us flexibility).
but what if we wanted to change the background color of everything? can we do better?


(quite fitting in these times of papal conclave)

Well – everything gets a background color of “white smoke” (which is a fine background color btw.). Including the “body” element – that is, everything on the page!
selectAll(“*”) matches everything. With it, you can grab all the children, their children etc. (“descendants”. I know…) of a selection, or, if used directly like so: d3.selectAll(“*”), everything on the page.
So we’ve seen we can select moaar. But can we be finer? Can we select the paragraphs and the spans only, without touching the rest?

we sure can!

d3.selectAll("p, span").style("background-color","lawngreen")

The outcome of that one statement probably won’t make it to our web design portfolio, but it does the trick: you can select as much as you like, or as little as you like.

Nested selections

To illustrate the next situation, let’s add a span to our document."body").append("span").html("select me if you can")

Well, just like there is a way to select directly the 2nd paragraph using pseudo classes, there’s also a (complicated) way to select directly that last span (namely: selectAll(“span:not(p)”) )
there’s also a simpler way which is what we’re interested in.
let’s suppose we want to turn it to bold:
we can just do


then change the first one:"p").select("span").style("font-weight",null);

Admittedly, the complicated way is more compact. But conceptually, the “simple” way is easier to follow: we can do a selection, and within that selection perform a newer selection, and so on and so forth. That way, we can get away with just using super simple selectors, as opposed to master the intricacies of CSS3 syntax. Do it for the people who will read your source code :)

At this point:

  • You know how to dynamically create content. Pretty cool!
  • More! you can dynamically change every property of every element of the page. woot!
  • Bonus! you’re equipped with tactics to easily reach any element you want to change.

You should also have a good grasp of, d3.selectAll and the difference between the two.
what more could you possibly want? Well, since this is about data visualization, how about a way to tie our elements to data? This is what d3 is really about.

Putting the data in data visualization

Introducing data: passing values to many elements at once

So far, we’ve entered “hard coded” values for all of our variables. That’s fine, but we can’t really set our elements one by one. I mean, we could, but it’s no way to “industrialize” the way elements are created.
Fortunately, d3 provides. Its more interesting characteristic is the ability to “bind” elements with data.

If you’ve followed the instructions step by step, you should have 3 paragraphs in the page. Plus a span afterwards, but whatever.
Let’s introduce the data method. This will match an array of values to a selection of elements in the page. Let’s go:

var fs=["10px","20px","30px"];
d3.selectAll("p").data(fs).style("font-size",function(d) {return d;})

wow wow wow what just happened?
First, we create an array of values which we intelligently call fs (for font size).
Then, right after the selectAll(“p”) which gathers a selection of elements (3 “p” elements to be exact), we specify a dataset using the data method.
It just happens that our dataset has just the same number of items as our selection of elements!

finally, we use style, like we used to, with a twist: instead of providing one fixed value, which would affect our 3 p elements in the same way, we specify a function.
This function will parse the dataset, and for each element, it will return the result of an operation in the corresponding data point: the result of the function on the first item for the first p element, the result on the 2nd item for our 2nd paragraph, and lastly the result on the last item for our last paragraph.
We write the function with an argument: d. What is d? it’s nothing but a convention. We can call it anything. d is standard fare in d3 code because that’s the writing style of Mike Bostock, the author of the framework and of many of its examples.
This function is nothing special, it returns the element itself, so we are passing “10px” for the font-size of our first paragraph, and so on and so forth (20px, 30px).
As an aside, we can use the String function, which converts any element into a string, instead of writing function(d) {return d;}. So:


would also work and is shorter to write.

Let’s recap what just happened here, because this is important.
We want to apply a dynamic transformation to a bunch of existing elements, as opposed to finding a way to select each individual element, and passing it a hard-coded value.
What’s more, we want to apply a transformation of the same nature, but of a different magnitude, on each of these items.

How to proceed?
well, first we create an array of values. That’s our fs boy over there.

var fs=["10px","20px","30px"];

Then, we will first select all of the elements we want to modify, then we’ll tie our dataset to that selection. This is what selectAll, then data does.

var selection=d3.selectAll("p").data(fs);

By the way, I’ve stored the result of the selectAll then data in a variable. In the original example, I just “chained” the methods, that is, I followed each method by a period and another one. The two syntaxes are equivalent. Chaining works, because each of these methods returns a value which is itself a selection on which further operations can be done. This syntax works well through most of d3 with some exceptions which will be duly noted.

Then, we are going to change the style of the selection, using a function on our data."font-size",function(d) {return d;})


That function will run on each value of our dataset, and return one result per value, which will be passed to all elements in sequence.

At this stage you may have two questions:

  • Can we use more sophisticated functions, because this one is kind of meh?
  • What happens if there is not the same number of items in the dataset and of elements?

The second question is actually more complicated than the first, but we’ll answer it in painstaking detail.
So let’s take care of the question on functions first.
Yes, obviously, we can use the function not just to return the element, but to do any kind of calculation that a language such as javascript is capable of, which is nearly everything.
To illustrate that, here are some variations of our initial code which will return the same result, but with a different form.

var fs=[10,20,30]; // no more px
d3.selectAll("p").data(fs).style("font-size",function(d) {return d+"px";})

Here, instead of returning just the element, we append “px” at its end. Sadly, style(“font-size”,10) doesn’t work, but style(“font-size”,10+”px”) – which is the same as style(“font-size”,”10px”) is valid.

Here is yet another way.

d3.selectAll("p").style("font-size",function(d,i) {return 10*(i+1)+"px";})

function(d,i) ? what is this devilry?
Here, i (or anything we want to call it, as long as it’s the 2nd argument of this function) represents the order of the element in the selection, so the first gets a 0, the second a 1, etc. (well, in our example it goes to 3 elements, so the last one gets a 2).
This may be a bit abstract to say here, but even if we haven’t passed data, this would still work – i represent the order of the element, not the data item. so, if no data had been passed, within this function call, d would be undefined, but i would still be equal to 0,1,2, …

The answer to the second question is the last great mystery of d3. Once you get this, you’re golden.

Creating or removing the right number of elements depending on data

Before we get further, let’s quickly introduce append’s reckless cousin, remove(). Writing remove at the end of a selection deletes all the corresponding elements from the document object model.


would remove our 3 paragraphs. Let’s do it and get rid of our paragraphs.
Actually, let’s do"body").selectAll("*").remove()

and remove everything below the body.

Now, earlier, we were alluding to what could happen if we didn’t have the same number of elements as of items in our dataset.

That means that we should be able to do the following:

  • If there are fewer elements than items in a dataset, create the missing elements
  • If there are fewer elements than items in a dataset, disregard the extra data items
  • If there are more elements than items in a dataset, remove the extra elements
  • If there are more elements than items in a dataset, don’t change the extra elements/li>
  • As data are updated, keep some elements, remove some, add some

Why would we want to do all of this?
The first case is the most common. When we start a data visualization script, chances are that there are no elements yet but there is data, so you’ll want to add elements based on the data.
Then, if you have interaction or animation, your dataset may be updated, and depending on what you intend to do you may just want to update the existing elements, create new ones, remove old ones, etc. That’s when you may want to do 2, 3 or 4.
The last (5th case) is more complicated, but don’t worry, we’ve got you covered.

Right now, we should have 0 p elements on our page (and if for some reason this is not the case, feel free to reload it).

let’s create a variable like so:

var text=["first paragraph","second paragraph","third paragraph"];

somewhat uninspired, I know, but let’s keep typing to a minimum, if you want to go all lyrical please go ahead.

We are smack in case 1: we’d like to create 3 paragraphs, we have 3 items in our dataset, but 0 elements yet.
Here’s what we’ll type:"body").selectAll("p").data(text).enter().append("p").html(String)

A-ha! we meet again, select selectAll data enter append.
After all we’ve done, select selectAll should make some sense, even though, at this stage, this selection returns 0 p elements. There are none yet.
Then we pass data as we’ve done before. Note that there are 3 items in our dataset.

Then, we use the enter() statement. What it does is that it prepares one new element for every unmatched data item. We’ll expand a bit later on the true meaning of unmatched, but for the time being, let’s focus on the difference. We have 0 elements, but 3 data items. 3 – 0 = 3, so the enter() selection will prepare 3 new elements.
What does prepare means? the elements are not created yet at this stage, but they will with the next command. Right after enter(), think of what’s created as placeholders for future element (Scott’s vocabulary), or buds that will eventually blossom into full-fledge elements (mine).
After enter(), we specify an append(“p”) command. Previously, when we had used the append method, we created one element at a time. But in this case, we are going to create as many as there are placeholders returned by enter(). So, in our case, 3.
You may legitimately wonder why we needed a select statement to begin with – after all, enter() works on the difference between selectAll and data. But when we are going to append elements, we will need to create them somewhere, to build them upon a container. This is what the first select does. Omit it, and you’ll have an error, because the system will be asked to create something without knowing where.
The final method, html, will populate our paragraphs with text. The String function, which we have already seen, simply returns the content of each item in our dataset.

We’re using select > selectAll > data > enter > append, but hopefully you will see why (and if you don’t, hang on to the end of the article, and feel free to ask questions).

But let’s recap once more. Actually, let’s see the many ways to get this wrong (or, surprisingly, right)


We’ve alluded to that: without a container to put them in, p elements can’t be created. This will result in a DOM error."body").selectAll("p").data(text).append("p").html(String)

No enter statement. After the selectAll, the selection has 0 items. This doesn’t change after the data method. As such, append creates 0 new elements, and nothing changes in the document. (but no error though)"body").data(text).selectAll("p").enter().append("p").html(String)

In many cases in d3, it’s ok to switch the order of chained methods, but that’s not true here. selectAll must come before data. We bind data to elements. The other way round would have made sense, but that’s the way it is. First selectAll, then data. Here, we get an error, because enter() can’t be fired directly from selectAll."body").selectAll("wootwoot")

This actually works. Why?
There are actually 0 elements of type “wootwoot” in our document, which may or may not surprise you. There are still 3 items in the dataset, so enter() returns space for 3 new elements. the next append subsequently creates 3 p elements, which are populated by the html method.
It usually makes more sense to use the same selector in the selectAll and the append methods, but that’s not always the case. Sometimes, you will be selecting elements of a specific class, but in an append method, you have to specify the name of an element, not any selector. So you’d go"body").selectAll(".myClass")

Now that we’ve seen a few variations on the subject, here is a really cool use of enter. Check this out:"body").selectAll("h1").data([{}]).enter().insert("h1").html("My title")

ok there are 3 things here worth mentioning. 2 are just for show, though it doesn’t hurt to know them, but the 3rd one is really neat and useful.
In data, we’ve passed: [{}]. This is an array of one object which is empty. There are two interesting things with that construct, one is that there’s only one element, the other one is that it’s an object. When you pass objects, the functions you run on them (like in the attr or style methods) can be used to add properties to them or change them. If that doesn’t make sense yet, just accept for now that it gives you more flexibility than using, say, [0].
We’ve used insert instead of append. What this means is that we’re adding things before the first child of our container, not at the end (ie after the last child). In other words, our h1 (a title) will go at the top of the body element – fitting.

But what’s really interesting is what would happen if you were to run that statement again – nothing. try it. See?
Why is that? Well, on your first go, at a point where there are no h1 elements yet, it works the standard way – you do a selectAll that returns nothing, you bind a dataset with more elements, then enter prepares space for the unmatched elements – 1 in our case – and then append creates that element. You may notice that the html part doesn’t use the data.
When you run it again, the selectAll finds one h1 element, there’s still one item in the dataset, so enter won’t find any unmatched element, so the subsequent append is ignored.

So, you can run this kind of thing in a loop safely, it will only do what it’s supposed to do on the first go, it will be ignored afterwards. Don’t be afraid to use this construct for all the unique parts of your visualization, so you won’t have to worry about creating them multiple times.

Other cases of mismatch between data items and elements

All right, so now we have 3 p elements and 3 items in our dataset.
What happens if we do this:

text2=["hello world"]


There is now one item in the data set, versus 3 p elements. Try to make a guess before you type this in. At the tutorial, the audience made a few reasonable guesses, namely: the last 2 paragraphs will be removed, only “hello world” will remain. Or: all paragraphs will be changed to “hello world”.
Either could happen if d3 was trying to be smart and guess your intent. Fortunately, d3 is no excel here and behaves consistently even if that means extra work for you. When you do that (and please try this now) what happens is that the first paragraph of text is changed and the other two are untouched.

We are in the case, change the matched elements, ignore the others.

By the way, by now you should be able to guess what would have happened if there had been an enter() right after the data. Do I hear… nothing? almost! There would be no unmatched data element, so enter() would not return anything. Besides, enter() would require an append afterwards to make anything. This is why you’ll get an error: html can’t work directly after enter(). you would need an append.

Now what if we want to remove the extra 2 elements? This is where the exit() method comes into play.
exit() is pretty much to enter() what remove() is to append(). Kind of.

let’s see how this work by example.

let’s recreate our 3 p paragraphs just in case:


Now we pass the new dataset:


– remember that only the first paragraph has changed, the other two are untouched.
Now, while all the items in the dataset are matched with elements, there are elements which are not matched with an item in the dataset: the last two. This is where exit() comes into play. exit() will select those two paragraphs, so they can be manipulated. Typically, what happens then is a remove(), but you could think of other options.


That will flag them instead of removing them.
But typically, you do:



note that even though you have already matched a one item dataset to that selection, to use exit(), you will need to use data before. selectAll(“p”).exit() won’t work. You’ll have to re-specify the data match.

So that takes care of the case when you want to remove extraneous data items.
This leaves us with only one simple case: where you have more items in your dataset than you have elements and you don’t want to create elements for the extra data items.
That’s the simplest syntax, really.

Here, for instance, we have only one paragraph left, but there are 3 items in the text variable.
so let’s do:


(no enter, no exit, no append).
The paragraph text will now come from the new dataset (from its first item to be precise), no extra paragraphs will be created, none will be deleted.

Data joins

the last case (pass a new dataset, create new elements as needed, make some elements stay and make some elements go) requires more complexity and actually I won’t cover it in detail here, instead I will explain the principle and refer you to this tutorial on object constancy by Mike Bostock.
In the general case, when you try to match your dataset to your elements, you count them and deal with the difference. So you have 5 data items and 3 elements: you can make 2 extra elements appear by using enter. With the concept of data joins, you can assign precisely each data item to one given element, so the first data item doesn’t have to be that of the first element, etc. Well, the first time it will be, and each element will receive a key, a unique identifier from the dataset. If the dataset is subsequently updated, the element will only be matched if there is an item in the dataset with the same key. Else, it will be found by an exit() method.

And that’s the general gist of it.
At Strata, we went further – we discussed interaction and transition, but that is downward trivial once you have understood – and by that, really understood, with all the implications and nuances – the selections.


Interactive map of the subway

My last project is an interactive map of the Paris subway.
Interacive subway map

Interactive subway map

Quite some time ago, I had seen Tom Carden’s Tube Travel Times and was fascinated. So while I initially had tried to do something different I ended up borrowing a lot of his design, so it wouldn’t be fair to start this post without acknowledging that.
Then again, there are a few original things in this worth writing about.
A subway system can be seen as a network of nodes and edges. Nodes would be the stations, and edges, the subway lines that connect them. Finding the shortest path between two parts of the graph is a classic problem of computer science. That path is the sum of the length of all the edges it takes to traverse the graph from the source to the destination.
However a typical subway journey, and I should know, is not entirely spent in a train. You first have to get from the street to the platform, wait for your train; in the event of a connection, you have to go from one platform to another, wait for another train; and eventually you have to return to the street level.
So in my representation of the subway system in addition to the stations themselves, I’ve added nodes for all the platforms within a station and edges between them. So if you have to go from one station A to station B, which is three stops away on a given subway line, you would really traverse 5 edges: station A to platform, stop 1, stop 2, stop 3, then platform to station B.
While the Paris subway agency, RATP, has recently publicized its open data intent, the travel times between two stations are not public. I actually had reliable and accurate measured travel times for 90% of the network, on the basis of which I could estimate the rest. The great unknown that remains is the complexity of the stations, some are very straightforward so going from the street to a train is a matter of seconds, while some are downright mazes. While I think I have travelled through almost every stations by now, I definitely not have stopped in many of them. So as part of the project I am asking users to complement my data on the stations they know well. Actually, I didn’t use much of the datasets that RATP published. The position of the stations on the canonical plan has a lot of inaccuracies. For the geographical positions I used my own measures rather than the published data (I had been playing on and off with subway data for close to 5 years now, I did use the trafic data to give an idea of the occupancy of a given section, and the official color codes as well. Because I am using my own measures, I have not added other railway systems like the RER or tramway for which I don’t have enough data.

So when a user selects a station, the rest of the network moves according to their (shortest path) distance to the selected station. So at the heart of the exercise there is a shortest path calculation from any station to any other. Including stations, platforms, connections, entrances and exits, that’s a network of 680 nodes and 1710 edges (for 299 stations, I didn’t include a couple of the very last ones). And it’s a directed network, because there are some sections which go in only one direction. At most, a network that big could have close to 500,000 edges, so it’s fairly sparse.
There are several algorithms to compute shortest paths in the code. I am using the best known one, Dijkstra, which works on path with no negative edge length. The way it is implemented here, with binary heaps, makes it so fast that it’s not noticeable. Not using that improvement, or relying on a more versatile but slower shortest path algorithm, would result in considerably greater computing times.

When a user selects one station a few things happen. First, all of the others are arranged around the selected station. Imagine the station is the fixed point in a polar coordinate system, all the other stations retain their angular coordinate but their radii now depends on the shortest path time. That idea came directly from Tom’s applet. Another thing I took from him is the addition of rings in the background which give an indication of the travelling time to those destinations. But I settled for that after much fumbling and with the conviction that there was no better solution.
What also happens is that all the edges which are not part of any shortest path disappear. So subway lines are no longer continuous.
Once a station is selected, more things happen when hovering on a second station, the actual shortest path is revealed while all others fade out, and the station marks are hidden except those of the start, destination and all connections on the way, for which names are displayed.
Finally, an interace appears that let users enter estimations for the times I don’t know so well about the stations. I haven’t added any hard figures there so I am only asking for impressions, but I believe that with some usage the time estimations can get much more accurate.

Edit 1
the vis has been quite successful, so I decided to awesomize it a bit.
Now by clicking on any colored line, it’s possible to “block” any given subway section. The shortest routes become those which do not go through that section. You can simulate what happens, for instance, when a given line is down.
The other change is that I’ve added walking distances. For each station I’ve computed the beeline distance with its 5 nearest neighbors. I’ve translated that distance into time (knowing that pedestrians can’t traverse buildings, have to mind trafic etc.).
But in many cases it is still quicker to go from one station to the next on foot than by train.


Gun murders in America

Click for interactive visualization

I have created this map of every homicide in the USA using firearms for the latest year where detailed information was available. Every, that is from all the agencies that report homicides to the FBI, which is not an obligation – this is why the map lacks Florida data.
In the interactive version you can see how murders happen through the year and explore them according to several criteria that were available in the database. While large shooting sprees receive media attention, unfortunately there are thousands of cases each year in just about every community.
Technically this is my first foray with d3.v3 and it uses two of its major new features, topoJSON for easy, lightweight maps, and hex binning to represent many individual events in one hexagon. Thanks to Mike Bostock for tutoring on that.


Events in the Game of Thrones

Events in a Game of Thrones

Click for the interactive version

I’ve created an interactive visualization on the Game of Thrones.

(A more technical post on how this was done to follow)


La nuit blanche

TL;DR: go check the model here

So I live in Paris and try to go to the museums there as much as I can, which is often.
About ten years ago (2002, I think) the city came up with what I thought was a fantastic idea: “nuit blanche”, an art event during one night in October. There were a dozen or so exhibits planned, all of which were fairly ambitious. For instance Sophie Calle would welcome visitors on a secret bedroom on the very top of the Eiffel tower.

When I saw the programme it was almost to good to be true: all those world class artists would do something extraordinary in Paris! I thought I could just casually walk all night from one installation to another along with a few other like-minded night wanderers. so it would look a bit like this:

(how to read this: circles are installations. The large black circle represents their capacity, the inner green circle the occupancy. If too many people try to enter an installation, the green circle will grow larger than the black one and will turn red, this is when queuing appears. Dots represent visitors).

Little did I know that the city expected 100000 visitors. Actually, 1 million showed up.

If the total number of visitors that were in the streets at one given moment in time is an indication of success, Nuit Blanche has been fantastic. But to anybody who’s experienced it it was really a night of waiting and walking. There just wasn’t enough capacity in the various places that would host the event, and no way for a visitor to expect the size of the queue that they would be facing (more on that later).

Every year, it’s a struggle. I’m like, the programme is so exciting, but then I know I will spend my time queuing or walking (in later editions, public transportation got better, so let’s say queuing and moving).

And each year I end up going, and I end up thinking: never again. (but hey, hey, and hey).

In the last edition there’s been something like 2.5 million visitors. There are about 100 places to visit in Paris. Many of which are indoors with a limited capacity and so enforce queues. Some are outside, so people can just walk past it and enjoy the art (sometimes, even from a great distance).

What I found annoying though it that there was no way to tell in advance which installations were going to be crowded, and which were going to be empty.
This is what led me to build this model.

Here’s how it works.
A certain number of people and attractions are randomly distributed in the city.
People will go to the nearest attraction. If it’s possible, they will go in, else they will queue until there is room.
While inside, they will enjoy it for a certain time, then they will move out and go to the nearest one they haven’t visited.

This model, which is fairly close to how things work currently, will always end up creating large queues in some places while others will run under capacity.

Now imagine a simple change: people get an idea of how long the queues are everywhere. Instead of going to the nearest exhibit, they will go to the place where they would be able to get in the quickest, taking into account the time to get there and the time to queue if applicable. Since nobody likes to wait, people will shy away from the places with long queues, evening the load, and by the same token reducing the waiting time for other visitors. Technically the waiting time at all the installations become Lyapunov functions, and provided they have enough information agents will make it optimal.

If you run this model in its full screen incarnation with dashboards, bells and whistles, you will see that this doesn’t really help increasing the number of installations that people get to visit. What happens is that the time spent waiting in queue shifts to time spent moving between places. (the overall time spent inside installations should increase, though).

In order to reduce the time spent moving, one can improve the public transportation (represented by the average speed, in the model), or putting installations close together, which has been done to some extent since the start. But do that with a limit, because that moving time is an interesting buffer, it’s not as annoying as queuing and Paris is beautiful at night, and when you’re among 2.5 million art enthousiasts it’s as safe a city can get.

To further improve the time spent visiting exhibits, one has to improve capacity or reduce attendance (two options which I trust the city of Paris will be wary about). One other option would be to improve the percentage of art installations that can be enjoyed from the outside, another parameter in the simulation.

Lastly, here are a few things the model doesn’t include, which IMO don’t make a lot of difference in the results, but to be honest:
– the installations are presented as being equally attractive. In fact, some are always more interesting than others and get even more queue. (that would make the model where queues are ignored even more unbalanced).
– the capacities of the installations are determined by random, but uniformly distributed. That isn’t the case, the largest museums are open that night and they can welcome thousands of visitors, while some other places would be full at 100.
– opening times of installations haven’t been taken into account; most will close at some point during the night.

and on that note, enjoy the model


Some simulation models

Inspired by the coursera class on model thinking, by Scott E Page, I have implemented a bunch of simulation models with d3. It’s something I have done on and off for the past few weeks, and let’s say it’s really the first 6. Those interactive versions just graze the surface of all there is to see.

So what I have is:



d3 tutorial at visWeek 2012

Jeff Heer, Scott Murray and myself have done a d3 tutorial at visWeek 2012. You probably gathered that from the title of the post.

Here is a link to all the slides and code examples that we have presented:

d3 tutorial

For the purpose of the tutorial I have compiled a d3 cheat sheet, on 4 pages it groups some of the most common d3 functions. When I was learning d3 my number one problem was figuring out which property should be set using .attr, and which required .style. And also: which svg element support which property? All of this is addressed in the cheat sheet. It’s part of the link above, but if you want it directly without downloading a 13Mb file, here it is:

d3 cheat sheet


A game of data

Inspired by the, well, inspiring set of Lost visualizations released by Santiago Ortiz – Lostalgic, I decided to publish the one visualization on all the data I had gathered on the Song of Ice and Fire series of books.

Click to see the vis

Here’s the idea behind this one. Many books set in a fantasy world come with a map where all the places mentioned in the books are situated. I end up looking up these places very often to get an idea for the distances, for instance. But the way these locations are placed on a map is one specific convention. If two places are supposed to be fairly close from each other on a map but that it is very inconvenient to travel from one to the other, it is as if they were far, and conversely, if two places are a world apart but travel between them is fast and easy, it is as if they were close.

With that in mind I am drawing a subjective map of the Game of Thrones world.
In the books, chapters are broadly comparable. Since all chapters are narrated from the point of view of one character, I link two places between which this protagonist has travelled in the course of one chapter. I also add links for travels suggested in the chapter, even if not done by the point of view character.
Places which are linked are drawn one to the other. As a result, this creates an alternate, abstract geography, where distances represent the difficulties and obstacles in travel, rather than distance in the territory.

In addition the size of the nodes depend on the number of times they are visited in the books. A node could be large even for a relatively empty place, if a lot of the action takes place there, this is true for Castle Winterfell or Castle Black. Then again, large cities which are alluded to in the story, but where not much happens in the books, such as Casterly Rock or Sunspear, will appear as tiny dots. King’s Landing, which is the settings of roughly 25% of the books, and also probably the largest city in this world, is the largest node.


Dimensionality reduction

Following my Tableau politics contest entry, here is another view I had developed but which I didn’t include in the already full dashboard.

In the main view I have tried to show how the values of candidates relate to those of the French. It’s difficult to convey that graphically when these values are determined by the answers to as many as 19 questions (and there are many many more that could be used to that effect).

Enter a technique called dimensionality reduction. The idea is to turn a dataset with many dimensions into a dataset with much fewer dimensions, as little as one, two or three. So we compute new variables, so that they capture virtually all the variability of the original dataset. In other words, if two records have different values in the original dataset, they should have different values in the transformed dataset too.

If you’re not allergic to words like eigenvalues the math is actually pretty simple. But let’s not go into that. The point is that with this technique you can represent a complex dataset as a two-dimensional dataset with very little loss.

Of course the technique doesn’t tell you what these new variables represent. Getting a feel for the data I postulate that the one on the X axis represents the toughness of a candidate (pro-security measures, no sensitivity for minorities, etc.) and the one on the Y axis is happiness with previous government. Or possibly, lover of the capitalistic doctrine.

Now you get a better feel for how close or distant the various candidates are from individual voters. You can also see which “spaces” remain empty or which are competitive. The top-right quadrant, for instance, looks tempting, but it is really nearly empty (about 200 respondents on over 1500). The right half of the matrix, that is the one which is sensitive to strength, has only one possible competitor but also few voters (~400 respondents). It makes more sense to remain an acceptable choice for the top half (750 voters) and especially the top left (550). In other words the Sarkozy mark should drift slightly towards the top left for optimal impact.