A game of data

Inspired by the, well, inspiring set of Lost visualizations released by Santiago Ortiz – Lostalgic, I decided to publish the one visualization on all the data I had gathered on the Song of Ice and Fire series of books.

Click to see the vis

Here’s the idea behind this one. Many books set in a fantasy world come with a map where all the places mentioned in the books are situated. I end up looking up these places very often to get an idea for the distances, for instance. But the way these locations are placed on a map is one specific convention. If two places are supposed to be fairly close from each other on a map but that it is very inconvenient to travel from one to the other, it is as if they were far, and conversely, if two places are a world apart but travel between them is fast and easy, it is as if they were close.

With that in mind I am drawing a subjective map of the Game of Thrones world.
In the books, chapters are broadly comparable. Since all chapters are narrated from the point of view of one character, I link two places between which this protagonist has travelled in the course of one chapter. I also add links for travels suggested in the chapter, even if not done by the point of view character.
Places which are linked are drawn one to the other. As a result, this creates an alternate, abstract geography, where distances represent the difficulties and obstacles in travel, rather than distance in the territory.

In addition the size of the nodes depend on the number of times they are visited in the books. A node could be large even for a relatively empty place, if a lot of the action takes place there, this is true for Castle Winterfell or Castle Black. Then again, large cities which are alluded to in the story, but where not much happens in the books, such as Casterly Rock or Sunspear, will appear as tiny dots. King’s Landing, which is the settings of roughly 25% of the books, and also probably the largest city in this world, is the largest node.


Re: flowing data contest, code for my entry

Here is the processing applet with source code for my entry. The code is based on a charter program I’ve been working on and off for a while, which I’ll publish once it’s more polished.
Select the applet then press a key to alternate between the 2 representations. The text file is the data, and the two png files are the images of the results.

Inserting processing applets in wordpress is not obvious. Here’s a discussion post for anyone interested.

source: fdcontest.pde

media files: params.txt image.png image_matrix2.png


Flowing Data’s chart contest

This week, FlowingData has organized the contest. A chart was submitted, and contestants were asked to improve it. 

A lot of my job revolves around reviewing and correcting graphs, so I was more than happy to compete. 

Here is the original graph, hosted & designed by Swivel:

Immigration to the U.S. by decade

The rules of the contest stated that the new graph should use the same data. But instead of re-using the dataset hosted on Swivel, I checked the source to answer some questions I had.

Here goes: 









1820-30 151,824 106,487 36 11,951 17 33,333
1831 40 599,125 495,681 53 33,424 54 69,911
1841-50 1,713,251 1,597,442 141 62,469 55 53,144
1851-60 2,598,214 2,452,577 41,538 74,720 210 29,169
1861-70 2,314,824 2,065,141 64,759 166,607 312 18,005
1871-80 2,812,191 2,271,925 124,160 404,044 358 11,704
1881-90 5,246,613 4,735,484 69,942 426,967 857 13,363
1891-00 3,687,564 3,555,352 74,862 38,972 350 18,028
1901-10 8,795,386 8,056,040 323,543 361,888 7,368 46,547
1911-20 5,735,811 4,321,887 247,236 1,143,671 8,443 14,574
1921-30 4,107,209 2,463,194 112,059 1,516,716 6,286 8,954
1931-40 528,431 347,566 16,595 160,037 1,750 2,483
1941-50 1,035,039 621,147 37,028 354,804 7,367 14,693
1951-60 2,515,479 1,325,727 153,249 996,944 14,092 25,467
1961-70 3,321,677 1,123,492 427,642 1,716,374 28,954 25,215
1971-80 4,493,314 800,368 1,588,178 1,982,735 80,779 41,254
1981-90 7,338,062 761,550 2,738,157 3,615,225 176,893 46,237
1991-00 9,095,417 1,359,737 2,795,672 4,486,806 354,939 98,263
2001-06  7,009,322 1,073,726  2,265,696 3,037,122 446,792 185,986

187 Years        







* includes others unidentified by nationality


The FAIR (Federation for American Immigration Reform), who’ve published this on their website, also made a chart out of this data: 


So let’s take a look at the data. 

At first glance, it is very aggregated: data are not available per country or per year, but per continent and per decade. However, the last “decade” is only 6 years long. Also, Oceania includes all the unidentified immigrants. Immigrants from Africa and “Oceania” are a tiny fraction of the total flow so it would be difficult to draw a conclusion from their data.

So if I want to tell a story about this dataset, I would choose the following. 

The total flow of immigrants to the USA has gone through major changes. 

Looking at the composition of this flow: over 90% of the immigrants were Europeans at some point, but now that ratio is down to around 15%. 

Now, for a critique of these two graphs. 



  1. It’s not very telling to keep presenting those numbers aggregated by decade. 
  2. Especially if the last decade is not corrected. All curves seem to dip, although the underlying variables are actually growing.
  3. You can clearly see the point where American immigrants take over Europeans (and later, when Asians do the same). But again, those absolute figures are not very interesting. You cannot see the share of the various continents to the total. 
  4. The Africa and Oceania curve clutter the graph and bring little information. 
  5. The fact that Oceania includes other countries is not disclosed (not that it would change the graph tremendously). 
  1. To do this graph, they’ve annualised the data, which is a more sensible option. 
  2. The year labels are difficult to read. 
  3. The last column (2001-2006) is exactly similar to the others, which comprise 10 years. 
  4. Again, Oceania and Africa don’t bring much to the graph. 
  5. It’s very difficult to see the evolution of one given continent, except Europe. 
An idea that I had and discarded was to show cumulated values (stocks). 
The left graph shows the cumulated values as part of the total. The second shows the cumulated values in absolute figures.
On both graphs, one can see the decline of the share of European immigrants. It’s more striking on the second, when the blue curve suddenly flattens around the turn of the century, while the green one (America) then the red one (Asia) start to thicken. 
So we have a story there. But then, what are these numbers? what would the sum of all those migrants mean, over nearly 200 years? That’s a very different number from the stock of all migrants currently living in the USA, because over so much time, most of them are dead. And it is also a very different number from the sum of all immigrants that ever came to the USA. Starting at 1820 is quite arbitrary – and does in fact exclude most African arrivals. So based on that dataset alone, which is the rule of the contest, it’s just not possible to work with cumulated values and get meaningful results.  
Then, I thought of doing a matrix chart instead of the stacked column chart done by FAIR. 


Doing a matrix chart like this (several charts one top of the other, using the SAME SCALE, wich can be added vertically – and visually) is the textbook way of showing variables in such a way that one can see their evolution over time and their proportion in the total. 

This kind of chart is not natively supported in Excel, so I’ve done it with processing

(I wrote a program to make them in Excel, but will talk about that in a later post.)

It’s an interesting graph: it shows Europe immigration peak, then America taking off, followed by Asia. In the early 20th century, the Mexican revolutions caused much emigration to the US, this is the ripple in the graph. 

But then, I thought it was too complex. Frankly, by glancing at it, you don’t get anything. You might learn information by examining it. 

So I have done this one which I am going to submit. 

And here I have my 2 stories in a much lighter graph. 

The blue rectangles are the total immigrants. Various laws and events have shaped that curve, I first wanted to annotate it but I’ve decided against it. I just kept the Immigration Act which was in force between 1924 and 1965 and which largely explains the drastic drop in immigration in that time. 

Without any other variable to compete with it, you can clearly follow its story. 

Then, I’ve added the share of Europeans in all the immigrants. That’s another clear story: in the early 19th century, they made the bulk of the immigrants, but then, their share dropped sharply to around 15%. My guess, though, is that the shape of the first leg of this curve (from about 70% to over 90%) is due to the fact that many unidentified immigrants were really Europeans. 

For the title of the left axis, I’ve chosen naturalization over number of immigrants or another denomination because most of the “immigrants” of the last few decades are really people already residing in the USA which get naturalized.

But that’s another contradiction in the dataset. In 1868, when the 14th amendment to the Constitution came into force, about 4 million former slaves became American citizens. They are not shown in the data. In 1924, the Native Americans who were not yet citizens were also granted citizenship. They too are not included int he dataset. However, since 1965, most “immigrants” are change of status migrants who were already in the USA. But then, we are to play with this dataset so that’s the best I could come up with.

Lastly, a few words about the design. I took some of the colours from a chart I really liked, by Viveka Weiley. In her chart she uses the MyriadPro font (guess she’s a Mac, but I’m a PC). I am using Frutiger which is quite similar.