This week, FlowingData has organized the contest. A chart was submitted, and contestants were asked to improve it.
A lot of my job revolves around reviewing and correcting graphs, so I was more than happy to compete.
Here is the original graph, hosted & designed by Swivel:
The rules of the contest stated that the new graph should use the same data. But instead of re-using the dataset hosted on Swivel, I checked the source to answer some questions I had.
* includes others unidentified by nationality
The FAIR (Federation for American Immigration Reform), who’ve published this on their website, also made a chart out of this data:
So let’s take a look at the data.
At first glance, it is very aggregated: data are not available per country or per year, but per continent and per decade. However, the last “decade” is only 6 years long. Also, Oceania includes all the unidentified immigrants. Immigrants from Africa and “Oceania” are a tiny fraction of the total flow so it would be difficult to draw a conclusion from their data.
So if I want to tell a story about this dataset, I would choose the following.
The total flow of immigrants to the USA has gone through major changes.
Looking at the composition of this flow: over 90% of the immigrants were Europeans at some point, but now that ratio is down to around 15%.
Now, for a critique of these two graphs.
- It’s not very telling to keep presenting those numbers aggregated by decade.
- Especially if the last decade is not corrected. All curves seem to dip, although the underlying variables are actually growing.
- You can clearly see the point where American immigrants take over Europeans (and later, when Asians do the same). But again, those absolute figures are not very interesting. You cannot see the share of the various continents to the total.
- The Africa and Oceania curve clutter the graph and bring little information.
- The fact that Oceania includes other countries is not disclosed (not that it would change the graph tremendously).
- To do this graph, they’ve annualised the data, which is a more sensible option.
- The year labels are difficult to read.
- The last column (2001-2006) is exactly similar to the others, which comprise 10 years.
- Again, Oceania and Africa don’t bring much to the graph.
- It’s very difficult to see the evolution of one given continent, except Europe.
An idea that I had and discarded was to show cumulated values (stocks).
The left graph shows the cumulated values as part of the total. The second shows the cumulated values in absolute figures.
On both graphs, one can see the decline of the share of European immigrants. It’s more striking on the second, when the blue curve suddenly flattens around the turn of the century, while the green one (America) then the red one (Asia) start to thicken.
So we have a story there. But then, what are these numbers? what would the sum of all those migrants mean, over nearly 200 years? That’s a very different number from the stock of all migrants currently living in the USA, because over so much time, most of them are dead. And it is also a very different number from the sum of all immigrants that ever came to the USA. Starting at 1820 is quite arbitrary – and does in fact exclude most African arrivals. So based on that dataset alone, which is the rule of the contest, it’s just not possible to work with cumulated values and get meaningful results.
Then, I thought of doing a matrix chart instead of the stacked column chart done by FAIR.
Doing a matrix chart like this (several charts one top of the other, using the SAME SCALE, wich can be added vertically – and visually) is the textbook way of showing variables in such a way that one can see their evolution over time and their proportion in the total.
This kind of chart is not natively supported in Excel, so I’ve done it with processing.
(I wrote a program to make them in Excel, but will talk about that in a later post.)
It’s an interesting graph: it shows Europe immigration peak, then America taking off, followed by Asia. In the early 20th century, the Mexican revolutions caused much emigration to the US, this is the ripple in the graph.
But then, I thought it was too complex. Frankly, by glancing at it, you don’t get anything. You might learn information by examining it.
So I have done this one which I am going to submit.
And here I have my 2 stories in a much lighter graph.
The blue rectangles are the total immigrants. Various laws and events have shaped that curve, I first wanted to annotate it but I’ve decided against it. I just kept the Immigration Act which was in force between 1924 and 1965 and which largely explains the drastic drop in immigration in that time.
Without any other variable to compete with it, you can clearly follow its story.
Then, I’ve added the share of Europeans in all the immigrants. That’s another clear story: in the early 19th century, they made the bulk of the immigrants, but then, their share dropped sharply to around 15%. My guess, though, is that the shape of the first leg of this curve (from about 70% to over 90%) is due to the fact that many unidentified immigrants were really Europeans.
For the title of the left axis, I’ve chosen naturalization over number of immigrants or another denomination because most of the “immigrants” of the last few decades are really people already residing in the USA which get naturalized.
But that’s another contradiction in the dataset. In 1868, when the 14th amendment to the Constitution came into force, about 4 million former slaves became American citizens. They are not shown in the data. In 1924, the Native Americans who were not yet citizens were also granted citizenship. They too are not included int he dataset. However, since 1965, most “immigrants” are change of status migrants who were already in the USA. But then, we are to play with this dataset so that’s the best I could come up with.
Lastly, a few words about the design. I took some of the colours from a chart I really liked, by Viveka Weiley. In her chart she uses the MyriadPro font (guess she’s a Mac, but I’m a PC). I am using Frutiger which is quite similar.