Treemaps in Tableau? can be done.

Tableau can do many things natively but there are a couple of basic primitives that are not built in because they behave somewhat differently from the overall logic. And treemaps is one of them. Then again treemaps are arguably one of the best way to express complex hierarchical information, i.e. to show the proportions in a large dataset.

Fortunately, thanks to Tableau flexibility there are ways to do that. In the tutorial I'm going to cover 2 cases. First, we'll create a somewhat complex treemap off data which will not change in runtime. Then, we'll create mini-treemaps which can change dynamically.

A complex treemap

Before we go in the details the main ideas are deceptively simple.

  • we use the polygon mark,
  • we generate the treemap layout outside of tableau.
What we want (and what we'll get) is a dataset that can be directly imported in Tableau and -boom- makes a treemap in a few clicks.

To make this dataset we can use d3. The treemap I am making is directly inspired from the d3 treemap example. d3 is already computing all of the node positions so what we'll do is modify the program slightly so that it outputs them in a way that can be directly used in Tableau.

Here is the modified file which you can download and run on your computer. To work it needs to be in the same folder as a data file called data.js which will hold your hiearchical data and which has the same structure as the one linked here.

You can just copy/paste the table that's displayed below the treemap and put it in Tableau or save it in a file for good measure. Here is the output of the data file linked above.

Let's take a look at a few rows :

Id Path Top-level category Name Value Corner x y
0 flare>analytics>cluster flare AgglomerativeCluster 3938 0 89 167
0 flare>analytics>cluster flare AgglomerativeCluster 3938 1 167 167
0 flare>analytics>cluster flare AgglomerativeCluster 3938 2 167 192
0 flare>analytics>cluster flare AgglomerativeCluster 3938 3 89 192
1 flare>analytics>cluster flare CommunityStructure 3812 0 102 138
1 flare>analytics>cluster flare CommunityStructure 3812 1 167 138
1 flare>analytics>cluster flare CommunityStructure 3812 2 167 167
1 flare>analytics>cluster flare CommunityStructure 3812 3 102 167
2 flare>analytics>cluster flare HierarchicalCluster 6714 0 89 192
2 flare>analytics>cluster flare HierarchicalCluster 6714 1 167 192
2 flare>analytics>cluster flare HierarchicalCluster 6714 2 167 236
2 flare>analytics>cluster flare HierarchicalCluster 6714 3 89 236
I'm creating 4 lines per "leaf" node. So in this example which has 220 nodes, that amounts to 880 lines. Why 4? Because to draw a rectangle in Tableau you really need to define 4 corners. This is why there is a column "Corner" which is worth 0,1,2 and 3. This, we will use to tell Tableau to read our corners in bottom left, bottom right, top right, top left order which produces a nice convex rectangle and not a concave hourglass shape.

Now off to Tableau with this data. 

Now it's just a matter of doing like this screen. Unsurprisingly the columns and rows are going to be determined by x and y. You want a polygon mark, and you absolutely must use your corner measure in the path. For color, you'll have a choice, you can use the top-level category column (as I have) or the full path which will divide your treemap in finer parts. Finally, level of detail: you must use the Id and not the name in case several of your nodes have the same name. It's quite important at this point to uncheck aggregate measures in Analysis. You do NOT want aggregate measures (though it's quite pretty). To be able to use the name, you must first make a measure out of it. And finally, you'll want to update your infotip slightly.

All of this you can see if you download the tableau file.

And voilà! Treemaps for your Tableau workbooks.

Caveat: the polygon mark doesn't support labels so you can't write on top of the small rectangles what they are but that's not the point of the treemap, which is instead to give an immediate first impression of the relative size of large groups of your data, then allow you to explore them, to that end the infotip function works just fine.

Simpler but dynamic treemaps

This is fine and dandy if your data doesn't change but it won't scale if you need to make many treemaps based on selections. What to do? You could use pie charts, but let's not.

To that end I've tried to emulate the Congress speaks visualization by Periscopic. I really like it. When you've selected representatives at the end of the process you are taken to a screen which shows the following mini-treemap:

There are just 5 rectangles. But they will change for any representative that we choose. Can this be done with Tableau? Obviously.

Now the Tableau part of this is slightly trickier than above. The idea is that we are going to use formulas to generate the coordinates of all 20 corners of the rectangles, in other words we are going to let Tableau calculate the layout. We can do it because the way that rectangles are going to be arranged is quite predictible. There is one on the left, then 4 stacked on the right one on top of the other. Again, we could compute all of these coordinates outside of Tableau but that would be a hassle and so for a large number of cases it becomes easier and more reliable to do this inside of Tableau.


For this I have used completely random data. I have generated 20 names, and for each I have generated 5 values in a likely range, number of possible votes, number of votes the representative actually voted, number of times they voted yes, number of times they voted yes with their party, and the same for no. (or nay, technically).

At the end of the day I need 20 records per representative (5 rectangles of 4 corners each), so I can either replicate the line 20 times, or use linked tables. The idea is to get something like this for all of the representatives that can somehow get into Tableau.

Id representative corner rectangle possible votes total votes voted yes yes with party voted no no with party
16 Nelson Thiede 0 no against party 888 784 320 274 464 373
16 Nelson Thiede 1 no against party 888 784 320 274 464 373
16 Nelson Thiede 2 no against party 888 784 320 274 464 373
16 Nelson Thiede 3 no against party 888 784 320 274 464 373
16 Nelson Thiede 0 no vote 888 784 320 274 464 373
16 Nelson Thiede 1 no vote 888 784 320 274 464 373
16 Nelson Thiede 2 no vote 888 784 320 274 464 373
16 Nelson Thiede 3 no vote 888 784 320 274 464 373
16 Nelson Thiede 0 no with party 888 784 320 274 464 373
16 Nelson Thiede 1 no with party 888 784 320 274 464 373
16 Nelson Thiede 2 no with party 888 784 320 274 464 373
16 Nelson Thiede 3 no with party 888 784 320 274 464 373
16 Nelson Thiede 0 yes against party 888 784 320 274 464 373
16 Nelson Thiede 1 yes against party 888 784 320 274 464 373
16 Nelson Thiede 2 yes against party 888 784 320 274 464 373
16 Nelson Thiede 3 yes against party 888 784 320 274 464 373
16 Nelson Thiede 0 yes with party 888 784 320 274 464 373
16 Nelson Thiede 1 yes with party 888 784 320 274 464 373
16 Nelson Thiede 2 yes with party 888 784 320 274 464 373
16 Nelson Thiede 3 yes with party 888 784 320 274 464 373

In Tableau

In Tableau we are going to use the same idea as above: polygon mark, disable aggregate measures, and use x and y for columns and rows.

Only, x and y are going to be much more complex. Sorry about that. Well, not that complex but definitely longer.

Here's x:

case [rectangle]
when "no vote" then
     case [corner]
       when 0 then 0
       when 1 then (([possible votes]-[total votes])/[possible votes])
       when 2 then (([possible votes]-[total votes])/[possible votes])
       when 3 then 0
     case [corner]
       when 0 then (([possible votes]-[total votes])/[possible votes])
       when 1 then 1
       when 2 then 1
       when 3 then (([possible votes]-[total votes])/[possible votes])

Depending on the rectangle we are trying to draw we can find ourselves in one of two cases (hence the use of case).

If we draw "no vote" then we are on the left of our vis. The left corners are on the leftmost side of the vis (hence value: 0) and the right corners correspond to the proportion of possible votes which where not cast by this representative, which we can compute as ([possible votes]-[total votes])/[possible votes].

In the other case, we are drawing one of the 4 stacked rectangles, so the right corners are on the rightmost side of the vis (hence value: 1) and the left corners correspond to the value we just computed.

And now, y:

case [rectangle]
when "no vote" then
case [corner]
when 0 then 0
when 1 then 0
when 2 then 1
when 3 then 1
when "yes against party" then
case [corner]
when 0 then 0
when 1 then 0
when 2 then (([voted yes]-[yes with party])/[total votes])
when 3 then (([voted yes]-[yes with party])/[total votes])
when "yes with party" then
case [corner]
when 0 then (([voted yes]-[yes with party])/[total votes])
when 1 then (([voted yes]-[yes with party])/[total votes])
when 2 then ((2*[voted yes]-[yes with party])/[total votes])
when 3 then ((2*[voted yes]-[yes with party])/[total votes])
when "no with party" then
case [corner]
when 0 then ((2*[voted yes]-[yes with party])/[total votes])
when 1 then ((2*[voted yes]-[yes with party])/[total votes])
when 2 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when 3 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when "no against party" then
case [corner]
when 0 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when 1 then ((2*[voted yes]+[no with party]-[yes with party])/[total votes])
when 2 then 1
when 3 then 1
y is longer but this is the same general idea. For the "no vote" rectangle, the corners are either to the top or bottom of the vis. But for the other, we can predict where the rectangle will start and when it will end, as a proportion of the [possible votes] field. The values we want are going to be correspond to these proportions, plus that of all the rectangles below so we can achieve that stacked effect (as opposed to have all rectangles superimposed at the bottom of the vis). This is why I am entering the rectangles in stacking order. Each time, the bottom corners get the value of the top corners of the previous rectangle.

Here is the final result:


La france qui exporte

This week, I was made aware of a new set of maps by French ministry of Foreign Trade, called cartographie de la France qui exporte (map of France exports) (link). Since I’m interested in the topic and that I know that French public services have killer cartographers I was eager to see what was so exciting about the first set of online maps on French exports.

I was a little underwhelmed to be honest. Online here meant static pdf files, although this is a dataset that just begs to be explored and manipulated.
On top of that, those where basic choropleth maps with markers such as this one here:

Now this map has two problems. First, it’s a choropleth with a discrete scale, but the values of adjacent areas can vary a lot. So, if you look at this portion of the map, what can be deduced on the values? not much I’m afraid.

Second, it’s difficult to compare the marks on the map. Which region has the biggest? the smallest? how do two specific regions compare? with this representation, this type of question is even more difficult to answer than with a table.

Also these charts answer one partial question. So this one, here, shows which region exports most food products. But to where? and how about the imports and balance? now if one given view was the most relevant and could illustrate some important finding, it can be highlighted but here the website gives us collections of many of such maps. As a citizen I’m leaving no more informed than I was.

Not being the one to criticize without proposing an alternative, I whipped out an interactive exploratory tool of France trade flows.

(The interactive vis is too wide to conveniently fit in a blog, but clicking on the image will open it in a new tab).

I don’t have access to the same dataset so I can’t show a strict equivalent. My data comes from COMTRADE, the UN database of trade flows, and shows all exports and imports to France in 2009. They are not broken down by region or by type of company, but I got the flows by partner country and product category.
The idea is that one can select something on one treemap to update the other. Also, it’s possible to alternate between a categorical view (where all groups of products and continents look neatly separated) and a view of the balance, which quickly shows which products or which countries get the bulk of French trade.

(technical explanation follows for those interested in the code proper)
Now following last week’s tutorial, of course it had to be done in protovis.
Actually it illustrates some interesting principles of working with arrays, trees, maps etc.

First, I want to do as much data manipulation as possible in protovis as opposed to manually. So, my source data for the treemap is stored as an array of associative arrays, which is probably the preferred form in protovis. This is no different than, for instance, Protovis’s barley example.
Now how do you get something of the shape -

var data=

to something shaped like a tree like:

var tree=
{0: {
       02: 101421,
       03: 0,
       04: 9272355,
       05: 0,

The solution is to use the rollup method.

First, if you look at my individual records, they are of the shape:


where com is commodity code, cat is product category, cou is country, con is continent, imp is imports and exp is exports.

For any country + commodity combination, there will be only one record.
What I’m interested to get in the tree I will use for the treemap are exports. That is what will determine the size of the leaves of the tree.

first I am going to nest my array:

var byProduct=pv.nest(data) 
	.key(function(d) {return d.cou;})
	.key(function(d) {return;})
	.key(function(d) {return;})

once I have written this I could follow up with a .entries() statement which would return me a nested array, or with rollup() which could give me the tree I need.
Since, again, there is only one record for a combination of country (cou) and commodity (com), I can use any aggregation I want.

I define this function:

function rollup(data) {return pv.sum(data, function(d) {return d.exp;});} 

It returns the sum of all the export values. Since there is just one record, what it does is that it gives me the one export value I need in a tree form.

So the complete statement is:

function rollup(data) {return pv.sum(data, function(d) {return d.exp;});} 

var byProduct=pv.nest(data) 
	.key(function(d) {return d.cou;})
	.key(function(d) {return;})
	.key(function(d) {return;})

This creates a tree, nested by country, then by product category, then by commodity. The corresponding values are the exports.

now creating my treemap data dynamically saves me a ton of hassle compared to trying to come up with a data file of the right shape and size, not mentioning the calculation errors which creep in each manual manipulation !

Another point of interestingness: how I computed the data to create the bar charts on the side.
For the left treemap (and left bar chart) the user has selected a country. (and for the right ones, it’s a given product, but let’s focus on the left side, the reasoning is the same for the other side anyway).

so first I am going to take the tree we made earlier and just look at the selected country. We can do that with a statement like:


(so now we have a tree with just 2 levels, product category and commodity).

Now I can’t run pv.nest and all that on a tree. I need an array! so I have to use flatten to turn that section of the tree into a bona fide array which I will be able to further process.

catsByCountry = pv.flatten(myProductTree).key("cat").key("com").key("exp").array(); 

Here, note that the arguments: “cat”, “com”, “exp” are completely arbitrary. But, since I’m recreating the array almost as it originally was, I might as well use the same names for the keys.

So now, I have like a little subset of my original dataset, only the records of the selected country.
I can now proceed to sum exports by categories using a standard rollup method, just as we’ve seen here.

catsByCountry = pv.nest(catsByCountry).key(function(d) {return;}).rollup(rollup);

Conveniently, the rollup function that I defined earlier sums the records! and here I do need summing, not any aggregation.

The problem is that the rollup() method creates an associative array, and if I need to use that in a bar chart I need a proper array! so, I use pv.values() which does just that, it creates an array out of the values of an associative array.

catsByCountry = pv.values(catsByCountry);

Now the values can vary a lot in absolute terms depending on the selected country. This is why in the actual bar chart, I use pv.normalize() to have only values from 0 to 1 which are much more convenient to plot.

	.data(function() pv.normalize(catsByCountry))

one last thing, to save space in the data set (which means: bandwidth + loading time) I’ve used short keys in my data file, and I’ve used codes for countries, commodities and the like.

so I have this:


instead of

    cat:"Food and live animals",

to get the names of the countries, categories etc. I have in my data file variables that associate, say, a country code to its long name, its continent etc.
so I can have to write things like:

countries["4"].name+" ("+continents[countries["4"].continent]+")"

instead of something simpler, but it’s a good trade-off because writing those names in full in the original dataset inflates the size of the file to megabytes (there are approx 10.000 records).

Working with data in protovis: part 5 of 5

previous: reshaping complex arrays (4/5)

Working with layouts

In this final part, we’re going to look at how we can shape our data to use the protovis built-in layouts such as stacked areas, treemaps or force-directed graphs.
This is not a tutorial on how to use layouts stricto sensu, and I advise anyone interested to first look at the protovis documentation to see what can be done with this and to understand the underlying concepts.

But if there is one thing to know about layouts, it’s that they allow you to create non-trivial visualizations in even less code than regular protovis, provided that you pass them data in a form they can use, and this is precisely where we come in.

Three great categories of layouts

Currently, there are no fewer than 13 types of layouts in Protovis. Fortunately, there are examples for all of them in the gallery.
There are layouts for:

In addition, there are layouts like pv.Layout.Bullet which require data to have a certain specific shape but the example from the gallery is very explicit. (et tu, Horizon layout).

Arrays of data

In order to work with this kind of layout, the simplest thing is to put your data in a 2-dimensional array:

var data=[

For the grid layout, this gives you an array of cells divided in columns (number of elements in each line) and rows (number of lines).
The idea of the grid layout is that your cells are automatically positioned and sized, so afaik the only thing you can do is add a mark such as a pv.Bar which would fill them completely, but which you could still style with fillStyle or strokeStyle. You can’t really access the underlying data with functions but you can use methods that rely on default values, like adding labels.

For instance, you can use it to generate a QR code:

var qr=[
].map(function(i) i.split(""));

var vis = new pv.Panel()
 	    .fillStyle(pv.colors("#fff", "#000"))
(BTW, this is the QR code to this page)

On line 29, I’m using a map function to turn this array of strings, which is easier and shorter to type, into a bona fide 2-dimensional array.

That’s all there is to grids, of all the layouts they are among the easiest to reproduce with regular protovis.

Now, stacks.
The easiest way to use them is to pass them 2-dimensional arrays. Now it doesn’t have to be arrays of numbers, it can be arrays of associative arrays in case you need to do something exotic. But for the following examples let’s just assume you don’t. Here is how you’d do a stacked area, stacked columns and stacked bars respectively:

var data=[
var vis=new pv.Panel().width(200).height(200);
    .x(function() 50*this.index)
    .y(function(d) d/20)

all you need is to feed the layers, x, y properties of your stack, then say what you want to add to your layers.
Now, columns:

    .x(function() 50*this.index)
    .y(function(d) d/20)

and finally, bars:

    .x(function() 50*this.index)
    .y(function(d) d/20)

For bars, there is a little trick here. I specify that the layer orientation is horizontal (“left”) and I change the height instead of the width of the added pv.Bar.
And that all there is. You can create various streamgraphs by playing with the order and offset properties of the stack but this doesn’t change anything to the data structure, so we’re done here.

Representing networks

Protovis provides 3 cool layouts to easily exhibit relationships between nodes: arc diagrams, matrix diagrams and force-directed layouts.
The good news is that the shape of the data required by those three layouts is identical.

They require an array that correponds to the nodes. This can be as simple as a pv.range(), or as sophisticated as an array of associative arrays if you want to style your network graph according to several potential attributes of the node.

And they also require an array for the links. This array has a more rigid form, it must be an array of associative arrays of the shape: {source: #, target: #, value: #} where the values for source and target correspond to the position of a node in the node array, and value indicates the strength of the link.

So let’s do a simple one.

var nodes=pv.range(6); // why more complex, right?
var links=[
{source:0, target:1, value:2},
{source:1, target:2, value:1},
{source:1, target:3, value:1},
{source:2, target:4, value:4},
{source:3, target:5, value:1},
{source:4, target:5, value:1},
{source:1, target:5, value:3}
var vis = new pv.Panel()
var arc = vis.add(pv.Layout.Arc)

Here, by varying the strength of the link, the thickness of the arcs changes accordingly. The nodes are left unstyled, had we passed a more complicated dataset to the nodes array, we could have changed their properties (fillStyle, size, strokeStyle, labels etc.) with appropriate accessor functions.

With little modifications we can create a force-directed layout and a matrix diagram.

var force = vis.add(pv.Layout.Force)

		.text(function() this.index);


Here I labelled the nodes so one can tell which is which. This is done by adding a pv.Label to the pv.Dot that’s attached to the node, just like with any other mark.

var Matrix = vis.add(pv.Layout.Matrix)
    .fillStyle(function(d) pv.Scale.linear(0, 2, 4)
      .range('#eee', 'yellow', 'green')(d.linkValue))

Matrix.label.add(pv.Label).text(function() Math.floor(this.index/2))


For the matrix things are slightly more complex than for the previous 2. Here I opted for a directed matrix, as opposed to a bidirectional one: this means that each link is shown once, to its source from its target, and not twice (ie from its target back to its source) which is the default.
I chose to color the bar attached to my links (which are cells of the matrix) according to the strength of my links. Again, if my nodes field was more qualified, I could have used these properties.

Finally, we’ve added labels to the custom property Matrix.label. Only, the labels are numbered from 0 to 11 so to get numbers from 0 to 5 for both rows and columns I used Math.floor(this.index/2) (integer part of half of this number).

Hierarchized data

Like for networks, the shape of the data we can feed to treemaps, icicles and other hierarchical representation doesn’t change. So once you have your data in order, you can easily switch representations.

Essentially, you will be passing a tree of the form:

var myTree={
   rootnode: {
      node: {
         node: {
            leaf: value,
            leaf: value,
            leaf: value

The protovis examples use the hierarchy of flare source code as an example, which really shows what can be done with a treemap and other tree represenations.

For our purpose we are going for a simpler tree, inspired by the work of Periscopic on which Kim Rees showed at Strata.
Kim presentation featured tiny treemaps that showed the voting record for a congressperson, and whether they had voted for or against their party.

So let’s play with the voting record of an hypothetic congressperson:

var hasVoted={
	didnt: 100,
	voted: {
	    yes: {
	        yesWithParty: 241,
	        yesAgainstParty: 23
	    no: {
	        noWithParty: 73,
	        noAgainstParty: 5

Once you have your tree, you will need to pass it to your layout using pv.dom, like this:


Based on that let’s do two hierarchical representations.
Let’s start with a tree:

var vis = new pv.Panel()
var tree = vis.add(pv.Layout.Tree)
    .size(function(n) n.nodeValue)
	.anchor("center").add(pv.Label).textAlign("center").text(function(n) n.nodeName)

And here is the result:

There are many styling possibilities obviously left unexplored in this simple example (you can control properties of the, tree.node, tree.labels which we didn’t use here, etc.), but this won’t change much as far as data are concerned.

Now let’s try a treemap with the same dataset.

var vis = new pv.Panel()

var tree = vis.add(pv.Layout.Treemap)

	.fillStyle(function(d) d.nodeName=="didnt"?"darkgrey":d.nodeName.slice(0,3)=="yes"?

		   {label:"yes with party", 	color: "powderblue"},
		   {label:"yes against party", 	color: "steelblue"},
		   {label:"no with party", 		color: "lightsalmon"},
		   {label:"no against party", 	color: "salmon"},
		   {label:"didn't vote", 		color: "darkgrey"}
	.top(function() 50+20*this.index)
	.fillStyle(function(d) d.color)
	.anchor("right").add(pv.Label).textAlign("left").text(function(d) d.label)


and what took the longest part of the code was making the legend.

Here is the outcome: