Working with data in protovis – part 3 of 5

Previous : Multi-dimensional arrays, inheritance and hierarchy

Short interlude: what can be done with arrays in javascript?

Now that we have a grasp on how arrays work and how they can be used in protovis, let’s take a step back and look at some very useful methods for working with arrays in standard javascript.

Sorting arrays

Using a method called, well, sort(), javascript arrays can be reordered in a variety of ways.

var a = [1, 1.2, 1.7, 1.5, .7, .3];
a.sort();

Without any argument, this will sort the array in ascending order: [0.3, 0.7, 1, 1.2, 1.5, 1.7].

However, we can reorder the array differently with a comparison function, such as:

a.sort(function(a,b) b-a)

Note that this is protovis notation. In traditional javascript, you’d have written a.sort(function(a,b) {return b-a;});

This would sort a in descending order ([1.7, 1.5, 1.2, 1, 0.7, 0.3]). Here is how it works.

The comparison function takes 2 elements in the array. If the first element (corresponding to the first argument, a) should appear first, the function must return a negative number. If the result is positive, the second element is appearing first. With our example, b-a is negative when the first element is greater than the second. So, this function sorts an array in descending order.

Sort also works with arrays of associative arrays. For instance:

var data = [
  {key:"a", value:1},
  {key:"b", value:1.2},
  {key:"c", value:1.7}, 
  {key:"d", value:1.5},
  {key:"e", value:.7},
  {key:"f", value:.3}
];

data.sort(function(a,b) b.value-a.value);

Here, we have to specify which criterion is used to sort the array. In this example, we sort it in descending order by value. But we could have sorted it in ascending order by key, for instance.

The method reverse() turns an array upside down.

var a = [1, 1.2, 1.7, 1.5, .7, .3];

a.reverse(); // gives [0.3, 0.7, 1.5, 1.7, 1.2, 1]

So, a.sort().reverse() will also sort that array in decreasing order.

Extracting sub-arrays

The method slice() will extract a sub-array out of an array, in other words, it will retrieve certain elements. It works with one or two arguments. If you only give one argument, it will give you the last elements from the index you specified.

For instance,

[1, 1.2, 1.7, 1.5, .7, .3].slice(3) // it's [1.5, 0.7, 0.3]

The index 3 corresponds to the 4th element of the array (0,1,2,3) so slice(3) will return the 4th, 5th and 6th elements.

With 2 arguments, you get a sub-array that starts at the first index but ends just before the 2nd one. For instance,

[1, 1.2, 1.7, 1.5, .7, .3].slice(0,3) // gives [1, 1.2, 1.7]

It is also possible to use negative arguments. Instead of counting from the beginning of the array, it means counting from the end.

[1, 1.2, 1.7, 1.5, .7, .3].slice(-1) // result: [0.3]

Turning a string into an array

This can be done via the split() method.

split() normally works with a separator, for instance “07/04/1975″.split(“/”) is [“07″, “04”, “1975”], but if an empty string is used, then each character becomes an element of the new array. For instance,

"ABCDEFGHIJ".split("") // result: ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"] 

This can be very useful to generate data on the go.

And what methods does protovis has for working with arrays and data?

Many.

Some simple, some very elaborate. In addition to visualization functions protovis has impressive data processing capabilities.

The workhorse: pv.range()

pv.range() is a method that creates an array of numbers.

pv.range(10), for instance, is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], or the 10 first integers starting from 0.
(this is the method I referred to in the example of the previous part).

By using the other arguments, it is possible to generate arrays of numbers that go from one specific index to another, or to change the step. For instance:

pv.range(5,10) // is [5, 6, 7, 8, 9]
pv.range(5,21,5) // is [5, 10, 15, 20]

While this may not look incredible at first sight, pv.range() really shines when associated with other functions or methods such as map().

map()  is an array method that creates a new array by applying a function to each of the element of the array it’s run against. For instance,

var data = pv.range(100).map(Math.random) // creates an array of 100 random numbers.

var data = pv.range(1000).map(function(a) Math.cos(a/1000)) // stores into data 1000 values 
// of cos(x) for all the numbers between 0 and 0.999 by increment of 0.001.

In other words, pv.range() and map() can quickly create very interesting datasets for protovis to visualize.

Simple statistical functions that make life easier

pv.min(), pv.max()

Protovis has functions that extract the maximum or the minimum value of an array.

pv.min([1, 1.2, 1.7, 1.5, .7, .3]) will return the value of the smallest element of the array, in that case 0.3.

Likewise, pv.max returns the value of the greatest element of the array.

Both of these can be used with an accessor function to help retrieve what will be compared.

For instance:

var data = [
  {key:"a", value:1},
  {key:"b",value:1.2},
  {key:"c", value:1.7}, 
  {key:"d", value:1.5},
  {key:"e", value:.7},
  {key:"f", value:.3}
];

pv.max(data, function(a) a.value); // should return 1.7

pv.sum(), pv.mean(), pv.median()

Likewise, protovis has aptly-named methods that return the sum, the mean and the median of the elements of an array. Note that it’s pv.mean() and not pv.average(), though.

pv.sum([1, 1.2, 1.7, 1.5, .7, .3]) // gives  6.4.
pv.mean([1, 1.2, 1.7, 1.5, .7, .3]) // gives 1.0666666666666667
pv.median([1, 1.2, 1.7, 1.5, .7, .3]) // gives 1.1.

Similarly to pv.min() you can use an optional accessor function.

pv.median(data, function(a) a.value);

pv.normalize()

pv.normalize() is a handy method that divides all elements in an array by a factor, so that the sum of all these elements is 1.

pv.normalize([1, 1.2, 1.7, 1.5, .7, .3])
// [0.15625, 0.18749999999999997, 0.265625, 0.234375, 0.10937499999999999, 0.04687499999999999]

pv.sum(pv.normalize([1, 1.2, 1.7, 1.5, .7, .3])) // result: 1.

Combining arrays

pv.blend()

pv.blend() simply turns an array of arrays into a simpler array where all elements follow each other.

pv.blend([["a", "b", "c"], [1,2,3]]) // gives  ["a", "b", "c", 1, 2, 3]

One benefit is that it is possible to run methods on this new array that wouldn’t work on an array of arrays.

For instance:

pv.max([[1,2],[3,4],[5,6]]) // result : NaN
// protovis cannot guess how to compare [1,2], [3,4] or [5,6] without any further instruction.
pv.max(pv.blend([[1,2],[3,4],[5,6]])) // result: 6.
pv.max([[1,2],[3,4],[5,6]], function(a) pv.max(a)) // also returns 6.

pv.cross()

Given two arrays a and b, pv.cross(a,b) returns an array of all possible pairs of elements.

pv.cross(["a", "b", "c"], [1,2,3]) // returns 
//[[“a”,1],[“a”,2],[“a”,3],[“b”,1],[“b”,2],[“b”,3],[“c”,1],[“c”,2],[“c”,3]]

pv.transpose()

pv.transpose() flips an array of arrays.

In our previous example –

pv.cross(["a", "b", "c"], [1,2,3])

The result was a 9×2 array. [[“a”,1],[“a”,2],[“a”,3],[“b”,1],[“b”,2],[“b”,3],[“c”,1],[“c”,2],[“c”,3]]

If we apply pv.transpose, we get a 2×9 array:

pv.transpose(pv.cross(["a", "b", "c"], [1,2,3])); // result:
//[["a","a","a","b","b","b","c","c","c"]
//[1,2,3,1,2,3,1,2,3]]

Putting it all together

This short example will show how one can work from an unprepared dataset.
For the purpose of this example, I’m retrieving GDP per capita data of OECD countries. I have a table of 34 countries on 10 years from 2000 to 2009. The countries are in alphabetical order.
I would like to do a bar chart showing the top 12 countries ranked by their GDP per capita. How complex can this be? With the array methods, not so difficult.

var data=[
["Country",2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009],
["Australia",28045.6566279103, 29241.289462815, 30444.7839453338, 32079.5610505727, 33494.6654789274, 35091.7414600449, 37098.3968669994, 39002.0335375021, 39148.4759391917, 39660.1953738791],
["Austria",28769.5911274323, 28799.1968990511, 30231.0955219759, 31080.6122212092, 32598.0698959222, 33409.3453751649, 36268.9604225683, 37801.7320979073, 39848.9481928628, 38822.6960085478],
["Belgium",27624.4454637489, 28488.7963954409, 30013.7570016445, 30241.724380444, 31151.6315179167, 32140.9239149283, 34159.4723388042, 35597.2980660662, 36878.9710906881, 36308.4784216699],
["Canada",28484.9776553362, 29331.8223245112, 29911.3226371905, 31268.5521058006, 32845.6267954986, 35105.9892865177, 36853.5554314586, 38353.0812084648, 38883.0700787846, 37807.5259929247],
["Chile",9293.72322249492, 9712.95685376592, 9973.25946014534, 10475.6486664294, 11299.9329380664, 12194.1443507847, 13036.1285137662, 13896.6946083854, 14577.5715215544, 14346.4648487517],
["Czech Republic",14992.3582327884, 16173.6387263985, 16872.3146139754, 17991.9331910401, 19303.9208098529, 20365.7571142986, 22349.7033747705, 24578.7151794413, 25845.0252652635, 25530.2001372786],
["Denmark",28822.3343389384, 29437.5232757263, 30756.3323835463, 30427.5149228283, 32301.3394496912, 33195.8834383121, 36025.8628586246, 37730.736916861, 39494.1216134705, 37688.2516868117],
["Estonia",9861.80346448344, 10693.0362387373, 11966.6746413277, 13370.127724177, 14758.4181995435, 16530.7311083966, 19134.4130457855, 21262.1415030861, 21802.2185996096, 19880.2786070771],
["Finland",25651.0005450286, 26518.293212524, 27509.1023963486, 27592.2031438268, 29855.387224753, 30689.9753400459, 33095.030855128, 36149.32963148, 37795.237097856, 35236.9462175119],
["France",25272.4127936247, 26644.8404266553, 27776.4225007831, 27399.5459524877, 28273.6330332193, 29692.129719144, 31551.3458452695, 33300.5367942194, 34233.0770304046, 33697.9698933558],
["Germany",25949.0588445755, 26855.1296905696, 27587.1573355486, 28566.8930562768, 29900.6507502683, 31365.576121107, 33713.1689185469, 35623.4147543745, 37170.693412872, 36339.9009091421],
["Greece",18410.2161113813, 19928.6672542964, 21597.5997816753, 22701.7754699285, 24088.2202877303, 24571.6085727425, 26917.7358531616, 28059.5724529461, 29919.6385697499, 29121.8306842272],
["Hungary",12134.0140204384, 13576.4413471795, 14765.2333755651, 15424.3926636975, 16316.8885518624, 16937.9383138957, 18329.1579647528, 19187.2829138188, 20699.9056965458, 20279.5060192879],
["Iceland",28840.4899286393, 30443.907492292, 31083.7723760522, 30768.0910155965, 33697.6723843766, 35025.1338769129, 35807.880160537, 37178.9762849327, 39029.2396691797, 36789.0728215933],
["Ireland",28695.0931156375, 30524.3712319885, 33052.4517849575, 34525.4148515126, 36511.518217243, 38622.5520655621, 42267.9784043449, 45293.6482834027, 42643.6362934458, 39571.204030073],
["Israel",23496.1573531847, 23455.1048827219, 23534.8524310529, 22270.5507137525, 23650.3373360675, 23390.3809761211, 24960.0508863672, 26583.387317862, 27679.3777353666, 27661.1271269539],
["Italy",25594.3456589799, 27127.4298830414, 26803.97019843, 27137.5365317475, 27416.0961790796, 28144.0379218171, 30224.2469223694, 31897.7434207435, 33270.7464832409, 32407.5033618371],
["Japan",25607.7204586644, 26156.1615068187, 26804.9303541079, 27487.123875677, 29020.8955443678, 30311.5281317716, 31865.2733339475, 33577.1846704038, 33902.3807335349, 32476.7736535378],
["Korea",17197.0800334388, 18150.876787578, 19655.5961579558, 20180.932195274, 21630.1629926488, 22783.2288432845, 24286.1803272315, 26190.6122298674, 26876.6422899926, 27099.9481192024],
["Luxembourg",53645.7229595775, 53932.4594967082, 57559.2104243171, 60723.9861616145, 65021.6940738929, 68372.258619455, 78523.2988291034, 84577.2190776183, 89732.070876649, 84802.9716853677],
["Mexico",10046.1268819126, 10135.7265293228, 10397.8915415223, 10883.8643666506, 11535.0108530091, 12460.5385178945, 13672.6034191998, 14581.5178679935, 15290.896575069, 14336.6153896301],
["Netherlands",29405.5490140191, 30788.2508953647, 31943.4974056501, 31702.5798119046, 33208.8592768987, 35110.6577619843, 38063.6877336725, 40744.3819227526, 42887.4361825772, 40813.047575264],
["New Zealand",21113.9167562199, 22128.8483527737, 23115.0676326844, 23826.8007737809, 24775.7827578032, 25460.2942993256, 27277.7654223562, 28701.3328788203, 29231.151567007, 29097.307061683],
["Norway",36125.7036078917, 37091.7150478501, 37051.9473705873, 38298.8631304819, 42257.5943785044, 47318.7844737929, 53287.9750020125, 55042.2093246253, 60634.9819502298, 55726.7456848212],
["Poland",10567.0016558591, 10950.4739587211, 11562.6245018264, 11985.0042955849, 13014.5996003514, 13785.7656450352, 15067.4402978586, 16762.2045951617, 18062.1834289945, 18928.8205162032],
["Portugal",17748.6930918065, 18464.7472033934, 19088.1764760703, 19392.2862101711, 19796.1891048601, 21294.1660710906, 22869.9272824917, 24122.9886840593, 24962.2959136026, 24980.061318413],
["Slovak Republic",10980.1169029929, 12070.70145551, 12965.6228450353, 13597.9556337584, 14659.0096334623, 16174.4605836956, 18397.5822518716, 20916.5438777989, 23241.0972994031, 22870.8998469728],
["Slovenia",17468.5101911515, 18342.8685005009, 19702.0589885459, 20448.2040741134, 22200.7770986247, 23493.9370882038, 25432.3240768864, 27228.3376725332, 29240.5471362484, 27535.6540786883],
["Spain",21320.0690041202, 22591.3855327033, 24066.5003000439, 24748.2819467008, 25958.1017090553, 27376.7644983649, 30347.9357063518, 32251.8469529105, 33173.3334547015, 32254.0895139806],
["Sweden",27948.4749835611, 28231.3958224236, 29277.7736264795, 30418.0441004337, 32505.646687298, 32701.4327422078, 35680.1787996674, 38339.6698693908, 39321.3014501815, 36996.1394001578],
["Switzerland",31618.0958082618, 32103.4812665805, 33390.9169108411, 33266.3042144918, 34536.8476164193, 35478.0395058126, 39116.3945212067, 42755.5569054368, 45516.5601956426, 44830.3719226918],
["Turkey",9169.71780977686, 8613.21159842998, 8666.90348260238, 8790.01048446305, 10166.0963401732, 11391.3768057053, 12886.6195973915, 13897.3820310325, 14962.4944915831, 14242.7276329837],
["United Kingdom",26071.3066882302, 27578.2857038985, 28887.5850874466, 29848.7319363944, 31790.9933025905, 32724.4057819337, 34970.5193042987, 35719.2060247462, 36817.493156774, 35158.8005888247],
["United States",35050.1738557741, 35866.2624634202, 36754.5543204007, 38127.524970345, 40246.0630591955, 42466.1326203714, 44594.9199470326, 46337.2237397566, 46901.0697730874, 45673.7445647402]
];

var year=10;
var rows = data.slice(1,data.length);

var x = pv.Scale.ordinal(pv.range(12)).splitBanded(0, 240, 4/5);
var y = pv.Scale.linear(0, 80000).range(0, 170);

var vis = new pv.Panel()
    .width(250)
    .height(200)
    ;
	vis.add(pv.Bar)
		.data(rows.sort(function(a,b) b[year]-a[year]).slice(0,12))
		.bottom(10)
		.width(x.range().band)
		.height(function(d) y(d[year]))
		.left(function() 5+x(this.index))
		.anchor("bottom").add(pv.Label).textAngle(Math.PI/2).text(function(d) d[0]).textBaseline("middle").textAlign("right")
		;
vis.render();

here, in the data variable, I’ve put my 2-dimensional table as I got it. The first row contains headers which I won’t need. So, in line 40, I create rows which just removes that 1st line with a slice function. data.slice(1,data.length) effectively keeps all the lines but the first.
In the next few lines I create two scales for placing my bars, a standard vis panel and a bar chart. Now, what kind of data will I pass to the bar chart?
I want to sort the rows by the value of the latest year (which happens to be the 11th column, so 10). This is what the rows.sort(function(a,b) b[year]-a[year]) part of the statement does. I’ve simply assigned 10 to the variable year, so if I want to display other years (with an HTML form for instance) it wouldn’t be difficult to modify.
And, since I only want the top 12 values, I just write .slice(0,12) after that.

In line 55, I just add a label. pv.Label inherits the data of its parent. The data item is the whole row of my 2-dimensional table, so if I write function(d) d[0], I am referring to the left-most item which will be the country name.

So, with the use of simple array functions, I can easily rework an unprepared dataset in protovis, rather than having to tailor my dataset with manual (and error-prone) manipulations in external programs. Here is the result:

Next: reshaping complex arrays