More fun with arrays in protovis

In my short tutorial on working with data and protovis I briefly covered some standard javascript and protovis methods to work with arrays. The more I work with Protovis, the more I am convinced that efficient array manipulation is key to achieving just about anything with the library. So, I would like to go into more detail in some javascript methods for building, processing and testing arrays that can really be helpful.

Going through arrays: map and forEach

I said rapidly that the map method was very useful in protovis especially used in combination with pv.range. But that's not very fair to map() to be treated this lightly. Protovis canon examples do not use many traditional loops such as for or while statements. One reason for that is that many constructs in protovis are de facto loops: when we pass an array to protovis as a data file, to create a bar chart for instance (or panels, pie wedges, you name it), it will go through each element of the array to create individul bars (panels, wedges...), to position them, style them, and so on and so forth. This is why it is so important to have your data elements in the best possible shape when you first pass them to protovis. It makes the rest of your code much nicer. Remember our early example:
var categories = ["a","b","c","d","e","f"];
var vis = new pv.Panel()
  .width(150)
  .height(150);

vis.add(pv.Bar)
  .data([1, 1.2, 1.7, 1.5, .7, .3])
  .width(20)
  .height(function(d) d * 80)
  .bottom(0)
  .left(function() this.index * 25)
  .anchor("bottom").add(pv.Label)
    .text(function() categories[this.index]);

vis.render();
This is not ideal to have values and categories in two separate places, because one could be changed without updating the other. So let's try to use map to create one single variable.
var categories = ["a","b","c","d","e","f"];
var data=[1, 1.2, 1.7, 1.5, .7, .3].map(function(d,i) {return {value:d, key: categories[i]};});
Let's look at our map() method here. First, it's right after an array. It will run against this array, so it will perform an operation on each element of this array, and the result will be another array of the same size with the outcome of each operation in the same order. The next thing here is a function with two arguments: d and i. Again the naming is arbitrary, call them what you want. And they are both optional. But it has to be a function. pv.range(3).map(3) will not return [3,3,3], you need to write pv.range(3).map(function() 3). The first argument refers to the current item of the array map is working on. So 1, then 1.2, etc. If the array is more complex, and is an array of arrays or similar, the current element can be itself an array, an object or anything. It doesn't have to be a number. Here, we want to create an array of associative arrays where the value handle corresponds to the values of the array, and where key corresponds to the category name. So we start our output by "{value: d,". This puts the value of each array element in sequence where we need it to be. The second argument corresponds to the index of the current item in the array, so - 0, 1, 2 etc. This is not unlike using "this.index" in other parts of protovis. This helps us getting the right category name, the one in the same position as the value we are fetching. So we write "key: categories[i]}". The rest of the code can then be changed to :
vis.add(pv.Bar)
  .data(data)
  .width(20)
  .height(function(d) d.value * 80)
  .bottom(0)
  .left(function() this.index * 25)
  .anchor("bottom").add(pv.Label)
    .text(function() d.key);
vis.render();
Now how about forEach()? forEach works in a very similar way to map(), the difference is that it doesn't output an array. It's just a function that runs on each element of the array. It can be used to perform an operation a number of times corresponding to the length of that array, for instance.

Testing arrays

There are some times when you need to know whether some or all the elements of your array fulfill a condition. And some other times, you need to be able to extract a subset of your array also on a conditional basis. Now, that would be possible using forEach or map methods as above, but fortunately javascript provides simpler means to achieve that.

Testing a condition on an array at once

There are two methods that do that: every() and some(). every() will return true is the condition is true for, well, every element of the array. some() will return true if the condition is true for at least one element of the array. So, they can also be used to tell if the condition is false for at least one element of the array, or all elements of the array respectively. This is how they work:
[0,1,2].every(function(d) {return d;})
// will return false: 0 is false, 1 is true and 2 is true.
[0,1,2].every(function(d) {return (d<3);})
// will return true. All elements are less than 3.

[0,1,2].some(function(d) {return d;})
// will return true. 1 is true, so at least one element in the array is true.

Creating conditional subsets of an array

It is also possible to get only the elements that fit a condition using the filter() method.
[0,1,2].filter(function(d) {return d;})
// this will return [1,2]. 0 is evaluated as "false".
[1,2,3,4,5].filter(function(d) {return (d>2);})
// this will return [3,4,5].
Of course, the more complex the array, the more interesting these functions get. With the barley array from part 4:
 barley.filter(function(d) {return d.variety=="Manchuria";}
/* this will return: 
  [{"yield":27,"variety":"Manchuria","year":1931,"site":"University Farm"},
   {"yield":48.86667,"variety":"Manchuria","year":1931,"site":"Waseca"},
   {"yield":27.43334,"variety":"Manchuria","year":1931,"site":"Morris"},
   {"yield":39.93333,"variety":"Manchuria","year":1931,"site":"Crookston"},
   {"yield":32.96667,"variety":"Manchuria","year":1931,"site":"Grand Rapids"},
   {"yield":28.96667,"variety":"Manchuria","year":1931,"site":"Duluth"},
   {"yield":26.9,"variety":"Manchuria","year":1932,"site":"University Farm"},
   {"yield":33.46667,"variety":"Manchuria","year":1932,"site":"Waseca"},
   {"yield":34.36666,"variety":"Manchuria","year":1932,"site":"Morris"},
   {"yield":32.96667,"variety":"Manchuria","year":1932,"site":"Crookston"},
   {"yield":22.13333,"variety":"Manchuria","year":1932,"site":"Grand Rapids"},
   {"yield":22.56667,"variety":"Manchuria","year":1932,"site":"Duluth"}]*/

Visualizing arrays

(without plotting them, of course) When you are manipulating arrays, turning them into maps, performing roll-ups and sorts, you may want to get a glimpse of the array. But, unless it's a single, one-dimensional array, firebug or chrome debugger will represent it as a cryptic [ > Object, > Object, > Object ]. Not being able to follow step by step what's happening to an array makes understanding the data functions much more difficult. Fortunately, you can use the JSON.stringify() method.
JSON.stringify(barley)
/*returns: 
"[
{"yield":27,"variety":"Manchuria","year":1931,"site":"University Farm"},
{"yield":48.86667,"variety":"Manchuria","year":1931,"site":"Waseca"},
{"yield":27.43334,"variety":"Manchuria","year":1931,"site":"Morris"},
{"yield":39.93333,"variety":"Manchuria","year":1931,"site":"Crookston"},
{"yield":32.96667,"variety":"Manchuria","year":1931,"site":"Grand Rapids"},
{"yield":28.96667,"variety":"Manchuria","year":1931,"site":"Duluth"},
{"yield":43.06666,"variety":"Glabron","year":1931,"site":"University Farm"},
{"yield":55.2,"variety":"Glabron","year":1931,"site":"Waseca"},
{"yield":28.76667,"variety":"Glabron","year":1931,"site":"Morris"},
{"yield":38.13333,"variety":"Glabron","year":1931,"site":"Crookston"},
{"yield":29.13333,"variety":"Glabron","year":1931,"site":"Grand Rapids"},
{"yield":29.66667,"variety":"Glabron","year":1931,"site":"Duluth"},
{"yield":35.13333,"variety":"Svansota","year":1931,"site":"University Farm"},
{"yield":47.33333,"variety":"Svansota","year":1931,"site":"Waseca"},
{"yield":25.76667,"variety":"Svansota","year":1931,"site":"Morris"},
{"yield":40.46667,"variety":"Svansota","year":1931,"site":"Crookston"},
{"yield":29.66667,"variety":"Svansota","year":1931,"site":"Grand Rapids"},
{"yield":25.7,"variety":"Svansota","year":1931,"site":"Duluth"},
{"yield":39.9,"variety":"Velvet","year":1931,"site":"University Farm"},
{"yield":50.23333,"variety":"Velvet","year":1931,"site":"Waseca"},
{"yield":26.13333,"variety":"Velvet","year":1931,"site":"Morris"},
{"yield":41.33333,"variety":"Velvet","year":1931,"site":"Crookston"},
{"yield":23.03333,"variety":"Velvet","year":1931,"site":"Grand Rapids"},
{"yield":26.3,"variety":"Velvet","year":1931,"site":"Duluth"},
{"yield":36.56666,"variety":"Trebi","year":1931,"site":"University Farm"},
{"yield":63.8333,"variety":"Trebi","year":1931,"site":"Waseca"},
{"yield":43.76667,"variety":"Trebi","year":1931,"site":"Morris"},
{"yield":46.93333,"variety":"Trebi","year":1931,"site":"Crookston"},
{"yield":29.76667,"variety":"Trebi","year":1931,"site":"Grand Rapids"},
{"yield":33.93333,"variety":"Trebi","year":1931,"site":"Duluth"},
{"yield":43.26667,"variety":"No. 457","year":1931,"site":"University Farm"},
{"yield":58.1,"variety":"No. 457","year":1931,"site":"Waseca"},
{"yield":28.7,"variety":"No. 457","year":1931,"site":"Morris"},
{"yield":45.66667,"variety":"No. 457","year":1931,"site":"Crookston"},
{"yield":32.16667,"variety":"No. 457","year":1931,"site":"Grand Rapids"},
{"yield":33.6,"variety":"No. 457","year":1931,"site":"Duluth"},
{"yield":36.6,"variety":"No. 462","year":1931,"site":"University Farm"},
{"yield":65.7667,"variety":"No. 462","year":1931,"site":"Waseca"},
{"yield":30.36667,"variety":"No. 462","year":1931,"site":"Morris"},
{"yield":48.56666,"variety":"No. 462","year":1931,"site":"Crookston"},
{"yield":24.93334,"variety":"No. 462","year":1931,"site":"Grand Rapids"},
{"yield":28.1,"variety":"No. 462","year":1931,"site":"Duluth"},
{"yield":32.76667,"variety":"Peatland","year":1931,"site":"University Farm"},
{"yield":48.56666,"variety":"Peatland","year":1931,"site":"Waseca"},
{"yield":29.86667,"variety":"Peatland","year":1931,"site":"Morris"},
{"yield":41.6,"variety":"Peatland","year":1931,"site":"Crookston"},
{"yield":34.7,"variety":"Peatland","year":1931,"site":"Grand Rapids"},
{"yield":32,"variety":"Peatland","year":1931,"site":"Duluth"},
{"yield":24.66667,"variety":"No. 475","year":1931,"site":"University Farm"},
{"yield":46.76667,"variety":"No. 475","year":1931,"site":"Waseca"},
{"yield":22.6,"variety":"No. 475","year":1931,"site":"Morris"},
{"yield":44.1,"variety":"No. 475","year":1931,"site":"Crookston"},
{"yield":19.7,"variety":"No. 475","year":1931,"site":"Grand Rapids"},
{"yield":33.06666,"variety":"No. 475","year":1931,"site":"Duluth"},
{"yield":39.3,"variety":"Wisconsin No. 38","year":1931,"site":"University Farm"},
{"yield":58.8,"variety":"Wisconsin No. 38","year":1931,"site":"Waseca"},
{"yield":29.46667,"variety":"Wisconsin No. 38","year":1931,"site":"Morris"},
{"yield":49.86667,"variety":"Wisconsin No. 38","year":1931,"site":"Crookston"},
{"yield":34.46667,"variety":"Wisconsin No. 38","year":1931,"site":"Grand Rapids"},
{"yield":31.6,"variety":"Wisconsin No. 38","year":1931,"site":"Duluth"},
{"yield":26.9,"variety":"Manchuria","year":1932,"site":"University Farm"},
{"yield":33.46667,"variety":"Manchuria","year":1932,"site":"Waseca"},
{"yield":34.36666,"variety":"Manchuria","year":1932,"site":"Morris"},
{"yield":32.96667,"variety":"Manchuria","year":1932,"site":"Crookston"},
{"yield":22.13333,"variety":"Manchuria","year":1932,"site":"Grand Rapids"},
{"yield":22.56667,"variety":"Manchuria","year":1932,"site":"Duluth"},
{"yield":36.8,"variety":"Glabron","year":1932,"site":"University Farm"},
{"yield":37.73333,"variety":"Glabron","year":1932,"site":"Waseca"},
{"yield":35.13333,"variety":"Glabron","year":1932,"site":"Morris"},
{"yield":26.16667,"variety":"Glabron","year":1932,"site":"Crookston"},
{"yield":14.43333,"variety":"Glabron","year":1932,"site":"Grand Rapids"},
{"yield":25.86667,"variety":"Glabron","year":1932,"site":"Duluth"},
{"yield":27.43334,"variety":"Svansota","year":1932,"site":"University Farm"},
{"yield":38.5,"variety":"Svansota","year":1932,"site":"Waseca"},
{"yield":35.03333,"variety":"Svansota","year":1932,"site":"Morris"},
{"yield":20.63333,"variety":"Svansota","year":1932,"site":"Crookston"},
{"yield":16.63333,"variety":"Svansota","year":1932,"site":"Grand Rapids"},
{"yield":22.23333,"variety":"Svansota","year":1932,"site":"Duluth"},
{"yield":26.8,"variety":"Velvet","year":1932,"site":"University Farm"},
{"yield":37.4,"variety":"Velvet","year":1932,"site":"Waseca"},
{"yield":38.83333,"variety":"Velvet","year":1932,"site":"Morris"},
{"yield":32.06666,"variety":"Velvet","year":1932,"site":"Crookston"},
{"yield":32.23333,"variety":"Velvet","year":1932,"site":"Grand Rapids"},
{"yield":22.46667,"variety":"Velvet","year":1932,"site":"Duluth"},
{"yield":29.06667,"variety":"Trebi","year":1932,"site":"University Farm"},
{"yield":49.2333,"variety":"Trebi","year":1932,"site":"Waseca"},
{"yield":46.63333,"variety":"Trebi","year":1932,"site":"Morris"},
{"yield":41.83333,"variety":"Trebi","year":1932,"site":"Crookston"},
{"yield":20.63333,"variety":"Trebi","year":1932,"site":"Grand Rapids"},
{"yield":30.6,"variety":"Trebi","year":1932,"site":"Duluth"},
{"yield":26.43334,"variety":"No. 457","year":1932,"site":"University Farm"},
{"yield":42.2,"variety":"No. 457","year":1932,"site":"Waseca"},
{"yield":43.53334,"variety":"No. 457","year":1932,"site":"Morris"},
{"yield":34.33333,"variety":"No. 457","year":1932,"site":"Crookston"},
{"yield":19.46667,"variety":"No. 457","year":1932,"site":"Grand Rapids"},
{"yield":22.7,"variety":"No. 457","year":1932,"site":"Duluth"},
{"yield":25.56667,"variety":"No. 462","year":1932,"site":"University Farm"},
{"yield":44.7,"variety":"No. 462","year":1932,"site":"Waseca"},
{"yield":47,"variety":"No. 462","year":1932,"site":"Morris"},
{"yield":30.53333,"variety":"No. 462","year":1932,"site":"Crookston"},
{"yield":19.9,"variety":"No. 462","year":1932,"site":"Grand Rapids"},
{"yield":22.5,"variety":"No. 462","year":1932,"site":"Duluth"},
{"yield":28.06667,"variety":"Peatland","year":1932,"site":"University Farm"},
{"yield":36.03333,"variety":"Peatland","year":1932,"site":"Waseca"},
{"yield":43.2,"variety":"Peatland","year":1932,"site":"Morris"},
{"yield":25.23333,"variety":"Peatland","year":1932,"site":"Crookston"},
{"yield":26.76667,"variety":"Peatland","year":1932,"site":"Grand Rapids"},
{"yield":31.36667,"variety":"Peatland","year":1932,"site":"Duluth"},
{"yield":30,"variety":"No. 475","year":1932,"site":"University Farm"},
{"yield":41.26667,"variety":"No. 475","year":1932,"site":"Waseca"},
{"yield":44.23333,"variety":"No. 475","year":1932,"site":"Morris"},
{"yield":32.13333,"variety":"No. 475","year":1932,"site":"Crookston"},
{"yield":15.23333,"variety":"No. 475","year":1932,"site":"Grand Rapids"},
{"yield":27.36667,"variety":"No. 475","year":1932,"site":"Duluth"},
{"yield":38,"variety":"Wisconsin No. 38","year":1932,"site":"University Farm"},
{"yield":58.16667,"variety":"Wisconsin No. 38","year":1932,"site":"Waseca"},
{"yield":47.16667,"variety":"Wisconsin No. 38","year":1932,"site":"Morris"},
{"yield":35.9,"variety":"Wisconsin No. 38","year":1932,"site":"Crookston"},
{"yield":20.66667,"variety":"Wisconsin No. 38","year":1932,"site":"Grand Rapids"},
{"yield":29.33333,"variety":"Wisconsin No. 38","year":1932,"site":"Duluth"}
]"*/
No matter the manipulations you inflict to an array you will always be able to make it reveal its innards by using this.
 

Protovis: analysis of the Map projections example

What is a map?

before we start looking at the code it may be a good idea to think of the best way to represent a country.
Countries are areas of land surrounded by borders, which are imaginary (or sometimes physical) lines going through a set of points.

Some countries are made of one of such surfaces, but many countries are not one contiguous territory (they may include islands for instance) so they could be made out of several disjointed polygons.
.

Now let’s put on our protovis hat. Let’s suppose we want to draw a map where each country could be colored differently (choropleth). What kind of data structure should be use to represent that?
First there should be a sort of array of countries. Each country should be an item in that array, so they can be indexed and assigned an individual color and various data points.
Then, at the lowest level, we would be drawing polygons, which are treated as pv.Line in protovis. For each polygon, we would require an array of coordinate pairs. To draw a country, we would need a list (array) of those polygons.

So the data structure we are looking at is:

var world=[  // an array of countries
    [ // an array of polygons
        [ // an array of pairs of coordinates
            [x0, y0], // coordinates of the first point
            [x1, y1], // coordinates of the next one
                ... 
            [xn, yn],
            [x0, y0]  // coordinates of the first point to close the polygon
        ]
       ...              // another polygon, but maybe not.
   ], 
   [                    // next country
  ...
   ]
...
]

the map projections example

Can be found here: http://vis.stanford.edu/protovis/ex/projection.html

/*
 * A diverging color scale, using previously-computed quantiles of population
 * densities; in the future, we might use a quantile scale here to do this
 * automatically. Map colors based on www.ColorBrewer.org, by Cynthia A. Brewer,
 * Penn State.
 */
var fill = pv.Scale.linear()
    .domain(140, 650, 1900)
    .range("#91bfdb", "#ffffbf", "#fc8d59");

/* Precompute the country's population density and color. */
countries.forEach(function(c) {
  c.color = stats[c.code].area
      ? fill(stats[c.code].pop / stats[c.code].area)
      : "#ccc"; // unknown
});

var w = 860,
    h = 3 / 5 * w,
    geo = pv.Geo.scale("hammer").range(w, h);

var vis = new pv.Panel()
    .width(w)
    .height(h);

/* Countries. */
vis.add(pv.Panel)
    .data(countries)
  .add(pv.Panel)
    .data(function(c) c.borders)
  .add(pv.Line)
    .data(function(b) b)
    .left(geo.x)
    .top(geo.y)
    .title(function(d, b, c) c.name)
    .fillStyle(function(d, b, c) c.color)
    .strokeStyle(function() this.fillStyle().darker())
    .lineWidth(1)
    .antialias(false);

/* Latitude ticks. */
vis.add(pv.Panel)
    .data(geo.ticks.lat())
  .add(pv.Line)
    .data(function(b) b)
    .left(geo.x)
    .top(geo.y)
    .strokeStyle("rgba(128,128,128,.3)")
    .lineWidth(1)
    .interpolate("cardinal")
    .antialias(false);

/* Longitude ticks. */
vis.add(pv.Panel)
    .data(geo.ticks.lng())
  .add(pv.Line)
    .data(function(b) b)
    .left(geo.x)
    .top(geo.y)
    .strokeStyle("rgba(128,128,128,.3)")
    .lineWidth(1)
    .interpolate("cardinal")
    .antialias(false);

vis.render();

In addition there are two arrays of the following shape:
First, stats which is an associative arrays of associative arrays, and which associate each 2-letter country code with values of population and area:

var stats = {
'AG': {pop:83039, area:44},
'DZ': {pop:32854159, area:238174},
...
'US': {pop:299846449, area:915896},
...
};

Then, countries, which is an array of associative arrays.

var countries = [
{code:'AG', name:"Antigua and Barbuda", 
borders:[ // an array of one or several areas, 
  [ // an array of coordinates, 
    [ // a pair of the form longitude, lattitude
       ...
    ]
  ]
]}
...
]

Now this second data structure looks a lot like the one we’ve drafted in the prologue. All the geographic information is tucked in a property called “borders”. The array has other properties for comfort.
Because the data is put in the right shape and order, this script can produce a very good map with a remarkable economy of code.
This example has been put together to showcase the various map projections of protovis (identity, mercator, and so on.). These projections have zero impact on the way data should be assembled for making maps, so we’ll just treat them as “magic”.

/*
 * A diverging color scale, using previously-computed quantiles of population
 * densities; in the future, we might use a quantile scale here to do this
 * automatically. Map colors based on www.ColorBrewer.org, by Cynthia A. Brewer,
 * Penn State.
 */
var fill = pv.Scale.linear()
    .domain(140, 650, 1900)
    .range("#91bfdb", "#ffffbf", "#fc8d59");

This part creates a color scale which will return a color according to the value passed to it. The color returned will be somewhere between the ones specified in the range, depending on where the value is relatively to the values specified in the domain. So a value of 140 will result in a color of #91bfdb (bluish), it will go towards the grey as the value moves up to 650, and towards #fc8d59 (redish) as the value goes up to 1900.

/* Precompute the country's population density and color. */
countries.forEach(function(c) {
  c.color = stats[c.code].area
      ? fill(stats[c.code].pop / stats[c.code].area)
      : "#ccc"; // unknown
});

As the remark says, this will precompute the country’s color once and for all.
The forEach() method goes to every element of the countries array.
the c.color = statement will add a color key to each element of that array (which, as you may recall, already has values for the code, name and borders keys.
What it does is that is retrieves the country code of that element of countries, c.code, and uses that to find out whether we have an area value for that country code (this is stats[/c].area?).
If this is the case, we are going to compute the color that should be attributed to the country, by passing the population divided by the area to the color scale we just made. Else, we just use light grey.

The next few lines are standard constants that will shape the vis.
Note however

geo = pv.Geo.scale("hammer").range(w, h)

This is a geographic scale, which will be used to convert longitudes and latitudes to X and Y coordinates on the screen.

/* Countries. */
vis.add(pv.Panel)
    .data(countries)
  .add(pv.Panel)
    .data(function(c) c.borders)
  .add(pv.Line)
    .data(function(b) b)
    .left(geo.x)
    .top(geo.y)
    .title(function(d, b, c) c.name)
    .fillStyle(function(d, b, c) c.color)
    .strokeStyle(function() this.fillStyle().darker())
    .lineWidth(1)
    .antialias(false);

This is where it all happens.
First, we create a series of panels, one for each country. So, we pass the countries array as data.
Then, we are going to create another series of panels for every country, that is, with as many panels as there are independent areas in the country. For instance, if there are islands, we are going to need extra panels to represent them. If the country is one contiguous mass of land, there will be just one panel here.
This time, we use function(c) c.borders as data. That is, we go into the borders array.

Finally, we are going to create a filled polygon for each of these independent areas. This is achieved by adding a pv.Line to the previous panels. Likewise, we use (function(b) b) as data, meaning that we go yet another level into the borders array. Now, we are accessing the pairs of longitude + latitude numbers.

geo.x and geo.y convert this pair of numbers to X and Y coordinates on the screen.
For the next two lines, title and fillStyle, we need to go back to the country level.
so, we use a function of the form function(d,b,c). d is the current item (pair of longitude, latitude), b its parent (individual area) and c, its grand-parent (the country).
so, function(d,b,c) c.name retrieves the country name, and function(d,b,c) c.color retrieves the color we had computed for that country to begin with.

For the color of the border, we wish to use a darker version of the fill color. This is what the this.fillStyle().darker() does.

The rest of the vis is longitude and latitude ticks, using the built-in properties of the scale.