You may not need d3

If you’re working on visualization on the web, the go-to choice is to use d3js (or a higher-level library). d3js is powerful and versatile. The best proof of that is the lead author of d3, Mike Bostock, worked until recently at the New York Times which many, including myself, consider the ultimate reference in terms of information graphics. At the NYT and elsewhere, d3js has powered breathtaking projects.

d3js was built as a successor to protovis. Protovis was a slightly higher-level library than d3js, more specialized in data graphics, and designed with the assumption that it could be used with little or no programming experience. And indeed this was true. When I started using protovis in 2009, my javascript skills were limited, and so I learned by deconstructing and recombining the examples. I kept on doing that when d3 came along – learning through examples. And the d3 examples, like those in protovis before, were short and standalone, demonstrating how easy it was to create something with a few lines of code.

d3js requires more technical knowledge than protovis did to get started, but not much. However, it is much more versatile – no longer constrained to charts and maps, it can handle a lot of the front-end functionalities of a web site. And so, it is possible to make visualizations by:

  • learning how to do stuff in d3js,
  • learning the essential javascript that you need to run d3js, as needed (i.e. variables, if-then statements, array methods…)
  • then, learning through doing, and obtain a feeling for how SVG, HTML and the Document Object Model works.

This approach is actually very limiting.

You would create much more robust code by:

  • learning javascript,
  • learning SVG/HTML/DOM,
  • then learning and appreciate d3 logic.

That would give you the choice when to use a d3js method, when to use another library (past, present or future), or when to use native javascript.

The point of this article is to go through the main functions of d3js and see how they can be replicated without using a library. As of this writing, I am convinced that using d3js is the most sensible way to handle certain tasks (though this may change in the future). In almost any case I can think of, even when it’s not the most efficient way to do so, using the d3js approach is the most convenient and the easiest from the developer’s perspective.  But I also believe it’s really important to know how to do all these things without d3js.

This will be a very long post. I’m trying to keep it as structured as possible so you can jump to the right section and keep it as a reference. This also assumes you have a certain familiarity with d3js and javascript – this is not an introductory post.

I’ve divided it in 4 parts:

  • Tasks that you really don’t need d3js for. And ok, you may want to use it regardless. But here’s how to do those things without it. (manipulating DOM elements)
  • interlude: a reflexion on d3js data binding. Do you really need that?
  • Tasks which are significantly easier with d3js (working with external data, animation)
  • Tasks where as of now, d3js is clearly superior to anything else (scales, array refactoring, map projections, layouts).

Selecting and manipulating elements

At the core of d3js is the notion of selection. In d3js, you select one or several elements which become a selection object. You can then do a variety of things:

  • join them with data, which is also an essential tenet of the d3 philosophy,
  • create or delete elements, especially in relation to the data you may have joined with their parent,
  • manipulate these elements by changing their attributes and their style, including in function of the data joined with each of them,
  • attach events listeners to those elements, so that when events happen (ie somebody clicks on a rectangle, or change the value of a form) a function might be triggered,
  • animate and transform those elements.

Selecting elements and parsing the DOM

d3 selection objects vs Nodes, HTML NodeList vs HTML LiveCollections

When you select something with d3js, using methods such as d3.select() or d3.selectAll(), this returns a d3 selection object. The selection object is a subclass of the javascript object Array – check here for all the nitty-gritty. The gist of it is that d3 selection objects can then be manipulated by d3 methods, such as .attr() or .style() to assign attributes or styles.

By contrast, the DOM is made of Node objects. The root of the DOM, the Document Object Model, is a special Node called Document. Nodes are organized in a tree-like structure: Document has children, which may have children etc. and encompass everything in the page. The Node objects are really the building blocks of a web page. When an element is added, be it an HTML element like a <div> or an SVG element like a <rect>, a new Node is created as well. So d3js has to interact with Node objects as well. d3 selection objects are tightly connected to their corresponding Node objects.

However, with “vanilla” javascript, you can directly access and manipulate the Node objects.

d3.select / d3.selectAll vs document.querySelector / document.querySelectorAll

In d3js, you parse the document using d3.select (to select one object) or d3.selectAll. The argument of this method is a CSS3 selector, that is, a string which is very similar to what could be found in a CSS style sheet to assign specific styles to certain situations. For instance, “g.series rect.mark” will match with all the rectangles of the class “mark” which are descendants of the SVG g groups of the class series.

When d3js was introduced in 2011, javascript didn’t have an equivalent syntax – instead you could select elements by class, or by id, or by tag name (more on that in a minute). Things have changed however and it is now possible to use CSS3 selectors using document.querySelector (which will return just one node) or document.querySelectorAll (which will return an HTML NodeList).

An HTML NodeList is a special object which is kind of like an array of Node objects, only it has almost no array methods or properties. You can still access its members using brackets, and get its length, but that’s it.

I wrote document.querySelectorAll, because you can use this method from the document, but you can use it from any Node. Those two snippets of code are parallel:

var svg = d3.select("svg"); // svg is a d3 selection object
var g = svg.selectAll("g"); // g is a d3 selection object
var svg = document.querySelector("svg"); // svg is a Node
var g = svg.querySelectorAll("g"); // g is a NodeList

 Getting elements by class name, ID, tag name, name attribute

d3js doesn’t have a special way to get all descendants of a selection of a certain class, of a certain ID, etc. The CSS3 selector syntax can indeed handle all those cases, so why have a separate way?

By contrast, javascript pre-2011 didn’t have a querySelectorAll method, and so the only way to parse the document was to use more specific method, like document.getElementsByClassName().

document.getElementsByClassName() retrieves all descendants of a certain class. document.getElementsByName() retrieves elements with a certain “name” attribute (think forms). documents.getElementsByTagName() gets all descendants of a certain type (ie all <div>s, all <rect>s, etc.).

What’s interesting about that is that what is returned is not an HTML NodeList like above with querySelectorAll, but another object called HTML Live Collection. The difference is that matching elements are created after, they would still be included in the Live Collection.

var svg = d3.select("svg");
svg.selectAll("rect").data([1,2,3]).enter().append("rect");
var mySelection = svg.selectAll("rect"); // 3 elements
mySelection[0].length // 3
svg.append("rect");
mySelection[0].length // 3
mySelection = svg.selectAll("rect"); // re-selecting to update it
mySelection[0].length // 4
var svg = document.querySelector("svg");
var ns = "http://www.w3.org/2000/svg";
var i;
for (i = 0; i < 3; i++) {
  var rect = document.createElementNS(ns, "rect"); // we'll explain creating elements later
  svg.appendChild(rect);
}
var mySelection = svg.getElementsByTagName("rect"); // 3 elements
var rect = document.createElementNS(ns, "rect");
svg.appendChild(rect);
mySelection.length // 4 - no need to reselect to update

How about IDs? there is also the getElementById (no s at elements!) which only retrieve one element. After all, IDs are supposed to be unique! if no elements match, getElementById returns null.

Children, parents, siblings…

Truth be told, if you can use selectors from the root, you can access everything. But sometimes, it’s nice to be able to go from one node to its parents or its children or its siblings, and d3js doesn’t provide that. By contrast, the Node object has an interface that does just that – node.childNodes gets a nodeList of child nodes, node.parentNode gets the parent node, node.nextSibling and node.previousSibling get the next and previous siblings. Nice.

However, most often you will really be manipulating elements (more on that in a second) and not nodes. What’s the difference? all Elements are Nodes, but the reverse is not true. One common example of Node which is not an Element is text content.

To get an Element’s children, you can use the (wait for it) children property. The added benefit is that what you get through children is a LiveCollection (dynamic), while what you get through childNodes is a NodeList (static).

var svg = document.querySelector("svg");
var ns = "http://www.w3.org/2000/svg";
var i;
for (i = 0; i < 3; i++) {
  var myRect = document.createElementNS(ns, "rect"); // we'll explain creating elements later
  svg.appendChild(rect);
}
// the variable myRect holds the last of the 3 <rect> elements that have been added
svg.childNodes; // a NodeList
myRect.parentNode; // the svg element
myRect.nextSibling; // null - myRect holds the last child of svg.
myRect.previousSibling; // the 2nd <rect> element
svg.firstChild; // the 1st <rect>. Really useful shorthand
svg.querySelector("rect"); // also the 1st <rect>.
svg.children; // a LiveCollection

Adding/reading attributes, styles, properties and events

Node, Element, EventTarget and other objects

In d3 101, right after you’ve created elements (to which we’ll come in a moment), you can start moving them around or giving them cool colors like “steelblue” by using the .attr and .style methods.

Wouldn’t that be cool if we could do the same to Node objects in vanilla javascript!

Well, we can. Technically, you can’t add style or attributes to Node objects proper, but to Element objects. The Element object inherits from the Node objects and is used to store these properties. There is also an HTMLElement and SVGElement which inherit from the Element object.

If you look at the Chrome console at an SVG element like a rect, you can see, in the properties tab, all the objects it inherits from: Object, EventTarget, Node, Element, SVGElement, SVGGraphicsElement, SVGGeometryElement, SVGRectElement, rect.

All have different roles. To simplify: Node relates to their relative place in the document hierarchy, EventTarget, to events, and Element and its children, to attributes, style and the like. The various SVG-prefixed objects all implement specific methods and properties. When we select a Node object as we’ve done above with svg.querySelector(“rect”) and the like, note that there’s not a Node object on one side, then an Element object somewhere else, a distinct SVGGeometryElement, and so on and so forth. What is retrieved is one single object that inherits all methods and properties of Nodes, Elements, EventTargets, and so on and so forth, and, as such, that behaves like a Node, like an Element, etc.

Setting and getting attributes

You can set attributes with the Element.setAttribute method.

var rect = document.querySelector("rect");
rect.setAttribute("x", 100);
rect.setAttribute("y", 100);
rect.setAttribute("width", 100);
rect.setAttribute("height", 100);

To be honest, I’m a big fan of the shorthand method in d3js,

var rect = d3.select("rect");
rect.attr({x: 100, y: 100, width: 100, height: 100});

Also, the Element.setAttribute method doesn’t return anything, which means it can’t be chained (which may or may not be a bad thing, though it’s definitely a change for d3js users). It’s not possible to set several attributes in one go either, although one could create a custom function, or, for the daring, extend the Element object for that.

Likewise, the Element object has a getAttribute method :

rect.getAttribute("x"); // 100

Classes, IDs and tag names

Classes, IDs and tag names are special properties of the Element objects. It’s extremely common to add or remove classes to elements in visualization: my favorite way to do that is to use the classed method in d3js.

d3.select("rect").classed("myRect", 1)

In vanilla javascript, you have the concept of classList.

document.querySelector("rect").classList; // ["myRect"]

ClassList has a number of cool methods. contains checks if this Element is of a certain class, add adds a class, remove removes a class, and toggles, well, toggles a class.

document.querySelector("rect").classList.contains(["myRect"]); // true
document.querySelector("rect").classList.remove("myRect");
document.querySelector("rect").classList.add("myRect");
document.querySelector("rect").classList.toggle("myRect");

How about IDs ? with d3js, you’d have to treat them as any other property (rect.attr(“id”)). In vanilla javascript, however, you can access it directly via the id property of Element. You can also do that with the name property.
Finally, you can use the tagName to get the type of element you are looking at (though you cannot change it – you can try, it just won’t do anything).

document.querySelector("rect").id = "myRect"; // true
document.querySelector("rect").name; // undefined;
document.querySelector("rect").tagName; // "rect"
document.querySelector("rect").tagName = "circle";
document.querySelector("rect").tagName; // "rect"

Text

Text is a pretty useful aspect of visualization! it is different from attributes or styles, which are set in the opening tag of an element. The text or content is what is happening in between the opening and closing tags of that element. This is why, in d3js, text isn’t set using attr or style, but either by the html method for HTML elements like DIVs or Ps, or by the text method for SVG elements like <text> and <tspan>.

Those have equivalent in the DOM + javascript world.

HTMLelements have the .innerHTML and outerHTML properties. The difference between the two is that outerHTML includes the opening and closing tags. innerHTML and outerHTML both return HTML, complete with tags and syntax.

SVG elements, however, don’t have access to this property, so they have to rely on the Node property textContent. HTML elements also have access to it, by the way. textContent returns just the plain text content of what’s in the element. All three properties can be used to either get or set text.

Style

In d3js, setting styles to elements is very similar to setting attributes, only we use the .style method instead of the .attr one. It’s so much similar that it’s a rather common mistake to pass as attribute what should be a style and vice-versa! Like with attributes, it is possible to pass an object with keys and values to set several style properties at once.

d3.selectAll("rect").style("fill", "red");
d3.selectAll("rect").style({stroke: "#222", opacity: .5});

In the world of DOM and vanilla javascript, style is a property of the HTMLElement / SVGElement objects. You can set style properties one at a time:

rect = document.querySelector("rect");
rect.style.fill = "red";
rect.style.stroke = "#222";
rect.style.opacity = .5;

Technically, .style returns a CSSStyleDeclaration object. This object maintains a “live” link to what it describes. So:

myStyle = rect.style;
rect.style.fill = "yellow";
myStyle.fill; // "yellow"

Finally, the window object has a getComputedStyle method that can get the computed styles of an element, ie how the element is actually going to get drawn. By contrast, the style property and the d3js style method only affect the inline styles of an element and are “blind” to styles of its parents.

myStyle = window.getComputedStyle(rect, null);
myStyle.fill; // "yellow"

Adding and removing events

In d3js, we have the very practical method “on” which let users interact with elements and can trigger behavior, such as transformations, filtering, or, really, any arbitrary function. This is where creating visualizations with SVG really shines because any minute interaction with any part of a scene can be elegantly intercepted and dealt with. Since in d3js, elements can be tied with data, the “on” methods takes that into account and passes the data element to the listener function. One of my favorite tricks when I’m developing with d3js and SVG is to add somewhere towards the end the line:

d3.selectAll("*").on("click", function(d) {console.log(d);})

Which, as you may have guessed, displays the data item tied to any SVG element the user could click on.

In the world of the DOM, the object to which events methods are attached in the EventTarget. Every Element is also an EventTarget (and EventTarget could be other things that handle events too, like xhr requests).

To add an event listener to an element, use the addEventListener method like so.

document
  .querySelector("rect")
  .addEventListener("click", function() {
     console.log("you clicked the rectangle!"
   }, false);

The first parameter is the type of event to listen to (just as in “on”), the second is the listener function proper. The third one, “use capture”, a Boolean, is optional. If set to true, it stops the event from propagating up and being intercepted by event listeners of the parents of this element.

There is also a “removeEventListener” method that does the opposite, and needs the same elements: in other words, yes, you need to pass the same listener function to be able to stop listening to the element. There is no native way to remove all event listeners from an element, although there are workarounds.

Creating and removing elements

Selecting and modifying elements is great, but if you are creating a visualization, chances are that you want to create elements from scratch.

Let’s first talk about how this is done in the DOM/javascript, then we’ll better understand the data joins and d3 angle.

Node objects can exist outside of the hierarchy of the DOM. Actually, they must first be created, then be assigned to a place in the DOM.

Until a Node object is positioned in the DOM, it is not visible. However, it can receive attributes, styles, etc. Likewise, a Node object can be taken from the DOM, and still manipulated.

To create an HTML element, we can use the document.createElement() method:

var myDiv = document.createElement("div");

However, that won’t work for SVG elements – remember in an earlier example, we used the createElementNS method. This is because SVG elements have to be created in the SVG namespace. d3js old-timers may remember that in the first versions, we had to deal with namespaces when creating elements in d3js as well, but now this all happens under the hood.

Anyway, in vanilla javascript, this is  how it’s done:

var svgns = "http://www.w3.org/2000/svg";
var myRect = document.createElementNS(svgns, "rect");

Warning, because document.createElement(“rect”) will not produce anything useful as of this writing.

Once the new Node objects are created, in order to be visible, they should be present in the DOM. Because the DOM is a tree, this means that they have to have a parent.

svg.appendChild(myRect);

Likewise, to remove a Node from the DOM means to sever that relationship with its parent, which is done through the removeChild method:

svg.removeChild(myRect);

Again, even after a Node has been removed, it can still be manipulated, and possibly re inserted at a later time.

Nodes don’t remove themselves, but you can write:

myRect.parentNode.removeChild(myRect);

In contrast, here is how things are done in d3js.

The append method will add one element to a parent.

d3.select("svg").append("rect");

The remove method will remove one entire selection object from the DOM.

d3.select("svg").selectAll("rect").remove(); // removes all rect elements which are children of the SVG element

But the most intriguing and the most characteristic way to create new elements in d3js is to use a data join.

d3.select("svg")
  .selectAll("rect")
  .data(d3.range(5))
  .enter()
  .append("rect");

The above snippet of code counts all the rect children of the svg element, and, if there are fewer than 5 – the number of items in d3.range(5), which is the [0,1,2,3,4] array – creates as many as needed to get to 5, and binds values to those elements – the contents of d3.range(5) in order. If there are already 5 rect elements, no new elements will be created, but the data assignment to the existing elements will still occur.

Data joins, or the lack thereof

The select / selectAll / data / enter / append sequence can sound exotic to people who learn d3js, but to its practitioners, it is its angular stone. Not only is it a quick way to create many elements (which, in vanilla javascript, takes at least 2 steps. Creating them, and assigning them to the right parent), but it also associates them with a data element. That data element can then be accessed each time the element is being manipulated, notably when setting attributes or styles and handling events.

For instance,

d3.selectAll("rect")
  .attr("x", function(d) {return 20 * d;});

the above code utilizes the fact that each of the rectangle have a different number associated with them to dynamically set an attribute, here position rectangles horizontally.

d3.selectAll("rect")
  .on("click", function(d) {console.log(d);})

A trick I had mentioned above, but which illustrates this point: here by clicking on each rectangle, we use the data join to show the associated data element.

Having data readily available when manipulating elements in d3js is extremely convenient. After all, data visualization is but the encoding of data through visual attributes. How to perform this operation without the comfort of data joins?

Simply by going back to the dataset itself.

Consider this:

var data = [];
var i;
for (i = 0; i < 100; i++) {
  data.push({x: Math.random() * 300, y: Math.random() * 300}); // random data points
}

// d3 way
var d3svg = d3.select("body").append("svg");
d3svg.selectAll("circle").data(data).enter().append("circle")
  .attr({cx: function(d) {return d.x;}, cy: function(d) {return d.y}, r: 2})
  .style({fill: "#222", opacity: .5});

// vanilla js way
var svgns = "http://www.w3.org/2000/svg";
var svg = document.createElementNS(svgns, "svg");
document.querySelector("body").appendChild(svg);
for (i = 0; i < 100; i++) {
  var myCircle = document.createElementNS(svgns, "circle");
  myCircle.setAttribute("cx", data[i].x);
  myCircle.setAttribute("cy", data[i].y);
  myCircle.setAttribute("r", 2);
  myCircle.style.fill = "#222";
  myCircle.style.opacity = .5;
  svg.appendChild(myCircle);
}

See the Pen eNzaYg by Jerome Cukier (@jckr) on CodePen.

Both codes are equivalent. Vanilla JS is also marginally faster, but d3 code is much more compact. In d3js, the process from dataset to visual is:

  • Joining dataset to container,
  • Creating as may children to container as needed, [repeat operation for as many levels of hierarchy as needed],
  • Use d3 selection objects to update the attributes and styles of underlying Node objects from the data items which have been joined to them.

    In contrast, in vanilla Javascript, the process is:

  • Loop over the dataset,
  • create, position and style elements as they are read from the dataset.

For visuals with a hierarchy of elements, the dataset may also have a hierarchy and could be nested. In this case, there may be several nested loops. While the d3js code is much more compact, the vanilla approach is actually more simple conceptually. Interestingly, this is the same logic that is at play when creating visualization with Canvas or with frameworks like React.js. To simply loop over an existing, invariant dataset enables you to implement a stateless design and take advantage of immutability. You don’t have to worry about things such as what happens if your dataset changes or the status that your nodes are in before creating or updating them. By contrast most operations in d3js assume that you are constantly updating a scene on which you are keeping tabs. In order to create elements, you would first need to me mindful on existing elements, what data is currently associated with them, etc. So while the d3js approach is much more convenient and puts the data that you need at your fingertips, the vanilla JS approach is not without merits.

Loading files

The first word in data visualization is data, and data comes in files, or database queries. If you’re plotting anything with more than a few datapoints, chances are you are not storing them as a local variable in your javascript. d3js has a number of nifty functions for that purpose, such as d3.csv or d3.json, which allow to load the files asynchronously. The trick in working with files is that it can take some time, so some operations can take place while we wait for the files to load, but some others really have to wait for the event that the file is loaded to start. I personally almost always use queue.js, also from Mike Bostock, as I typically have to load data from several files and a pure d3 approach would require nesting all those asynchronous file functions. But, for loading a simple csv file, d3js has a really simple syntax:

d3.csv("myfile.csv", function(error, csv) {
  // voila, the contents of the file is now store in the csv variable as an array
})

For reference, using queue js, this would look like

queue()
 .defer(d3.csv, "myFirstFile.csv")
 .defer(d3.csv, "mySecondFile.csv")
 .await(ready);

function ready(error, first, second) {
  // the contents of myFirstFile is stored as an array in the variable "first",
  // and the contents of mySecondFile are in the variable "second".
}

The way to do the equivalent in vanilla Javascript is to use XMLHttpRequest.

function readFile() {
  var fileLines = this.responseText.split("\n");
  var fields = fileLines[0].split(",");
  var data = fileLines.slice(1).map(function(d) {
    var item = {};
    d.split(",").forEach(function(v, i) {item[fields[i]] = v;})
    return item;
  })

  var request = new XMLHttpRequest();
  request.onload = readFile;
  request.open("get", "myFile.csv", true);
  request.send();

The syntax of loading the file isn’t that cumbersome, and there are tons of nice things that can be done through XMLHttpRequest(), but let’s admit that d3js/queue.js functions make it much more comfortable to work with csv files.

Animations

d3js transitions is one of my favorite part of the library. I understand it’s also one the things which couldn’t be done well in protovis and which caused that framework to break. It feels so natural: you define what you want to animate, all that needs to change, the time frame and the easing functions, and you’re good to go (see my previous post on animations and transitions). In native javascript, while you can have deep control of animations, it’s also, unsurprisingly, much more cumbersome. However, CSS3 provides an animation interface which is comparable in flexibility, expressiveness and ease of use to what d3js does. First let’s get a high-level view of how to do this entirely within JS. Then let’s get a sense of what CSS can do.

requestAnimationFrame and animation in JavaScript

JavaScript has timer functions, window.setTimeout and window.setTimeinterval, which let you run some code after a certain delay or every so often, respectively. But this isn’t great for animation. Your computer draws to screen a fixed number of times per second. So if you try to redraw the same element several times before in between those times, it’s a waste of resources! What requestAnimationFrame does is tell your system to wait for the next occasion to draw to execute a given function. Here’s how it will look in general.

function animate(duration) {
  var start = Date.now();
  var id = requestAnimationFrame(tick);
  function tick() {
    var time = Date.now();
    if (time - start < duration) {
       id = requestAnimationFrame(tick);
       draw(time - start / duration);
    } else {
      cancelAnimationFrame(id);
    }
  }
  function draw(frame) {
    // do your thing, update attributes, etc.
  }
}

 

See the Pen PqzgLV by Jerome Cukier (@jckr) on CodePen.

OK so in the part I commented out, you will do the drawing proper. Are you out of the woods yet? well, one great thing about d3js transitions is that they use easing functions, which transform a value between 0 and 1 into another value between 0 and 1 so that the speed of the animation isn’t necessarily uniform. In my example, you have (time – start) / duration represents the proportion of animation time that has already elapsed, so that proportion can be further transformed.

So yay we can do everything in plain javascript, but that’s a lot of things to rewrite from scratch.

CSS3, animations and transitions

(This is not intended to be an exhaustive description of animations and transitions, a subject on which whole books have been written. Just to give those who are not familiar with it a small taste).

In CSS3, anything you can do with CSS, you can time and animate. But there are some caveats.

There are two similar concepts to handle appearance change in CSS: animations and transitions. What is the difference?

  • with animations, you describe a @keyframes rule, which is a sequence of states that happen at different points in time in your transition. In each of these events, any style property can be changed. The animation will transform smoothly your elements to go from one state to the next.
  • in transitions, you specify how changes to certain properties will be timed. For instance, you can say that whenever opacity changes, that change will be staged over a 1s period, as opposed to happen immediately.

Both approaches have their uses. CSS3 animations are great to create complex sequences. In d3js, that requires to “chain transitions”, which is the more complex aspect of managing them. By contrast, going from one segment of the animation to another is fairly easy to handle in CSS3. Animations, though, require the @keyframes rule to be specified ahead of time in a CSS declaration file. And yes, that can be done programmatically, but it’s cumbersome and not the intent. The point is: animations work better for complex, pre-designed sequence of events.

Transitions, in contrast, can be set as inline styles, and work fine when one style property is changed dynamically, a scenario which is more likely to happen in interactive visualizations.

Here’s how they work. Let’s start with animations.

See the Pen vOKKYN by Jerome Cukier (@jckr) on CodePen.

Note: as of this writing, animation-related CSS properties have to be vendor prefixed, i.e. you have to repeat writing these rules for the different browsers. Here’s a transition in action.

See the Pen xGOqOR by Jerome Cukier (@jckr) on CodePen.

For transition, you specify one style property to “listen” to, and say how changes to that property will be timed, using the transition: name of property + settings. (in the above example: transition: transform ease 2s means that whenever the “transform” style of that element changes, this will happen over a 2s period with an easing function). 

One big caveat for both CSS animations and transitions is that they are limited to style properties. In HTML elements this is fine because everything that is related to their appearance is effectively handled by style: position, size, colors, etc. For SVG, however, color or opacity are styles, like in HTML, but positions, sizes and shapes are attributes, and can’t be directly controlled by CSS. There is a workaround for positions and sizes, which is to use the transform style property.

But wait: isn’t transform an SVG attribute as well? that’s right. And that’s where it can get really confusing. Many SVG elements are positioned through x and y properties (attributes). They can also have a transform property which is additive to that. For instance, if I have a <rect> which has an x property of 100 and a transform set at “translate(100)”, it will be positioned 200px right of its point of origin. But on top of that, SVG elements can have a transform style which affects pretty much the same things (position, scales, rotation…) but which has a slightly different syntax (“translate(100)”, for instance, wouldn’t work, you’d have to write “translateX(100px)”). What’s more, the transform set in the style doesn’t add to the one set in the properties, but it overrides it. If we add a “transform: translateX(50px)” to our <rect>, it will be positioned at 150px, not 200px or 250px. Another potential headache is that some SVG elements cannot support transform styles.
While any of these properties can be accessed programmatically, managing their potential conflicts and overlaps can be difficult. In the transition example above, I have used the transform/translateX syntax.

That said, a lot of awesome stuff can be done in CSS only. For scripted animations, the animation in pure CSS is definitely more powerful and flexible than the d3js equivalent, however, when dealing with dynamically changing data, while you can definitely handle most things through CSS transitions, you’ll appreciate the comfort of d3js style transitions.

See the Pen GJqrLO by Jerome Cukier (@jckr) on CodePen.

Now a common transformation handled by d3js transitions is to transform the shape of path shapes. This is impossible through CSS animations/transitions, because the shape of the path – the “d” – is definitely not a style property. And sure, we can use a purely programmatic approach with requestAnimationFrame but is there a more high level way?
It turns out there actually is – the animation element of SVG, or SMIL. Through SMIL, everything SVG can be animated with an interface, this includes moving an object along a path, which I wouldn’t know how to do on top of my head in d3js. Here is an extensive explanation of how this works and what can be done.

Data processing, scales, maps and layouts

For the end of the article let’s talk about all of which could technically be done without d3js, but in a much, much less convenient way. Therefore, I won’t be discussing alternatives with vanilla Javascript, which would be very work intensive and not necessarily inventive.

Array functions and data processing

d3js comes with a number of array functions. Some are here for convenience, such as d3.min or d3.max which can easily be replaced by using the native reduce method of arrays. When comparing only two variables, d3.max([a, b]) is not much more convenient than Math.max(a,b) or a > b ? a : b.

Likewise, d3js has many statistical functions, which saves you the trouble to implement them yourself if you need them, such as d3.quantile. There are other libraries who do that, but they’re here and it’s really not useful to recode that from scratch.

d3js comes with shims for maps and sets, which will be supported by ES6. By now, there are transpilers which can let you use ES6 data structures. But it’s nice to have them.

In my experience, d3js most useful tool in terms of data processing is the d3.nest function, which can transform an array of objects into a nested object. (I wrote this article about them).  Underscore has something similar. While you can definitely getting a dataset of any shape and size by parsing an array and performing any grouping or operations manually, this is not only very tedious but also extremely error prone.

Scales

Scales are one superb feature of d3js. They are simple to understand, yet very versatile and convenient. Oftentimes, d3 scales, especially the linear ones, are a replacement for linear algebra.

d3.scale.linear().range([100, 500]).domain([0, 24])(14);
((14 - 0) / (24 - 0)) * (500 - 100) + 100; // those two are equivalent (333.33...)

but changing the domain or the range of a scale is much safer using the scale than adhoc formulas. Add to this scale goodness such as the ticks() method or nice() to round the domain, and you get something really powerful.

So, of course it is possible (and straightforward, even) to replace the scales but that would be missing out one of the best features of d3js.

Maps

d3js comes with a full arsenal of functions and methods to handle geographic projections, ie: the transformation of longitude/latitude coordinates into x,y positions on screen. Those come in two main groups, projections proper that turn an array of two values (longitude, latitude) into an array of two values (x, y). There are also paths functions that are used to trace polygons, such as countries, from specially formatted geographic files (geoJSON, topoJSON).

The mercator projection may be straightforward to implement, But others are much less so. The degree of comfort that d3js provides when visualizing geographical data is really impressive.

Layouts

d3js layouts, that is special visual arrangements of data, were in my opinion one of the key drivers of protovis (where they originated) then d3js adoption. Through layouts, it became simple to create, with a few line of codes, complex constructions like treemaps or force-directed networks. Some layouts, like the pie chart or the dendogram, are here for convenience and could be emulated. Others, and most of all the force layout, are remarkably implemented, efficient and versatile. While they are called different names in d3js, geometry functions such as voronoi tessellation or convex hulls are similar functionally and there is little incentive in reproducing what they do in plain javascript.

Should I stop using d3?

d3js is definitely the most advanced javascript visualization library. The point of this article is not to get you to stop using it, but rather, to have a critical thinking in your code. Even with the best hammer, not everything is a nail.

To parse the DOM, manipulate classes and listen to events, you probably don’t need a library. The context of your code may make it more convenient to use d3 or jQuery or something else, but it’s useful to consider alternatives.
The concept of the data join unlocks a lot of possibilities in d3js. A good understanding of data join would lead you to implement your visualization much faster, using more concise code and replicable logic. It also makes trouble shooting easier. Data joins are especially useful if you have a dataset which is structured like your visualization should be, or if you plan to have interaction with your visualization that requires quick access with the underlying data. However, data joins are not necessary in d3js, or in visualization in general. In many cases, it’s actually perfectly sensible to parse a dataset and create a visual representation in one fell swoop, without attaching the data to its representation or overly worrying about updating.

Assuming you have d3js loaded, nothing prevents you from creating elements using d3js append methods instead of vanilla javascript. Or to listen to events using addEventListener rather than with d3js on method. It’s totally ok to mix and match.

Like data joins, transitions are a very powerful component of d3js, and, once you’re comfortable with them, they are very expressive. There are other animation frameworks available though, which can be better adapted to the task at hand.
Scale, maps, layouts and geometries are extremely helpful features however and I can think of no good reason to reimplement them.

Credit where it’s due

To write this I drew inspiration from many articles and I will try to list them all here.
The spark that led me to write was Lea Verou’s article jQuery considered harmful, as well as articles she cites (you might not need jQuery, you don’t need jQuery, Do you really need jQuery?, 10 tips for writing JavaScript without jQuery).

Most of the information I used especially in the beginning of the article comes more or less directly from MDN documentation, and direct experimentation. For CSS and animation, I found the articles of Chris Coyier (such as this one or this one) and Sara Soueidan (here) on CSS Tricks to be extremely helpful. Those are definitely among the first resources to check out to go deeper on the subject. Sara was also the inspiration behind my previous post, so thanks to her again!

Finally, I’ve read Replacing jQuery with d3 with great interest (like Fast interactive prototyping with d3js and sketch about a year ago). It may seem that what I write goes in the opposite direction, but we’re really talking about the same thing -that there are many ways to power front ends and that it’s important to maintain awareness of alternative methods.

 

Getting to “Hello world” with d3

Back when I started learning programming, it was always fairly simple to achieve the canonical first step of accomplishments, that is, to get the system to announce that you are ready to do more by displaying “hello world” on the screen.

In most systems then, there was a command prompt somewhere that would usually do that when you would type, say:

PRINT "hello world"

.

Things have changed a lot since the early 80s. In some fields like fashion, I would argue it’s a good thing, but we’re definitely not going in the way of less complexity.

Now if you’re interested in web-oriented visualization and want to do it with d3.js, it’s still fairly simple, but it is built upon a number of technologies that you’re supposed to know a little. Front-end developers live and breathe the web and have been exposed to all things javascript, HTML, CSS, you name it, in enormous doses. Many developers probably have, at some point, tried to interface with the web and know enough of that to get started. So for this crowd, the amount of things you need to know to crack d3 code seems negligible, because they know all that and they are very familiar with it, just as well as people knew the first names of Friends characters by the end of the tenth season.

But what about those who didn’t? and the people who don’t see themselves as developers ? do they have to reimmerse themselves in 10 -odd years of web development history to get started? It turns out that this sum of knowledge, while not insurmountable, is certainly not trivial.

So without further ado, let’s get started

We’re cooking an omelette

And when we do, we need a few things: a pan, a recipe, eggs and stuff, a stove and then plates, knives and forks, etc.

The pan: a text editor

The first thing is really the pan. If you don’t have one when cooking eggs, you borrow one or go buy one. In our analogy the pan is the text editor. This is the tool with which you are going to make the files that will constitute your visualization.

There was a time when it was ok to use notepad (textedit if you are of the apple persuasion). And it’s still possible, but you are not making your life easier. What I recommend instead is that you get a hold of a copy of SublimeText2. (http://www.sublimetext.com/2). There are windows versions. And Mac versions. And linux versions. For windows users, there is a mobile version so you don’t need administrator access to install it. There is a free, unlimited evaluation version,  but unless you can’t spend $69, I strongly recommend that you buy it. Sublime Text 2 has a nearly infinite amount of niceties built in. And unlike some other powerful text editors, where the best features are only understandable by the tech masters, what’s really nice about Sublime Text 2 is that it would make you gain time even if you are an absolute beginner. One such nice things that it does is detect what language you are working with, automatically color and format the words as you type them depending on the category they fall in, and when possible, suggesting the word you are trying to type, automatically format and indent your code, all in a very unobtrusive and pleasant way. This will really help you troubleshoot problems like strings not closed properly or loose closing bracket which typically consume a lot of time.

Let’s type a fairly common d3 statement to see how SublimeText2 can help. First, it recognized the var keyword as such and writes it in italics and cyan. Second, when I type my opening parenthesis, it adds a closing one, and as long as my cursor touches either it underlines them both.

Let’s carry on. The function keyword is highlighted in italics cyan too – useful. The opening/closing thing works for curly braces too.

The return statement is highlighted in red. With the cursor on the closing parenthesis, we are starting to get a feel that the underlining function is a useful safety net

New line. Joy! the indentation is aligned with the line above.

We now have four consecutive opening or closing curly braces and parentheses. Typically, this is where errors sneak in, and where sublime text 2 really shines.

And we now have 5 consecutive closing curly braces and parentheses. This is fairly common in d3 code. Is the order correct? Thank you Sublime Text 2!

we finish up writing the statement.

When moving the cursor to the left side, where the line numbers are, we notice down-pointing arrows. We know our code is correct, and we don’t want to see it again, so…

we just click on the top one to collapse this section. If we need to edit it again we can expand it.

Finally, we add a comment above. Notice the syntax highlighting, comments are colored with an unobtrusive dark grey.

The recipe: a basic file structure

In d3, you can’t really type a “print” command from a prompt. You need to write some files, which are loaded by a browser (that’s your “plate” in the metaphor, but let’s not get ahead of ourselves).

You are going to need up to 5 types of files.

First, an html file. This will be the file that your browser will read, either locally, or uploaded on a website. We’ll get to cover this in detail in a minute.

Second, believe it or not, you are going to need the d3 library, which is also a file. You may link to the version on the d3js.org site, and so not worry about having the actual file handy. That has advantages (like the one we just said, also, you’re pretty sure to always have the latest version on hand), and two problems. First, you always need to have a live internet connection, so there’s no working in the park outside of free wifi space (for example), and also, it will probably be slower than having the file locally or on your own web space. And if having your own web server seems kind of scary, I’ll show you in a short while that it’s not.

The three next kind of files are optional, but hey.

The third file is a javascript .js file which would be where you put your code. Some people would rather put all their code in the html file, which is an option, especially for short programs. Personally, I prefer having a separate file. So to make d3 work, you need some script, but it doesn’t have to be in a separate file.

The fourth file is a style sheet, or css file. This can be used to define some formatting options, for instance to make all your circles blue by default, or some circles that meet some pre-defined criterion. Like the javascript file, any style information can be contained within the html file, but unlike the script, it is completely optional. I also like to keep it separate from the html.

Finally, you may want a data file, you know, with data (csv, txt, json, xml…). If you have lots of data to visualize, it’s easier to keep it in separate files than in variables within the script. But it doesn’t have to be that way. And you could also use d3 without data.

The ingredients: contents of the files

The HTML file

So let’s see how this articulates by looking at a typical d3 html file. I am using templates which I try to change as little as possible from project to project.

<!DOCTYPE html>
<html>
 <head>
   <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
   <title>My project</title>
   <script type="text/javascript" src="../d3.v2.js"></script>
   <link href="style.css" rel="stylesheet">
 </head>
 <body>
   <div id="chart">
   </div>
   <script type="text/javascript" src="script.js"></script>
 </body>
</html>

Well. That is certainly longer than the BASIC one-liner (and we haven’t even printed “hello world” yet).

Let’s take this piece by piece.

The first line is a doctype declaration. What this does is that it tells your browser that what follows should be interpreted as standard, HTML5-compliant HTML (standards mode). If you omit the doctype documentation, your browser will read the html in “quirks mode“, i.e. by replicating the non-standard behavior of Nescape 4 or IE5. You can still try to run d3 under quirks mode, but don’t be surprised if your HTML doesn’t behave as expected.

The doctype declaration doesn’t have to be more complicated than <!DOCTYPE html>.

The second line opens the html document proper. Technically, it’s ok to omit <html>, <head> and <body> tags in HTML5. The document will still be considered valid by tools like the W3C validator. But it seems that some browsers, in some complex cases, don’t like that so much, and I as a person find it more convenient to find those tags when reading code.

The next line opens the header section of the document. Again, it’s not absolutely necessary, but I consider it helpful to explicitly differentiate the header from the rest of the document.

The next line, which goes

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

is not absolutely required either. It specifies the encoding of the page, that is, what kind of characters will be seen in the page. Since I use non-ascii characters often, being French and all, I make sure to use it all the time. After all, this is a template, not something I type from beginning to end each time.

Next, we specify a title. This is what will appear in the title area of your browser, or, more likely, as the name of your tab.

In the next line, we load the d3 library. This is my preferred syntax. This is how my files are set up:

I have a directory where all my d3 projects are, and in this directory, I am also keeping (and maintaining reasonably up-to-date) a version of the d3 library, a file called d3.v2.min.js. (min stands for minified, which means that it’s not meant to be read by persons, but it’s faster to load). All my projects proper are in folders within that directory. So my html files are one level down from where the d3.v2.min.js file is kept. This is why the src attribute reads “../d3.v2.min.js”: the ../ part means, look one level up. If the d3.v2.min.js file were on the same directory where I keep my html, I would write src=”d3.v2.min.js”, if I kept it within a specific directory like “d3″, I could write src=”../d3/d3.v2.min.js”, and finally, there is always the option of getting it from the website, src=”http://d3js.org/d3.v2.min.js”.

I don’t have to load the d3 library then. I could have done it at the end of the page. The only requirement is that it should be before the script that will use it. But honestly, the file is so small that it doesn’t make much of a difference (9ms on my machine).

Next, I link to a style sheet. With this syntax, I am assuming that my style is specified in a file called style.css which will be in the same directory as this html page. And if there is no such file, it’s not a problem. It doesn’t prevent the page to load.

Instead of using this syntax, I could have written:


<style>

... // my style definitions

</style>

in the html file. And frankly, it is sometimes more convenient. But again, for the general case, it’s just as well to leave it like this.

Note that style information should always be in the header part of the file.

And that concludes the header, as noted by the closing tag </head>. Even if we use the <head> tag to mark the beginning of the header section, we may omit the closing tag </head>, and still get away with a valid (and slightly shorter) document, but I keep it for clarity’s sake.

The next part starts with <body>, and is where the content proper, which will get displayed on the screen, is described. <body> and </body>, just like <html> and <head>, are not mandatory, but do help, somewhat, to make the document easier to read.

So what do we find in the body section? Here, I’ve kept it very simple but also close to the conventions I use.

There is one <div> element, which is the basic building block of HTML, and with an id attribute – a document-wide, unique identifier – called “chart”.

Then, there is the <script> element, which is calling the javascript code we are going to use to create our visualization. It’s at the very bottom of the page, actually just before the closing tags (which, again, could be omitted, but let’s not).

Like for the style element, it is possible to leave the script inside the html document. Instead of using a src attribute – which, incidentally, assumes that the script is within the same directory as the html document with this syntax -, we can write:

<script>

// all our javascript instructions

</script>

And that’s it for the html document! A final word about the contents of the <body> element. In most of my projects, there is an interface such as buttons or controls which is also done in HTML. In that case, the contents of the <body> element get more complex. I would add a button to tweet the page, copyright notices, and other stuff. But I almost always have a <div> element with an identifier named “chart”.

ok, so now that you’re finished with writing your html file, you must save it under any name and use the “.html” extension (or .htm, but why no love for the l? why?)

The javascript file

In this section I will walk you through a very, very basic file, which includes things I do for every project.

var w=960,h=500,
svg=d3.select("#chart")
.append("svg")
.attr("width",w)
.attr("height",h);

var text=svg
.append("text")
.text("hello world")
.attr("y",50);

I like to define variables that describe the width and length of the visualization that I am creating. By putting these in variables, at the beginning of the file, I can easily modify them in case I need to. 960 and 500 work well for visualizations that should appear on their own page, by the way. No scrolling should be necessary.

The next statement use the d3.select construct. Here, it indicates that we are going to build something on top  of the element that meets the criterion that is described between the parentheses. The syntax used by that is that of css selectors, but long story short, #chart refers to whatever has an “id” attribute of “chart”. This is our lone <div> element in the html file. Then, we are going to add an svg element, which is what will hold the visualization proper in svg form, and give it a width of w and a height of h.

I always use that syntax, an “svg” variable that holds the top-level svg container, which resides in a <div> element which has an id of “chart”.

The final part of the file writes, finally, hello world proper. Note that I specify a y attribute (vertical position) else the text have its lower-left corner in the top-left corner of the browser window and will be effectively invisible.

Now, the HTML file we just created expect this file to be called “script.js”, so let’s save it under this name.

In this most simple example, we will not need a css file nor a data file. But, for the sake of discussion, let’s create a css file nonetheless.


text:{font-size:36px;}

and let’s save this under style.css (the name that, again, our HTML file expects). What this does is that it changes the size of the font to whatever the default was to a more massive 36 pixels.

The stove: a web server

As far as writing hello world, we’re done. You can load the html file you created in a browser, you should see the encouraging inscription. Congratulations!

Many visualizations can be seen in a browser directly, just by opening a local file. However, this won’t be the case for some, for instance, those who require external data. In that case, you need a web server. If you have web hosting, you may upload the files to your (remote) server, via FTP for instance, and see your visualization by typing the address of your site in the browser url bar. That said, it is a good idea to have a local web server, that is, one that runs on your computer, so you can view your files as if they were served by a web server, but with the added bonus that you can edit them and see the modifications directly without having to upload them each time you change them.

On Macs, you’re pretty much all set. All you have to do is enable web-sharing in your system preferences. Then, http://localhost/~YOURNAME will point to /Users/YOURNAME/Sites where YOURNAME is your user name. Just put your files there and go at it.

For windows, there are a bunch of solutions. The “Professional” versions of windows include the IIS web server, so, there. But beyond that, there is a lot of web server software available. I personally use EasyPHP. EasyPHP comes up with a web server (Apache), a mySql database, a PHP preprocessor and other niceties. And, as an aside, it doesn’t require administrator rights, for you corporate users.

EasyPHP installation is a breeze. When it’s on, by default, http://localhost/ points to the www/ directory in the install directory of EasyPHP, so you may want to install it in a place that suits you. Alternatively, you can create aliases in the admin panel of EasyPHP (http://localhost/home/index.php), in other words to give a name to any part of your hard drive. This is what I do, I put all my projects there and have a shortcut to that name in my browser, so whenever I want to see a project I use that shortcut and I can see the visualization as if it were on the web.

This is how you create aliases in EasyPHP.

The plate: a browser

We’ve talked browsers before, and chances are you have one (or several) on your computer.

Now I wish that by browsers, we could just skip it and mean “the latest version of chrome”, but it turns out that there are slight differences in the way that browsers handle d3 code so you should really test your work in at least chrome and firefox. As of this writing, Chrome + Firefox (version 5 and up) represent just under 50% of the browser market share. If you add all browsers that are d3-capable (Safari, earlier versions of Firefox, Opera, IE9) you reach about 75% of the market. Sadly, IE8 and IE7 which account for slightly over 20% of the market are not d3-compatible, though they can use the Google ChromeFrame free plug-in and do pretty much all that chrome does.

Knives and forks: the console

At the beginning of my dad’s engineering career, code came on a punch card. People then, allegedly, thought it through. You didn’t want to be the kid who didn’t follow your algorithm carefully enough to forecast an avoidable bug and waste a perfectly good card and oh-so precious computing time.

But now? no code is perfect by the time it hits the browser. You may want to launch incomplete code to get a feel for where you’re going. You may not be too sure of whether that should be a plus or a minus in that equation and just try either because it would be quicker to correct an unexpected outcome than to troubleshoot the formula on paper. You may want to iterate, to bring newer, more complex ideas to your visualization with each change to the code. Or just try out different aesthetic options.

Not too long ago, debugging javascript was really a pain. You’d have to fire those annoying alert boxes to understand what was the value of the variables, and dispatch them manually. Fortunately, that time is gone and now is the age of the Console.

There are console functionalities for Chrome, Firefox and Safari, and while the interface slightly varies, the idea is the same. The console allows you to do three main things:

– first, to see if your code executed without errors or warning. Some of those messages can be generated by javascript, and some can be added by you if certain unfortunate conditions are met. You get the position of the error in your code, which helps you to understand what went wrong and fix it.

I have planted an error at the end of the code and it’s been picked up by the console which explains what’s wrong and when. Notice the red cross in the lower-right corner which counts errors. If there were warning, they would be indicated by a yellow triangle.

– second, to inspect elements, that is to find out all the information about the elements displayed on screen, even if (especially if) they have been generated at runtime. So you can see if those elements you really wanted to create have been indeed added, and if the right attributes have been passed.  third, to interact with the code after it’s run (or while it runs, if you manage to pause if with breakpoints). The most common use of this is, IMO, is to check the value of variables, which you can do simply by typing their name at the console command prompt. But you can also type in one-liner javascript statements, even if they are quite complicated. So it’s a way to test your code before you write it in your script file.

What a relief! all those paths elements that were supposed to be created in the code have been added as expected.

– third, it can be used to interact with the code after it’s run (or even during run-time, because you can pause the code with breakpoints using the console, but we won’t go into that). The most common use for that IMO is to check the value of variables, which can be changed during the code execution, but it can also be used to enter one-liner statements, which can be quite complicated. Such a use allows you to test and preview code hypothesis before you write it down in your script file, or to troubleshoot a problem that you could have difficulties seeing outside of the context.

Here, I am using the console to check the value of one variable, and to enter a statement that turns all the shapes orange.

Voilà! the last thing you need when you cook food is people to share it with, same goes for visualizations!

 

d3: adding stuff. And, oh, understanding selections

From data to graphics

the d3 principle (and also the protovis principle)
d3 and protovis are built around the same principle. Take data, put it into an array, and for each element of data a graphical object can be created, whose properties are derived from the data that was provided.

Only d3 and protovis have a slightly different way of adding those graphical elements and getting data.

In protovis, you start from a panel, a protovis-specific object, to which you add various marks. Each time you add a mark, you can either:

  • not specify data and add just one,
  • or specify data and create as many as there are items in the array you pass as data.

.

How de did it in protovis

var vis=new pv.Panel().width(200).height(200); 
vis.add(pv.Panel).top(10).left(10)
  .add(pv.Bar)
    .data([1,4,3,2,5])
    .left(function() {return this.index*20;})
    .width(15)
    .bottom(0)
    .height(function(d) {return d*10;});
vis.render();

this simple bar chart in protovis
you first create a panel (first line), you may add an element without data (here, another panel, line 2), and add to this panel bars: there would be 5, one for each element in the array in line 4.

And in d3?

In d3, you also have a way to add either one object without passing data, or a series of objects – one per data element.

var vis=d3.select("body").append("svg:svg").attr("width",200).attr("height",200);
var rect=vis.selectAll("rect").data([1,4,3,2,5]).enter().append("svg:rect");
rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}
  .attr("fill","steelblue");

In the first line, we are creating an svg document which will be the root of our graphical creation. It behaves just as the top-level panel in protovis.

However we are not creating this out of thin air, but rather we are bolting it onto an existing part of the page, here the tag. Essentially, we are looking through the page for a tag named and once we find it (which should be the case often), that’s where we put the svg document.

Oftentimes, instead of creating our document on , we are going to add it to an existing <div> block, for instance:

<div id="chart"></div>
<script type="text/javascript">
var vis=d3.select("#chart").append("svg:svg");
...
</script>

Anyway. To add one element, regardless of data, what you do is:

The logic is : d3.select(where we would like to put our new object).append(type of new object).

Going back to our code:

var vis=d3.select("body").append("svg:svg").attr("width",200).attr("height",200);
var rect=vis.selectAll("rect").data([1,4,3,2,5]).enter().append("svg:rect");
rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}
  .attr("fill","steelblue");

On line 2, we see a different construct:

an existing selection, or a part of the page
.selectAll(something)
.data(an array)
.enter()
.append(an object type)

This sequence of methods (selectAll, data, enter and append) are the way to add a series of elements. If all you need to know is to create a bar chart, just remember that, but if you plan on taking your d3 skills further than where you stopped with protovis, look at the end of the post for a more thorough explanation of the selection process.

Attributes and accessor functions

At this stage, we’ve added our new rectangles, and now we are going to shape and style them.

rect.attr("height",function(d) {return d*20;})
  .attr("width", 15)
  .attr("x",function(d,i) {return i*20;})
  .attr("y",function(d) {return 100-20*d;}
  .attr("fill","steelblue");

All the attributes of a graphical element are controlled by the method attr(). You specify the attribute you want to set, and the value you want to give.
In some cases, the value doesn’t depend on the data. All the bars will be 15 pixels wide, and they will all be of the steelblue color.
In some others, the value do depend on the data. We decide that the height of each bar is 20 times the value of the underlying data, in pixels (so 1 becomes 20, 5 becomes 100 etc.). Like in protovis, once data has been attributed to an element, function(variable name) enables to return a dynamic value in function on that element. By convention, we usually write function(d) {…;} (d for data) although it could be anything. Those functions are still called accessor functions.
so for instance:

.attr("height",function(d) {return d*20;})

means that the height will be 20 times the value of the underlying data element (exactly what we said above).
In protovis, we could position the mark relatively to any corner of its parent, so we had a .top method and a .bottom method. But with SVG, objects are positioned relatively to the top-left corner. So when we specify the y position, it is also relative to the top of the document, not necessarily to the axis (and not in this case).
so –

.attr("y", function(d) {return 100-d*20;})

if we use scales (see next post), all of this will have no impact whatsoever anyway.
Finally, there is an attribue here which doesn’t so much depend on the value of the data, but of its rank in the data items: the x position.
for this, we write: function(d,i) {return i*20;}
Here is a fundamental difference with protovis. In protovis, when we passed a second argument to such a function, it meant the data of the parent element (grand parent for the third, etc.). But here in d3, the second parameter is the position of the data element in its array. By convention, we write it i (for index).
And since you have to know: there is no easy way to retrieve the data of the parent element.

Bonus: understanding selections

To add many elements at once we’ve used the sequence: selectAll, data, enter, append.
Why use 4 methods for what appears to be one elementary task? If you don’t care about manipulating nodes individually, for instance for animations, you can just remember the sequence. But if you want to know more, here is what each method does.

selectAll

the selectAll method
First, we select a point on which to add your new graphical objects. When you are creating your objects and use the selectAll method, it will return an empty selection but based on that given point. You may also use selectAll in another context, to update your objects for instance. But here, an empty selection is expected.

data

the data method
Then, you attribute data. This works quite similarly to protovis: d3 expects an array. d3 takes the concept further (with the concept of data joins) but you need not concern yourself with that until you look at transitions.
Anyway, at this stage you have an empty selection, based on a given point in the page, but with data.

enter

the enter method
The enter method updates the selection with nodes which have data, but no graphical representation. Using enter() is like creating stubs where the graphical elements will be grafted.

append

the append method
Finally, by appending we actually create the graphical objects. Each is tied to one data element, so it can be further styled (for instance, through “attr”) to derive its characteristics from the value of that data.

 

Working with data in protovis – part 3 of 5

Previous : Multi-dimensional arrays, inheritance and hierarchy

Short interlude: what can be done with arrays in javascript?

Now that we have a grasp on how arrays work and how they can be used in protovis, let’s take a step back and look at some very useful methods for working with arrays in standard javascript.

Sorting arrays

Using a method called, well, sort(), javascript arrays can be reordered in a variety of ways.

var a = [1, 1.2, 1.7, 1.5, .7, .3];
a.sort();

Without any argument, this will sort the array in ascending order: [0.3, 0.7, 1, 1.2, 1.5, 1.7].

However, we can reorder the array differently with a comparison function, such as:

a.sort(function(a,b) b-a)

Note that this is protovis notation. In traditional javascript, you’d have written a.sort(function(a,b) {return b-a;});

This would sort a in descending order ([1.7, 1.5, 1.2, 1, 0.7, 0.3]). Here is how it works.

The comparison function takes 2 elements in the array. If the first element (corresponding to the first argument, a) should appear first, the function must return a negative number. If the result is positive, the second element is appearing first. With our example, b-a is negative when the first element is greater than the second. So, this function sorts an array in descending order.

Sort also works with arrays of associative arrays. For instance:

var data = [
  {key:"a", value:1},
  {key:"b", value:1.2},
  {key:"c", value:1.7}, 
  {key:"d", value:1.5},
  {key:"e", value:.7},
  {key:"f", value:.3}
];

data.sort(function(a,b) b.value-a.value);

Here, we have to specify which criterion is used to sort the array. In this example, we sort it in descending order by value. But we could have sorted it in ascending order by key, for instance.

The method reverse() turns an array upside down.

var a = [1, 1.2, 1.7, 1.5, .7, .3];

a.reverse(); // gives [0.3, 0.7, 1.5, 1.7, 1.2, 1]

So, a.sort().reverse() will also sort that array in decreasing order.

Extracting sub-arrays

The method slice() will extract a sub-array out of an array, in other words, it will retrieve certain elements. It works with one or two arguments. If you only give one argument, it will give you the last elements from the index you specified.

For instance,

[1, 1.2, 1.7, 1.5, .7, .3].slice(3) // it's [1.5, 0.7, 0.3]

The index 3 corresponds to the 4th element of the array (0,1,2,3) so slice(3) will return the 4th, 5th and 6th elements.

With 2 arguments, you get a sub-array that starts at the first index but ends just before the 2nd one. For instance,

[1, 1.2, 1.7, 1.5, .7, .3].slice(0,3) // gives [1, 1.2, 1.7]

It is also possible to use negative arguments. Instead of counting from the beginning of the array, it means counting from the end.

[1, 1.2, 1.7, 1.5, .7, .3].slice(-1) // result: [0.3]

Turning a string into an array

This can be done via the split() method.

split() normally works with a separator, for instance “07/04/1975”.split(“/”) is [“07”, “04”, “1975”], but if an empty string is used, then each character becomes an element of the new array. For instance,

"ABCDEFGHIJ".split("") // result: ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"] 

This can be very useful to generate data on the go.

And what methods does protovis has for working with arrays and data?

Many.

Some simple, some very elaborate. In addition to visualization functions protovis has impressive data processing capabilities.

The workhorse: pv.range()

pv.range() is a method that creates an array of numbers.

pv.range(10), for instance, is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], or the 10 first integers starting from 0.
(this is the method I referred to in the example of the previous part).

By using the other arguments, it is possible to generate arrays of numbers that go from one specific index to another, or to change the step. For instance:

pv.range(5,10) // is [5, 6, 7, 8, 9]
pv.range(5,21,5) // is [5, 10, 15, 20]

While this may not look incredible at first sight, pv.range() really shines when associated with other functions or methods such as map().

map()  is an array method that creates a new array by applying a function to each of the element of the array it’s run against. For instance,

var data = pv.range(100).map(Math.random) // creates an array of 100 random numbers.

var data = pv.range(1000).map(function(a) Math.cos(a/1000)) // stores into data 1000 values 
// of cos(x) for all the numbers between 0 and 0.999 by increment of 0.001.

In other words, pv.range() and map() can quickly create very interesting datasets for protovis to visualize.

Simple statistical functions that make life easier

pv.min(), pv.max()

Protovis has functions that extract the maximum or the minimum value of an array.

pv.min([1, 1.2, 1.7, 1.5, .7, .3]) will return the value of the smallest element of the array, in that case 0.3.

Likewise, pv.max returns the value of the greatest element of the array.

Both of these can be used with an accessor function to help retrieve what will be compared.

For instance:

var data = [
  {key:"a", value:1},
  {key:"b",value:1.2},
  {key:"c", value:1.7}, 
  {key:"d", value:1.5},
  {key:"e", value:.7},
  {key:"f", value:.3}
];

pv.max(data, function(a) a.value); // should return 1.7

pv.sum(), pv.mean(), pv.median()

Likewise, protovis has aptly-named methods that return the sum, the mean and the median of the elements of an array. Note that it’s pv.mean() and not pv.average(), though.

pv.sum([1, 1.2, 1.7, 1.5, .7, .3]) // gives  6.4.
pv.mean([1, 1.2, 1.7, 1.5, .7, .3]) // gives 1.0666666666666667
pv.median([1, 1.2, 1.7, 1.5, .7, .3]) // gives 1.1.

Similarly to pv.min() you can use an optional accessor function.

pv.median(data, function(a) a.value);

pv.normalize()

pv.normalize() is a handy method that divides all elements in an array by a factor, so that the sum of all these elements is 1.

pv.normalize([1, 1.2, 1.7, 1.5, .7, .3])
// [0.15625, 0.18749999999999997, 0.265625, 0.234375, 0.10937499999999999, 0.04687499999999999]

pv.sum(pv.normalize([1, 1.2, 1.7, 1.5, .7, .3])) // result: 1.

Combining arrays

pv.blend()

pv.blend() simply turns an array of arrays into a simpler array where all elements follow each other.

pv.blend([["a", "b", "c"], [1,2,3]]) // gives  ["a", "b", "c", 1, 2, 3]

One benefit is that it is possible to run methods on this new array that wouldn’t work on an array of arrays.

For instance:

pv.max([[1,2],[3,4],[5,6]]) // result : NaN
// protovis cannot guess how to compare [1,2], [3,4] or [5,6] without any further instruction.
pv.max(pv.blend([[1,2],[3,4],[5,6]])) // result: 6.
pv.max([[1,2],[3,4],[5,6]], function(a) pv.max(a)) // also returns 6.

pv.cross()

Given two arrays a and b, pv.cross(a,b) returns an array of all possible pairs of elements.

pv.cross(["a", "b", "c"], [1,2,3]) // returns 
//[[“a”,1],[“a”,2],[“a”,3],[“b”,1],[“b”,2],[“b”,3],[“c”,1],[“c”,2],[“c”,3]]

pv.transpose()

pv.transpose() flips an array of arrays.

In our previous example –

pv.cross(["a", "b", "c"], [1,2,3])

The result was a 9×2 array. [[“a”,1],[“a”,2],[“a”,3],[“b”,1],[“b”,2],[“b”,3],[“c”,1],[“c”,2],[“c”,3]]

If we apply pv.transpose, we get a 2×9 array:

pv.transpose(pv.cross(["a", "b", "c"], [1,2,3])); // result:
//[["a","a","a","b","b","b","c","c","c"]
//[1,2,3,1,2,3,1,2,3]]

Putting it all together

This short example will show how one can work from an unprepared dataset.
For the purpose of this example, I’m retrieving GDP per capita data of OECD countries. I have a table of 34 countries on 10 years from 2000 to 2009. The countries are in alphabetical order.
I would like to do a bar chart showing the top 12 countries ranked by their GDP per capita. How complex can this be? With the array methods, not so difficult.

var data=[
["Country",2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009],
["Australia",28045.6566279103, 29241.289462815, 30444.7839453338, 32079.5610505727, 33494.6654789274, 35091.7414600449, 37098.3968669994, 39002.0335375021, 39148.4759391917, 39660.1953738791],
["Austria",28769.5911274323, 28799.1968990511, 30231.0955219759, 31080.6122212092, 32598.0698959222, 33409.3453751649, 36268.9604225683, 37801.7320979073, 39848.9481928628, 38822.6960085478],
["Belgium",27624.4454637489, 28488.7963954409, 30013.7570016445, 30241.724380444, 31151.6315179167, 32140.9239149283, 34159.4723388042, 35597.2980660662, 36878.9710906881, 36308.4784216699],
["Canada",28484.9776553362, 29331.8223245112, 29911.3226371905, 31268.5521058006, 32845.6267954986, 35105.9892865177, 36853.5554314586, 38353.0812084648, 38883.0700787846, 37807.5259929247],
["Chile",9293.72322249492, 9712.95685376592, 9973.25946014534, 10475.6486664294, 11299.9329380664, 12194.1443507847, 13036.1285137662, 13896.6946083854, 14577.5715215544, 14346.4648487517],
["Czech Republic",14992.3582327884, 16173.6387263985, 16872.3146139754, 17991.9331910401, 19303.9208098529, 20365.7571142986, 22349.7033747705, 24578.7151794413, 25845.0252652635, 25530.2001372786],
["Denmark",28822.3343389384, 29437.5232757263, 30756.3323835463, 30427.5149228283, 32301.3394496912, 33195.8834383121, 36025.8628586246, 37730.736916861, 39494.1216134705, 37688.2516868117],
["Estonia",9861.80346448344, 10693.0362387373, 11966.6746413277, 13370.127724177, 14758.4181995435, 16530.7311083966, 19134.4130457855, 21262.1415030861, 21802.2185996096, 19880.2786070771],
["Finland",25651.0005450286, 26518.293212524, 27509.1023963486, 27592.2031438268, 29855.387224753, 30689.9753400459, 33095.030855128, 36149.32963148, 37795.237097856, 35236.9462175119],
["France",25272.4127936247, 26644.8404266553, 27776.4225007831, 27399.5459524877, 28273.6330332193, 29692.129719144, 31551.3458452695, 33300.5367942194, 34233.0770304046, 33697.9698933558],
["Germany",25949.0588445755, 26855.1296905696, 27587.1573355486, 28566.8930562768, 29900.6507502683, 31365.576121107, 33713.1689185469, 35623.4147543745, 37170.693412872, 36339.9009091421],
["Greece",18410.2161113813, 19928.6672542964, 21597.5997816753, 22701.7754699285, 24088.2202877303, 24571.6085727425, 26917.7358531616, 28059.5724529461, 29919.6385697499, 29121.8306842272],
["Hungary",12134.0140204384, 13576.4413471795, 14765.2333755651, 15424.3926636975, 16316.8885518624, 16937.9383138957, 18329.1579647528, 19187.2829138188, 20699.9056965458, 20279.5060192879],
["Iceland",28840.4899286393, 30443.907492292, 31083.7723760522, 30768.0910155965, 33697.6723843766, 35025.1338769129, 35807.880160537, 37178.9762849327, 39029.2396691797, 36789.0728215933],
["Ireland",28695.0931156375, 30524.3712319885, 33052.4517849575, 34525.4148515126, 36511.518217243, 38622.5520655621, 42267.9784043449, 45293.6482834027, 42643.6362934458, 39571.204030073],
["Israel",23496.1573531847, 23455.1048827219, 23534.8524310529, 22270.5507137525, 23650.3373360675, 23390.3809761211, 24960.0508863672, 26583.387317862, 27679.3777353666, 27661.1271269539],
["Italy",25594.3456589799, 27127.4298830414, 26803.97019843, 27137.5365317475, 27416.0961790796, 28144.0379218171, 30224.2469223694, 31897.7434207435, 33270.7464832409, 32407.5033618371],
["Japan",25607.7204586644, 26156.1615068187, 26804.9303541079, 27487.123875677, 29020.8955443678, 30311.5281317716, 31865.2733339475, 33577.1846704038, 33902.3807335349, 32476.7736535378],
["Korea",17197.0800334388, 18150.876787578, 19655.5961579558, 20180.932195274, 21630.1629926488, 22783.2288432845, 24286.1803272315, 26190.6122298674, 26876.6422899926, 27099.9481192024],
["Luxembourg",53645.7229595775, 53932.4594967082, 57559.2104243171, 60723.9861616145, 65021.6940738929, 68372.258619455, 78523.2988291034, 84577.2190776183, 89732.070876649, 84802.9716853677],
["Mexico",10046.1268819126, 10135.7265293228, 10397.8915415223, 10883.8643666506, 11535.0108530091, 12460.5385178945, 13672.6034191998, 14581.5178679935, 15290.896575069, 14336.6153896301],
["Netherlands",29405.5490140191, 30788.2508953647, 31943.4974056501, 31702.5798119046, 33208.8592768987, 35110.6577619843, 38063.6877336725, 40744.3819227526, 42887.4361825772, 40813.047575264],
["New Zealand",21113.9167562199, 22128.8483527737, 23115.0676326844, 23826.8007737809, 24775.7827578032, 25460.2942993256, 27277.7654223562, 28701.3328788203, 29231.151567007, 29097.307061683],
["Norway",36125.7036078917, 37091.7150478501, 37051.9473705873, 38298.8631304819, 42257.5943785044, 47318.7844737929, 53287.9750020125, 55042.2093246253, 60634.9819502298, 55726.7456848212],
["Poland",10567.0016558591, 10950.4739587211, 11562.6245018264, 11985.0042955849, 13014.5996003514, 13785.7656450352, 15067.4402978586, 16762.2045951617, 18062.1834289945, 18928.8205162032],
["Portugal",17748.6930918065, 18464.7472033934, 19088.1764760703, 19392.2862101711, 19796.1891048601, 21294.1660710906, 22869.9272824917, 24122.9886840593, 24962.2959136026, 24980.061318413],
["Slovak Republic",10980.1169029929, 12070.70145551, 12965.6228450353, 13597.9556337584, 14659.0096334623, 16174.4605836956, 18397.5822518716, 20916.5438777989, 23241.0972994031, 22870.8998469728],
["Slovenia",17468.5101911515, 18342.8685005009, 19702.0589885459, 20448.2040741134, 22200.7770986247, 23493.9370882038, 25432.3240768864, 27228.3376725332, 29240.5471362484, 27535.6540786883],
["Spain",21320.0690041202, 22591.3855327033, 24066.5003000439, 24748.2819467008, 25958.1017090553, 27376.7644983649, 30347.9357063518, 32251.8469529105, 33173.3334547015, 32254.0895139806],
["Sweden",27948.4749835611, 28231.3958224236, 29277.7736264795, 30418.0441004337, 32505.646687298, 32701.4327422078, 35680.1787996674, 38339.6698693908, 39321.3014501815, 36996.1394001578],
["Switzerland",31618.0958082618, 32103.4812665805, 33390.9169108411, 33266.3042144918, 34536.8476164193, 35478.0395058126, 39116.3945212067, 42755.5569054368, 45516.5601956426, 44830.3719226918],
["Turkey",9169.71780977686, 8613.21159842998, 8666.90348260238, 8790.01048446305, 10166.0963401732, 11391.3768057053, 12886.6195973915, 13897.3820310325, 14962.4944915831, 14242.7276329837],
["United Kingdom",26071.3066882302, 27578.2857038985, 28887.5850874466, 29848.7319363944, 31790.9933025905, 32724.4057819337, 34970.5193042987, 35719.2060247462, 36817.493156774, 35158.8005888247],
["United States",35050.1738557741, 35866.2624634202, 36754.5543204007, 38127.524970345, 40246.0630591955, 42466.1326203714, 44594.9199470326, 46337.2237397566, 46901.0697730874, 45673.7445647402]
];

var year=10;
var rows = data.slice(1,data.length);

var x = pv.Scale.ordinal(pv.range(12)).splitBanded(0, 240, 4/5);
var y = pv.Scale.linear(0, 80000).range(0, 170);

var vis = new pv.Panel()
    .width(250)
    .height(200)
    ;
	vis.add(pv.Bar)
		.data(rows.sort(function(a,b) b[year]-a[year]).slice(0,12))
		.bottom(10)
		.width(x.range().band)
		.height(function(d) y(d[year]))
		.left(function() 5+x(this.index))
		.anchor("bottom").add(pv.Label).textAngle(Math.PI/2).text(function(d) d[0]).textBaseline("middle").textAlign("right")
		;
vis.render();

here, in the data variable, I’ve put my 2-dimensional table as I got it. The first row contains headers which I won’t need. So, in line 40, I create rows which just removes that 1st line with a slice function. data.slice(1,data.length) effectively keeps all the lines but the first.
In the next few lines I create two scales for placing my bars, a standard vis panel and a bar chart. Now, what kind of data will I pass to the bar chart?
I want to sort the rows by the value of the latest year (which happens to be the 11th column, so 10). This is what the rows.sort(function(a,b) b[year]-a[year]) part of the statement does. I’ve simply assigned 10 to the variable year, so if I want to display other years (with an HTML form for instance) it wouldn’t be difficult to modify.
And, since I only want the top 12 values, I just write .slice(0,12) after that.

In line 55, I just add a label. pv.Label inherits the data of its parent. The data item is the whole row of my 2-dimensional table, so if I write function(d) d[0], I am referring to the left-most item which will be the country name.

So, with the use of simple array functions, I can easily rework an unprepared dataset in protovis, rather than having to tailor my dataset with manual (and error-prone) manipulations in external programs. Here is the result:

Next: reshaping complex arrays