A drio thing.

Exploring selections and data binding in d3 (Part I)

Intro

This is an ongoing article where I explore a core functionality provided by d3: selections and data binding. Why? I realized I didn't fully understand how it works and that was limiting the benefit I get from the library. This is more to clarify my thoughts about the topic.

Let's start with the basics:
const sel = select('body')
    .selectAll('h2');

The first part of the statement (select('body')) uses the select method from d3-selection package. This will return a new selection which is a javascript function (remember they are also objects) with all the selection functionality attached to it. By functionality I mean a bunch of functions attached to this object. Exploring the selection object returned is informative.

Exploring selections and data binding in d3 (Part I)

It is important to stop here for a second and review what constructs Mike is using to provide this functionality. He has a wonderful article that covers this. I am going to go over it with my own words.

He uses a "configurable" function that returns another function. The inner function has access to all the config parameters we pass to the configurable function (via a closure). This function he returns, has functionality attached (setter/getter methods). Each of those "methods" return the function itself. Thanks to that we can chain calls together.

Let's build a simple example that encapsulates the logic to add two numbers to solidify these concepts.

function calculator(opts={}) {
  let {x = 0, y = 0} = opts;

  function engine() {
    return x+y;
  }

  engine.x = function (value) {
    if (!arguments.length) return x;
    x = value;
    return engine;
  }

  engine.y = function (value) {
    if (!arguments.length) return y;
    y = value;
    return engine;
  }

  return engine;
}

Let's use that new piece of code we have just created. First, we call our new functions and provide the two numbers we want to add: c = calculator({x:1, y:2}). When we execute the function, we assign the x and y values from the opts parameter (an object) to the local variables x and y. After that, we create a function (engine) that encapsulates the logic we need (adding two numbers in this case). Then, we attach two setter and getter functions (methods) to our engine function. This logic checks for the input parameter (a number) and either sets a new value o returns the addend (x or y). In both cases, we return the engine function. Because of that, we can do things like:

c.x(100).y(200)() // 300

Data binding

Excellent. We are ready to move on. Let's bring back the original d3 statement:

const sel = select('body')
    .selectAll('h2');

We are chaining another select call (.selectAll('h2')) but at this stage, we have already narrowed down the "selection space" to children of the body element. That's because we are running the selection off of the first selection result (a selection object).

Now that we have selected the elements we are interested on, we can go ahead and preform data binding. That's linking our data to the elements we have selected. We do it by using the property data from our selection object.

const sel = select('body')
    .selectAll('h2')
    .data(data);

data is an array of values. Those values will be assigned to the selected elements. But, what happens if we have missing elements? Or what happens if we have more elements than elements in our data array? The data method in our selection returns an object that knows how to deal with that. It does so by providing methods (operations) on these different cases. data() returns the elements that exist and are linked to our dataset. That's called the update selection. We can access the other selections via enter() and exit(). enter() gives us access to elements that do not exist yet but for which we have data and exit() yields elements without data associated to it. Let's write some code to exercise these concepts.

This is a helper function that executes the data binding and exercises the enter, update, and remove() states:

function basicDataJoin(selector, data) {
  const sel = select(selector)
    .selectAll('span')
    .data(data);

  sel.text(d => d)
    .attr('class', 'update');

  sel.enter()
    .append('span')
    .text(d => d)
      .attr('class', 'enter');

  sel.exit().remove();
}

Now, let's run basicDataJoin('#numbers_example_1', ['a', 'b', 'c']);. New elements have the salmon color and new elements use dark red.

Excellent, all our new elements are there and they are properly bind to our data. Let's now call the same function twice so we exercise the enter and remove states:

basicDataJoin('#numbers_example_2', ['a', 'b', 'c']);
basicDataJoin('#numbers_example_2', ['a', 'x', 'y']);

And we get:

That may look strange to you. It seems those are existing letters/data? What is happening here is that the first element is assigned to the first datum, the second element to the second datum and so on. We are joining by index. But d3 provides alternatives to perform this binding. We can provide a function that evaluates on each element we select. The value returned by that function is what we will use to join the elements and datums. Let's write another helper function:

function advanceDataJoin(selector, data) {
  const f = (d) => d.letter;
  const sel = select(selector).selectAll('span').data(data, f);

  sel.text(f)
    .attr('class', 'update');

  sel.enter()
    .append('span')
    .text(f)
    .attr('class', 'enter');

  sel.exit().remove();
}

Let's run that now like this:

advanceDataJoin('#numbers_example_3', [ {letter: 'a'}, {letter:'b'} ]);
advanceDataJoin('#numbers_example_3', [ 
  {letter:'b'}, {letter: 'c'} 
]);

And that's what we wanted. Now, after the second call, a is gone b is part of the update selection and c is part of the enter selection.