Reusable custom reduce functions #102

jefffriesen · 2014-01-04T00:30:33Z

I want to calculate the average of a lot of attributes, such as cost, savings, carbon emmissions, and so on.

Here is an example of calculating it with a single attribute (savings):

function reduceAddAvg(p,v) {
  ++p.count
  p.sum += v.savings;
  p.avg = p.sum/p.count;
  return p;
}
function reduceRemoveAvg(p,v) {
  --p.count
  p.sum -= v.savings;
  p.avg = p.sum/p.count;
  return p;
}
function reduceInitAvg() {
  return {count:0, sum:0, avg:0};
}

var statesAvgDimension = xf.dimension(function(d) { return d.state });
var statesAvgGroup = statesAvgDimension.group().reduce(reduceAddAvg, reduceRemoveAvg, reduceInitAvg);

But I am going to have to write a set of these reduce functions for every attribute I want to calculate (up to a dozen attributes). I am hoping there is a way to make these functions reusable. For example, something like this:

function reduceAddAvg(p,v,attr) {
  ++p.count
  p.sum += v[attr];
  p.avg = p.sum/p.count;
  return p;
}
function reduceRemoveAvg(p,v,attr) {
  --p.count
  p.sum -= v[attr];
  p.avg = p.sum/p.count;
  return p;
}
function reduceInitAvg() {
  return {count:0, sum:0, avg:0};
}
var statesAvgGroup = statesAvgDimension.group().reduce(reduceAddAvg, reduceRemoveAvg, reduceInitAvg, 'savings');
var statesAvgGroup = statesAvgDimension.group().reduce(reduceAddAvg, reduceRemoveAvg, reduceInitAvg, 'cost');
// ... and so on

Would this be a feature request, impossible, or am I missing something in the docs?

Thanks!

The text was updated successfully, but these errors were encountered:

RandomEtc · 2014-01-04T03:56:37Z

What about:

function reduceAddAvg(attr) {
  return function(p,v) {
    ++p.count
    p.sum += v[attr];
    p.avg = p.sum/p.count;
    return p;
  };
}
function reduceRemoveAvg(attr) {
  return function(p,v) {
    --p.count
    p.sum -= v[attr];
    p.avg = p.sum/p.count;
    return p;
  };
}
function reduceInitAvg() {
  return {count:0, sum:0, avg:0};
}
var statesAvgGroup = statesAvgDimension.group().reduce(reduceAddAvg('savings'), reduceRemoveAvg('savings'), reduceInitAvg);
var statesAvgGroup = statesAvgDimension.group().reduce(reduceAddAvg('cost'), reduceRemoveAvg('cost'), reduceInitAvg);
// ... and so on

Would that work?

jefffriesen · 2014-01-04T18:25:03Z

Yep, this works great. Maybe a good one for the docs? It's obvious when I see it but wasn't obvious before!
Thanks

RandomEtc · 2014-01-04T19:03:59Z

Great! I added a note linking here from https://github.com/square/crossfilter/wiki/API-Reference#wiki-group_reduce

jefffriesen · 2014-01-04T19:28:00Z

nice

brandones · 2014-12-05T01:21:39Z

Why isn't statesAvgGroup overwritten by the subsequent declarations?

gordonwoodhull · 2017-01-13T19:24:09Z

Note that you need to guard against a divide by zero in reduceRemove:

    p.avg = p.count ? p.sum/p.count : 0;

monfera · 2017-05-29T21:20:41Z

Derived calculations, such as avg here, can be moved out of the the reducers, as they use extra CPU cycles. Non-numeric aggregations may even stress the GC, worst case, yielding janky crossfiltering.

The final reduction can be done at the end of an interaction, i.e. after applying a filter or adding an array of new elements. The aggregate is just a tuple of count and sum. On a project we even made a shallow abstraction for the final reduction of the aggregates.

It conveniently makes the user responsible for handling the degenerate case of the empty set (e.g. they can check count or use isNaN). A NaN is usually preferred over a zero avg because the value 0 implies a legit average where there isn't and typical crossfiltering aggregates on the empty set don't make sense (e.g. values that characterize distributions).

With some aggregates, non-empty sets of reasonable values can still lead to edge cases. E.g. computing the extent where all N numbers are of the same value. The user of the values can determine what to do with the edge case, e.g. just not rendering axis ticks, or adding 1 to max and using zero, or subtracting 1 from min for rendering some ticks, whatever makes sense.

As a more complicated, yet still incrementally computable aggregate, consider the standard deviation or its cousin, variance. The final reduction is division and taking the square root.

A more challenging group aggregate, surprisingly, is the calculation of the extent (or just a minimum or maximum). The difficulty is updating the aggregate with items removed. A basic option is to maintain a sorted value vector but it gets expensive.

* Fix type definition for dimension.group() * Fix casing of word 'TKey'

jefffriesen closed this as completed Jan 4, 2014

RandomInsano mentioned this issue Apr 27, 2014

Allow reduce() to take a dictionary #113

Closed

jasondavies mentioned this issue Sep 12, 2014

Allows reduce() to accept object containing three functions #124

Closed

jdar pushed a commit to jdar/crossfilter that referenced this issue Nov 19, 2019

Fix type definition for dimension.group() (square#102)

22b506e

* Fix type definition for dimension.group() * Fix casing of word 'TKey'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reusable custom reduce functions #102

Reusable custom reduce functions #102

jefffriesen commented Jan 4, 2014

RandomEtc commented Jan 4, 2014

jefffriesen commented Jan 4, 2014

RandomEtc commented Jan 4, 2014

jefffriesen commented Jan 4, 2014

brandones commented Dec 5, 2014

gordonwoodhull commented Jan 13, 2017

monfera commented May 29, 2017

Reusable custom reduce functions #102

Reusable custom reduce functions #102

Comments

jefffriesen commented Jan 4, 2014

RandomEtc commented Jan 4, 2014

jefffriesen commented Jan 4, 2014

RandomEtc commented Jan 4, 2014

jefffriesen commented Jan 4, 2014

brandones commented Dec 5, 2014

gordonwoodhull commented Jan 13, 2017

monfera commented May 29, 2017