Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reusable custom reduce functions #102

Closed
jefffriesen opened this issue Jan 4, 2014 · 7 comments
Closed

Reusable custom reduce functions #102

jefffriesen opened this issue Jan 4, 2014 · 7 comments

Comments

@jefffriesen
Copy link

I want to calculate the average of a lot of attributes, such as cost, savings, carbon emmissions, and so on.

Here is an example of calculating it with a single attribute (savings):

function reduceAddAvg(p,v) {
  ++p.count
  p.sum += v.savings;
  p.avg = p.sum/p.count;
  return p;
}
function reduceRemoveAvg(p,v) {
  --p.count
  p.sum -= v.savings;
  p.avg = p.sum/p.count;
  return p;
}
function reduceInitAvg() {
  return {count:0, sum:0, avg:0};
}

var statesAvgDimension = xf.dimension(function(d) { return d.state });
var statesAvgGroup = statesAvgDimension.group().reduce(reduceAddAvg, reduceRemoveAvg, reduceInitAvg);

But I am going to have to write a set of these reduce functions for every attribute I want to calculate (up to a dozen attributes). I am hoping there is a way to make these functions reusable. For example, something like this:

function reduceAddAvg(p,v,attr) {
  ++p.count
  p.sum += v[attr];
  p.avg = p.sum/p.count;
  return p;
}
function reduceRemoveAvg(p,v,attr) {
  --p.count
  p.sum -= v[attr];
  p.avg = p.sum/p.count;
  return p;
}
function reduceInitAvg() {
  return {count:0, sum:0, avg:0};
}
var statesAvgGroup = statesAvgDimension.group().reduce(reduceAddAvg, reduceRemoveAvg, reduceInitAvg, 'savings');
var statesAvgGroup = statesAvgDimension.group().reduce(reduceAddAvg, reduceRemoveAvg, reduceInitAvg, 'cost');
// ... and so on

Would this be a feature request, impossible, or am I missing something in the docs?

Thanks!

@RandomEtc
Copy link
Collaborator

What about:

function reduceAddAvg(attr) {
  return function(p,v) {
    ++p.count
    p.sum += v[attr];
    p.avg = p.sum/p.count;
    return p;
  };
}
function reduceRemoveAvg(attr) {
  return function(p,v) {
    --p.count
    p.sum -= v[attr];
    p.avg = p.sum/p.count;
    return p;
  };
}
function reduceInitAvg() {
  return {count:0, sum:0, avg:0};
}
var statesAvgGroup = statesAvgDimension.group().reduce(reduceAddAvg('savings'), reduceRemoveAvg('savings'), reduceInitAvg);
var statesAvgGroup = statesAvgDimension.group().reduce(reduceAddAvg('cost'), reduceRemoveAvg('cost'), reduceInitAvg);
// ... and so on

Would that work?

@jefffriesen
Copy link
Author

Yep, this works great. Maybe a good one for the docs? It's obvious when I see it but wasn't obvious before!
Thanks

@RandomEtc
Copy link
Collaborator

Great! I added a note linking here from https://github.com/square/crossfilter/wiki/API-Reference#wiki-group_reduce

@jefffriesen
Copy link
Author

nice

@brandones
Copy link

Why isn't statesAvgGroup overwritten by the subsequent declarations?

@gordonwoodhull
Copy link

Note that you need to guard against a divide by zero in reduceRemove:

    p.avg = p.count ? p.sum/p.count : 0;

@monfera
Copy link

monfera commented May 29, 2017

Derived calculations, such as avg here, can be moved out of the the reducers, as they use extra CPU cycles. Non-numeric aggregations may even stress the GC, worst case, yielding janky crossfiltering.

The final reduction can be done at the end of an interaction, i.e. after applying a filter or adding an array of new elements. The aggregate is just a tuple of count and sum. On a project we even made a shallow abstraction for the final reduction of the aggregates.

It conveniently makes the user responsible for handling the degenerate case of the empty set (e.g. they can check count or use isNaN). A NaN is usually preferred over a zero avg because the value 0 implies a legit average where there isn't and typical crossfiltering aggregates on the empty set don't make sense (e.g. values that characterize distributions).

With some aggregates, non-empty sets of reasonable values can still lead to edge cases. E.g. computing the extent where all N numbers are of the same value. The user of the values can determine what to do with the edge case, e.g. just not rendering axis ticks, or adding 1 to max and using zero, or subtracting 1 from min for rendering some ticks, whatever makes sense.

As a more complicated, yet still incrementally computable aggregate, consider the standard deviation or its cousin, variance. The final reduction is division and taking the square root.

A more challenging group aggregate, surprisingly, is the calculation of the extent (or just a minimum or maximum). The difficulty is updating the aggregate with items removed. A basic option is to maintain a sorted value vector but it gets expensive.

jdar pushed a commit to jdar/crossfilter that referenced this issue Nov 19, 2019
* Fix type definition for dimension.group()

* Fix casing of word 'TKey'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants