-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize percentile calculation #67
Conversation
6b41671
to
bcf701b
Compare
Oh, we can also remove another step (extracting values from the heap in the order of priority). The values are re-sorted by value anyway. |
I pushed a commit and updated the PR description accordingly. |
Hi @shuhei, thanks for contributing! At first glance this looks great, I'll give it a closer look tonight. |
This looks great, and the benchmark was super helpful for showing the nice improvement from this. I think there is still a case where there is some value doing a clone/copy, and that's when used with If someone were to use that to essentially get a 'snapshot' of the sample for future reference, i.e. show me the values at this point in time, if we don't do a clone future updates to the I think one work around for this would be to add an optional boolean parameter to both We can name the parameter What do you think of that solution? I think this way we could maintain the previous behavior if someone was using |
// A little hacky, but effective. | ||
heap.content = JSON.parse(JSON.stringify(this.content)); | ||
return heap; | ||
getValues: function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that is odd to me is why it does:
heap.content = JSON.parse(JSON.stringify(this.content));
This looks really inefficient, instead this should work better:
heap.content = this.content.slice(0);
I updated percentiles.js benchmark to try this out, changing the cycle method to:
.on("cycle", function(event) {
console.log(String(event.target));
BinaryHeap.prototype.clone = function() {
var heap = new BinaryHeap(this.scoreFunction);
heap.content = this.content.slice(0);
return heap;
};
})
Results:
original x 755 ops/sec ±2.86% (89 runs sampled)
updated x 3,365 ops/sec ±2.01% (93 runs sampled)
The performance is not as good as the non-copy approach, but it's definitely better than before. If we go the route of adding the snapshot
parameter, we can retain a clone
implementation and change it to use slice
instead of using the JSON trick.
values.push(elt.val); | ||
} | ||
return values; | ||
return this.values.getValues().map(function(v) { return v.val; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh, now that I think of it, this very line here of using map and returning the underlying value works fine! We still essentially get a snapshot, so ignore my previous comments as this is sufficient. Sorry for any confusion!
Thanks again for contributing this improvement! Released as v0.1.21 |
@tolbertam Thanks a lot for your thorough review and publishing a new version! |
Happy Holidays!
The deep-clone of the binary heap seems to be the bottleneck of
Histogram#percentiles()
.I think we can just use the heap items without cloning or sorting by priority because they will be sorted by value in
Histogram
anyway.Benchmark
The updated version is roughly 5x-6x faster on my Macbook Air, Early 2015, 2,2 GHz Intel Core i7.
$ node -v v8.14.1 $ node percentiles.js original x 509 ops/sec ±0.76% (89 runs sampled) updated x 2,883 ops/sec ±0.89% (92 runs sampled) done
$ node -v v10.14.2 $ node percentiles.js original x 565 ops/sec ±1.94% (87 runs sampled) updated x 3,548 ops/sec ±5.29% (82 runs sampled) done
$ node -v v11.5.0 $ node percentiles.js original x 522 ops/sec ±1.78% (87 runs sampled) updated x 2,470 ops/sec ±1.38% (90 runs sampled) done
with the following benchmark code: