-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimisation and benchmarking of backing data structure for exponential histogram #3848
Comments
benchmarking PR: #3986 |
A few things I found when benchmarking and investigating the initial prototype:
|
Here's a quick comparison of using a Circular-Buffer (with preallocated array of integers) vs. MapCounter: main...jsuereth:wip-exponential-counter-perf It's a 2x performance differential. I think (in practice) the # of measurements per-histogram likely means we'll be better off pre-allocating a bounded array for buckets vs. using a Map in almost all cases. Going to add in benchmarks for the byte->short->int->long conversions after I have a chance to push some better shaped input data. (Ideally we'd use incoming measurements froma live server we've recorded somewhere, but for now just going to synthesize similarly shaped data with assumed distributions from what I see in histograms) |
Yeah, the I initially had used the variable sized counter + circular array that nrsketch had in the aggregator, but took it out due to review and to reduce the scope of the initial aggregation PR. I do have some doubts of whether the extra code to convert byte->short->int->long is worth it though. It's a CPU/memory trade off I guess. |
But yeah I sw the comments and understand the current state. |
True, worst case the memory difference is quite high. It's even doubly worse than that since there's |
For the exponential histogram, we decided to start with the simplest possible implementation for the backing data structure, and then beat it from there. Some context here: #3724 (comment). This is the ticket to beat it.
This ticket is to track the work for the optimisation, testing, and benchmarking of these data structures. There are notable reference implementations such as NrSketch and DynaHist that the author may draw from.
To implement the backing data structure, the author should implement
ExponentialCounter
.The text was updated successfully, but these errors were encountered: