-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVLTreeDigest with a lot of datas : integer overflow #81
Comments
Nikko,
Thanks for the feedback. I think that this has been fixed in the latest
version. Need to get that released.
Also, there is a new algorithm, MergingDigest that should give you a big
speedup. That may not be interesting given that you have done your work
already, but keep it in mind if you need it later.
Do you have a reference for your work yet? I would love to include it as an
example.
…On Tue, Apr 11, 2017 at 8:08 AM, Nikko78 ***@***.***> wrote:
For a spacial project, we use TDigest to make statistics on 5 billion
stars. Thanks for your works, it's very useful !
With integer fields, AVLTreeDigest give false results while TreeDigest
give good results.
The problem become from "int[] aggregatedCounts", in AVLGroupTree,
sometimes sum overflow integer capacity and the result become negative.
I change "int" to "long" and after AVLTreeDigest gave good results (the
same result as TreeDigest).
AVLGroupTree_patch.txt
<https://github.com/tdunning/t-digest/files/913642/AVLGroupTree_patch.txt>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#81>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPSeqeKq0OnPQ_vOVUMpkNufBhV8dx7ks5ru5dRgaJpZM4M6PDZ>
.
|
Thanks for your suggestion for MergingDigest, we have a lot of stats to do in the future with new data which will be sent by the satellite. |
I just checked and the MergingDigest accumulates the weights in double. AVLTreeDigest uses int. This means that MergingDigest will work for your case with many counts. I will document the limitation for AVLTreeDigest, but I don't expect that I will change it in the near future. Thanks for the references! |
Also, check out the MegaMergeTest for an example of merging lots of digests at once. This is what you need for parallelism. Let me know if you publish any papers that describe your use of t-digest so I can make sure to reference them! |
For a spacial project, we use TDigest to make statistics on 5 billion stars. Thanks for your works, it's very useful !
With integer fields, AVLTreeDigest give false results while TreeDigest give good results.
The problem become from "int[] aggregatedCounts", in AVLGroupTree, sometimes sum overflow integer capacity and the result become negative.
I change "int" to "long" and after AVLTreeDigest gave good results (the same result as TreeDigest).
AVLGroupTree_patch.txt
The text was updated successfully, but these errors were encountered: