You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 20, 2023. It is now read-only.
i've been playing around with MergeInto, and noticed some strange behaviour when merging into an empty t-digest. this might be an edge case, i'm not quite sure.
here is the test program:
package main
import (
"fmt"
"math/rand"
"github.com/spenczar/tdigest"
)
func main() {
rand.Seed(5678)
values := make(map[float64]interface{})
// Generate 100k uniform random data between 0 and 100
var (
n int = 100000
min, max float64 = 0, 100
)
for i := 0; i < n; i++ {
val := min + rand.Float64()*(max-min)
values[val] = nil
}
for i := 0; i < 10; i++ {
a := tdigest.New()
for v := range values {
a.Add(v, 1)
}
b := tdigest.New()
a.MergeInto(b)
fmt.Println(a.Quantile(1), b.Quantile(1))
}
}
i would expect a and b to be similar if not identical, but that is not the case. a is around 99.99 which is the expected result. however, b is all over the place.
any ideas what might be going on here?
The text was updated successfully, but these errors were encountered:
Thanks for the detailed issue report, and thanks especially for the reproducer.
You're right that this looks quite wrong. The empty tdigest is getting very confused when it receives any of the heavy, high-weight centroids from the middle of a.
This is either a problem in the algorithm, or (much more likely) a serious problem in the implementation of it. I'm afraid this might take longer to turn around than #2 did, this looks pretty nasty.
This might actually be a problem with the algorithm: tdunning/t-digest#78. There's a blog post that suggests a workaround I'll play with.
Since I wrote this library, Ted Dunning rewrote much of the paper to describe a different implementation which he calls a "merging TDigest" which solves most of these problems. Implementing merging TDigests would be a lot more work though - it would be a rewrite of this package. I'm up for that, it could just take a while.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
i've been playing around with
MergeInto
, and noticed some strange behaviour when merging into an empty t-digest. this might be an edge case, i'm not quite sure.here is the test program:
i would expect
a
andb
to be similar if not identical, but that is not the case.a
is around99.99
which is the expected result. however,b
is all over the place.any ideas what might be going on here?
The text was updated successfully, but these errors were encountered: