Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByteSizeForCompression underestimates memory consumption #33

Open
vdobler opened this issue Jun 9, 2023 · 0 comments
Open

ByteSizeForCompression underestimates memory consumption #33

vdobler opened this issue Jun 9, 2023 · 0 comments

Comments

@vdobler
Copy link

vdobler commented Jun 9, 2023

The calculation for ByteSizeForCompression states

Unprocessed and processed can grow up to length c

https://github.com/influxdata/tdigest/blob/master/tdigest.go#L50

But unprocessed and processed are allocated in https://github.com/influxdata/tdigest/blob/master/tdigest.go#L30 with a capacity of maxUnprocessed and maxProcessed which are 8 and 2 times c (see e.g. https://github.com/influxdata/tdigest/blob/master/tdigest.go#L304)

This leads to ByteSizeForCompression underestimating the memory consumption of these two buffers by a factor of 10 which is not neglible.

I think the result should be

8 * (2 * (8c + 2c) + c)   //  8bytes/float * ( (8*c unprocessed + 2*c compresses) * 2 values   +  c values 

cumulative )
= 168 * c

(Btw: the 40 is not correct even if the factors 8 and 2 for the capacity of the unprocessed and processed buffers are ignored.)

BUT: This 168 * c still underestimates the memory consumpotion as in
https://github.com/influxdata/tdigest/blob/master/tdigest.go#L121

t.unprocessed = append(t.unprocessed, t.processed...)

processed gets appended to unprocessed and both slices might be full resulting in unprocessed to grow to (maxUnprocessed + maxProcessed) = 10 * c

So the correct value should be 200 * c.

This is quite a number. I'm wondering if float32 and smaller unprocessed buffer would allow to reduce memory consumption while keeping enough numerical stability and accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant