You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
...the contributor who wrote that code did so based on the javadocs of ArrayDigest.smallByteSize, not realizing that under the covers hte internals of this method are:
allocate a ByteBuffer based on the result of byteSize()
do a complete serialization using asSmallBytes(ByteBuffer)
return the final position of the ByteBuffer
...meaning that ultimately at least 2x the smallByteSize of RAM was being allocated everytime, and the ArrayDigest was being serialized twice.
Methods like ArrayDigest.smallByteSize (AVLTreeDigest.smallByteSize, etc...) that have such a high internal cost should either be changed to just call byteSize() (since the overallocation the user will have to do is still better for space/time performance then doing that same overallocation internally plus a behind the scenes serialiation that is thrown away) or warn users about this heavy internal cost in the javadocs.
The text was updated successfully, but these errors were encountered:
I am opting to warn users away from these methods. I see no way to find out the size without doing the compression. There is probably a better API based on streams that would avoid this need. Or maybe over-allocation idioms are the rule of the day.
When integrating t-digest into Solr, the first patch contributed had code which looked like this...
...the contributor who wrote that code did so based on the javadocs of
ArrayDigest.smallByteSize
, not realizing that under the covers hte internals of this method are:byteSize()
asSmallBytes(ByteBuffer)
...meaning that ultimately at least 2x the
smallByteSize
of RAM was being allocated everytime, and the ArrayDigest was being serialized twice.Methods like
ArrayDigest.smallByteSize
(AVLTreeDigest.smallByteSize
, etc...) that have such a high internal cost should either be changed to just callbyteSize()
(since the overallocation the user will have to do is still better for space/time performance then doing that same overallocation internally plus a behind the scenes serialiation that is thrown away) or warn users about this heavy internal cost in the javadocs.The text was updated successfully, but these errors were encountered: