-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DL Compress get_keys_values output by hash. #17243
Conversation
Removing coverage diff, the two lines in data store correspond to assertions failing and the benchmarks line is just changing type to an already existing not covered structure. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These thoughts may apply to a lot of existing code as well, I'm not sure, but... This assumes that on average the keys and values are at least the size of the hash. If they are not then this makes the situation actually worse. Using a smaller hash size is limited by the assumptions around not having hash collisions. I'm guessing that these tradeoffs were already considered and we don't expect the 32 byte effective minimum key/value size to be an issue?
Yea, it's true that in the case of smaller k,v items, this increases the size to 32bytes - but I think that it is worth it to normalize to a hash size (for consistency and code simplicity, etc). Even with a small k,v tree, the increase in size is likely quite small in absolute terms so I don't expect any practical issue with doing this. Not sure what the memory footprint is for the recursive SQL query here though as that is mostly unchanged, but at least we don't have to keep large k,v items in memory |
Introduces
get_keys_values_compressed
which maps fromhash(key)
tohash(leaf)
.hint_keys_values
is modified to store this instead of all (key, values) data to optimize the memory usage.