-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simple GC for view array types #5885
Conversation
part 1: implement checker to check if current buffer is compact Signed-off-by: 蔡略 <[email protected]>
part2: implement actual gc algorithm Signed-off-by: 蔡略 <[email protected]>
Signed-off-by: 蔡略 <[email protected]>
Signed-off-by: 蔡略 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @XiangpengHao -- this looks very close to me. I think this PR needs a few more test cases and then it would be ok to merge
Bonus points if you added a benchmark for gc as well
/// This method will compact the data buffers by recreating the view array and only include the data | ||
/// that is pointed to by the views. | ||
/// | ||
/// Note that it will copy the array regardless of whether the original array is compact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
let mut builder = GenericByteViewBuilder::<T>::with_capacity(self.len()); | ||
|
||
for v in self.iter() { | ||
builder.append_option(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is reasonable (and correct) first version
However, it could likely be made faster by special casing inline views -- the code here is going to be doing several checks on length only to copy the bytes back into a u128
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand how the checks can be avoided; even if the view is inlined, we need to change the offset and the block id, so we need to copy that u128 from the original view buffer to the new buffer anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was imagining that special casing copying inlined views could potentially be faster than converting to &str and then checking
But I don't think we should make any changes unless we have benchmarks supporting that hypothesis
Co-authored-by: Andrew Lamb <[email protected]>
I overhauled the test case, and it should be more straightforward now. Also added a simple benchmark -- in case we want to make a more performant version :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @XiangpengHao -- I had two minor suggestions but I think they could both be done as follow on PRs as well
})) | ||
} | ||
|
||
fn criterion_benchmark(c: &mut Criterion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be interesting to have a "array with no nulls" benchmark as I could imagine special casing that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thanks again @XiangpengHao
Which issue does this PR close?
Closes #5513 .
Rationale for this change
This PR is largely based on excellent work from @ClSlaid in #5720
I simplified the GC process and the test cases, and now they are dead simple (but dumb)!
Note that this pr does not check whether the original array is already compact (and thus does not avoid unnecessary copies). Once we get this in, we can continue the compact check discussion as in #5720
What changes are included in this PR?
Are there any user-facing changes?