Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor API adjustments for StringViewBuilder #6047

Merged
merged 5 commits into from
Jul 15, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions arrow-array/src/array/byte_view_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -325,8 +325,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
/// Use with caution as this can be an expensive operation, only use it when you are sure that the view
/// array is significantly smaller than when it is originally created, e.g., after filtering or slicing.
pub fn gc(&self) -> Self {
let mut builder =
GenericByteViewBuilder::<T>::with_capacity(self.len()).with_deduplicate_strings();
XiangpengHao marked this conversation as resolved.
Show resolved Hide resolved
let mut builder = GenericByteViewBuilder::<T>::with_capacity(self.len());

for v in self.iter() {
builder.append_option(v);
Expand Down
16 changes: 15 additions & 1 deletion arrow-array/src/builder/generic_bytes_view_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,8 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> {

/// Returns the value at the given index
/// Useful if we want to know what value has been inserted to the builder
fn get_value(&self, index: usize) -> &[u8] {
/// The index has to be smaller than `self.len()`, otherwise it will panic
pub fn get_value(&self, index: usize) -> &[u8] {
let view = self.views_builder.as_slice().get(index).unwrap();
let len = *view as u32;
if len <= 12 {
Expand Down Expand Up @@ -337,6 +338,19 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> {
pub fn validity_slice(&self) -> Option<&[u8]> {
self.null_buffer_builder.as_slice()
}

/// Return the allocated size of this builder, useful for memory accounting.
XiangpengHao marked this conversation as resolved.
Show resolved Hide resolved
pub fn allocated_size(&self) -> usize {
let buffer_size = self.completed.iter().map(|b| b.capacity()).sum::<usize>();
XiangpengHao marked this conversation as resolved.
Show resolved Hide resolved
let in_progress = self.in_progress.capacity();
let null = self.null_buffer_builder.allocated_size();
let tracker = match &self.string_tracker {
Some((ht, _)) => ht.capacity() * std::mem::size_of::<usize>(),
None => 0,
};
let views = self.views_builder.capacity() * std::mem::size_of::<u128>();
buffer_size + in_progress + tracker + views + null
}
}

impl<T: ByteViewType + ?Sized> Default for GenericByteViewBuilder<T> {
Expand Down
8 changes: 8 additions & 0 deletions arrow-buffer/src/builder/null.rs
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,14 @@ impl NullBufferBuilder {
pub fn as_slice_mut(&mut self) -> Option<&mut [u8]> {
self.bitmap_builder.as_mut().map(|b| b.as_slice_mut())
}

/// Return the allocated size of this builder, useful for memory accounting.
XiangpengHao marked this conversation as resolved.
Show resolved Hide resolved
pub fn allocated_size(&self) -> usize {
self.bitmap_builder
.as_ref()
.map(|b| b.capacity())
.unwrap_or(0)
}
}

impl NullBufferBuilder {
Expand Down
Loading