-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Builder Constructors #2054
Comments
Sounds good, especially forcing users to specify both item and nested capacities for List/String/Binary arrays. My gut feeling also says that 1024 could be a good default, I'm assuming that arrays will usually be much larger than general purpose collection types. |
I agree the proposal sounds good as well |
While working on #2104, I find it might be good to let users decide both the pub struct GenericBinaryBuilder<OffsetSize: OffsetSizeTrait> {
value_builder: UInt8BufferBuilder,
offsets_builder: BufferBuilder<OffsetSize>,
bitmap_builder: BooleanBufferBuilder,
}
impl<OffsetSize: OffsetSizeTrait> GenericBinaryBuilder<OffsetSize> {
/// Creates a new `GenericBinaryBuilder` with at least
/// `num_elements` binary slots in the array and
/// `value_capacity` bytes in the values buffer.
pub fn new(num_elements: Option<usize>, value_capacity: Option<usize>) -> Self {
Self {
value_builder: UInt8BufferBuilder::new(value_capacity.unwrap_or(1024)),
offsets_builder: BufferBuilder::<OffsetSize>::new(num_elements.unwrap_or(1023) + 1),
bitmap_builder: BooleanBufferBuilder::new(num_elements.unwrap_or(1024))
}
} |
Thanks, I've updated the ticket to be hopefully slightly clearer. We should not remove the ability to pass a value_capacity to with_capacity, as we currently support for StringArray 👍 |
Hello, Is this still a valid issue ? . If so I would like to pick this up. Given that this can lead to big code changes, may I request the team to split this issue into multiple smaller issues or tasks so that its easier to implement and review. |
Hi, this is definitely still an issue. I would recommend working on each builder separately in turn, perhaps starting with the more esoteric builders such as unions, lists, and then working through to strings and eventually primitives. The latter builders are likely to cause significantly more churn and so leaving them to last will help I think? |
So based on this issue the following PR's are created. Some I have kept in draft, but the rest are ready for review. The buffer builder refactoring I will take it last once the rest are approved.
On a side note , I have replace all Only in test module I have used both API's to test them. |
Hello @tustvold , Just confirming since it was not mentioned in the issue description. Should I also refactor buffer builder constructors. I think thats the only one left ? |
I think we can leave the BufferBuilder constructors for now, I think all that remains now are the builders for BinaryArray and StringArray |
Those 2 builders already have new and with capacity, But since the sizes of strings are not constant the API was taking the number of bytes to pre allocate the buffer as a parameter. Whats the best way to handle this. Shall we remove the buffer_capacity parameters from the constructors and allocate a fixed buffer of size 1024 ? |
I think it is just a case of removing the capacity parameter from |
Okay |
* Simplify DictionaryBuilder constructors (#2684) (#2054) * Apply suggestions from code review Co-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]>
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The various builder constructors take capacities to pre-allocate. However, they aren't consistent about whether they take a capacity in terms of elements or bytes, or what these are capacities for.
Describe the solution you'd like
I would like to propose the following:
capacity
from thenew
constructors, instead using a static default capacity (e.g. 1024)with_capacity
take capacities in terms of elements, potentially with an additional number of bytes for variable length arraysThis has a couple of advantages:
Vec
(AddVec
-inspired APIs toBufferBuilder
#1850)The only major disadvantage being that it results in API churn.
Describe alternatives you've considered
We could not do this, but the current situation leads to hard to spot performance bugs.
Additional context
Noticed whilst reviewing #2038
Thoughts @alamb @viirya @jhorstmann
The text was updated successfully, but these errors were encountered: