-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrate ArrayBuilder to new GrowableBuffer #1542
Conversation
@jpivarski - while debugging the failing test - it looks like def test_big():
a = ak._v2.highlevel.ArrayBuilder(initial=90)
for i in range(2000):
if i == 200:
tmp = a.snapshot()
a.boolean(i % 2 == 0)
print("to_list(a)")
assert to_list(a) == [True, False] * 1000
print("----> to_list(tmp)")
assert to_list(tmp) == [True, False] * 100
print("DONE!")
also, the buffers dictionary would have two |
Being a dict, Usually, the buffers are NumPy arrays or Py_Buffers with a correct length (200 or 2000). If that's the case here, it would prevent a segfault—it would be a "buffer is too small," rather than a segfault. All of the nodes in a Form must have distinct ids from each other. They don't have to be distinct from other Forms, but without uniqueness-per-tree, you'd get these sorts of collisions. |
Thanks! I should have mentioned that the example is taken from the main branch. I saw this behaviour in the PR and re-checked that it is the same without it. |
Codecov Report
|
This is understood - the second |
@jpivarski - I'm done with this PR. Please, have a look when you have time. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving this, though I have questions about it. See below—if you're going to keep the idea of having new panels be bigger than old panels (potentially a good idea, but one I hadn't thought of), you'll need a way to pass that parameter down, rather than hard-code it at 1.5
. (I have no idea if that's a good value or not.)
These things could be addressed in a new PR, but if you're going to revert the many instances of options
→ initial
back to → options
, then perhaps it should be in this PR so the main history will have a smaller diff. The new options could be named BuilderOptions
, rather than ArrayBuilderOptions
, because of the generality.
if (length_[ptr_.size()-1] == reserved_[ptr_.size()-1]) { | ||
add_panel(reserved_[ptr_.size()-1]); | ||
if (ptr_->length() == ptr_->reserved()) { | ||
add_panel((size_t)ceil(ptr_->reserved() * 1.5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The panels grow as they're added? That's not a bad idea: at first, you don't know how big they need to be, but if you've been adding a lot of panels, you can be more confident that large panels are better.
However, if that's the case, then it needs to be possible to parameterize the resize
, not just fix it as 1.5
. The old algorithm's resize=1.5
came from the fact that it was a replacement algorithm, and there's some mathematical reason why the golden ratio is the best resize factor (for defragmentation). Adding panels and not removing the old ones is a different dynamic—there's less fragmentation, for one thing—so the best factor might not be 1.5
. Eventually, you'll want to do a performance test, and that would involve varying these two parameters, initial
and resize
.
Those are the two parameters that the old ArrayBuilderOptions
carried, and this PR converts a lot of options
(which can be extended by modifying ArrayBuilderOptions
) into initial
(which can't be extended). If we're already wanting to reintroduce a factor
, are you sure you want to do this?
On the other hand, I don't yet know that the optimal resize
isn't 1.0
. Maybe you won't be needing it. But having an ArrayBuilderOptions
, or at least a "BuilderOptions
", could be more future-proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, the resize
parameter should not be set by a user. Also, perhaps, it should not be a fixed number. I agree, we should profile this first. GrowableBuffer
can have a "growable strategy" hint, for example. I'll merge it as is - this will help @ManasviGoyal to rebase her PR.
replaces #1529
std
containers and utilities.awkward
subdirectory. The cmakeINTERFACE
has been updated to allow the following include:ArrayBuilder
has been updated to use the newGrowableBuffer
.