-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5588: [C++] Better support for building union arrays #4781
ARROW-5588: [C++] Better support for building union arrays #4781
Conversation
bkietz
commented
Jul 2, 2019
- Simplify DenseUnionBuilder
- Add SparesUnionBuilder
- MakeBuilder can now produce a {Sparse,Dense}UnionBuilder
- ArrayFromJSON can now produce union arrays
|
I'll disable that test until I can address the issue in ListBuilder |
566014c
to
7b2b41d
Compare
7b2b41d
to
a20f739
Compare
a20f739
to
a1c6225
Compare
cpp/src/arrow/array/builder_union.cc
Outdated
type_id_to_child_num_.resize(union_type->max_type_code() + 1, -1); | ||
DCHECK_EQ(union_type->mode(), UnionMode::DENSE); | ||
int child_i = 0; | ||
for (auto type_id : union_type->type_codes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it simplify to require for children.size() the same as max_type_code()? If you're not, you're asking for trouble. The user can always fill with NullBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can let children_[i]
be the builder for type_id == i
instead of the builder which will finish into child_data[i]
. The latter seems like it might be an implicit contract though; I'll investigate
cpp/src/arrow/array/builder_union.h
Outdated
} | ||
|
||
private: | ||
TypedBufferBuilder<int8_t> types_builder_; | ||
TypedBufferBuilder<int32_t> offsets_builder_; | ||
std::vector<std::string> field_names_; | ||
std::vector<int8_t> type_id_to_child_num_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be unnecessary if we maintain an invariant on children_.
a1c6225
to
bab294c
Compare
@fsaintjacques Travis is green https://travis-ci.org/bkietz/arrow/builds/556909415 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some small changes and nothing serious. Good job on simplifying the indirection in the builders.
ASSERT_OK(ValidateArray(*array)); | ||
|
||
auto expected_type = list(dictionary(int16(), utf8())); | ||
ASSERT_EQ(array->type()->ToString(), expected_type->ToString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToString? Applies to other reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a hack to get more info out of the assertion failure than "types are unequal".
NB: serialize.cc is probably broken
cdcd799
to
0efe91d
Compare
Ok, it passes locally with lint, so I'll proceed to merge. |
- Simplify DenseUnionBuilder - Add SparesUnionBuilder - MakeBuilder can now produce a {Sparse,Dense}UnionBuilder - ArrayFromJSON can now produce union arrays Author: Benjamin Kietzman <[email protected]> Closes #4781 from bkietz/5588-Better-support-for-building-UnionArrays and squashes the following commits: 38e2828 <Benjamin Kietzman> iwyu #include <limits> 0efe91d <Benjamin Kietzman> address review comments 17e6e27 <Benjamin Kietzman> construct offset_builder_ with a MemoryPool 4131fe3 <Benjamin Kietzman> separate child builder array indexable by type_id fd64c1b <Benjamin Kietzman> rewrite union builder to share a base class, let children_ be indexed by type_id 37de5f2 <Benjamin Kietzman> explicit uint8_t for msvc 673916e <Benjamin Kietzman> Disable ListOfDictionary test until ListBuilder is updated cf1c5be <Benjamin Kietzman> revert changes to reader.cc 5742db9 <Benjamin Kietzman> debugging: highlight the broken case and a similar one 5b1ec93 <Benjamin Kietzman> improve doccomments, dedupe test code 33fade1 <Benjamin Kietzman> Adding support for DenseUnions to ArrayFromJSON 6245c82 <Benjamin Kietzman> add SparseUnionBuilder and MakeBuilder case 8d4f36d <Benjamin Kietzman> add tests for building lists where the value builder has mutable type 351905d <Benjamin Kietzman> add test for lazily typed union builder 7902d12 <Benjamin Kietzman> first pass at updating DenseUnionBuilder d20055a <Benjamin Kietzman> minor refactors, adding some asserts