Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-5588: [C++] Better support for building union arrays #4781

Closed

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented Jul 2, 2019

  • Simplify DenseUnionBuilder
  • Add SparesUnionBuilder
  • MakeBuilder can now produce a {Sparse,Dense}UnionBuilder
  • ArrayFromJSON can now produce union arrays

@wesm
Copy link
Member

wesm commented Jul 2, 2019

[ RUN      ] TestDictionary.ListOfDictionary
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0702 23:55:33.778414  6046 array.cc:287]  Check failed: list_type_->value_type()->Equals(data->child_data[0]->type)
*** Check failure stack trace: ***
/buildbot/AMD64_Conda_C__/cpp/build-support/run-test.sh: line 97:  6046 Aborted                 (core dumped) $TEST_EXECUTABLE "$@" 2>&1
      6047 Done                    | $ROOT/build-support/asan_symbolize.py
      6048 Done                    | ${CXXFILT:-c++filt}
      6049 Done                    | $ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE
      6050 Done                    | $pipe_cmd 2>&1
      6051 Done                    | tee $LOGFILE
/buildbot/AMD64_Conda_C__/cpp/build/src/arrow

@bkietz
Copy link
Member Author

bkietz commented Jul 3, 2019

I'll disable that test until I can address the issue in ListBuilder

@kszucs kszucs force-pushed the 5588-Better-support-for-building-UnionArrays branch from 566014c to 7b2b41d Compare July 4, 2019 07:57
@kszucs kszucs changed the title ARROW-5588: [C++] better support for building union arrays ARROW-5588: [C++] Better support for building union arrays Jul 4, 2019
@kszucs kszucs force-pushed the 5588-Better-support-for-building-UnionArrays branch from 7b2b41d to a20f739 Compare July 5, 2019 07:11
@emkornfield emkornfield force-pushed the 5588-Better-support-for-building-UnionArrays branch from a20f739 to a1c6225 Compare July 5, 2019 09:04
@fsaintjacques fsaintjacques self-requested a review July 5, 2019 18:42
cpp/src/arrow/array/builder_union.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/builder_union.h Show resolved Hide resolved
cpp/src/arrow/array/builder_union.h Outdated Show resolved Hide resolved
type_id_to_child_num_.resize(union_type->max_type_code() + 1, -1);
DCHECK_EQ(union_type->mode(), UnionMode::DENSE);
int child_i = 0;
for (auto type_id : union_type->type_codes()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it simplify to require for children.size() the same as max_type_code()? If you're not, you're asking for trouble. The user can always fill with NullBuilder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can let children_[i] be the builder for type_id == i instead of the builder which will finish into child_data[i]. The latter seems like it might be an implicit contract though; I'll investigate

cpp/src/arrow/array/builder_union.cc Outdated Show resolved Hide resolved
}

private:
TypedBufferBuilder<int8_t> types_builder_;
TypedBufferBuilder<int32_t> offsets_builder_;
std::vector<std::string> field_names_;
std::vector<int8_t> type_id_to_child_num_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be unnecessary if we maintain an invariant on children_.

@bkietz bkietz force-pushed the 5588-Better-support-for-building-UnionArrays branch from a1c6225 to bab294c Compare July 10, 2019 15:16
@bkietz
Copy link
Member Author

bkietz commented Jul 10, 2019

Copy link
Contributor

@fsaintjacques fsaintjacques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some small changes and nothing serious. Good job on simplifying the indirection in the builders.

cpp/src/arrow/array/builder_union.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/builder_union.h Show resolved Hide resolved
cpp/src/arrow/compare.cc Show resolved Hide resolved
cpp/src/arrow/array/builder_union.cc Show resolved Hide resolved
ASSERT_OK(ValidateArray(*array));

auto expected_type = list(dictionary(int16(), utf8()));
ASSERT_EQ(array->type()->ToString(), expected_type->ToString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToString? Applies to other reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a hack to get more info out of the assertion failure than "types are unequal".

@bkietz bkietz force-pushed the 5588-Better-support-for-building-UnionArrays branch from cdcd799 to 0efe91d Compare July 11, 2019 21:58
@fsaintjacques
Copy link
Contributor

Ok, it passes locally with lint, so I'll proceed to merge.

kszucs pushed a commit that referenced this pull request Jul 22, 2019
- Simplify DenseUnionBuilder
- Add SparesUnionBuilder
- MakeBuilder can now produce a {Sparse,Dense}UnionBuilder
- ArrayFromJSON can now produce union arrays

Author: Benjamin Kietzman <[email protected]>

Closes #4781 from bkietz/5588-Better-support-for-building-UnionArrays and squashes the following commits:

38e2828 <Benjamin Kietzman> iwyu #include <limits>
0efe91d <Benjamin Kietzman> address review comments
17e6e27 <Benjamin Kietzman> construct offset_builder_ with a MemoryPool
4131fe3 <Benjamin Kietzman> separate child builder array indexable by type_id
fd64c1b <Benjamin Kietzman> rewrite union builder to share a base class, let children_ be indexed by type_id
37de5f2 <Benjamin Kietzman> explicit uint8_t for msvc
673916e <Benjamin Kietzman> Disable ListOfDictionary test until ListBuilder is updated
cf1c5be <Benjamin Kietzman> revert changes to reader.cc
5742db9 <Benjamin Kietzman> debugging: highlight the broken case and a similar one
5b1ec93 <Benjamin Kietzman> improve doccomments, dedupe test code
33fade1 <Benjamin Kietzman> Adding support for DenseUnions to ArrayFromJSON
6245c82 <Benjamin Kietzman> add SparseUnionBuilder and MakeBuilder case
8d4f36d <Benjamin Kietzman> add tests for building lists where the value builder has mutable type
351905d <Benjamin Kietzman> add test for lazily typed union builder
7902d12 <Benjamin Kietzman> first pass at updating DenseUnionBuilder
d20055a <Benjamin Kietzman> minor refactors, adding some asserts
@bkietz bkietz deleted the 5588-Better-support-for-building-UnionArrays branch February 25, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants