-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RowVector::pushDictionaryToRowVectorLeaves utility #10892
Conversation
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D61988576 |
velox/vector/ComplexVector.cpp
Outdated
} | ||
if (!wrappers.back().indices) { | ||
VELOX_CHECK_NULL(wrappers.back().nulls); | ||
std::vector<BufferPtr> wrapInfos(wrappers.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently wrapInfos are just buffer ptrs for indices. Should we call them indiceHolders/indiceBuffers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but wrapInfos
is also aligned with the method name BaseVector::wrapInfo
so I would like keep that.
velox/vector/ComplexVector.h
Outdated
@@ -245,6 +245,15 @@ class RowVector : public BaseVector { | |||
/// Note : If the child is null, then it will stay null after the resize. | |||
void resize(vector_size_t newSize, bool setNotNull = true) override; | |||
|
|||
/// Push all dictionary encoding to the leave vectors of a RowVector tree. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it would still really help to explicitly mention in the comment that this includes flat map semantics and doesn't push down for other complex types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flat map is going to be in the type of RowVector
for this use case, so I will just update the comment to emphasize that it is only traversing the RowVector
nodes.
velox/vector/ComplexVector.cpp
Outdated
for (vector_size_t j = 0; j < size; ++j) { | ||
auto index = j; | ||
bool isNull = false; | ||
for (int i = 0; i < wrappers.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can't simply combine the null buffers bit wise because we need to also include non-null indices pointing to null alphabets. And combining to "combined nulls" at each layer first would require additional memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the order of nulls bits in each layer is different so we cannot do batch combination.
@@ -681,6 +681,125 @@ void RowVector::resize(vector_size_t newSize, bool setNotNull) { | |||
} | |||
} | |||
|
|||
namespace { | |||
|
|||
struct Wrapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL, I can't think of a better name either. It's like a Peer/ParentDictionaryStates at the level of the traversal. Would it be appropriate to call it DictionaryVisitor lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DictionaryVisitor
is not appropriate IMO, a visitor is unique during the traversal, this is created per dictionary node.
This pull request was exported from Phabricator. Differential Revision: D61988576 |
70097f1
to
e0e3dc2
Compare
…ator#10892) Summary: Pull Request resolved: facebookincubator#10892 In Nimble writer we need to merge multiple dictionary layers in different levels of a `RowVector` into one dictionary layer at the lowest possible vectors, so that `ArrayWithOffsets` can be encoded efficiently. Add a utility to achieve this. Reviewed By: HuamengJiang Differential Revision: D61988576
…ator#10892) Summary: Pull Request resolved: facebookincubator#10892 In Nimble writer we need to merge multiple dictionary layers in different levels of a `RowVector` into one dictionary layer at the lowest possible vectors, so that `ArrayWithOffsets` can be encoded efficiently. Add a utility to achieve this. Reviewed By: HuamengJiang Differential Revision: D61988576
This pull request was exported from Phabricator. Differential Revision: D61988576 |
e0e3dc2
to
92876be
Compare
This pull request has been merged in e1d8307. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary:
In Nimble writer we need to merge multiple dictionary layers in
different levels of a
RowVector
into one dictionary layer at the lowestpossible vectors, so that
ArrayWithOffsets
can be encoded efficiently. Add autility to achieve this.
Differential Revision: D61988576