Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RowVector::pushDictionaryToRowVectorLeaves utility #10892

Closed
wants to merge 1 commit into from

Conversation

Yuhta
Copy link
Contributor

@Yuhta Yuhta commented Aug 29, 2024

Summary:
In Nimble writer we need to merge multiple dictionary layers in
different levels of a RowVector into one dictionary layer at the lowest
possible vectors, so that ArrayWithOffsets can be encoded efficiently. Add a
utility to achieve this.

Differential Revision: D61988576

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 29, 2024
Copy link

netlify bot commented Aug 29, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 92876be
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66db09860afc720008f14445

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61988576

}
if (!wrappers.back().indices) {
VELOX_CHECK_NULL(wrappers.back().nulls);
std::vector<BufferPtr> wrapInfos(wrappers.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently wrapInfos are just buffer ptrs for indices. Should we call them indiceHolders/indiceBuffers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but wrapInfos is also aligned with the method name BaseVector::wrapInfo so I would like keep that.

@@ -245,6 +245,15 @@ class RowVector : public BaseVector {
/// Note : If the child is null, then it will stay null after the resize.
void resize(vector_size_t newSize, bool setNotNull = true) override;

/// Push all dictionary encoding to the leave vectors of a RowVector tree. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would still really help to explicitly mention in the comment that this includes flat map semantics and doesn't push down for other complex types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flat map is going to be in the type of RowVector for this use case, so I will just update the comment to emphasize that it is only traversing the RowVector nodes.

velox/vector/ComplexVector.cpp Outdated Show resolved Hide resolved
velox/vector/ComplexVector.cpp Show resolved Hide resolved
for (vector_size_t j = 0; j < size; ++j) {
auto index = j;
bool isNull = false;
for (int i = 0; i < wrappers.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can't simply combine the null buffers bit wise because we need to also include non-null indices pointing to null alphabets. And combining to "combined nulls" at each layer first would require additional memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the order of nulls bits in each layer is different so we cannot do batch combination.

velox/vector/ComplexVector.cpp Show resolved Hide resolved
@@ -681,6 +681,125 @@ void RowVector::resize(vector_size_t newSize, bool setNotNull) {
}
}

namespace {

struct Wrapper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, I can't think of a better name either. It's like a Peer/ParentDictionaryStates at the level of the traversal. Would it be appropriate to call it DictionaryVisitor lol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DictionaryVisitor is not appropriate IMO, a visitor is unique during the traversal, this is created per dictionary node.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61988576

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 6, 2024
…ator#10892)

Summary:
Pull Request resolved: facebookincubator#10892

In Nimble writer we need to merge multiple dictionary layers in
different levels of a `RowVector` into one dictionary layer at the lowest
possible vectors, so that `ArrayWithOffsets` can be encoded efficiently.  Add a
utility to achieve this.

Reviewed By: HuamengJiang

Differential Revision: D61988576
…ator#10892)

Summary:
Pull Request resolved: facebookincubator#10892

In Nimble writer we need to merge multiple dictionary layers in
different levels of a `RowVector` into one dictionary layer at the lowest
possible vectors, so that `ArrayWithOffsets` can be encoded efficiently.  Add a
utility to achieve this.

Reviewed By: HuamengJiang

Differential Revision: D61988576
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61988576

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in e1d8307.

Copy link

Conbench analyzed the 1 benchmark run on commit e1d83072.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants