Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for overlapping ranges of ARRAY and MAP vectors #10960

Closed
wants to merge 1 commit into from

Conversation

Yuhta
Copy link
Contributor

@Yuhta Yuhta commented Sep 10, 2024

Summary: We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that, and also add the check to BaseVector::validate() method to make it clear it is not valid. Later we will also add the check to ArrayVectorBase constructor.

Differential Revision: D62212238

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2024
Copy link

netlify bot commented Sep 10, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit ad6d14d
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66e8a5c943bb830008d2f712

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62212238

@Yuhta Yuhta changed the title Add check to forbid overlap ranges for ARRAY and MAP vectors Add check for overlapping ranges of ARRAY and MAP vectors Sep 10, 2024
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta thanks for the change % minors.

contains offsets and sizes buffers and an elements vector. Offsets and sizes
are 32-bit integers.
contains offsets and sizes buffers and an elements vector. Offsets and sizes are
32-bit integers. The non-null non-empty ranges formed by offsets and sizes in a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does offsets need to in order in elements vector? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it does not. But by default this is the case for most vectors, so we leverage that in the fast path.

velox/vector/ComplexVector.h Show resolved Hide resolved
template <bool kHasNulls>
vector_size_t ArrayVectorBase::nextNonEmpty(vector_size_t i) const {
while (i < size() &&
((kHasNulls && bits::isBitNull(rawNulls(), i)) || rawSizes_[i] <= 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rawSizes_ could be < 0? Thanks!

Copy link
Contributor Author

@Yuhta Yuhta Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not, but some application may generate such a thing as intermediate state. This is checked in ArrayVectorBase::validateArrayVectorBase before calling this check.

((kHasNulls && bits::isBitNull(rawNulls(), i)) || rawSizes_[i] <= 0)) {
++i;
}
return i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we return std::optional if not found?

Copy link
Contributor Author

@Yuhta Yuhta Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably no need, we will need to check i < size() once again anyway and this won't save us anything in terms of efficiency (it's probably even one more registry used) or code readability.

velox/vector/ComplexVector.cpp Outdated Show resolved Hide resolved
i = nextNonEmpty<kHasNulls>(i);
if (i >= size()) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

vector_size_t prev{-1};
for (;;) {
  const auto next = ...;
  ...
  if (prev != -1 && ...) {
    return false;
  }
  prev = next;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with curr and next. Feels a little bit weird since these are usually used for pointers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge the code in l824 into the loop? curr{-1} can indicate the initial case? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is call to rawOffsets_[curr] so curr = -1 is not valid here

velox/vector/ComplexVector.cpp Outdated Show resolved Hide resolved
indices.push_back(i);
}
}
std::sort(indices.begin(), indices.end(), [&](auto i, auto j) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rawSizes_ could be < 0? If not, why maybeHaveOverlappingRanges check is not sufficient? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rawOffsets_ could be out of order

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 16, 2024
…cubator#10960)

Summary:
Pull Request resolved: facebookincubator#10960

We don't allow overlapping ranges in ARRAY and MAP vectors.  However this is not clear in the code, so we add a method for user to check that, and also add the check to `BaseVector::validate()` method to make it clear it is not valid.  Later we will also add the check to `ArrayVectorBase` constructor.

Differential Revision: D62212238
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62212238

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62212238

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 16, 2024
…cubator#10960)

Summary:
Pull Request resolved: facebookincubator#10960

We don't allow overlapping ranges in ARRAY and MAP vectors.  However this is not clear in the code, so we add a method for user to check that, and also add the check to `BaseVector::validate()` method to make it clear it is not valid.  Later we will also add the check to `ArrayVectorBase` constructor.

Differential Revision: D62212238
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62212238

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 16, 2024
…cubator#10960)

Summary:
Pull Request resolved: facebookincubator#10960

We don't allow overlapping ranges in ARRAY and MAP vectors.  However this is not clear in the code, so we add a method for user to check that.

In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed.

Differential Revision: D62212238
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62212238

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 16, 2024
…cubator#10960)

Summary:
Pull Request resolved: facebookincubator#10960

We don't allow overlapping ranges in ARRAY and MAP vectors.  However this is not clear in the code, so we add a method for user to check that.

In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed.

Differential Revision: D62212238
template <bool kHasNulls>
bool maybeHaveOverlappingRanges() const;

// Return the next non-null non-empty array/map after `index'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Return/Returns/

i = nextNonEmpty<kHasNulls>(i);
if (i >= size()) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge the code in l824 into the loop? curr{-1} can indicate the initial case? Thanks!

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhta LGTM. Thanks!

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62212238

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 16, 2024
…cubator#10960)

Summary:
Pull Request resolved: facebookincubator#10960

We don't allow overlapping ranges in ARRAY and MAP vectors.  However this is not clear in the code, so we add a method for user to check that.

In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed.

Reviewed By: xiaoxmeng

Differential Revision: D62212238
Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 16, 2024
…cubator#10960)

Summary:
Pull Request resolved: facebookincubator#10960

We don't allow overlapping ranges in ARRAY and MAP vectors.  However this is not clear in the code, so we add a method for user to check that.

In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed.

Reviewed By: xiaoxmeng

Differential Revision: D62212238
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62212238

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 16, 2024
…cubator#10960)

Summary:
Pull Request resolved: facebookincubator#10960

We don't allow overlapping ranges in ARRAY and MAP vectors.  However this is not clear in the code, so we add a method for user to check that.

In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed.

Reviewed By: xiaoxmeng

Differential Revision: D62212238
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62212238

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62212238

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 16, 2024
…cubator#10960)

Summary:
Pull Request resolved: facebookincubator#10960

We don't allow overlapping ranges in ARRAY and MAP vectors.  However this is not clear in the code, so we add a method for user to check that.

In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed.

Reviewed By: xiaoxmeng

Differential Revision: D62212238
…cubator#10960)

Summary:
Pull Request resolved: facebookincubator#10960

We don't allow overlapping ranges in ARRAY and MAP vectors.  However this is not clear in the code, so we add a method for user to check that.

In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed.

Reviewed By: xiaoxmeng

Differential Revision: D62212238
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62212238

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 0a9c481.

Copy link

Conbench analyzed the 1 benchmark run on commit 0a9c4812.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants