-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check for overlapping ranges of ARRAY and MAP vectors #10960
Conversation
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D62212238 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yuhta thanks for the change % minors.
contains offsets and sizes buffers and an elements vector. Offsets and sizes | ||
are 32-bit integers. | ||
contains offsets and sizes buffers and an elements vector. Offsets and sizes are | ||
32-bit integers. The non-null non-empty ranges formed by offsets and sizes in a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does offsets need to in order in elements vector? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it does not. But by default this is the case for most vectors, so we leverage that in the fast path.
template <bool kHasNulls> | ||
vector_size_t ArrayVectorBase::nextNonEmpty(vector_size_t i) const { | ||
while (i < size() && | ||
((kHasNulls && bits::isBitNull(rawNulls(), i)) || rawSizes_[i] <= 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rawSizes_ could be < 0? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not, but some application may generate such a thing as intermediate state. This is checked in ArrayVectorBase::validateArrayVectorBase
before calling this check.
((kHasNulls && bits::isBitNull(rawNulls(), i)) || rawSizes_[i] <= 0)) { | ||
++i; | ||
} | ||
return i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we return std::optional if not found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably no need, we will need to check i < size()
once again anyway and this won't save us anything in terms of efficiency (it's probably even one more registry used) or code readability.
i = nextNonEmpty<kHasNulls>(i); | ||
if (i >= size()) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
vector_size_t prev{-1};
for (;;) {
const auto next = ...;
...
if (prev != -1 && ...) {
return false;
}
prev = next;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with curr
and next
. Feels a little bit weird since these are usually used for pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you merge the code in l824 into the loop? curr{-1} can indicate the initial case? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is call to rawOffsets_[curr]
so curr = -1
is not valid here
indices.push_back(i); | ||
} | ||
} | ||
std::sort(indices.begin(), indices.end(), [&](auto i, auto j) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rawSizes_ could be < 0? If not, why maybeHaveOverlappingRanges check is not sufficient? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rawOffsets_
could be out of order
…cubator#10960) Summary: Pull Request resolved: facebookincubator#10960 We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that, and also add the check to `BaseVector::validate()` method to make it clear it is not valid. Later we will also add the check to `ArrayVectorBase` constructor. Differential Revision: D62212238
876919c
to
851c3e1
Compare
This pull request was exported from Phabricator. Differential Revision: D62212238 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D62212238 |
…cubator#10960) Summary: Pull Request resolved: facebookincubator#10960 We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that, and also add the check to `BaseVector::validate()` method to make it clear it is not valid. Later we will also add the check to `ArrayVectorBase` constructor. Differential Revision: D62212238
851c3e1
to
396b8ea
Compare
This pull request was exported from Phabricator. Differential Revision: D62212238 |
…cubator#10960) Summary: Pull Request resolved: facebookincubator#10960 We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that. In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed. Differential Revision: D62212238
396b8ea
to
d3bcb9e
Compare
This pull request was exported from Phabricator. Differential Revision: D62212238 |
…cubator#10960) Summary: Pull Request resolved: facebookincubator#10960 We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that. In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed. Differential Revision: D62212238
d3bcb9e
to
77e95dd
Compare
velox/vector/ComplexVector.h
Outdated
template <bool kHasNulls> | ||
bool maybeHaveOverlappingRanges() const; | ||
|
||
// Return the next non-null non-empty array/map after `index'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Return/Returns/
i = nextNonEmpty<kHasNulls>(i); | ||
if (i >= size()) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you merge the code in l824 into the loop? curr{-1} can indicate the initial case? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yuhta LGTM. Thanks!
This pull request was exported from Phabricator. Differential Revision: D62212238 |
…cubator#10960) Summary: Pull Request resolved: facebookincubator#10960 We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that. In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed. Reviewed By: xiaoxmeng Differential Revision: D62212238
77e95dd
to
1387e1f
Compare
…cubator#10960) Summary: Pull Request resolved: facebookincubator#10960 We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that. In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed. Reviewed By: xiaoxmeng Differential Revision: D62212238
This pull request was exported from Phabricator. Differential Revision: D62212238 |
1387e1f
to
e021540
Compare
…cubator#10960) Summary: Pull Request resolved: facebookincubator#10960 We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that. In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed. Reviewed By: xiaoxmeng Differential Revision: D62212238
e021540
to
888c044
Compare
This pull request was exported from Phabricator. Differential Revision: D62212238 |
This pull request was exported from Phabricator. Differential Revision: D62212238 |
…cubator#10960) Summary: Pull Request resolved: facebookincubator#10960 We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that. In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed. Reviewed By: xiaoxmeng Differential Revision: D62212238
888c044
to
78e74e8
Compare
…cubator#10960) Summary: Pull Request resolved: facebookincubator#10960 We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that. In the future we will also add the check to `BaseVector::validate()` and `ArrayVectorBase` constructor to make it clear it is not allowed. Reviewed By: xiaoxmeng Differential Revision: D62212238
This pull request was exported from Phabricator. Differential Revision: D62212238 |
78e74e8
to
ad6d14d
Compare
This pull request has been merged in 0a9c481. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: We don't allow overlapping ranges in ARRAY and MAP vectors. However this is not clear in the code, so we add a method for user to check that, and also add the check to
BaseVector::validate()
method to make it clear it is not valid. Later we will also add the check toArrayVectorBase
constructor.Differential Revision: D62212238