-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check for overlapping ranges of ARRAY and MAP vectors #10960
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -809,31 +809,59 @@ VectorPtr RowVector::pushDictionaryToRowVectorLeaves(const VectorPtr& input) { | |
wrappers, input->size(), input, input->pool()); | ||
} | ||
|
||
void ArrayVectorBase::checkRanges() const { | ||
std::unordered_map<vector_size_t, vector_size_t> seenElements; | ||
seenElements.reserve(size()); | ||
template <bool kHasNulls> | ||
vector_size_t ArrayVectorBase::nextNonEmpty(vector_size_t i) const { | ||
while (i < size() && | ||
((kHasNulls && bits::isBitNull(rawNulls(), i)) || rawSizes_[i] <= 0)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rawSizes_ could be < 0? Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should not, but some application may generate such a thing as intermediate state. This is checked in |
||
++i; | ||
} | ||
return i; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall we return std::optional if not found? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably no need, we will need to check |
||
} | ||
|
||
template <bool kHasNulls> | ||
bool ArrayVectorBase::maybeHaveOverlappingRanges() const { | ||
vector_size_t curr = 0; | ||
curr = nextNonEmpty<kHasNulls>(curr); | ||
if (curr >= size()) { | ||
return false; | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you merge the code in l824 into the loop? curr{-1} can indicate the initial case? Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is call to |
||
for (;;) { | ||
auto next = nextNonEmpty<kHasNulls>(curr + 1); | ||
if (next >= size()) { | ||
return false; | ||
} | ||
// This also implicitly ensures rawOffsets_[curr] <= rawOffsets_[next]. | ||
if (rawOffsets_[curr] + rawSizes_[curr] > rawOffsets_[next]) { | ||
return true; | ||
} | ||
curr = next; | ||
} | ||
} | ||
|
||
bool ArrayVectorBase::hasOverlappingRanges() const { | ||
if (!(rawNulls() ? maybeHaveOverlappingRanges<true>() | ||
: maybeHaveOverlappingRanges<false>())) { | ||
return false; | ||
} | ||
std::vector<vector_size_t> indices; | ||
indices.reserve(size()); | ||
for (vector_size_t i = 0; i < size(); ++i) { | ||
auto size = sizeAt(i); | ||
auto offset = offsetAt(i); | ||
|
||
for (vector_size_t j = 0; j < size; ++j) { | ||
auto it = seenElements.find(offset + j); | ||
if (it != seenElements.end()) { | ||
VELOX_FAIL( | ||
"checkRanges() found overlap at idx {}: element {} has offset {} " | ||
"and size {}, and element {} has offset {} and size {}.", | ||
offset + j, | ||
it->second, | ||
offsetAt(it->second), | ||
sizeAt(it->second), | ||
i, | ||
offset, | ||
size); | ||
} | ||
seenElements.emplace(offset + j, i); | ||
const bool isNull = rawNulls() && bits::isBitNull(rawNulls(), i); | ||
if (!isNull && rawSizes_[i] > 0) { | ||
indices.push_back(i); | ||
} | ||
} | ||
std::sort(indices.begin(), indices.end(), [&](auto i, auto j) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rawSizes_ could be < 0? If not, why maybeHaveOverlappingRanges check is not sufficient? Thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
return rawOffsets_[i] < rawOffsets_[j]; | ||
}); | ||
for (vector_size_t i = 1; i < indices.size(); ++i) { | ||
auto j = indices[i - 1]; | ||
auto k = indices[i]; | ||
if (rawOffsets_[j] + rawSizes_[j] > rawOffsets_[k]) { | ||
return true; | ||
} | ||
} | ||
return false; | ||
} | ||
|
||
void ArrayVectorBase::validateArrayVectorBase( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does offsets need to in order in elements vector? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it does not. But by default this is the case for most vectors, so we leverage that in the fast path.