-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] run_end_encode
segfaults on arrays with an all-set validity buffer
#36708
Comments
cc @felipecrv |
Chunked arrays are handled by the kernel and even part of the unit tests, but the repro code above does indeed trigger the SIGSEGV. I isolated the cause of the SIGSEGV to some expectations in the allocation code not being preserved. Result<std::shared_ptr<ArrayData>> PreallocateValuesArray(
const std::shared_ptr<DataType>& value_type, bool has_validity_buffer, int64_t length,
int64_t null_count, MemoryPool* pool, int64_t data_buffer_size) {
std::vector<std::shared_ptr<Buffer>> values_data_buffers;
std::shared_ptr<Buffer> validity_buffer = NULLPTR;
if (has_validity_buffer) {
ARROW_ASSIGN_OR_RAISE(validity_buffer, AllocateEmptyBitmap(length, pool));
DCHECK(validity_buffer);
}
ARROW_ASSIGN_OR_RAISE(auto values_buffer, AllocateValuesBuffer(length, *value_type,
pool, data_buffer_size));
if (is_base_binary_like(value_type->id())) {
const int offset_byte_width = offset_bit_width(value_type->id()) / 8;
ARROW_ASSIGN_OR_RAISE(auto offsets_buffer,
AllocateBuffer((length + 1) * offset_byte_width, pool));
// Ensure the first offset is zero
memset(offsets_buffer->mutable_data(), 0, offset_byte_width);
offsets_buffer->ZeroPadding();
values_data_buffers = {validity_buffer, std::move(offsets_buffer),
std::move(values_buffer)};
} else {
values_data_buffers = {validity_buffer, std::move(values_buffer)};
}
auto data = ArrayData::Make(value_type, length, values_data_buffers, null_count);
DCHECK(!has_validity_buffer || validity_buffer.use_count() == 2);
DCHECK(!has_validity_buffer || validity_buffer != NULLPTR);
DCHECK(!has_validity_buffer || data->buffers[0] != NULLPTR);
return data;
}
|
Found the reason: static inline void AdjustNonNullable(Type::type type_id, int64_t length,
std::vector<std::shared_ptr<Buffer>>* buffers,
int64_t* null_count) {
if (type_id == Type::NA) {
*null_count = length;
(*buffers)[0] = nullptr;
} else if (internal::HasValidityBitmap(type_id)) {
if (*null_count == 0) {
// In case there are no nulls, don't keep an allocated null bitmap around
(*buffers)[0] = nullptr;
} else if (*null_count == kUnknownNullCount && buffers->at(0) == nullptr) {
// Conversely, if no null bitmap is provided, set the null count to 0
*null_count = 0;
}
} else {
*null_count = 0;
}
} |
@coady the issue is triggered by giving an input array without nulls that has a non-null validity bitmap buffer. The PR I've sent fixes that. @mapleFU it might be nice to investigate if the reading of Parquet files is unnecessarily producing validity buffers when |
But to be clear: validity bitmap set when |
I guess not:
|
run_end_encode
segfaults on chunked arrays.run_end_encode
segfaults on chunked arrays.
@westonpace for reference, the title of the issue should probably be: "run_end_encode segfaults on arrays with an all-set validity buffer" |
run_end_encode
segfaults on chunked arrays.run_end_encode
segfaults on arrays with an all-set validity buffer
Updated the title. |
…ke sense (#36740) ### Rationale for this change When `has_validity_buffer` is true, we expect validity buffers to be allocated, but if null_count is calculated and ends up being 0, `ArrayData::Make()` will sneakily remove the validity buffer from the physical array for us and the assumption that it exists stops holding and causes a crash. Forcing `null_count` calculation with `input.GetNullCount()` ensures `has_validity_buffer` won't be `true` if the `null_count` on the input ends up being 0. ### What changes are included in this PR? The fix and tests to reproduce it. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #36708 Authored-by: Felipe Oliveira Carvalho <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…ons make sense (apache#36740) ### Rationale for this change When `has_validity_buffer` is true, we expect validity buffers to be allocated, but if null_count is calculated and ends up being 0, `ArrayData::Make()` will sneakily remove the validity buffer from the physical array for us and the assumption that it exists stops holding and causes a crash. Forcing `null_count` calculation with `input.GetNullCount()` ensures `has_validity_buffer` won't be `true` if the `null_count` on the input ends up being 0. ### What changes are included in this PR? The fix and tests to reproduce it. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#36708 Authored-by: Felipe Oliveira Carvalho <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
…ons make sense (apache#36740) ### Rationale for this change When `has_validity_buffer` is true, we expect validity buffers to be allocated, but if null_count is calculated and ends up being 0, `ArrayData::Make()` will sneakily remove the validity buffer from the physical array for us and the assumption that it exists stops holding and causes a crash. Forcing `null_count` calculation with `input.GetNullCount()` ensures `has_validity_buffer` won't be `true` if the `null_count` on the input ends up being 0. ### What changes are included in this PR? The fix and tests to reproduce it. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#36708 Authored-by: Felipe Oliveira Carvalho <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Describe the bug, including details regarding any error messages, version, and platform.
Seems to only reproduce when the array is read. Tested on the latest nightly build: 13.0.0.dev516.
Component(s)
C++, Python
The text was updated successfully, but these errors were encountered: