-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-4178: [C++] Fix TSan and UBSan errors #3334
Conversation
Unfortunately, UBSan flags some flatbuffers usage that I haven't been able to suppress or work around. Typically when serializing a 0-size structure, it seems flatbuffers can call
|
24981ae
to
d151354
Compare
@xhochy @fsaintjacques This might interest you. |
d151354
to
ce4cf44
Compare
ce4cf44
to
b836f73
Compare
Codecov Report
@@ Coverage Diff @@
## master #3334 +/- ##
==========================================
+ Coverage 88.63% 89.74% +1.11%
==========================================
Files 546 489 -57
Lines 73054 69074 -3980
==========================================
- Hits 64749 61993 -2756
+ Misses 8198 7081 -1117
+ Partials 107 0 -107
Continue to review full report at Codecov.
|
if (left.null_count() > 0) { | ||
if (byte_width == 0) { | ||
// Special case 0-width data, as the data pointers may be null | ||
for (int64_t i = 0; i < left.length(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you use the BitmapEquals util instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had forgotten about it. Too late :-/
@@ -160,7 +160,8 @@ struct ByteArray { | |||
}; | |||
|
|||
inline bool operator==(const ByteArray& left, const ByteArray& right) { | |||
return left.len == right.len && 0 == std::memcmp(left.ptr, right.ptr, left.len); | |||
return left.len == right.len && | |||
(left.len == 0 || std::memcmp(left.ptr, right.ptr, left.len) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory you'd have to limit the memcmp
to the smallest length.
@@ -83,7 +83,10 @@ inline int DecodePlain(const uint8_t* data, int64_t data_size, int num_values, | |||
if (data_size < bytes_to_decode) { | |||
ParquetException::EofException(); | |||
} | |||
memcpy(out, data, bytes_to_decode); | |||
// If bytes_to_decode == 0, data could be null | |||
if (bytes_to_decode > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a ARROW_PREDICT_TRUE.
memcpy(Head(), data, length); | ||
size_ += length; | ||
// If length == 0, data may be null | ||
if (length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a ARROW_PREDICT_TRUE.
@@ -102,7 +102,7 @@ class TestPrimitiveReader : public ::testing::Test { | |||
&vresult[0] + total_values_read, &values_read)); | |||
total_values_read += static_cast<int>(values_read); | |||
batch_actual += batch; | |||
batch_size = std::max(batch_size * 2, 4096); | |||
batch_size = std::min(1 << 24, std::max(batch_size * 2, 4096)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding kMinimumBatchSize
kMaximumBatchSize
, bonus point for arrow::clamp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks looking into these tests.
Author: Antoine Pitrou <[email protected]> Closes apache#3334 from pitrou/ARROW-4178-tsan-ubsan-fixes and squashes the following commits: b836f73 <Antoine Pitrou> ARROW-4178: Fix TSan and UBSan errors
No description provided.