Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-4178: [C++] Fix TSan and UBSan errors #3334

Closed
wants to merge 1 commit into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Jan 7, 2019

No description provided.

@pitrou
Copy link
Member Author

pitrou commented Jan 7, 2019

Unfortunately, UBSan flags some flatbuffers usage that I haven't been able to suppress or work around. Typically when serializing a 0-size structure, it seems flatbuffers can call memcpy with a null pointer (which is technically undefined behaviour as per the C++ spec). Here is a typical stack snippet:

/home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:617:18: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:28: note: nonnull attribute specified here
    #0 0x7f6d637ba4e7 in flatbuffers::vector_downward::push(unsigned char const*, unsigned long) /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:617:5
    #1 0x7f6d637b9cea in flatbuffers::FlatBufferBuilder::PushBytes(unsigned char const*, unsigned long) /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:813:10
    #2 0x7f6d638e4308 in flatbuffers::Offset<flatbuffers::Vector<org::apache::arrow::flatbuf::Block const*> > flatbuffers::FlatBufferBuilder::CreateVectorOfStructs<org::apache::arrow::flatbuf::Block>(org::apache::arrow::flatbuf::Block const*, unsigned long) /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1208:5
    #3 0x7f6d638e235f in flatbuffers::Offset<flatbuffers::Vector<org::apache::arrow::flatbuf::Block const*> > flatbuffers::FlatBufferBuilder::CreateVectorOfStructs<org::apache::arrow::flatbuf::Block>(std::vector<org::apache::arrow::flatbuf::Block, std::allocator<org::apache::arrow::flatbuf::Block> > const&) /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1259:12
    #4 0x7f6d638c6e01 in arrow::ipc::internal::FileBlocksToFlatbuffer(flatbuffers::FlatBufferBuilder&, std::vector<arrow::ipc::internal::FileBlock, std::allocator<arrow::ipc::internal::FileBlock> > const&) /home/antoine/arrow/cpp/build-test/../src/arrow/ipc/metadata-internal.cc:804:14

@pitrou pitrou force-pushed the ARROW-4178-tsan-ubsan-fixes branch 2 times, most recently from 24981ae to d151354 Compare January 7, 2019 18:39
@pitrou
Copy link
Member Author

pitrou commented Jan 7, 2019

@xhochy @fsaintjacques This might interest you.

@pitrou pitrou force-pushed the ARROW-4178-tsan-ubsan-fixes branch from d151354 to ce4cf44 Compare January 7, 2019 19:01
@pitrou pitrou force-pushed the ARROW-4178-tsan-ubsan-fixes branch from ce4cf44 to b836f73 Compare January 7, 2019 19:44
@codecov-io
Copy link

Codecov Report

Merging #3334 into master will increase coverage by 1.11%.
The diff coverage is 84.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3334      +/-   ##
==========================================
+ Coverage   88.63%   89.74%   +1.11%     
==========================================
  Files         546      489      -57     
  Lines       73054    69074    -3980     
==========================================
- Hits        64749    61993    -2756     
+ Misses       8198     7081    -1117     
+ Partials      107        0     -107
Impacted Files Coverage Δ
cpp/src/arrow/compute/kernels/cast.cc 91.87% <ø> (ø) ⬆️
cpp/src/arrow/util/thread-pool-test.cc 98.14% <ø> (ø) ⬆️
cpp/src/arrow/util/bit-util-test.cc 99.56% <100%> (ø) ⬆️
cpp/src/arrow/util/parsing.h 95.6% <100%> (ø) ⬆️
cpp/src/parquet/util/memory.cc 86.53% <100%> (+0.08%) ⬆️
cpp/src/parquet/types.h 100% <100%> (ø) ⬆️
cpp/src/arrow/csv/column-builder.cc 95.45% <100%> (-1.95%) ⬇️
cpp/src/arrow/util/int-util.h 100% <100%> (ø)
cpp/src/parquet/arrow/reader.cc 86.94% <100%> (+0.14%) ⬆️
cpp/src/arrow/util/decimal.cc 87.92% <100%> (ø) ⬆️
... and 70 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1aecb98...b836f73. Read the comment docs.

if (left.null_count() > 0) {
if (byte_width == 0) {
// Special case 0-width data, as the data pointers may be null
for (int64_t i = 0; i < left.length(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you use the BitmapEquals util instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten about it. Too late :-/

@@ -160,7 +160,8 @@ struct ByteArray {
};

inline bool operator==(const ByteArray& left, const ByteArray& right) {
return left.len == right.len && 0 == std::memcmp(left.ptr, right.ptr, left.len);
return left.len == right.len &&
(left.len == 0 || std::memcmp(left.ptr, right.ptr, left.len) == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory you'd have to limit the memcmp to the smallest length.

@@ -83,7 +83,10 @@ inline int DecodePlain(const uint8_t* data, int64_t data_size, int num_values,
if (data_size < bytes_to_decode) {
ParquetException::EofException();
}
memcpy(out, data, bytes_to_decode);
// If bytes_to_decode == 0, data could be null
if (bytes_to_decode > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a ARROW_PREDICT_TRUE.

memcpy(Head(), data, length);
size_ += length;
// If length == 0, data may be null
if (length > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a ARROW_PREDICT_TRUE.

@@ -102,7 +102,7 @@ class TestPrimitiveReader : public ::testing::Test {
&vresult[0] + total_values_read, &values_read));
total_values_read += static_cast<int>(values_read);
batch_actual += batch;
batch_size = std::max(batch_size * 2, 4096);
batch_size = std::min(1 << 24, std::max(batch_size * 2, 4096));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding kMinimumBatchSize kMaximumBatchSize, bonus point for arrow::clamp.

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks looking into these tests.

@xhochy xhochy closed this in 4f2f533 Jan 8, 2019
@pitrou pitrou deleted the ARROW-4178-tsan-ubsan-fixes branch January 8, 2019 14:11
emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Jan 10, 2019
Author: Antoine Pitrou <[email protected]>

Closes apache#3334 from pitrou/ARROW-4178-tsan-ubsan-fixes and squashes the following commits:

b836f73 <Antoine Pitrou> ARROW-4178:  Fix TSan and UBSan errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants