Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve racecheck errors in ORC kernels #9916

Merged
merged 8 commits into from
Jan 7, 2022

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Dec 16, 2021

Running ORC Python tests with compute-sanitizer --tool racecheck results in a number of errors/warnings.
This PR resolves the errors originating in ORC kernels. Remaining errors come from gpu_inflate.

Adds a few missing block/warp syncs and minor clean up in the affected code.

Causes 42% slowdown on average in ORC reader benchmarks. Not negligible, will double check whether the changes are required, or just resolving false positives in racecheck.
Ran the benchmarks many more times, and the average time difference is smaller than variations between runs.

@vuule vuule added bug Something isn't working cuIO cuIO issue labels Dec 16, 2021
@vuule vuule self-assigned this Dec 16, 2021
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 16, 2021
@vuule vuule added the non-breaking Non-breaking change label Dec 16, 2021
@codecov
Copy link

codecov bot commented Dec 16, 2021

Codecov Report

Merging #9916 (79c3eee) into branch-22.02 (967a333) will decrease coverage by 0.07%.
The diff coverage is n/a.

❗ Current head 79c3eee differs from pull request most recent head 76cc2b0. Consider uploading reports for the commit 76cc2b0 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.02    #9916      +/-   ##
================================================
- Coverage         10.49%   10.41%   -0.08%     
================================================
  Files               119      119              
  Lines             20305    20480     +175     
================================================
+ Hits               2130     2134       +4     
- Misses            18175    18346     +171     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.30% <0.00%> (-0.61%) ⬇️
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/series.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/utils.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/dtypes.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/ioutils.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dataframe.py 0.00% <0.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d69ea61...76cc2b0. Read the comment docs.

@vuule vuule requested a review from elstehle December 16, 2021 06:46
Copy link
Contributor

@elstehle elstehle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. Maybe you can double-check whether that syncwarp after the shfl_sync is really needed.

@@ -782,6 +780,7 @@ static __device__ uint32_t Integer_RLEv2(orc_bytestream_s* bs,
pos = shuffle(pos);
n = shuffle(n);
w = shuffle(w);
__syncwarp();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this one is needed here, as our shuffle is an alias for __shfl_sync, which, to my understanding, would converge threads participating in the shuffle (in our case: there is no mask, so all threads participate).
If, despite, __syncwarp should be required, we should leave a note that clarifies why we need __syncwarp here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment. Really want to go towards error-free memcheck/racecheck reports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one resolves the following racecheck warnings, presumably because the tool does not recognize shuffle_sync as a sync point.

Warning: Race reported between Write access at 0x19520 in /cudf/cpp/src/io/orc/stripe_data.cu:735:unsigned int Integer_RLEv2
and Read access at 0x19be0 in /cudf/cpp/src/io/orc/stripe_data.cu:807:unsigned int Integer_RLEv2[488 hazards]

Warning: Race reported between Write access at 0x19050 in /cudf/cpp/src/io/orc/stripe_data.cu:773:unsigned int Integer_RLEv2
and Read access at 0x196d0 in /cudf/cpp/src/io/orc/stripe_data.cu:816:unsigned int Integer_RLEv2 [16 hazards]

cpp/src/io/orc/stripe_data.cu Show resolved Hide resolved
baseval = rle->baseval.u32[r];
else
baseval = rle->baseval.u64[r];
for (uint32_t j = tr; j < n; j += 32) {
vals[base + j] += baseval;
}
}
__syncwarp();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one fixes the following warning:

Warning: Race reported between Write access at 0x19520 in /cudf/cpp/src/io/orc/stripe_data.cu:735:unsigned int Integer_RLEv2
and Read access at 0x1a4e0 in /cudf/cpp/src/io/orc/stripe_data.cu:865:unsigned int Integer_RLEv2 [8 hazards]

if (s->chunk.type_kind == TIMESTAMP) {
s->top.data.buffered_count = s->top.data.max_vals - numvals;
if (t == 0 && numvals + vals_skipped > 0) {
auto const max_vals = s->top.data.max_vals;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workaround for a presumable false positive:

Warning: Race reported between Write access at 0x19520 in /cudf/cpp/src/io/orc/stripe_data.cu:735:unsigned int Integer_RLEv2
and Read access at 0x1a4e0 in /cudf/cpp/src/io/orc/stripe_data.cu:865:unsigned int Integer_RLEv2 [8 hazards]

s->nnz = 0;
s->numvals = 0;
}
if (t == 0) { s->nnz = 0; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes the error:

Error: Race reported between Read access at 0x2ce0 in /cudf/cpp/src/io/orc/stripe_enc.cu:629: encode_null_mask
and Write access at 0x2d30 in /cudf/cpp/src/io/orc/stripe_enc.cu:709:encode_null_mask [8 hazards]

Resetting numvals can be skipped because it is guaranteed to be zero after the loop above.

@vuule vuule marked this pull request as ready for review December 17, 2021 23:17
@vuule vuule requested a review from a team as a code owner December 17, 2021 23:17
pos = min((__ffs(lit_mask) - 1) & 0xff, 32);
auto const symt = (t < batch_len) ? b[t] : 256;
auto const lit_mask = ballot(symt >= 256);
auto pos = min((__ffs(lit_mask) - 1) & 0xff, 32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't spot the fix in this file. Is this code cleanup only?

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Jan 4, 2022
@vuule
Copy link
Contributor Author

vuule commented Jan 5, 2022

rerun tests

1 similar comment
@vuule
Copy link
Contributor Author

vuule commented Jan 7, 2022

rerun tests

@vuule
Copy link
Contributor Author

vuule commented Jan 7, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit de8c0b8 into rapidsai:branch-22.02 Jan 7, 2022
@vuule vuule deleted the bug-racecheck-orc branch January 7, 2022 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants