-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve racecheck errors in ORC kernels #9916
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #9916 +/- ##
================================================
- Coverage 10.49% 10.41% -0.08%
================================================
Files 119 119
Lines 20305 20480 +175
================================================
+ Hits 2130 2134 +4
- Misses 18175 18346 +171
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. Maybe you can double-check whether that syncwarp
after the shfl_sync
is really needed.
cpp/src/io/orc/stripe_data.cu
Outdated
@@ -782,6 +780,7 @@ static __device__ uint32_t Integer_RLEv2(orc_bytestream_s* bs, | |||
pos = shuffle(pos); | |||
n = shuffle(n); | |||
w = shuffle(w); | |||
__syncwarp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this one is needed here, as our shuffle
is an alias for __shfl_sync
, which, to my understanding, would converge threads participating in the shuffle (in our case: there is no mask, so all threads participate).
If, despite, __syncwarp
should be required, we should leave a note that clarifies why we need __syncwarp
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment. Really want to go towards error-free memcheck/racecheck reports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one resolves the following racecheck
warnings, presumably because the tool does not recognize shuffle_sync
as a sync point.
Warning: Race reported between Write access at 0x19520 in /cudf/cpp/src/io/orc/stripe_data.cu:735:unsigned int Integer_RLEv2
and Read access at 0x19be0 in /cudf/cpp/src/io/orc/stripe_data.cu:807:unsigned int Integer_RLEv2[488 hazards]
Warning: Race reported between Write access at 0x19050 in /cudf/cpp/src/io/orc/stripe_data.cu:773:unsigned int Integer_RLEv2
and Read access at 0x196d0 in /cudf/cpp/src/io/orc/stripe_data.cu:816:unsigned int Integer_RLEv2 [16 hazards]
…bug-racecheck-orc
baseval = rle->baseval.u32[r]; | ||
else | ||
baseval = rle->baseval.u64[r]; | ||
for (uint32_t j = tr; j < n; j += 32) { | ||
vals[base + j] += baseval; | ||
} | ||
} | ||
__syncwarp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one fixes the following warning:
Warning: Race reported between Write access at 0x19520 in /cudf/cpp/src/io/orc/stripe_data.cu:735:unsigned int Integer_RLEv2
and Read access at 0x1a4e0 in /cudf/cpp/src/io/orc/stripe_data.cu:865:unsigned int Integer_RLEv2 [8 hazards]
if (s->chunk.type_kind == TIMESTAMP) { | ||
s->top.data.buffered_count = s->top.data.max_vals - numvals; | ||
if (t == 0 && numvals + vals_skipped > 0) { | ||
auto const max_vals = s->top.data.max_vals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workaround for a presumable false positive:
Warning: Race reported between Write access at 0x19520 in /cudf/cpp/src/io/orc/stripe_data.cu:735:unsigned int Integer_RLEv2
and Read access at 0x1a4e0 in /cudf/cpp/src/io/orc/stripe_data.cu:865:unsigned int Integer_RLEv2 [8 hazards]
s->nnz = 0; | ||
s->numvals = 0; | ||
} | ||
if (t == 0) { s->nnz = 0; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes the error:
Error: Race reported between Read access at 0x2ce0 in /cudf/cpp/src/io/orc/stripe_enc.cu:629: encode_null_mask
and Write access at 0x2d30 in /cudf/cpp/src/io/orc/stripe_enc.cu:709:encode_null_mask [8 hazards]
Resetting numvals
can be skipped because it is guaranteed to be zero after the loop above.
pos = min((__ffs(lit_mask) - 1) & 0xff, 32); | ||
auto const symt = (t < batch_len) ? b[t] : 256; | ||
auto const lit_mask = ballot(symt >= 256); | ||
auto pos = min((__ffs(lit_mask) - 1) & 0xff, 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't spot the fix in this file. Is this code cleanup only?
rerun tests |
1 similar comment
rerun tests |
@gpucibot merge |
Running ORC Python tests with
compute-sanitizer --tool racecheck
results in a number of errors/warnings.This PR resolves the errors originating in ORC kernels. Remaining errors come from
gpu_inflate
.Adds a few missing block/warp syncs and minor clean up in the affected code.
Causes
42% slowdown on average in ORC reader benchmarks. Not negligible, will double check whether the changes are required, or just resolving false positives inracecheck
.Ran the benchmarks many more times, and the average time difference is smaller than variations between runs.