Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix debrotli issue on CUDA 11.5 #9632

Merged
merged 10 commits into from
Nov 11, 2021

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Nov 8, 2021

Closes #9546

This PR fixes the issue likely through elimination of undefined behavior.
Modified local heap implementation to return void* instead on uint8_t*. This greatly reduces the number of reinterpret_casts. Also changed heap type to char*, presumably reducing/eliminating aliasing issues.
Some other clean up in related code included.

@vuule vuule added bug Something isn't working cuIO cuIO issue non-breaking Non-breaking change labels Nov 8, 2021
@vuule vuule self-assigned this Nov 8, 2021
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Nov 8, 2021
@galipremsagar
Copy link
Contributor

I can confirm that this fixes issue in both cuda 11.4 & 11.5 on T4 gpus.

@vuule
Copy link
Contributor Author

vuule commented Nov 8, 2021

rerun tests

@codecov
Copy link

codecov bot commented Nov 8, 2021

Codecov Report

Merging #9632 (e13c2ec) into branch-21.12 (ab4bfaa) will decrease coverage by 0.10%.
The diff coverage is n/a.

❗ Current head e13c2ec differs from pull request most recent head 725f9d9. Consider uploading reports for the commit 725f9d9 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9632      +/-   ##
================================================
- Coverage         10.79%   10.68%   -0.11%     
================================================
  Files               116      117       +1     
  Lines             18869    19872    +1003     
================================================
+ Hits               2036     2123      +87     
- Misses            16833    17749     +916     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.90% <0.00%> (-1.21%) ⬇️
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
... and 67 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d60e2e6...725f9d9. Read the comment docs.

@galipremsagar
Copy link
Contributor

rerun tests

@vuule
Copy link
Contributor Author

vuule commented Nov 9, 2021

@galipremsagar could you run a round of fuzz tests with brotli to confirm the fix?

@vuule vuule marked this pull request as ready for review November 9, 2021 22:05
@vuule vuule requested review from a team as code owners November 9, 2021 22:05
volatile uint32_t* heap_ptr = reinterpret_cast<volatile uint32_t*>(ext_heap_base);
uint32_t first_free_block = ~0;
auto const len = (bytes + 0xf) & ~0xf;
auto const heap_ptr = static_cast<volatile uint32_t*>(ext_heap_base);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is const and volatile at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer is const, and the data it points to is volatile.
I can avoid auto here so the type becomes uint32_t volatile* const, which reads fine (right to left).

Copy link
Contributor

@davidwendt davidwendt Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense. I'd rather see the const on the left of the equal sign.

@galipremsagar
Copy link
Contributor

galipremsagar commented Nov 10, 2021

@vuule In fuzz-testing we ran into a data-corruption issue with brotli encoding in reader:
test_data.zip - Unzip this file, it will contain two parquet files. We will use only cpu_pdf.parquet which is brotli encoded. The gpu_pdf.parquet is not encoded.

>>> import cudf
>>> actual = cudf.read_parquet('cpu_pdf.parquet')
>>> expected = pd.read_parquet('cpu_pdf.parquet')
(Pdb) actual['21'] # See the last value, it should be 27195 instead of 0.
0       -28871
1         <NA>
2        18224
3       -13182
4         <NA>
         ...  
9426      6841
9427      3473
9428     31914
9429    -22488
9430         0
Name: 21, Length: 9431, dtype: int16
(Pdb) expected['21']
0       -28871
1         <NA>
2        18224
3       -13182
4         <NA>
         ...  
9426      6841
9427      3473
9428     31914
9429    -22488
9430     27195
Name: 21, Length: 9431, dtype: Int16

(Pdb) expected['21'][actual['21'].to_pandas(nullable=True) != expected['21']] # 1339 out of 9431 values are corrupted.
6571    -25740
6573      6433
6575    -21601
6576     19348
6577      4142
         ...  
9418     28837
9420      4155
9423      8501
9424     28381
9430     27195
Name: 21, Length: 1339, dtype: Int16

Copy link
Member

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving python changes

@vuule vuule added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 10, 2021
@vuule vuule removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 10, 2021
@ttnghia
Copy link
Contributor

ttnghia commented Nov 11, 2021

Rerun tests.

@vuule vuule added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 11, 2021
@galipremsagar
Copy link
Contributor

Verified with fuzz-testing, no issues found. 🎉

@vuule vuule added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 5 - DO NOT MERGE Hold off on merging; see PR for details labels Nov 11, 2021
@vuule
Copy link
Contributor Author

vuule commented Nov 11, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 507a3c5 into rapidsai:branch-21.12 Nov 11, 2021
@vuule vuule deleted the bug-debrotli-115 branch April 20, 2022 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Data-corruption while reading a parquet file in cudf built with cuda 11.5
6 participants