Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix string length in stripe dictionary building #7744

Merged
merged 2 commits into from
Mar 27, 2021
Merged

Fix string length in stripe dictionary building #7744

merged 2 commits into from
Mar 27, 2021

Conversation

kaatish
Copy link
Contributor

@kaatish kaatish commented Mar 26, 2021

In PR #7676 the length of the current string being referred to while building stripe dictionaries was always set to 0 while incrementing the dictionary character count of a StripeDictionary. This led to corrupted strings when the dictionary encoding was used as noted in issue #7741. This has been fixed in this PR.

Fixes #7741

@kaatish kaatish added bug Something isn't working 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue non-breaking Non-breaking change labels Mar 26, 2021
@kaatish kaatish requested review from devavret and vuule March 26, 2021 22:47
@kaatish kaatish self-assigned this Mar 26, 2021
@kaatish kaatish requested a review from a team as a code owner March 26, 2021 22:47
@vuule
Copy link
Contributor

vuule commented Mar 27, 2021

Note: no new tests added because the code path is disabled due to the issue fixed in #7737. The fix will be verified by CI on #7737 (once this is merged). Verified locally by combining the two fixes.

@codecov
Copy link

codecov bot commented Mar 27, 2021

Codecov Report

Merging #7744 (9a67f5a) into branch-0.19 (7871e7a) will increase coverage by 0.67%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7744      +/-   ##
===============================================
+ Coverage        81.86%   82.54%   +0.67%     
===============================================
  Files              101      101              
  Lines            16884    17461     +577     
===============================================
+ Hits             13822    14413     +591     
+ Misses            3062     3048      -14     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/lists.py 87.68% <0.00%> (-3.72%) ⬇️
python/cudf/cudf/core/column/decimal.py 92.95% <0.00%> (-1.92%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.83% <0.00%> (-0.20%) ⬇️
python/cudf/cudf/core/column/column.py 87.61% <0.00%> (-0.15%) ⬇️
python/cudf/cudf/utils/utils.py 85.36% <0.00%> (-0.07%) ⬇️
python/cudf/cudf/io/feather.py 100.00% <0.00%> (ø)
python/cudf/cudf/utils/ioutils.py 78.71% <0.00%> (ø)
python/cudf/cudf/comm/serialize.py 0.00% <0.00%> (ø)
python/cudf/cudf/_fuzz_testing/io.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/struct.py 100.00% <0.00%> (ø)
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0586c4...9a67f5a. Read the comment docs.

@vuule
Copy link
Contributor

vuule commented Mar 27, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit add4b45 into rapidsai:branch-0.19 Mar 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] ORC string dictionary encoding corrupts some characters in the column
3 participants