Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix orc reader RLEv2 reader #5473

Merged
merged 9 commits into from
Jun 22, 2020

Conversation

devavret
Copy link
Contributor

@devavret devavret commented Jun 15, 2020

Fixes #5440
Fixes #5324

Opened ORC-642 to track issue in apache's docs.

Patch bit width is only allowed to be from a fixed set of values.
Patch width is to be selected as the smallest value from the set that
fit the required patch size (pw + pgw)
Fixes the narrowing conversion in bytestream reading in patched RLE
@devavret devavret requested review from a team as code owners June 15, 2020 21:12
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@vuule vuule self-requested a review June 15, 2020 21:36
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python approval

@harrism harrism added the 2 - In Progress Currently a work in progress label Jun 16, 2020
@harrism harrism changed the title Fix orc reader RLEv2 reader [WIP] Fix orc reader RLEv2 reader Jun 16, 2020
@codecov
Copy link

codecov bot commented Jun 16, 2020

Codecov Report

Merging #5473 into branch-0.15 will increase coverage by 0.12%.
The diff coverage is 88.05%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.15    #5473      +/-   ##
===============================================
+ Coverage        88.65%   88.77%   +0.12%     
===============================================
  Files               57       57              
  Lines            10945    11003      +58     
===============================================
+ Hits              9703     9768      +65     
+ Misses            1242     1235       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/reshape.py 86.66% <ø> (-0.16%) ⬇️
python/cudf/cudf/utils/cudautils.py 48.32% <0.00%> (+2.11%) ⬆️
python/cudf/cudf/core/index.py 91.53% <50.00%> (ø)
python/cudf/cudf/utils/dtypes.py 87.58% <80.00%> (+1.15%) ⬆️
python/cudf/cudf/core/series.py 91.26% <94.73%> (+0.10%) ⬆️
python/cudf/cudf/core/column/numerical.py 94.42% <100.00%> (+0.04%) ⬆️
python/cudf/cudf/core/column/string.py 85.82% <100.00%> (+0.63%) ⬆️
python/cudf/cudf/core/column/column.py 87.23% <0.00%> (-0.14%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9200be6...9fc167c. Read the comment docs.

Copy link
Contributor

@OlivierNV OlivierNV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 709, l += (pll * (pgw + pw) + 7) >> 3; also needs to be modified to use ClosestFixedBitsMap[] instead of pw + pgw.
Also, we should probably use ClosestFixedBitsMap[min(pw + pgw, 64u)] when indexing to prevent crashing on bad data.

@kkraus14 kkraus14 added the cuIO cuIO issue label Jun 17, 2020
@devavret devavret merged commit 55f213b into rapidsai:branch-0.15 Jun 22, 2020
@harrism
Copy link
Member

harrism commented Jun 22, 2020

Great job and thanks for the fix, @devavret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress cuIO cuIO issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] cudf.read_orc reads incorrect data for one row [BUG] read_orc reads incorrect data on one row
7 participants