Codecov Report

Merging #9496 (3c0d862) into branch-21.12 (ab4bfaa) will decrease coverage by 0.12%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-21.12    #9496      +/-   ##
================================================
- Coverage         10.79%   10.66%   -0.13%     
================================================
  Files               116      117       +1     
  Lines             18869    19725     +856     
================================================
+ Hits               2036     2104      +68     
- Misses            16833    17621     +788

Impacted Files	Coverage Δ
python/dask_cudf/dask_cudf/sorting.py	`92.90% <0.00%> (-1.21%)`	⬇️
python/cudf/cudf/io/csv.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/hdf.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/orc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_version.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/abc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/api/types.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/dlpack.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0951ba...3c0d862. Read the comment docs.

robertmaynard

CMake changes 👍

rgsl888prabhu

First pass, looks pretty good, just had few questions.

cpp/src/io/orc/aggregate_orc_metadata.hpp

python/cudf/cudf/tests/test_orc.py

…fea-select-nested-cols-orc

vyasr

Good to see more code in cpp files, and I like the new feature (although I had a question about nesting depth that I've left inline). One overarching question: there's a lot of int32_t usage in this PR, is that because of the ORC specification or should the code be making use of cudf::size_type? Aside from that, I don't know this code too well so I left some questions in places that I didn't understand and comments on things that weren't really changes in this PR. Hopefully they're not too troublesome, feel free to ignore things that you need to.

cpp/src/io/orc/orc.cpp

cpp/src/io/orc/orc.h

cpp/src/io/orc/aggregate_orc_metadata.hpp

cpp/src/io/orc/aggregate_orc_metadata.cpp

cpp/src/io/orc/orc.cpp

vuule · 2021-10-27T00:20:38Z

@vyasr Thank you for the detailed review :)
Regarding

there's a lot of int32_t usage in this PR, is that because of the ORC specification or should the code be making use of cudf::size_type?

I did considered this option. My understanding is that size_type is used to represent the number of rows/row index. The uses here are all column indices. I wasn't sure is size_type applied there too, so I left as int32_t. ORC specs suggest uint32_t, which is more error-prone etc. Open for suggestions, maybe there should even be a new alias in use here.
Edit: switched to size_type :)

Co-authored-by: Vyas Ramasubramani <[email protected]>

vyasr · 2021-10-28T00:41:27Z

@vyasr Thank you for the detailed review :) Regarding

there's a lot of int32_t usage in this PR, is that because of the ORC specification or should the code be making use of cudf::size_type?

I did considered this option. My understanding is that size_type is used to represent the number of rows/row index. The uses here are all column indices. I wasn't sure is size_type applied there too, so I left as int32_t. ORC specs suggest uint32_t, which is more error-prone etc. Open for suggestions, maybe there should even be a new alias in use here. Edit: switched to size_type :)

Just following up even though you did already make the change: I think size_type is also appropriate for column indices. My reasoning is that when we index into columns the expected input is a size_type (for example, via table_view::column), and size_type defines the bounds for the number of columns. I've also seen enough code switching from raw for loops to algorithms where the original for loops used size_type to make me think that was the expected use case for the type.

vyasr

Thanks for answering my questions! This PR LGTM, but would probably benefit from an additional CPP review from a cuIO expert.

vuule · 2021-10-28T06:44:37Z

Thanks for answering my questions! This PR LGTM, but would probably benefit from an additional CPP review from a cuIO expert.

@rgsl888prabhu is the expert on this code, so this should be covered :)

rgsl888prabhu

Code looks good, only have couple of queries on testing.

python/cudf/cudf/tests/test_orc.py

vuule · 2021-10-28T23:18:32Z

@gpucibot merge

vuule added 17 commits October 14, 2021 14:11

add column path

2bdba7e

Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …

865b05f

…fea-select-nested-cols-orc

simplify add_column; remove _has_nested_column

5af011d

further simplify add_column

d76abd0

Merge branch 'orc-reader-remove-has-nested' of https://github.com/vuu…

0251f43

…le/cudf into fea-select-nested-cols-orc

cast

7c8af1f

Merge branch 'orc-reader-remove-has-nested' of https://github.com/vuu…

fca9668

…le/cudf into fea-select-nested-cols-orc

Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …

90e1e84

…fea-select-nested-cols-orc

Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …

30049bc

…fea-select-nested-cols-orc

remove has_timestamp_column

06fca7a

switch to new slection - works for existing cases

06f7cdb

remove old add_column

03bc301

bit of refactor

23287af

levelize -> lambda

8e18a0d

functional column selection

536857f

refactor selected columns into a hierarchy class

1de85ba

"final" fixes!

a32eb96

vuule added feature request New feature or request cuIO cuIO issue non-breaking Non-breaking change labels Oct 22, 2021

vuule self-assigned this Oct 22, 2021

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 22, 2021

vuule added 4 commits October 22, 2021 13:36

select nested even if a subset has previously been selected

fb79740

Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …

eff4dbc

…fea-select-nested-cols-orc

fix C++ test; clean up start

7ceaac4

metadata clean up

c2b087f

final(?) clean up; python test

7cf566d

github-actions bot added the Python Affects Python cuDF API. label Oct 22, 2021

vuule requested review from a team as code owners October 25, 2021 23:46

vuule requested review from vyasr and rgsl888prabhu October 25, 2021 23:46

vuule added the 3 - Ready for Review Ready for review by team label Oct 26, 2021

robertmaynard approved these changes Oct 26, 2021

View reviewed changes

rgsl888prabhu reviewed Oct 26, 2021

View reviewed changes

cpp/src/io/orc/aggregate_orc_metadata.hpp Outdated Show resolved Hide resolved

python/cudf/cudf/tests/test_orc.py Show resolved Hide resolved

vuule added 2 commits October 26, 2021 13:09

Merge branch 'branch-21.12' of https://github.com/rapidsai/cudf into …

4cf5903

…fea-select-nested-cols-orc

add comments

1281df1

vuule requested a review from rgsl888prabhu October 26, 2021 21:00

vyasr requested changes Oct 27, 2021

View reviewed changes

vuule and others added 4 commits October 26, 2021 23:32

Apply suggestions from code review

5c47de3

Co-authored-by: Vyas Ramasubramani <[email protected]>

add docs; remove unused members

03b3932

use size_type

2148b94

more size_type use

3c0d862

vuule requested a review from vyasr October 27, 2021 20:14

vyasr approved these changes Oct 28, 2021

View reviewed changes

rgsl888prabhu reviewed Oct 28, 2021

View reviewed changes

python/cudf/cudf/tests/test_orc.py Show resolved Hide resolved

rgsl888prabhu approved these changes Oct 28, 2021

View reviewed changes

vuule added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Oct 28, 2021

rapids-bot bot merged commit 8e0e70d into rapidsai:branch-21.12 Oct 28, 2021

vuule deleted the fea-select-nested-cols-orc branch October 28, 2021 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More granular column selection in ORC reader #9496

More granular column selection in ORC reader #9496

vuule commented Oct 22, 2021 •

edited

Loading

codecov bot commented Oct 22, 2021 •

edited

Loading

robertmaynard left a comment

rgsl888prabhu left a comment

vyasr left a comment

vuule commented Oct 27, 2021 •

edited

Loading

vyasr commented Oct 28, 2021 •

edited

Loading

vyasr left a comment

vuule commented Oct 28, 2021

rgsl888prabhu left a comment

vuule commented Oct 28, 2021

More granular column selection in ORC reader #9496

More granular column selection in ORC reader #9496

Conversation

vuule commented Oct 22, 2021 • edited Loading

codecov bot commented Oct 22, 2021 • edited Loading

Codecov Report

robertmaynard left a comment

Choose a reason for hiding this comment

rgsl888prabhu left a comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

vuule commented Oct 27, 2021 • edited Loading

vyasr commented Oct 28, 2021 • edited Loading

vyasr left a comment

Choose a reason for hiding this comment

vuule commented Oct 28, 2021

rgsl888prabhu left a comment

Choose a reason for hiding this comment

vuule commented Oct 28, 2021

vuule commented Oct 22, 2021 •

edited

Loading

codecov bot commented Oct 22, 2021 •

edited

Loading

vuule commented Oct 27, 2021 •

edited

Loading

vyasr commented Oct 28, 2021 •

edited

Loading