Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] ORC - Support reading multiple orc files/buffers in a single operation #8142

Merged
merged 32 commits into from
Jun 25, 2021

Conversation

jdye64
Copy link
Contributor

@jdye64 jdye64 commented May 3, 2021

This PR modifies the python, cython, cpp, and cuda code to support for reading multiple files in read_orc I have tried to change as little of the logic code as possible and simply added wrappers where possible.

This closes #7828

@jdye64 jdye64 requested review from a team as code owners May 3, 2021 15:28
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels May 3, 2021
@jdye64 jdye64 marked this pull request as draft May 3, 2021 15:28
@vuule vuule added cuIO cuIO issue feature request New feature or request non-breaking Non-breaking change labels May 3, 2021
@jdye64 jdye64 changed the title Orc list files ORC - Support reading multiple orc files/buffers in a single operation May 3, 2021
@vuule
Copy link
Contributor

vuule commented May 3, 2021

@jdye64, there are failing C++ tests.

@jdye64
Copy link
Contributor Author

jdye64 commented May 3, 2021 via email

@karthikeyann karthikeyann added the 2 - In Progress Currently a work in progress label May 10, 2021
@github-actions github-actions bot added CMake CMake build issue conda labels May 21, 2021
@github-actions github-actions bot removed conda CMake CMake build issue labels May 24, 2021
….read_orc(...). This allows for single calls to cudf.read_orc(...) and batching multiple read operations into a single read operation and therefore a single resulting dataframe
@jdye64
Copy link
Contributor Author

jdye64 commented Jun 24, 2021

rerun tests

1 similar comment
@rgsl888prabhu
Copy link
Contributor

rerun tests

@github-actions github-actions bot added the CMake CMake build issue label Jun 24, 2021
@robertmaynard
Copy link
Contributor

CMake changes LGTM

python/cudf/cudf/io/orc.py Outdated Show resolved Hide resolved
python/cudf/cudf/io/orc.py Outdated Show resolved Hide resolved
Copy link
Contributor

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small nitpicks

@rgsl888prabhu
Copy link
Contributor

@gpucibot merge

@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Jun 24, 2021
@rapids-bot rapids-bot bot merged commit 58438c0 into rapidsai:branch-21.08 Jun 25, 2021
rapids-bot bot pushed a commit that referenced this pull request Jun 25, 2021
Recent changes in #8142 causes the `cpp/benchmarks/io/orc/orc_reader_benchmark.cpp` compile to fail

```
Building CXX object benchmarks/CMakeFiles/ORC_READER_BENCH.dir/io/orc/orc_reader_benchmark.cpp.o
FAILED: benchmarks/CMakeFiles/ORC_READER_BENCH.dir/io/orc/orc_reader_benchmark.cpp.o 
/usr/local/bin/g++ -DCUDF_VERSION=21.08.00 -DGTEST_LINKED_AS_SHARED_LIBRARY=1 -DJITIFY_USE_CACHE -DSPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_INFO -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -I../benchmarks -I../ -I../src -I_deps/benchmark-src/src/../include -I_deps/jitify-src -I_deps/libcudacxx-src/include -I../include -Iinclude -I_deps/thrust-src -I_deps/thrust-src/dependencies/cub -isystem /conda/envs/rapids/include -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIE -Wall -Werror -Wno-unknown-pragmas -Wno-error=deprecated-declarations -Wno-deprecated-declarations -pthread -std=gnu++1z -MD -MT benchmarks/CMakeFiles/ORC_READER_BENCH.dir/io/orc/orc_reader_benchmark.cpp.o -MF benchmarks/CMakeFiles/ORC_READER_BENCH.dir/io/orc/orc_reader_benchmark.cpp.o.d -o benchmarks/CMakeFiles/ORC_READER_BENCH.dir/io/orc/orc_reader_benchmark.cpp.o -c ../benchmarks/io/orc/orc_reader_benchmark.cpp
../benchmarks/io/orc/orc_reader_benchmark.cpp: In function ‘void BM_orc_read_varying_options(benchmark::State&)’:
../benchmarks/io/orc/orc_reader_benchmark.cpp:127:36: error: cannot convert ‘vector<int>’ to ‘vector<std::vector<int>>’
  127 |           read_options.set_stripes(stripes_to_read);
      |                                    ^~~~~~~~~~~~~~~
      |                                    |
      |                                    vector<int>
In file included from ../benchmarks/io/orc/orc_reader_benchmark.cpp:24:
../include/cudf/io/orc.hpp:145:56: note:   initializing argument 1 of ‘void cudf::io::orc_reader_options::set_stripes(std::vector<std::vector<int> >)’
  145 |   void set_stripes(std::vector<std::vector<size_type>> stripes)
      |                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~

```

This PR fixes the call to `read_options.set_stripes` in the source file.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Christopher Harris (https://github.com/cwharris)
  - MithunR (https://github.com/mythrocks)

URL: #8609
@galipremsagar galipremsagar added breaking Breaking change and removed non-breaking Non-breaking change labels Jun 29, 2021
rapids-bot bot pushed a commit that referenced this pull request Jul 9, 2021
The skip_row issue was due to lack of test case and by mistake the essential line of code  was removed in #8142.

The crash was a corner issue which has been resolved and valid test case has been added.

closes #8665 
closes #8690

Authors:
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Devavret Makkar (https://github.com/devavret)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #8700
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change CMake CMake build issue cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support multiple inputs in ORC reader
9 participants