-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] ORC - Support reading multiple orc files/buffers in a single operation #8142
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
Python
Affects Python cuDF API.
libcudf
Affects libcudf (C++/CUDA) code.
labels
May 3, 2021
vuule
added
cuIO
cuIO issue
feature request
New feature or request
non-breaking
Non-breaking change
labels
May 3, 2021
jdye64
changed the title
Orc list files
ORC - Support reading multiple orc files/buffers in a single operation
May 3, 2021
@jdye64, there are failing C++ tests. |
Yes, this PR is still work in progress. I know for sure a few things around column metadata are not working. I am working on those however. Waiting on a local debug build to do more granular fixes just want to get the initial PR out there
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Vukasin Milovanovic ***@***.***>
Sent: Monday, May 3, 2021 2:44:32 PM
To: rapidsai/cudf ***@***.***>
Cc: Jeremy Dyer ***@***.***>; Mention ***@***.***>
Subject: Re: [rapidsai/cudf] ORC - Support reading multiple orc files/buffers in a single operation (#8142)
@jdye64<https://github.com/jdye64>, there are failing C++ tests.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#8142 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAQHLA6OTRIETEPJZA7UHEDTL3VJBANCNFSM44BAQDEQ>.
|
vuule
reviewed
May 24, 2021
….read_orc(...). This allows for single calls to cudf.read_orc(...) and batching multiple read operations into a single read operation and therefore a single resulting dataframe
…t be specified multiple times
…ies multiple stripes from a single ORC file
rerun tests |
1 similar comment
rerun tests |
robertmaynard
approved these changes
Jun 24, 2021
CMake changes LGTM |
isVoid
reviewed
Jun 24, 2021
isVoid
reviewed
Jun 24, 2021
isVoid
reviewed
Jun 24, 2021
isVoid
reviewed
Jun 24, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small nitpicks
isVoid
approved these changes
Jun 24, 2021
galipremsagar
approved these changes
Jun 24, 2021
@gpucibot merge |
galipremsagar
added
5 - Ready to Merge
Testing and reviews complete, ready to merge
and removed
3 - Ready for Review
Ready for review by team
labels
Jun 24, 2021
gerashegalov
approved these changes
Jun 24, 2021
rapids-bot bot
pushed a commit
that referenced
this pull request
Jun 25, 2021
Recent changes in #8142 causes the `cpp/benchmarks/io/orc/orc_reader_benchmark.cpp` compile to fail ``` Building CXX object benchmarks/CMakeFiles/ORC_READER_BENCH.dir/io/orc/orc_reader_benchmark.cpp.o FAILED: benchmarks/CMakeFiles/ORC_READER_BENCH.dir/io/orc/orc_reader_benchmark.cpp.o /usr/local/bin/g++ -DCUDF_VERSION=21.08.00 -DGTEST_LINKED_AS_SHARED_LIBRARY=1 -DJITIFY_USE_CACHE -DSPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_INFO -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -I../benchmarks -I../ -I../src -I_deps/benchmark-src/src/../include -I_deps/jitify-src -I_deps/libcudacxx-src/include -I../include -Iinclude -I_deps/thrust-src -I_deps/thrust-src/dependencies/cub -isystem /conda/envs/rapids/include -isystem /usr/local/cuda/include -O3 -DNDEBUG -fPIE -Wall -Werror -Wno-unknown-pragmas -Wno-error=deprecated-declarations -Wno-deprecated-declarations -pthread -std=gnu++1z -MD -MT benchmarks/CMakeFiles/ORC_READER_BENCH.dir/io/orc/orc_reader_benchmark.cpp.o -MF benchmarks/CMakeFiles/ORC_READER_BENCH.dir/io/orc/orc_reader_benchmark.cpp.o.d -o benchmarks/CMakeFiles/ORC_READER_BENCH.dir/io/orc/orc_reader_benchmark.cpp.o -c ../benchmarks/io/orc/orc_reader_benchmark.cpp ../benchmarks/io/orc/orc_reader_benchmark.cpp: In function ‘void BM_orc_read_varying_options(benchmark::State&)’: ../benchmarks/io/orc/orc_reader_benchmark.cpp:127:36: error: cannot convert ‘vector<int>’ to ‘vector<std::vector<int>>’ 127 | read_options.set_stripes(stripes_to_read); | ^~~~~~~~~~~~~~~ | | | vector<int> In file included from ../benchmarks/io/orc/orc_reader_benchmark.cpp:24: ../include/cudf/io/orc.hpp:145:56: note: initializing argument 1 of ‘void cudf::io::orc_reader_options::set_stripes(std::vector<std::vector<int> >)’ 145 | void set_stripes(std::vector<std::vector<size_type>> stripes) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ ``` This PR fixes the call to `read_options.set_stripes` in the source file. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Christopher Harris (https://github.com/cwharris) - MithunR (https://github.com/mythrocks) URL: #8609
galipremsagar
added
breaking
Breaking change
and removed
non-breaking
Non-breaking change
labels
Jun 29, 2021
rapids-bot bot
pushed a commit
that referenced
this pull request
Jul 9, 2021
The skip_row issue was due to lack of test case and by mistake the essential line of code was removed in #8142. The crash was a corner issue which has been resolved and valid test case has been added. closes #8665 closes #8690 Authors: - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Devavret Makkar (https://github.com/devavret) - Vukasin Milovanovic (https://github.com/vuule) URL: #8700
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
5 - Ready to Merge
Testing and reviews complete, ready to merge
breaking
Breaking change
CMake
CMake build issue
cuIO
cuIO issue
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Python
Affects Python cuDF API.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR modifies the python, cython, cpp, and cuda code to support for reading multiple files in
read_orc
I have tried to change as little of the logic code as possible and simply added wrappers where possible.This closes #7828