Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace parquet writer api with class #7058

Merged

Conversation

rgsl888prabhu
Copy link
Contributor

@rgsl888prabhu rgsl888prabhu commented Dec 30, 2020

This PR contains changes only pertaining to Parquet.

Instead of having API, a class is being used to control state and options to reduce burden on user. For more information look at #6911

These changes will break Java since main API changed.

@rgsl888prabhu rgsl888prabhu added feature request New feature or request 2 - In Progress Currently a work in progress Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer breaking Breaking change labels Dec 30, 2020
@rgsl888prabhu rgsl888prabhu self-assigned this Dec 30, 2020
@rgsl888prabhu rgsl888prabhu requested review from a team as code owners December 30, 2020 22:38
@rgsl888prabhu rgsl888prabhu added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Dec 30, 2020
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks so much better without the state being dragged around!
Not a thorough review, got some high-level design comments for now.

cpp/tests/io/parquet_test.cpp Outdated Show resolved Hide resolved
cpp/include/cudf/io/parquet.hpp Show resolved Hide resolved
cpp/include/cudf/io/parquet.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/io/parquet.hpp Outdated Show resolved Hide resolved
cpp/src/io/functions.cpp Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Dec 31, 2020

Codecov Report

Merging #7058 (c4fed94) into branch-0.18 (8860baf) will increase coverage by 0.08%.
The diff coverage is 87.08%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #7058      +/-   ##
===============================================
+ Coverage        82.09%   82.17%   +0.08%     
===============================================
  Files               97       97              
  Lines            16474    16589     +115     
===============================================
+ Hits             13524    13632     +108     
- Misses            2950     2957       +7     
Impacted Files Coverage Δ
python/cudf/cudf/_lib/__init__.py 100.00% <ø> (ø)
python/cudf/cudf/core/column/lists.py 91.66% <ø> (-0.09%) ⬇️
python/cudf/cudf/utils/ioutils.py 78.71% <ø> (ø)
python/cudf/cudf/io/orc.py 86.89% <63.63%> (-1.51%) ⬇️
python/cudf/cudf/core/dtypes.py 89.50% <66.66%> (-0.88%) ⬇️
python/cudf/cudf/core/dataframe.py 90.49% <72.41%> (-0.22%) ⬇️
python/cudf/cudf/core/series.py 91.10% <80.00%> (-0.06%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.08% <83.33%> (-0.33%) ⬇️
python/cudf/cudf/io/csv.py 91.66% <86.66%> (-1.67%) ⬇️
python/cudf/cudf/core/frame.py 89.90% <88.88%> (-0.08%) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c116e3...c4fed94. Read the comment docs.

Copy link
Contributor

@devavret devavret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM.

cpp/src/io/parquet/writer_impl.cu Show resolved Hide resolved
cpp/src/io/parquet/writer_impl.cu Show resolved Hide resolved
@rgsl888prabhu
Copy link
Contributor Author

@brandon-b-miller

@rgsl888prabhu
Copy link
Contributor Author

@rapidsai/cudf-java-codeowners This might break the Java since API is changing for Parquet Chunked writer, so it might need a PR to accommodate these changes.

@jlowe
Copy link
Member

jlowe commented Jan 21, 2021

This might break the Java since API is changing for Parquet Chunked writer, so it might need a PR to accommodate these changes.

Thanks for the update! I hope to have a PR posted tomorrow with the necessary JNI changes.

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions on some #includes.

cpp/include/cudf/io/detail/parquet.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/io/detail/parquet.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/io/detail/parquet.hpp Outdated Show resolved Hide resolved
@jlowe
Copy link
Member

jlowe commented Jan 22, 2021

The JNI changes for the new Parquet writer API have been posted to #7193.

@rgsl888prabhu rgsl888prabhu requested a review from a team as a code owner January 22, 2021 16:46
@rgsl888prabhu
Copy link
Contributor Author

@brandon-b-miller

Copy link
Contributor

@brandon-b-miller brandon-b-miller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cython approval

@rgsl888prabhu
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 6a4c760 into rapidsai:branch-0.18 Jan 26, 2021
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuDF (Python) Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond breaking Breaking change feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants