Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JNI: Support nested types in ORC writer #9334

Merged
merged 9 commits into from
Oct 13, 2021

Conversation

firestarman
Copy link
Contributor

@firestarman firestarman commented Sep 29, 2021

This fixes #9233.

Besides it should also cover lists and maps.

Signed-off-by: Firestarman [email protected]

Including list and struct.

Signed-off-by: Firestarman <[email protected]>
@firestarman firestarman requested a review from a team as a code owner September 29, 2021 06:08
@github-actions github-actions bot added the Java Affects Java cuDF API. label Sep 29, 2021
@firestarman firestarman added 4 - Needs cuDF (Java) Reviewer breaking Breaking change and removed Java Affects Java cuDF API. labels Sep 29, 2021
@firestarman firestarman added Java Affects Java cuDF API. feature request New feature or request labels Sep 29, 2021
@firestarman
Copy link
Contributor Author

NOTE: This is a breaking PR to spark-rapids, and the crossponding PR is here NVIDIA/spark-rapids#3696.

Signed-off-by: Firestarman <[email protected]>
@firestarman
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@razajafri razajafri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit. Why CondensedMetadataWriterOptions? Does FlattenedMetadataWriterOptions make more sense? Or may be even just MetadataWriterOptions?

@firestarman
Copy link
Contributor Author

firestarman commented Oct 8, 2021

Just a nit. Why CondensedMetadataWriterOptions? Does FlattenedMetadataWriterOptions make more sense? Or may be even just MetadataWriterOptions?

I wanted to use CompressedMetadataWriterOptions to cover the public info for both compressionType and metadata inside, but CompressedMetadataWriterOptions is being used, so here comes CondensedMetadataWriterOptions.

The flatting is used internally to pass the nested data to JNI, so I don't want to have it in the class name. But your suggestion made me think that two nouns here should be better, being less confusion. e.g. CompressionMetadataWriterOptions.

@codecov
Copy link

codecov bot commented Oct 8, 2021

Codecov Report

Merging #9334 (db2e510) into branch-21.12 (ab4bfaa) will decrease coverage by 0.03%.
The diff coverage is 0.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9334      +/-   ##
================================================
- Coverage         10.79%   10.75%   -0.04%     
================================================
  Files               116      116              
  Lines             18869    19483     +614     
================================================
+ Hits               2036     2096      +60     
- Misses            16833    17387     +554     
Impacted Files Coverage Δ
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_lib/__init__.py 0.00% <ø> (ø)
python/cudf/cudf/core/_base_index.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/categorical.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/column.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/datetime.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/lists.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/numerical.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/string.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/struct.py 0.00% <0.00%> (ø)
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56eb91a...db2e510. Read the comment docs.

@firestarman
Copy link
Contributor Author

rerun tests

@firestarman
Copy link
Contributor Author

Strange the CI tests failed at random.

@firestarman
Copy link
Contributor Author

CI is blocked by #9408.

Signed-off-by: Firestarman <[email protected]>
@firestarman firestarman changed the title JNI: Support structs and lists in ORC writer JNI: Support nested types in ORC writer Oct 9, 2021
@firestarman
Copy link
Contributor Author

The test failures are not revelant.

Now call the sync version directly.

Signed-off-by: Firestarman <[email protected]>
@abellina
Copy link
Contributor

FYI @firestarman #9406 (comment). I think your diff is good for now (minus a style check I fixed in mine).

Signed-off-by: Firestarman <[email protected]>
@firestarman firestarman added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Oct 12, 2021
@firestarman
Copy link
Contributor Author

DO NOT MERGE this until NVIDIA/spark-rapids#3696 gets approvals.

@firestarman firestarman added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 5 - Ready to Merge Testing and reviews complete, ready to merge labels Oct 12, 2021
@firestarman
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit df27da2 into rapidsai:branch-21.12 Oct 13, 2021
@firestarman firestarman deleted the orc-write branch October 13, 2021 00:10
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuDF (Java) Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Needs Review Waiting for reviewer to review or respond 5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change feature request New feature or request Java Affects Java cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Java APIs for writing structs to ORC
4 participants