Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable decimal support in parquet writer #7673

Merged

Conversation

devavret
Copy link
Contributor

@devavret devavret commented Mar 22, 2021

Resolves #7669

@devavret devavret requested a review from a team as a code owner March 22, 2021 22:33
@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 22, 2021
@devavret devavret added the non-breaking Non-breaking change label Mar 22, 2021
@kkraus14 kkraus14 added the feature request New feature or request label Mar 22, 2021
@kkraus14
Copy link
Collaborator

@devavret are there any issues with round tripping decimal64 vs decimal32 in generating Arrow / Pandas compatible metadata, or is that handled by the logical / physical types in Parquet anyway?

@codecov
Copy link

codecov bot commented Mar 23, 2021

Codecov Report

Merging #7673 (44a2457) into branch-0.19 (7871e7a) will increase coverage by 0.22%.
The diff coverage is n/a.

❗ Current head 44a2457 differs from pull request most recent head 38c0c49. Consider uploading reports for the commit 38c0c49 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7673      +/-   ##
===============================================
+ Coverage        81.86%   82.08%   +0.22%     
===============================================
  Files              101      101              
  Lines            16884    17036     +152     
===============================================
+ Hits             13822    13984     +162     
+ Misses            3062     3052      -10     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/categorical.py 91.62% <ø> (+0.23%) ⬆️
python/cudf/cudf/core/column/column.py 87.77% <ø> (+0.01%) ⬆️
python/cudf/cudf/core/column/datetime.py 89.09% <ø> (ø)
python/cudf/cudf/core/column/decimal.py 92.75% <ø> (-2.12%) ⬇️
python/cudf/cudf/core/column/lists.py 91.81% <ø> (+0.42%) ⬆️
python/cudf/cudf/core/column/numerical.py 94.83% <ø> (-0.20%) ⬇️
python/cudf/cudf/core/column/string.py 86.58% <ø> (+0.08%) ⬆️
python/cudf/cudf/core/column/timedelta.py 88.23% <ø> (ø)
python/cudf/cudf/core/column_accessor.py 95.28% <ø> (-0.03%) ⬇️
python/cudf/cudf/core/dataframe.py 90.79% <ø> (+0.32%) ⬆️
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c21bd0e...38c0c49. Read the comment docs.

@devavret
Copy link
Contributor Author

@devavret are there any issues with round tripping decimal64 vs decimal32 in generating Arrow / Pandas compatible metadata, or is that handled by the logical / physical types in Parquet anyway?

Correct me if I'm wrong but I thought cudf only had decimal64

class Decimal64Dtype(ExtensionDtype):

@kkraus14
Copy link
Collaborator

kkraus14 commented Mar 23, 2021

@devavret are there any issues with round tripping decimal64 vs decimal32 in generating Arrow / Pandas compatible metadata, or is that handled by the logical / physical types in Parquet anyway?

Correct me if I'm wrong but I thought cudf only had decimal64

class Decimal64Dtype(ExtensionDtype):

For now yes, but Decimal32 is planned to be supported in the future and we want to make sure that we don't introduce changes into the Parquet metadata that aren't forward compatible if possible.

@devavret
Copy link
Contributor Author

@devavret are there any issues with round tripping decimal64 vs decimal32 in generating Arrow / Pandas compatible metadata, or is that handled by the logical / physical types in Parquet anyway?

Correct me if I'm wrong but I thought cudf only had decimal64

class Decimal64Dtype(ExtensionDtype):

For now yes, but Decimal32 is planned to be supported in the future and we want to make sure that we don't introduce changes into the Parquet metadata that aren't forward compatible if possible.

Alright then yes, for that, it will be taken care of during the conversion of Column to cudf::column_view. And as long as the dtype class for Decimal32 has a precision member then no changes would be required to the bindings either.

@kkraus14
Copy link
Collaborator

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 5cd90a0 into rapidsai:branch-0.19 Mar 23, 2021
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuDF (Python) Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond cuIO cuIO issue feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Can't write Parquet file containing Decimal/Fixed Point column(s)
3 participants