-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use default value for decimal precision in parquet writer when not specified #9963
Use default value for decimal precision in parquet writer when not specified #9963
Conversation
After fixing the missing decimal precision, this still fails with
because
I'm thinking the fix could be that we add a libcudf API to read metadata. But this could take longer. Apart from this failure, I see some more in the benchmarks for cc @vuule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this change seems fine, but it sounds like there is more work to be done here as always.
@@ -101,8 +109,17 @@ void BM_parq_read_varying_options(benchmark::State& state) | |||
auto const view = tbl->view(); | |||
|
|||
std::vector<char> parquet_data; | |||
auto table_meta = cudf::io::table_input_metadata(view); | |||
// Precision is required for decimal columns but the value doesn't affect the performance | |||
for (cudf::size_type c = 0; c < view.num_columns(); ++c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (cudf::size_type c = 0; c < view.num_columns(); ++c) { | |
for (auto c = 0; c < view.num_columns(); ++c) { |
This might read easier.
I'm fine with temporarily disabling the Why does the writer require precision to be manually specified? Can we default to the max precision for the input decimal type? |
I think this was avoided because there were some spark rules that limited precision based on decimal type width. And I figured all other libcudf users might not agree with those rules. The precursor to the precision setting in table_input_metadata was a writer option and that also threw when precision wasn't specified. @hyperbolic2346 would know why. Although it wouldn't hurt because precision is merely a schema thing. It doesn't affect the data written into the pages, which are still the underlying rep. |
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #9963 +/- ##
================================================
- Coverage 10.49% 10.45% -0.04%
================================================
Files 119 119
Lines 20305 20417 +112
================================================
+ Hits 2130 2134 +4
- Misses 18175 18283 +108
Continue to review full report at Codecov.
|
rerun tests |
@gpucibot merge |
Fixes #9962