Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor RecordBatch::validate_new_batch #1361

Merged

Conversation

HaoYang670
Copy link
Contributor

Which issue does this PR close?

Closes #1350.

Rationale for this change

There is some repetitive code in the function validate_new_batch, which are not easy to maintain. We rewrite this function to make it more clear.

Are there any user-facing changes?

No.

simplify the iterator

Signed-off-by: remzi <[email protected]>
@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 25, 2022
@HaoYang670
Copy link
Contributor Author

Performance

run cargo bench -- json_list_primitive_to_record_batch

Before

json_list_primitive_to_record_batch                                                                             
                        time:   [17.513 us 17.536 us 17.563 us]
                        change: [+0.0716% +0.4880% +0.8879%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

After

json_list_primitive_to_record_batch                                                                             
                        time:   [16.754 us 16.769 us 16.785 us]
                        change: [+0.4428% +1.0700% +1.7171%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

No performance penalty.

@codecov-commenter
Copy link

Codecov Report

Merging #1361 (c119805) into master (a69f0d1) will decrease coverage by 0.04%.
The diff coverage is 68.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1361      +/-   ##
==========================================
- Coverage   83.04%   82.99%   -0.05%     
==========================================
  Files         181      181              
  Lines       52925    52929       +4     
==========================================
- Hits        43950    43929      -21     
- Misses       8975     9000      +25     
Impacted Files Coverage Δ
arrow/src/record_batch.rs 94.00% <68.42%> (+2.02%) ⬆️
parquet/src/file/statistics.rs 91.73% <0.00%> (-2.07%) ⬇️
parquet/src/util/cursor.rs 77.31% <0.00%> (-1.69%) ⬇️
parquet/src/schema/types.rs 85.64% <0.00%> (-1.51%) ⬇️
arrow/src/csv/writer.rs 71.32% <0.00%> (-0.82%) ⬇️
arrow/src/json/writer.rs 91.77% <0.00%> (-0.70%) ⬇️
arrow/src/array/array_dictionary.rs 91.12% <0.00%> (-0.47%) ⬇️
arrow/src/array/data.rs 83.15% <0.00%> (-0.44%) ⬇️
arrow/src/datatypes/datatype.rs 66.40% <0.00%> (-0.40%) ⬇️
arrow/src/ipc/writer.rs 83.14% <0.00%> (-0.32%) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a69f0d1...c119805. Read the comment docs.

@HaoYang670
Copy link
Contributor Author

Rebuild after #1362 is fixed.

@HaoYang670 HaoYang670 closed this Feb 25, 2022
@HaoYang670 HaoYang670 reopened this Feb 25, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice cleanup to me. Thank you @HaoYang670

@HaoYang670
Copy link
Contributor Author

Is it ready to be merged?

@alamb alamb merged commit 6de3887 into apache:master Mar 2, 2022
@alamb
Copy link
Contributor

alamb commented Mar 2, 2022

Yes, thank you @HaoYang670 !

@HaoYang670 HaoYang670 deleted the issue1350_refactor_validate_new_batch branch March 2, 2022 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor RecordBatch::validate_new_batch
3 participants