Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: name some constant values in arrow writer, parquet writer #8642

Merged
merged 3 commits into from
Dec 27, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Dec 24, 2023

Which issue does this PR close?

Follow on to #8608

Rationale for this change

The presence of numbers in the source code was not immediately obvious to me and I think it is clearer when there are comments to guide understanding. See #8608 (comment)

What changes are included in this PR?

Give two constants a name and add some docstrings to explain what they do

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Dec 24, 2023
@devinjdangelo
Copy link
Contributor

devinjdangelo commented Dec 24, 2023

LGTM. I believe the parallel parquet writer has the same two constants as well.

@@ -193,6 +193,13 @@ impl DisplayAs for ArrowFileSink {
}
}

/// Initial writing buffer size. Note this is just a size hint for efficiency. It
/// will grow beyond the set value if needed.
const INITIAL_BUFFER_BYTES: usize = 1048576;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@alamb alamb Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed follow on PR #8656

Update: since this PR wasn't yet approved, I'll just push commits here

@alamb alamb changed the title Minor: name some constant values in arrow writer Minor: name some constant values in arrow writer, parquet writer Dec 26, 2023
@@ -75,6 +75,17 @@ use crate::physical_plan::{
Statistics,
};

/// Size of the buffer for [`AsyncArrowWriter`].
const PARQUET_WRITER_BUFFER_SIZE: usize = 10485760;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why these values are (slightly) different, but I figured we could start by keeping them the same and then could unify them as a follow on if needed

@alamb alamb merged commit 28ca6d1 into apache:main Dec 27, 2023
1 check passed
@alamb
Copy link
Contributor Author

alamb commented Dec 27, 2023

Thank you @andygrove

@alamb alamb deleted the alamb/const branch December 27, 2023 15:08
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 4, 2024
…che#8642)

* Minor: name some constant values in arrow writer

* Add constants to parquet.rs, update doc comments

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants