Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup ArrowNativeType (#1918) #2793

Merged
merged 2 commits into from
Sep 28, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Sep 27, 2022

Which issue does this PR close?

Closes #1918

Rationale for this change

The ArrowNativeType has been a pet peeve of mine for a while, this cleans it up a bit:

  • Deprecates some methods that don't seem to serve a purpose
  • Remove the use of num as it is unnecessary
  • Adds an unchecked conversion to usize

What changes are included in this PR?

Are there any user-facing changes?

ArrowNativeType::to_usize and ArrowNativeType::to_isize take values by value, instead of reference

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 27, 2022
arrow-buffer/src/native.rs Outdated Show resolved Hide resolved
impl ArrowNativeType for $t {
#[inline]
fn from_usize(v: usize) -> Option<Self> {
v.try_into().ok()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TryInto behaves the same for integer types as num::FromPrimitive

@@ -102,7 +101,7 @@ pub unsafe fn unset_bit_raw(data: *mut u8, i: usize) {
pub fn ceil(value: usize, divisor: usize) -> usize {
// Rewrite as `value.div_ceil(&divisor)` after
// https://github.com/rust-lang/rust/issues/88581 is merged.
Integer::div_ceil(&value, &divisor)
value / divisor + (0 != value % divisor) as usize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied wholesale from Integer::div_ceil

@tustvold tustvold changed the title Remove num arrow buffer Remove num dependency from arrow-buffer Sep 27, 2022
Remove num dependency from arrow-buffer

Deprecate unnecessary methods
@tustvold tustvold changed the title Remove num dependency from arrow-buffer Replace checked casts with as for performance (#1918) Sep 27, 2022
@@ -38,7 +38,6 @@ path = "src/lib.rs"
bench = false

[dependencies]
num = { version = "0.4", default-features = false, features = ["std"] }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a dependency elsewhere, but every little helps 😅

@@ -308,20 +308,13 @@ impl<K: ArrowPrimitiveType> DictionaryArray<K> {

/// Return an iterator over the keys (indexes into the dictionary)
pub fn keys_iter(&self) -> impl Iterator<Item = Option<usize>> + '_ {
self.keys
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks make no sense as we have already validated the dictionary offsets are non-negative

/// Convert to usize according to the [`as`] operator
///
/// [`as`]: https://doc.rust-lang.org/reference/expressions/operator-expr.html#numeric-cast
fn as_usize(self) -> usize;
Copy link
Contributor Author

@tustvold tustvold Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't entirely sure about this, but taking a Copy type by reference seems strange, so I changed to take it by value, happy to revert if people feel strongly

@tustvold tustvold added the api-change Changes to the arrow API label Sep 27, 2022
fn from_usize(_: usize) -> Option<Self> {
None
}
/// Convert native integer type from usize
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't change the behaviour, just documents it

@github-actions github-actions bot added the parquet Changes to the parquet crate label Sep 27, 2022
@tustvold tustvold changed the title Replace checked casts with as for performance (#1918) Cleanup ArrowNativeType (#1918) Sep 27, 2022
@tustvold tustvold marked this pull request as ready for review September 27, 2022 21:28
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

impl private::Sealed for f32 {}
impl ArrowNativeType for f64 {}
impl private::Sealed for f64 {}
native_float!(f16, self, self.to_f32() as _);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that as usize on a f32 does the right thing (truncates to integer) rather than reinterpreting as bytes

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e8c093e2d8a2b83c9669c16ee04905ee

@@ -97,20 +97,20 @@ impl Parser for Date32Type {
fn parse(string: &str) -> Option<i32> {
use chrono::Datelike;
let date = string.parse::<chrono::NaiveDate>().ok()?;
Self::Native::from_i32(date.num_days_from_ce() - EPOCH_DAYS_FROM_CE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes? That's why I removed it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, wondering why we had it before. 😄

#[inline]
///
/// Returns `None` if [`Self`] is not `i32`
#[deprecated(note = "please use `Option::Some` instead")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deprecation message looks a bit confused at first glance. I think it means that like Self::Native::from_i32(date.num_days_from_ce() - EPOCH_DAYS_FROM_CE) such a redundant case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case where the method returned something other than None is when the type matches, in which case Some is the same behaviour

@tustvold tustvold merged commit 178319b into apache:master Sep 28, 2022
@ursabot
Copy link

ursabot commented Sep 28, 2022

Benchmark runs are scheduled for baseline = 7639f28 and contender = 178319b. 178319b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb alamb changed the title Cleanup ArrowNativeType (#1918) Cleanup ArrowNativeType (#1918) Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace checked casts with as for performance
4 participants