Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ergonomics of Scalar #4704

Merged
merged 2 commits into from
Aug 17, 2023
Merged

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Originally I wanted to avoid Scalar permitting ownership to discourage constructions like Vec<Scalar>. Unfortunately this had the side-effect of making Scalar quite cumbersome to use. This PR instead changes Scalar to be generic over T: Array, this allows constructions like

Scalar<Int32Array>
Scalar<&Int32Array>
Scalar<ArrayRef>
Scalar<&ArrayRef>

It also allows adding constructor methods to PrimitiveArray and ByteArray, that improve the ergonomics of constructing such scalars.

What changes are included in this PR?

Are there any user-facing changes?

Yes, as &dyn Array: ?Array this is a breaking change.

@tustvold tustvold added the api-change Changes to the arrow API label Aug 16, 2023
@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 16, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an improvement to me. I sill think ScalarValue::new_int etc would be valuable

@@ -101,22 +101,23 @@ impl Datum for &dyn Array {
/// A wrapper around a single value [`Array`] indicating kernels should treat it as a scalar value
///
/// See [`Datum`] for more information
pub struct Scalar<'a>(&'a dyn Array);
#[derive(Debug, Copy, Clone)]
pub struct Scalar<T: Array>(T);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what a user would be looking for is Scalar::new_primitive() and ScalarValue::new_string(). Without thos signatures function, I predict pretty much anyone using Scalars` are going to make some version of the function from #4701

fn make_primitive_scalar<T: num::ToPrimitive + std::fmt::Debug>(
    d: &DataType,
    scalar: T,
) -> Result<ArrayRef, ArrowError> {
    match d {
        DataType::Int8 => {
            let right = try_to_type!(scalar, to_i8)?;
            Ok(Arc::new(PrimitiveArray::<Int8Type>::from(vec![right])))
        }
        DataType::Int16 => {
            let right = try_to_type!(scalar, to_i16)?;
            Ok(Arc::new(PrimitiveArray::<Int16Type>::from(vec![right])))
        }
        DataType::Int32 => {
            let right = try_to_type!(scalar, to_i32)?;
            Ok(Arc::new(PrimitiveArray::<Int32Type>::from(vec![right])))
        }
        DataType::Int64 => {
            let right = try_to_type!(scalar, to_i64)?;
            Ok(Arc::new(PrimitiveArray::<Int64Type>::from(vec![right])))
        }
        DataType::UInt8 => {
            let right = try_to_type!(scalar, to_u8)?;
            Ok(Arc::new(PrimitiveArray::<UInt8Type>::from(vec![right])))
        }
...

Copy link
Contributor Author

@tustvold tustvold Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make some version of the function from

I don't follow this, why would they need to be creating a scalar from a mix of a concrete rust type and an arrow DataType? That function is a temporary compatibility shim to map the old scalar comparison kernels onto the new scalar APIs, and will be deleted in the near future.

The major reason I'm pushing back on this is it opens up a can of worms about how do you specify which particular ArrowPrimitiveType, or OffsetSize, you want, or any metadata like timezones, etc... By keeping things array based we avoid all that nonsense

@@ -182,6 +182,11 @@ impl<T: ByteArrayType> GenericByteArray<T> {
}
}

/// Create a new [`Scalar`] from `v`
pub fn new_scalar(value: impl AsRef<T::Native>) -> Scalar<Self> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about BooleanArray and DictonaryArray and ListArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add BooleanArray, not sure what the signature for nested types should be, and they aren't needed by any kernels anyway so seems unnecessary

@tustvold tustvold merged commit d838194 into apache:master Aug 17, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants