-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrayData Enumeration for Primitive, Binary and UTF8 #3749
Conversation
|
||
/// A slice-able [`Buffer`] containing bit-packed booleans | ||
#[derive(Debug, Clone)] | ||
pub struct BooleanBuffer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mod private { | ||
use super::*; | ||
|
||
pub trait BytesSealed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This two-level formulation is somewhat arcane, but allows BytesArrayData
to be typed solely on actual physical types, without needing a logical ByteArrayType
.
|
||
/// Describes the physical representation of a given [`DataType`] | ||
#[derive(Debug, Copy, Clone, Ord, PartialOrd, Eq, PartialEq, Hash)] | ||
pub enum PhysicalType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mirrors what the eventual ArrayData
enumeration will look like, with each of these variants corresponding to an ArrayData
variant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only used for validating physical type when constructing typed ArrayData like PrimitiveArrayData.new
does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I anticipate it being used in other safe array data constructors, but yes this is the only current use-case
07c79dc
to
c1527b5
Compare
|
||
/// Cast [`BytesArrayData`] to [`ArrayDataBytesOffset`] | ||
fn upcast<B: Bytes + ?Sized>( | ||
v: BytesArrayData<Self, B>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What difference between ArrayDataBytes
and BytesArrayData
? Seems ArrayDataBytes
is the data of bytes types (binary and str). But BytesArrayData
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it after reading the enums and struct BytesArrayData
below. Maybe a descriptive comment at the top helpful.
pub trait Primitive: private::PrimitiveSealed + ArrowNativeType { | ||
const VARIANT: PrimitiveType; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this trait and the const need document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add some high-level documentation of the ArrayData "pattern" when I switch ArrayData over (and make these public)
arrow-data/src/data/primitive.rs
Outdated
assert!( | ||
matches!(physical, PhysicalType::Primitive(p) if p == T::VARIANT), | ||
"Illegal physical type for PrimitiveArrayData, expected {:?} got {:?}", | ||
T::VARIANT, | ||
physical | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert!( | |
matches!(physical, PhysicalType::Primitive(p) if p == T::VARIANT), | |
"Illegal physical type for PrimitiveArrayData, expected {:?} got {:?}", | |
T::VARIANT, | |
physical | |
); | |
assert!( | |
matches!(physical, PhysicalType::Primitive(p) if p == T::VARIANT), | |
"Illegal physical type for PrimitiveArrayData of datatype {:?}, expected {:?} got {:?}", | |
data_type, | |
T::VARIANT, | |
physical | |
); |
arrow-data/src/data/types.rs
Outdated
pub enum OffsetType { | ||
Small, | ||
Large, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, as this is for physical type, should it be something like Int32, Int64?
arrow-data/src/data/types.rs
Outdated
DataType::Binary => Self::Bytes(OffsetType::Small, BytesType::Binary), | ||
DataType::FixedSizeBinary(_) => Self::FixedSizeBinary, | ||
DataType::LargeBinary => Self::Bytes(OffsetType::Large, BytesType::Binary), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to keep variable length binary close.
DataType::Binary => Self::Bytes(OffsetType::Small, BytesType::Binary), | |
DataType::FixedSizeBinary(_) => Self::FixedSizeBinary, | |
DataType::LargeBinary => Self::Bytes(OffsetType::Large, BytesType::Binary), | |
DataType::Binary => Self::Bytes(OffsetType::Small, BytesType::Binary), | |
DataType::LargeBinary => Self::Bytes(OffsetType::Large, BytesType::Binary), | |
DataType::FixedSizeBinary(_) => Self::FixedSizeBinary, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start.
Benchmark runs are scheduled for baseline = 96791ea and contender = dae7a71. dae7a71 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Part of #1799
Relates to #1176
Closes #1802
Rationale for this change
This starts the process of fleshing out what the elements of the
ArrayData
enumeration will consist of, with implementations for primitive and byte arrays. I wanted to get something up for review to get feedback on the general direction, and to also avoid a single massive PR.What changes are included in this PR?
This PR adds enumerations
ArrayDataPrimitive
andArrayDataBytes
, that consist of the enumerable variants of the genericPrimitiveArrayData
andBytesArrayData
.These provide for an idiomatic way to enumerate the possible array types, with the view that this would eventually replace the current trait object approach.
Additionally
downcast
/downcast_ref
/From
are provided to facilitate going from the enumeration to/from the generic form. This is critical to allowing types such asPrimitiveArray
to be wrappers aroundPrimitiveArrayData
whilst being constructable from anArrayData
enumeration containingArrayDataPrimitive
.I anticipate functionality gradually being moved onto these ArrayData abstractions, to allow code-sharing with arrow2, and to reduce monomorphisation overheads for PrimitiveArray, but I've purposefully kept them simple for now
Are there any user-facing changes?