Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seal ArrowNativeType and OffsetSizeTrait for safety (#1028) #1819

Merged
merged 2 commits into from
Jun 8, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions arrow/src/datatypes/native.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,36 @@ use super::DataType;
use half::f16;
use serde_json::{Number, Value};

mod private {
pub trait Sealed {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could implement the Seal trait in the private mod:

mod private {
    pub trait Sealed {}
    impl Sealed for i8 {}
    impl Sealed for i16 {}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer implementing it with the rest of the implementations for the type. It makes it clearer why it is implemented, and theoretically we may macroify it later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add some docstrings about why it is not allowed to ArrowNativeType

Using the text from the PR description "ArrowNativeType is used to indicate "trivially safely transmutable" within the buffer implementations. If client can create their own implementations they can do potentially unwise things."

would be a good starting point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in a64575e

}

/// Trait declaring any type that is serializable to JSON. This includes all primitive types (bool, i32, etc.).
pub trait JsonSerializable: 'static {
fn into_json_value(self) -> Option<Value>;
}

/// Trait expressing a Rust type that has the same in-memory representation
/// as Arrow. This includes `i16`, `f32`, but excludes `bool` (which in arrow is represented in bits).
///
/// In little endian machines, types that implement [`ArrowNativeType`] can be memcopied to arrow buffers
/// as is.
///
/// # Transmute Safety
///
/// A type T implementing this trait means that any arbitrary slice of bytes of length and
/// alignment `size_of::<T>()` can be safely interpreted as a value of that type without
/// being unsound, i.e. potentially resulting in undefined behaviour.
///
/// Note: in the case of floating point numbers this transmutation can result in a signalling
/// NaN, which, whilst sound, can be unwieldy. In general, whilst it is perfectly sound to
/// reinterpret bytes as different types using this trait, it is likely unwise
///
/// Note: `bool` is restricted to `0` or `1`, and so `bool: !ArrowNativeType`
///
/// # Sealed
///
/// Due to the above restrictions, this trait is sealed to prevent accidental misuse
pub trait ArrowNativeType:
std::fmt::Debug
+ Send
Expand All @@ -37,6 +58,7 @@ pub trait ArrowNativeType:
+ std::str::FromStr
+ Default
+ JsonSerializable
+ private::Sealed
{
/// Convert native type from usize.
#[inline]
Expand Down Expand Up @@ -109,6 +131,7 @@ impl JsonSerializable for i8 {
}
}

impl private::Sealed for i8 {}
impl ArrowNativeType for i8 {
#[inline]
fn from_usize(v: usize) -> Option<Self> {
Expand All @@ -132,6 +155,7 @@ impl JsonSerializable for i16 {
}
}

impl private::Sealed for i16 {}
impl ArrowNativeType for i16 {
#[inline]
fn from_usize(v: usize) -> Option<Self> {
Expand All @@ -155,6 +179,7 @@ impl JsonSerializable for i32 {
}
}

impl private::Sealed for i32 {}
impl ArrowNativeType for i32 {
#[inline]
fn from_usize(v: usize) -> Option<Self> {
Expand Down Expand Up @@ -184,6 +209,7 @@ impl JsonSerializable for i64 {
}
}

impl private::Sealed for i64 {}
impl ArrowNativeType for i64 {
#[inline]
fn from_usize(v: usize) -> Option<Self> {
Expand Down Expand Up @@ -217,6 +243,7 @@ impl JsonSerializable for i128 {
}
}

impl private::Sealed for i128 {}
impl ArrowNativeType for i128 {
#[inline]
fn from_usize(v: usize) -> Option<Self> {
Expand Down Expand Up @@ -246,6 +273,7 @@ impl JsonSerializable for u8 {
}
}

impl private::Sealed for u8 {}
impl ArrowNativeType for u8 {
#[inline]
fn from_usize(v: usize) -> Option<Self> {
Expand All @@ -269,6 +297,7 @@ impl JsonSerializable for u16 {
}
}

impl private::Sealed for u16 {}
impl ArrowNativeType for u16 {
#[inline]
fn from_usize(v: usize) -> Option<Self> {
Expand All @@ -292,6 +321,7 @@ impl JsonSerializable for u32 {
}
}

impl private::Sealed for u32 {}
impl ArrowNativeType for u32 {
#[inline]
fn from_usize(v: usize) -> Option<Self> {
Expand All @@ -315,6 +345,7 @@ impl JsonSerializable for u64 {
}
}

impl private::Sealed for u64 {}
impl ArrowNativeType for u64 {
#[inline]
fn from_usize(v: usize) -> Option<Self> {
Expand Down Expand Up @@ -351,8 +382,11 @@ impl JsonSerializable for f64 {
}

impl ArrowNativeType for f16 {}
impl private::Sealed for f16 {}
impl ArrowNativeType for f32 {}
impl private::Sealed for f32 {}
impl ArrowNativeType for f64 {}
impl private::Sealed for f64 {}

/// Allows conversion from supported Arrow types to a byte slice.
pub trait ToByteSlice {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ToByteSlice trait would also be a candiate for sealing as it also used for transmuting and should only be used on ArrowNativeTypes and slices of ArrowNativeType.

Copy link
Contributor Author

@tustvold tustvold Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ToByteSlice assumes transmutability as far as I can tell, although the blanket impl for ArrowNativeType is (which is fine)? Am I missing something?

pub trait ToByteSlice {
    /// Converts this instance into a byte slice
    fn to_byte_slice(&self) -> &[u8];
}

impl<T: ArrowNativeType> ToByteSlice for [T] {
    #[inline]
    fn to_byte_slice(&self) -> &[u8] {
        let raw_ptr = self.as_ptr() as *const T as *const u8;
        unsafe {
            std::slice::from_raw_parts(raw_ptr, self.len() * std::mem::size_of::<T>())
        }
    }
}

impl<T: ArrowNativeType> ToByteSlice for T {
    #[inline]
    fn to_byte_slice(&self) -> &[u8] {
        let raw_ptr = self as *const T as *const u8;
        unsafe { std::slice::from_raw_parts(raw_ptr, std::mem::size_of::<T>()) }
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I did not word that right. The question is rather whether we want to support other (external) types to be used for example in MutableBuffer:

pub fn push<T: ToByteSlice>(&mut self, item: T)

It's probably fine as it would only be unsound if the trait impl does something explicitly unsound.

Expand Down