Improve performance when converting python bytes/bytearray to `Vec<u8>` #2888

vxgmichel · 2023-01-18T16:41:33Z

Hi all, I noticed a performance issue when extracting a PyBytes or a PyByteArray object into a Vec<u8>.

This is an issue one can easily run into without realizing it. Here's a scenario, let's say we'd like to expose a simple checksum function:

#[pyfunction]
fn checksum(data: &[u8]) -> PyResult<u8> {
   let mut result = 0;
   for x in data {
       result ^= x;
   }
   Ok(result)
}

See how it performs against the equivalent python implementation, processing 1MB a hundred times:

2.65s call     test_checksum.py::test_perf[py-bytes]
0.00s call     test_checksum.py::test_perf[rs-bytes]

Looks really fast! However, it won't accept a bytearray as an argument:

TypeError: argument 'data': 'bytearray' object cannot be converted to 'PyBytes'

So we update our implementation to take a Vec<u8> instead:

#[pyfunction]
fn checksum(data: Vec<u8>) -> PyResult<u8> {
    let mut result = 0;
    for x in data {
        result ^= x;
    }
    Ok(result)
}

And now the results:

2.61s call     test_checksum.py::test_perf[py-bytearray]
2.55s call     test_checksum.py::test_perf[py-bytes]
1.92s call     test_checksum.py::test_perf[rs-bytearray]
1.87s call     test_checksum.py::test_perf[rs-bytes]

It performs roughly the same as python, which makes sense if we look at the FromPyObject implementation for Vec<T>:

pyo3/src/types/sequence.rs

Lines 314 to 318 in bed4f9d

    
           let mut v = Vec::with_capacity(seq.len().unwrap_or(0)); 
        
           for item in seq.iter()? { 
        
               v.push(item?.extract::<T>()?); 
        
           } 
        
           Ok(v)

The bytes/bytearray object is iterated and each item (i.e a python integer) is separately extracted into a u8.

This could be fixed by specializing the extract logic in the case of a Vec<u8> and use specific methods such as PyBytes::as_bytes().to_vec() and PyByteArray::to_vec(). Here's a possible patch:
https://gist.github.com/vxgmichel/367e01e8504cb9c9e700a22525e8b68d

With this patch applied, the performance is now similar to what we had with the &[u8] slice:

2.70s call     test_checksum.py::test_perf[py-bytearray]
2.65s call     test_checksum.py::test_perf[py-bytes]
0.00s call     test_checksum.py::test_perf[rs-bytes]
0.00s call     test_checksum.py::test_perf[rs-bytearray]

The text was updated successfully, but these errors were encountered:

adamreichold · 2023-01-18T17:41:07Z

I understand this might be unsatisfying, but I think without stable specialization, the preferred course of action is manual downcasting to the relevant types like PyBytes and PyByteArray. Using an enum with #[derive(FromPyObject)] can make limit the required boilerplate. Especially since downcasting to PyByteArray opens the unsafe route towards not copying at all.

The Option-based specialization seems to be a bit of wart for this special case. And if applied, I would probably suggest limiting it to Vec<u8> as anything beyond "bytes" goes right into buffer protocol and/or NumPy array territory.

davidhewitt · 2023-01-18T20:32:05Z

I agree that adding extract_sequence to FromPyObject as proposed in the gist is very unsatisfying.

That said, I've hit the Vec<u8> performance cliff myself in production (it can also cause OOM), which took me some time to debug, so I would be open to a discussion if there's something we can do to make this case less of a footgun.

vxgmichel · 2023-01-20T11:38:20Z

the preferred course of action is manual downcasting to the relevant types like PyBytes and PyByteArray. Using an enum with #[derive(FromPyObject)] can make limit the required boilerplate.

I'm not sure I understand this part correctly, is that a suggestion for pyo3 itself or the user code? At the moment the best solution I found for the user code is using a dedicated wrapper type that implements FromPyObject:

struct BytesWrapper(Vec<u8>);

impl<'source> FromPyObject<'source> for BytesWrapper {
    fn extract(object: &'source PyAny) -> PyResult<Self> {
        Ok(BytesWrapper(match object.extract::<&PyByteArray>() {
            Ok(x) => x.to_vec(),
            Err(_) => object.extract::<&PyBytes>()?.as_bytes().to_vec(),
        }))
    }
}

impl From<BytesWrapper> for Vec<u8> {
    fn from(data: BytesWrapper) -> Self {
        data.0
    }
}

Then it can be used as such:

#[pyfunction]
fn checksum_fixed(data: BytesWrapper) -> PyResult<u8> {
    let data: Vec<u8> = data.into();
    let mut result = 0;
    for x in data {
        result ^= x;
    }
    Ok(result)
}

It's not entirely satisfactory though because the unwrapping can be quite annoying in some cases, say if the argument is something like Option<Vec<Vec<u8>>>. In this case, the type conversion would look like this:

#[pyfunction]
fn some_function(data: Option<Vec<BytesWrapper>>) -> PyResult<u8> {
    let data: Option<Vec<Vec<u8>>> = data.map(|x| x.into_iter().map(|y| y.into()).collect());
    ...
}

Any suggestion on how to address this issue?

adamreichold · 2023-01-20T12:20:25Z

I'm not sure I understand this part correctly, is that a suggestion for pyo3 itself or the user code?

This was a suggestion for user code, but less about wrapping the result, but more about manually writing the "extract chains", e.g.

#[derive(FromPyObject)]
enum BytesWrapper<'py> {
  Bytes(&'py PyBytes),
  ByteArray(&'py PyByteArray),
}

impl From<BytesWrapper<'_>> for Vec<u8> {
  fn from(wrapper: BytesWrapper) -> Self {
    match wrapper {
      BytesWrapper::Bytes(bytes) => bytes.as_bytes().to_vec(),
      BytesWrapper::ByteArray(byte_array) => byte_array.to_vec(),
    }
  }
}

but of course this does not help with getting rid of the wrapper type. It might enable zero-copy usage though, e.g.

impl<'py> From<BytesWrapper<'py>> for Cow<'py, [u8]> {
  fn from(wrapper: BytesWrapper<'py>) -> Self {
    match wrapper {
      BytesWrapper::Bytes(bytes) => Cow::Borrowed(bytes.as_bytes()),
      BytesWrapper::ByteArray(byte_array) => Cow::Owned(byte_array.to_vec()),
    }
  }
}

In the approach you outlined above, I think implementing Deref and DerefMut might help. It would not remove the wrapper, maybe you could use it transparently in most places? (Beyond that, your wrapper could use the exact same representation as Vec<u8> via #[repr(transparent)] so complex types built on top could be unsafely transmuted, but I would not recommend that.)

As written above, the proper solution here would be specialization so that we could provide specialized impls for impl FromPyObject<'a> for Vec<u8> which considers PyBytes and PyByteArray.

That said, I've hit the Vec performance cliff myself in production (it can also cause OOM), which took me some time to debug, so I would be open to a discussion if there's something we can do to make this case less of a footgun.

For now, I see two hacks which would be palatable to me personally:

impl FromPyObject<'a> for Cow<'a, [u8]> which explicitly handles PyBytes and PyByteArray. I think this would work as we have no more general impl for Cow<'a, T> with which it could conflict ATM.
Use TypeId::of to check TypeId::of::<T>() == TypeId::of::<u8>() in the generic impl FromPyObject<'a> for Vec<T> and if that condition holds, handle PyBytes and PyByteArray and unsafely transmute the resulting Vec<u8> into Vec<T> (which are the same in that case). Ideally, the compiler optimize away this poor man's specialization as both TypeId are known at compile time.

adamreichold · 2023-01-20T12:21:40Z

Use TypeId::of to check TypeId::of::() == TypeId::of::() in the generic impl FromPyObject<'a> for Vec and if that condition holds, handle PyBytes and PyByteArray and unsafely transmute the resulting Vec into Vec (which are the same in that case). Ideally, the compiler optimize away this poor man's specialization as both TypeId are known at compile time.

Ah, this one of course does not work as the T: 'static bound is not fulfilled. Sorry for the noise...

vxgmichel · 2023-01-20T12:41:36Z

This was a suggestion for user code, but less about wrapping the result, but more about manually writing the "extract chains", e.g. [...]

Looks nice, I'll do that :)

In the approach you outlined above, I think implementing Deref and DerefMut might help. It would not remove the wrapper, maybe you could use it transparently in most places? (Beyond that, your wrapper could use the exact same representation as Vec via #[repr(transparent)] so complex types built on top could be unsafely transmuted, but I would not recommend that.)

I think I'll implement an UnwrapBytesWrapper trait as such:

pub trait UnwrapBytesWrapper {
    type ResultType;
    fn unwrap_bytes(self) -> Self::ResultType;
}

impl UnwrapBytesWrapper for BytesWrapper<'_> {
    type ResultType = Vec<u8>;
    fn unwrap_bytes(self) -> Self::ResultType {
        self.into()
    }
}

impl<T> UnwrapBytesWrapper for Option<T>
where
    T: UnwrapBytesWrapper,
{
    type ResultType = Option<T::ResultType>;
    fn unwrap_bytes(self) -> Self::ResultType {
        self.map(|x| x.unwrap_bytes())
    }
}

impl<T> UnwrapBytesWrapper for Vec<T>
where
    T: UnwrapBytesWrapper,
{
    type ResultType = Vec<T::ResultType>;
    fn unwrap_bytes(self) -> Self::ResultType {
        self.into_iter().map(|x| x.unwrap_bytes()).collect()
    }
}

This way I can use the same conversion code everywhere:

#[pyfunction]
fn some_function(data: Option<Vec<BytesWrapper>>) -> PyResult<u8> {
    let data = data.unwrap_bytes();
    ...
}

As written above, the proper solution here would be specialization so that we could provide specialized impls for impl FromPyObject<'a> for Vec which considers PyBytes and PyByteArray.

I'm looking forward to that, thanks for your help :)

See PyO3/pyo3#2888

2899: RFC: Provide a special purpose FromPyObject impl for byte slices r=davidhewitt a=adamreichold This enables efficiently and safely getting a byte slice from either bytes or byte arrays. The main issue I see here is discoverability, i.e. should this be mention in the docs of `PyBytes` and `PyByteArray` or in the guide? It is also not completely clear whether this really _fixes_ the issue. Closes #2888 Co-authored-by: Adam Reichold <[email protected]>

vxgmichel · 2023-02-27T11:24:38Z

Hi all, I think the bot went a bit too quick on closing this issue.

PR #2899 is an interesting feature to PyO3 but I don't think it addresses the specific performance issue I reported. In my opinion, either bytes -> Vec<u8> is supported or it's not.

If it is, the python iteration over the individual bytes should really be avoided as it makes programs much slower than their pure python alternative, which defeats an important purpose of PyO3.

If it is not, it should not appear in the docs like it does today:

pyo3/guide/src/conversions/tables.md

Line 17 in 0a48859

| `bytes` | `Vec<u8>`, `&[u8]`, `Cow<[u8]>` | `&PyBytes` |

I also quickly read through the comments in #2899 about banning Vec<u8> using a linter. I'm not an experienced rust developer but isn't it more convoluted than the fix I suggested in my original post, for specializing the specific bytes -> Vec<u8> conversion using as_bytes().to_vec()? I understand it does not conform to your quality standards, but it does achieve its goals as far as I could tell. Maybe someone more experienced could get a satisfying implementation of this specialization?

davidhewitt · 2023-02-28T19:29:51Z

I am happy to reopen this, as I agree the current situation around Vec<u8> is a performance footgun, both as an input and output argument to #[pyfunction].

Agreed that either we need to make this clear in the documentation (presumably recommending the new Cow implementation) or change the design.

vxgmichel added the enhancement label Jan 18, 2023

vxgmichel mentioned this issue Jan 18, 2023

Do not use Vec<u8> as argument type in the binding Scille/parsec-cloud#3925

Closed

adamreichold mentioned this issue Jan 20, 2023

RFC: Provide a special purpose FromPyObject impl for byte slices #2899

Merged

vxgmichel added a commit to Scille/parsec-cloud that referenced this issue Jan 20, 2023

Do not use Vec<u8> as argument type in the binding

4f02a4b

See PyO3/pyo3#2888

vxgmichel added a commit to Scille/parsec-cloud that referenced this issue Jan 20, 2023

Do not use Vec<u8> as argument type in the binding

a462810

See PyO3/pyo3#2888

vxgmichel added a commit to Scille/parsec-cloud that referenced this issue Jan 26, 2023

Do not use Vec<u8> as argument type in the binding

d7b5587

See PyO3/pyo3#2888

bors bot closed this as completed in bbaceb8 Feb 22, 2023

davidhewitt reopened this Feb 28, 2023

davidhewitt mentioned this issue Mar 10, 2024

docs: major rewrite for Bound API #3906

Merged

davidhewitt mentioned this issue May 24, 2024

[RFC] standardize PyBytes <-> Vec<u8> or &[u8] or Cow<[u8]> #4182

Closed

davidhewitt mentioned this issue Oct 11, 2024

add second lifetime to FromPyObject #4390

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance when converting python bytes/bytearray to `Vec<u8>` #2888

Improve performance when converting python bytes/bytearray to `Vec<u8>` #2888

vxgmichel commented Jan 18, 2023

adamreichold commented Jan 18, 2023

davidhewitt commented Jan 18, 2023

vxgmichel commented Jan 20, 2023

adamreichold commented Jan 20, 2023 •

edited

Loading

adamreichold commented Jan 20, 2023

vxgmichel commented Jan 20, 2023 •

edited

Loading

vxgmichel commented Feb 27, 2023 •

edited

Loading

davidhewitt commented Feb 28, 2023

Improve performance when converting python bytes/bytearray to Vec<u8> #2888

Improve performance when converting python bytes/bytearray to Vec<u8> #2888

Comments

vxgmichel commented Jan 18, 2023

adamreichold commented Jan 18, 2023

davidhewitt commented Jan 18, 2023

vxgmichel commented Jan 20, 2023

adamreichold commented Jan 20, 2023 • edited Loading

adamreichold commented Jan 20, 2023

vxgmichel commented Jan 20, 2023 • edited Loading

vxgmichel commented Feb 27, 2023 • edited Loading

davidhewitt commented Feb 28, 2023

Improve performance when converting python bytes/bytearray to `Vec<u8>` #2888

Improve performance when converting python bytes/bytearray to `Vec<u8>` #2888

adamreichold commented Jan 20, 2023 •

edited

Loading

vxgmichel commented Jan 20, 2023 •

edited

Loading

vxgmichel commented Feb 27, 2023 •

edited

Loading