Support for memoryview and PickleBuffer #99

crusaderky · 2023-04-26T18:36:11Z

As of 2.7.0rc1, cramjam seems to be incompatible with memoryview and PickleBuffer objects.
This is a blocker to the adoption in dask/distributed.

>>> cramjam.bzip2.compress(memoryview(b"123"))
TypeError: argument 'data': failed to extract enum BytesType ('bytes | bytearray | File | Buffer | numpy')
- variant Bytes (bytes): TypeError: failed to extract field BytesType::Bytes.0, caused by TypeError: 'memoryview' object cannot be converted to 'PyBytes'
- variant ByteArray (bytearray): TypeError: failed to extract field BytesType::ByteArray.0, caused by TypeError: 'memoryview' object cannot be converted to 'PyByteArray'
- variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'memoryview' object cannot be converted to 'File'
- variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'memoryview' object cannot be converted to 'Buffer'
- variant NumpyArray (numpy): TypeError: failed to extract field BytesType::NumpyArray.0, caused by TypeError: 'memoryview' object cannot be converted to 'PyArray<T, D>'

>>> import pickle, numpy
>>> a = numpy.ones(10)
>>> buffers = []
>>> pickle.dumps(a, buffer_callback=buffers.append)
>>> buffers
[<pickle.PickleBuffer at 0x7ff1f5b28540>]
>>> cramjam.bzip2.compress(buffers[0])
TypeError: argument 'data': failed to extract enum BytesType ('bytes | bytearray | File | Buffer | numpy')
- variant Bytes (bytes): TypeError: failed to extract field BytesType::Bytes.0, caused by TypeError: 'PickleBuffer' object cannot be converted to 'PyBytes'
- variant ByteArray (bytearray): TypeError: failed to extract field BytesType::ByteArray.0, caused by TypeError: 'PickleBuffer' object cannot be converted to 'PyByteArray'
- variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'PickleBuffer' object cannot be converted to 'File'
- variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'PickleBuffer' object cannot be converted to 'Buffer'
- variant NumpyArray (numpy): TypeError: failed to extract field BytesType::NumpyArray.0, caused by TypeError: 'PickleBuffer' object cannot be converted to 'PyArray<T, D>'

milesgranger · 2023-04-26T18:54:15Z

Can leverage the buffer protocol to make zero copy into numpy and then pass that; would this be suitable? (same goes for PickleBuffer)

Prefer to steer clear of implementing support for an arbitrary amount of types, especially if those types already implement the buffer protocol.

>>> import numpy as np
>>> data = memoryview(b'data')
>>> buf = np.frombuffer(data, dtype=np.uint8)
>>> cramjam.lz4.compress(buf)
cramjam.Buffer(len=23)

milesgranger · 2023-04-26T19:11:18Z

In retrospect, can likely add a generic PyObject variant and at runtime poke at it to see if it implements the buffer protocol and get the buffer from there. Suspect it may be slightly slower, but would be a good general solution for the random objects implementing it.

crusaderky · 2023-04-26T20:03:55Z

Suspect it may be slightly slower

I seriously doubt the slowdown will be noticeable as long as the compression itself deals with 10kiB+

crusaderky · 2023-04-26T20:05:15Z

Prefer to steer clear of implementing support for an arbitrary amount of types

I agree that the individual types should not be explicitly implemented (numpy shouldn't either!)

milesgranger · 2023-04-27T01:36:10Z

Will the suggested numpy.frombuffer work for you for now?

milesgranger · 2023-04-27T04:50:56Z

Was interested and prototype'd it this morning:

In [1]: import cramjam

In [2]: data = memoryview(b'data')

In [3]: cramjam.lz4.compress(data)
Out[3]: cramjam.Buffer(len=23)

In [4]: import array

In [5]: out = array.array('B', list(range(23)))

In [6]: out
Out[6]: array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22])

In [7]: cramjam.lz4.compress_into(data, out)
Out[7]: 23

In [8]: out
Out[8]: array('B', [4, 34, 77, 24, 68, 64, 94, 4, 0, 0, 128, 100, 97, 116, 97, 0, 0, 0, 0, 53, 138, 34, 33])

Guess a proper implementation would take a day or two. May take some time for me to get around to in my free time; hopefully the np.frombuffer is sufficient for you in the meantime.

martindurant · 2023-04-27T12:52:07Z

Like this?

use pyo3::buffer::PyBuffer;

        // value: &PyAny
        let buf: PyBuffer<u8> = value.extract().unwrap(); // expose possible error
        let rustbuf: &[u8] = unsafe {
            slice::from_raw_parts(buf.buf_ptr() as *const u8, buf.len_bytes())
        };

milesgranger · 2023-04-27T13:04:13Z

Roughly, basically how the prototype goes. Probably would need to wrap it similar to PyBytes/PyByteArray since all variants in BytesType have Read/Write implemented and would need to maintain some positional state value between the read/write call(s). PyBuffer has the handy checks for thing slike readonly but as_slice gives &[ReadOnlyCell<T>] which is less ergonomic than getting the buffer directly like you did here.

Think whatever wrapper it is, BytesType(PythonBuffer), for example, PythonBuffer ought to have TryFrom<PyBuffer<u8>> which checks for c_contiguous, dimensions == 1, readonly is appropriate for the context (ie, needs writable for _into calls) and maybe some other invariants. If all that checks out, use the underlying buffer directly from there.

crusaderky · 2023-05-02T08:06:58Z

Will the suggested numpy.frombuffer work for you for now?

Yes, the hack works. Didn't performance-test it but it should be negligible.

milesgranger · 2023-05-02T08:45:33Z

Can try out 2.7.0rc2 for the buffer updates done in #100

crusaderky · 2023-05-02T11:22:58Z

It doesn't seem to work

>>> cramjam.__version__
'2.7.0-rc2'
>>> cramjam.lz4.compress_block(bytearray(b"123"))
cramjam.Buffer<len=8>
>>> cramjam.lz4.compress_block(memoryview(b"123"))
cramjam.Buffer<len=8>
>>> cramjam.lz4.compress_block(numpy.ones(10))
TypeError: argument 'data': failed to extract enum BytesType ('Buffer | File | pybuffer')
- variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'ndarray' object cannot be converted to 'Buffer'
- variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'ndarray' object cannot be converted to 'File'
- variant PyBuffer (pybuffer): TypeError: failed to extract field BytesType::PyBuffer.0, caused by BufferError: buffer contents are not compatible with u8
>>> cramjam.lz4.compress_block(memoryview(numpy.ones(10)))
TypeError: argument 'data': failed to extract enum BytesType ('Buffer | File | pybuffer')
- variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'memoryview' object cannot be converted to 'Buffer'
- variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'memoryview' object cannot be converted to 'File'
- variant PyBuffer (pybuffer): TypeError: failed to extract field BytesType::PyBuffer.0, caused by BufferError: buffer contents are not compatible with u8

milesgranger · 2023-05-02T11:28:45Z

Needs to be bytes

>>> np.ones(10).dtype
dtype('float64')
>>> cramjam.lz4.compress_block(np.ones(10, dtype=np.uint8))  # or np.ones(10).tobytes()
cramjam.Buffer<len=15>

milesgranger · 2023-05-02T11:39:32Z

Or better yet np.ones(10).view(np.uint8) for no copies.

crusaderky · 2023-05-02T11:40:58Z

Or

m = memoryview(a).cast("B")
cramjam.lz4.compress_block(m)

All other compression libraries don't have this caveat and are happy to ingest any PickleBuffer; could you fix it? (no rush)

milesgranger · 2023-05-02T11:43:34Z

All other compression libraries don't have this caveat and are happy to ingest any PickleBuffer; could you fix it?

I agree, this is not great.

I'd like to get the bytes view of buffers directly which can done, but the current implementation of PyBuffer<T> requires a type, in this case u8, and we don't actually need that as we can always just get the bytes view. So would 'only' need a refactor to use the cffi more directly to avoid using PyBuffer<T> at all.

milesgranger · 2023-05-07T18:18:38Z

@crusaderky v2.7.0rc3 ought to work for you.

crusaderky · 2023-05-09T11:25:52Z

Yep works great! 👍

milesgranger mentioned this issue Apr 29, 2023

Accept anything with Buffer Protocol, remove numpy #100

Merged

milesgranger closed this as completed in #100 Apr 29, 2023

milesgranger mentioned this issue May 2, 2023

Accept buffers with types other than u8 #102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for memoryview and PickleBuffer #99

Support for memoryview and PickleBuffer #99

crusaderky commented Apr 26, 2023

milesgranger commented Apr 26, 2023

milesgranger commented Apr 26, 2023

crusaderky commented Apr 26, 2023

crusaderky commented Apr 26, 2023

milesgranger commented Apr 27, 2023

milesgranger commented Apr 27, 2023

martindurant commented Apr 27, 2023

milesgranger commented Apr 27, 2023 •

edited

Loading

crusaderky commented May 2, 2023

milesgranger commented May 2, 2023

crusaderky commented May 2, 2023

milesgranger commented May 2, 2023 •

edited

Loading

milesgranger commented May 2, 2023

crusaderky commented May 2, 2023

milesgranger commented May 2, 2023 •

edited

Loading

milesgranger commented May 7, 2023

crusaderky commented May 9, 2023

Support for memoryview and PickleBuffer #99

Support for memoryview and PickleBuffer #99

Comments

crusaderky commented Apr 26, 2023

milesgranger commented Apr 26, 2023

milesgranger commented Apr 26, 2023

crusaderky commented Apr 26, 2023

crusaderky commented Apr 26, 2023

milesgranger commented Apr 27, 2023

milesgranger commented Apr 27, 2023

martindurant commented Apr 27, 2023

milesgranger commented Apr 27, 2023 • edited Loading

crusaderky commented May 2, 2023

milesgranger commented May 2, 2023

crusaderky commented May 2, 2023

milesgranger commented May 2, 2023 • edited Loading

milesgranger commented May 2, 2023

crusaderky commented May 2, 2023

milesgranger commented May 2, 2023 • edited Loading

milesgranger commented May 7, 2023

crusaderky commented May 9, 2023

milesgranger commented Apr 27, 2023 •

edited

Loading

milesgranger commented May 2, 2023 •

edited

Loading

milesgranger commented May 2, 2023 •

edited

Loading