(feat): minimum working codec pipeline #19

LDeakin · 2024-10-23T22:55:35Z

Add internal get_chunk_representation, retrieve_chunk_bytes, and store_chunk_bytes
Separate retrieve_chunk and retrieve_chunk_subset
Separate store_chunk and store_chunk_subset
Add assertions to simple.py

- Add internal `get_chunk_representation`, `retrieve_chunk_bytes`, and `store_chunk_bytes` - Separate `retrieve_chunk` and `retrieve_chunk_subset` - Separate `store_chunk` and `store_chunk_subset` - Add assertions to simple.py

- Add config options - CodecPipelineImpl interior mutability

ilan-gold

two quick comments, would like to hear from @flying-sheep. I am updating https://github.com/ilan-gold/zarrs-python/pull/20/files because I realized that keeping zarr-python's tests in sync with the release we use is probably more trouble than its worth. So I copied over their codec tests only, which looked relatively comprehensive. I think getting those working is a good call.

Also in terms of MVP, I was thinking of dropping in-memory support. At least for now in the testing PR I am doing that to keep things simple.

src/lib.rs

ilan-gold · 2024-10-30T10:55:03Z

src/lib.rs

+        )
+    }
+
+    fn selection_to_array_subset(selection: &PyTuple, shape: &[u64]) -> PyResult<ArraySubset> {


I would tend towards doing this in python but would be curious to see what @flying-sheep would say so let's put a pin in this for now.

For example, I would tend towards initially converting everything to either a ndarray of ints, slices, or ints on the python side to minimize the rust pyo3 business. I think it's a bit more verbose than doing this stuff in python directly.

Yea, sounds reasonable. It would probably be necessary to separate the unstrided slices/ellipsis selections from the others though. The latter probably needs a new specialised retrieve_chunks method that lets numpy do the indexing on the output.

I've moved some of this logic out in 33037bf

ilan-gold · 2024-10-30T13:30:46Z

src/lib.rs

+            )
+            .collect::<PyResult<Vec<_>>>()?;
+
+        py.allow_threads(move || {


On my laptop, performance seems quite similar without this py.allow_threads. Could you see what you get on the benchmark without this? It would line up more with my mental map from https://pyo3.rs/main/parallelism if this actually wasn't necessary i.e., rayon just "works" from within pyo3

I still don't fully get why we need asyncio.to_thread either TBH but that one does seem necessary.

Here are benchmarks with the allow_threads commented out:
LDeakin/zarr_benchmarks@zarrs_python...zarrs_python_wo_allow_threads

It is definitely needed

Interesting, I don't think I have the time now to create a repro for this, but I'd be curious what the pyo3 people have to say. I don't really get why both of these are needed. I would think that we should be able to get the same performance without using either, but I must be missing something in the way I conceive of things

Concurrent calls to retrieve_chunks would spend more time waiting for the GIL without the allow_threads. benchmark_read_chunks.svg shows that if multiple chunks are requested concurrently, the performance is relatively flat without allow_theads.

The asyncio.to_thread lets read immediately return a future that can be awaited, without even kicking off any real work. Otherwise, read would block until the data has been retrieved.

I get conceptually why you need to release the GIL, I suppose, but what I don't get is why we have to do it here when the PyO3 docs specifically do not.

Here's their "rust_parallel" example:
https://github.com/PyO3/pyo3/blob/main/examples/word-count/src/lib.rs#L5-L11
And this is timed from python (since it is a pyfunction):
https://github.com/PyO3/pyo3/blob/9f955e4ebf5f38d1f9de4837951588ef6c850622/examples/word-count/tests/test_word_count.py#L35-L37

This doesn't require allow_threads (which I agree, to me, doesn't make a crazy amount of sense) but apparently it's fastest. Maybe it has to do something with the asyncio.to_thread we're using, but I would think that also has no impact because the underlying function of to_thread is not actually asynchronous.... Also, to_thread releases the GIL, at least for I/O tasks, no? So I would think that maybe that propagates to rust?

This doesn't require allow_threads (which I agree, to me, doesn't make a crazy amount of sense) but apparently it's fastest

The rust_parallel example uses all cores via the Rayon thread pool. Whereas the others are all sequential on the Rust side just for the purposes of the example.

In practice, Rust methods are being called concurrently like in run_rust_sequential_twice by zarr-python, which has its own thread pool. The Rust thread pool is independent.

The rust_parallel example uses all cores via the Rayon thread pool.

But aren't we doing that as well? I guess that's my confusion. Why does theirs work without py.allow_threads and ours does not?

They only call the parallel search function from one Python thread in the benchmarks, whereas our array methods can get called by multiple Python threads concurrently, like in the test_word_count_rust_sequential_twice_with_threads benchmark.

Got it! Makes sense!

* (fix): ci for running tests * (fix): no need to extract tests * (Fix): remove duplicate name * (fix): use a submodule... * (chore): remove memory store + port zarr codec tests * (chore): remove `dlpark` * (fix): getattr for itemsize * (chore): remove runtime F ordering * (chore): skip vlen * (feat): parse int --------- Co-authored-by: Lachlan Deakin <[email protected]>

LDeakin · 2024-10-30T21:03:13Z

python/zarrs_python/__init__.py

@@ -100,6 +100,8 @@ async def read(
        # )

        out = out.as_ndarray_like()  # FIXME: Error if array is not in host memory
+        if not out.dtype.isnative:


Can out can be changed in such situations? Or does that just create a new array ignored by the caller

That's a good point. as_ndarray_like returns the underlying data. So, I think you could change it as long as you do a 0-copy numpy (or similar) operation.

NDBuffer is just a wrapper class interface around array libraries. I am not sure what the design decision behind it over dlpack (which I had here before) was https://github.com/dmlc/dlpack, but undoing it is probably not going to happen now.

0-copy numpy (or similar) operation.

And it would have to be in-place on some level, as well

@LDeakin I think the issue here, looking at the zarr code, would not be endianness of the out array (which, both from how I think of it and the fact that no tests are failing, makes some sense to me). Rather I'd be worried about the incoming data from disk. But I think that your underlying implementation would handle that since it parses the codec metadata, no? i.e., you have a codec (or equivalent) in zarrs to swap bytes if need be as per the metadata.

So I think this is a reasonable check, but I am not worried. I don't see any handling in the zarr-python codebase about this.

Yep, there is a codec that takes of swapping bytes from disk if needed, but zarrs will always decode to native endian.
Looks like zarr-python does not have this entirely sorted anyway: zarr-developers/zarr-python#2324

LDeakin · 2024-11-04T21:23:20Z

Shall we rebase merge this on ld/codec_pipeline and then review an actual PR on main?

ilan-gold · 2024-11-05T09:13:48Z

@LDeakin Yup, sound good to me.

(fix): minimum working codec pipeline

4c87da3

- Add internal `get_chunk_representation`, `retrieve_chunk_bytes`, and `store_chunk_bytes` - Separate `retrieve_chunk` and `retrieve_chunk_subset` - Separate `store_chunk` and `store_chunk_subset` - Add assertions to simple.py

LDeakin requested a review from ilan-gold October 23, 2024 23:50

LDeakin added 8 commits October 24, 2024 10:53

(fix): handle relative filesystem paths

50d7ac3

(fix): minimal error handling and fix clippy warnings

0b7536f

(fix): handle relative filesystem paths take 2

20acc9c

(fix): constant handling

a2b4a3c

(fix): add retrieve_chunks for parallel read

cb0fdb1

- Add config options - CodecPipelineImpl interior mutability

(fix): add store_chunks for parallel write

a670b15

(fix) convert write value to contiguous array if needed

487c6de

(fix): bump zarrs to 21e86cb9

a2a3c2a

ilan-gold reviewed Oct 30, 2024

View reviewed changes

ilan-gold and others added 2 commits October 31, 2024 07:27

(fix): support writing arrays with non-native endianness

a33607e

LDeakin commented Oct 30, 2024

View reviewed changes

LDeakin added 2 commits October 31, 2024 08:33

(fix): disable bad/unsupported invalid metadata tests

32ccf69

(fix): do not store empty chunks

1e9c54b

ilan-gold changed the title ~~(fix): minimum working codec pipeline~~ (feat): minimum working codec pipeline Oct 31, 2024

LDeakin added 2 commits November 5, 2024 07:18

(fix) remove dead code in codec pipeline

3a4bb91

(fix): move some selection logic from Rust to Python

33037bf

LDeakin merged commit bd71529 into ld/codec_pipeline Nov 5, 2024
1 check passed

LDeakin deleted the ld/codec_pipeline_mvp branch November 5, 2024 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat): minimum working codec pipeline #19

(feat): minimum working codec pipeline #19

LDeakin commented Oct 23, 2024

ilan-gold left a comment

ilan-gold Oct 30, 2024

LDeakin Oct 30, 2024 •

edited

Loading

LDeakin Nov 4, 2024

ilan-gold Oct 30, 2024 •

edited

Loading

LDeakin Oct 31, 2024

ilan-gold Oct 31, 2024

LDeakin Nov 4, 2024

ilan-gold Nov 5, 2024 •

edited

Loading

LDeakin Nov 5, 2024

ilan-gold Nov 5, 2024

LDeakin Nov 5, 2024

ilan-gold Nov 5, 2024

LDeakin Oct 30, 2024

ilan-gold Oct 31, 2024

ilan-gold Oct 31, 2024

ilan-gold Oct 31, 2024

ilan-gold Nov 5, 2024 •

edited

Loading

LDeakin Nov 5, 2024

LDeakin commented Nov 4, 2024

ilan-gold commented Nov 5, 2024

(feat): minimum working codec pipeline #19

(feat): minimum working codec pipeline #19

Conversation

LDeakin commented Oct 23, 2024

ilan-gold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LDeakin Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilan-gold Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilan-gold Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilan-gold Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LDeakin commented Nov 4, 2024

ilan-gold commented Nov 5, 2024

LDeakin Oct 30, 2024 •

edited

Loading

ilan-gold Oct 30, 2024 •

edited

Loading

ilan-gold Nov 5, 2024 •

edited

Loading

ilan-gold Nov 5, 2024 •

edited

Loading