-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How widely is column-major supported? #41
Comments
Good question @tbenst! The Zarr.jl thread is interesting. I think it makes sense for Julia applications to write F order arrays, since this will give the best performance. However, I share your concern that such arrays may not be interoperable with other zarr implementations. One great thing about Zarr is its native javascript support, which opens up all kinds of cool usage patterns on the web. It looks like zarr.js does not support F order
http://guido.io/zarr.js/#/getting-started/remote-data?id=ordering Same for zarr-js
https://github.com/freeman-lab/zarr-js#zarr-js 😞 I wonder how much effort would be required to enable F order for these implementations. |
@gzuidhof @manzt @freeman-lab @jhamman : any thoughts? |
Likewise with Z5:
https://github.com/constantinpape/z5#current-limitations--todos Someone could easily go through the other implementations listed here and check: https://github.com/zarr-developers/zarr_implementations |
and/or add a |
cc @gzuidhof |
This is correct. I chatted briefly about this with @freeman-lab today. Our general consensus is that it should be possible to implement F-order array support in zarr-js but we would need to do some extra diligence to confirm things work with the compression libs and whatnot. |
I reckon that it's possible to add support for column ordered arrays in zarr.js by adding some sort of transpose (or a transposed view), a similar amount of effort as zarr-js I imagine. A PR is welcome for it of course, but I don't think anyone has actually needed that feature so far. Usually one has at least some control over the dataset that should be served to the browser, so writing your data as a |
I'd imagine this is more a result of there being far fewer datasets currently that are |
Just agreeing with what's already been stated here. Certainly possible to add to current implementations, but fewer It's worth mentioning that import { get_array } from "zarrita/v2"; // version 2 protocol
import FetchStore from "zarrita/storage/fetch";
import { get } from "zarrita/ndarray";
let store = new FetchStore("http://localhost:8080/data.F.zarr");
let f = await get_array(store).then(get); // returns F-strided array
let c = await get_array(store).then(arr => get(arr, null, { order: "C" })); // force C ordering I'd spoken to @jhamman & @freeman-lab previously about this implementation but have been too busy with grad school things to advertise, etc. Perhaps it would be a good time soon to get together and share updates.. |
Thanks a lot for all the replies. In no way did we intend to push people into implementing F-ordering, we just wanted to know how reasonable it would be to make F-order the default for the Julia implementation. I think most people will use the default and it might cause some confusion if the exported arrays can not be accessed with other libraries. So I think this leaves us with the following options and it would be good to find some consensus for column-major languages (R, Matlab, Octave, Fortran people here?):
Personally I still prefer option 1) but I am happy to be convinced by a majority saying that this behavior is not according to spec. |
Based on the discussion at yesterday's Zarr dev meeting with @meggart, I now realize that my earlier comment:
is completely wrong. There is no performance benefit for column-major languages to use F-order. Fortran has been writing C-order netCDF / HDF5 arrays for decades without any performance penalty. This is purely about the language-specific conventions regarding what order dimensions should appear in. Given, that I'm inclined to support the proposal in zarr-developers/zarr-specs#126. |
I like also the current behavior of While numpy does support both ordering schemes, I am wondering how well are arrays with F ordering supported in the python (extension) ecosystem. Should a C extension typically provide an implementation for both cases (as the loop ordering is different)? (Requiring the support of C and F layout pushes a lot of complexity into upstream applications. I would be fine with settling on the C layout. And I say that as somebody having used almost exclusively Fortran-layout languages, Fortran, matlab, octave, and now julia :-). But I realize that we have to deal with as it is standardized in the Zarr v2 format. ) |
We’ve been having a discussion over at
Zarr.jl
as to whether the package should default to writingorder=“F”
aka column-major data instead oforder=“C”
aka row-major. We have the impression that most Zarr datasets out in the wild are saved in the row-major order, as that is the default choice of the popular Zarr-Python package. Currently, Zarr.jl writes to row-major, permuting the dimensions on read & write.Since Julia is column-major, it would seem natural to write in this order by default, as the desired memory ordering can be maintained without permuting dimensions. However, there is some concern about how widely Zarr implementations support column-major data.
I know that zarr-python supports reading row-major or column-major. Curious if folks could advise about other major implementations? Would writing to column major by default reduce interoperability in practice?
The text was updated successfully, but these errors were encountered: