-
-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add N5 Support #309
Add N5 Support #309
Conversation
This requires numcodecs GZip support to be compatible with the N5 standard. Added with 410af66be0ea470d77b923e516d63cf5238114db to numcodecs.
As we are only using the logger to issue warnings, just issue warnings instead and drop the logger. Picked `RuntimeWarning`s for these two warnings as they are in regard to "dubious runtime behavior". Namely using compressors that are not explicitly supported by many N5 implementations to write N5 files.
Make sure that `N5Store` is raising `RuntimeWarnings` for some compressors that are not widely supported.
As N5 stores attributes and metadata in the same JSON object, there are some keys that are simply not allowed to be attributes as they would conflict with the metadata (and possibly corrupt the data if they were written in). The `N5Store` correctly raises for these cases, but this was not being tested previously. Here we add a test to cover this case.
Check that valid `fill_values` work with N5Store-backed arrays and that invalid `fill_values` raise.
We are explicitly setting the `blocksize` to `0` internally. So this assertion is not a user-facing error. It would also be difficult to test from the API. As we set the `blocksize` to `0` in a different function, it does make sense to `assert` this here.
Alright have updated all user-facing exceptions to be something other than asserts. Mostly these are Also played around with some sample data that @constantinpape and @hanslovsky have compiled in the zarr_implementations repo. This worked pretty well, but had a few issues around LZ4 data from N5 (have dropped LZ4 support from this PR). Otherwise was able to load all other N5 files generated by N5 or z5py. More details about how this was tested and the results found in this comment. In my view, this code is ready to be merged. The LZ4 issues should be explored, but they can be placed in a separate issue. Would appreciate it if some other people did some final reviews of the PR. Note: Previously this mentioned there were issues with XZ. Some recent commits have been pushed to support XZ as a special case of Numcodecs' LMZA, which works well. Also LZ4 support has been disabled in N5Store for now. Raised issue ( zarr-developers/numcodecs#175 ) about the LZ4 differences. |
Numcodecs' LZMA compressor is able to handle XZ compressed data as a special case. This maps XZ compression in N5 to Zarr's LZMA compression with the proper arguments as well as maps Zarr's LZMA support back to N5's XZ support if the right options are set.
Adds some tests for XZ support in N5 using Numcodecs' LZMA compressor with specific options. Also tests LZMA options that are not currently supported by N5 to ensure they still warn and are handled correctly.
There is not a clear mapping between LZ4 in the Zarr and N5 implementations at present. So this drops LZ4 from the supported codecs in `N5Store` for now.
As far as I understand it this is not fully compatible with the n5 format yet, |
@constantinpape your're right that this only implements a subset of n5, i.e. it does not store arbitrary block-sizes in any case and can not deal with multiset types or other varlength based tricks. As such, however, it is already useful. N5 has no spec that requires blocks to have any specific size. N5-ImgLib2 until recently expected that trailing blocks are cropped to size but I have changed this there saalfeldlab/n5-imglib2@dd26251 because this seemed to be easier than to support variable size blocks in zarr. The corresponding N5-ImgLib2 release is 3.2.0 https://github.com/saalfeldlab/n5-imglib2/tree/3.2.0. |
That's very cool! Thanks for sharing that, @axtimwalde. Is N5-ImgLib2 the primary way you are reading in N5 files or are there other libraries we should also be thinking about? I understand some things are still not supported, @constantinpape, but in my view this is a first pass, which will be easier to iterate on once merged. Raising issues for additional functionality is always possible and can be more easily pursued in parallel once this is integrated. Also have talked to several people already who are |
FWIW I have no objections to this getting released as an "experimental
feature" which is how we've been labelling new stuff that may need API
changes or other fixes following broader use.
Another question I've been meaning to raise is whether support for N5
should be implemented as a transformation layer over another store, rather
than as something that interacts directly with the file system. If it was
implemented just as a transformation layer, then passed through to an inner
store, it could be immediately used with cloud object stores as well as
file systems. If there is any existing N5 data in the cloud then wondering
if this might be of value to the community. This could also be tackled
after release so no objections to releasing as-is.
…On Sun, 24 Feb 2019, 19:16 jakirkham, ***@***.***> wrote:
That's very cool! Thanks for sharing that, @axtimwalde
<https://github.com/axtimwalde>. Is N5-ImgLib2 the primary way you are
reading in N5 files or are there other libraries we should also be thinking
about?
I understand some things are still not supported, @constantinpape
<https://github.com/constantinpape>, but in my view this is a first pass,
which will be easier to iterate on once merged. Raising issues for
additional functionality is always possible and can be more easily pursued
in parallel once this is integrated.
Also have talked to several people already who are pip installing this
branch as opposed to a normal Zarr release. While it's nice that they are
able to get started so easily, this is not great from a support
perspective. For instance they may be missing other bug fixes that are in
master or they may be getting a different product each time they (or
their colleagues) install. A release would really help set a baseline for
how things work going forward.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#309 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QnHgfnzwgoWoqLA8J_C8Br4OT-MOks5vQuT6gaJpZM4XvU9Z>
.
|
N5-ImgLib2 is the preferred way to load actual tensor data, but we also use the core block loading/ storing API directly, e.g. mutliset pixels are still elsewhere, Paintera uses it for indexing and agglomeration. Thanks to @alimanfoo for the support to add this as an experimental feature. I also like the idea to decouple the backend from access API. I do indeed have a bunch of N5 containers sit on Google Cloud, and our stitching pipeline https://github.com/saalfeldlab/stitching-spark is meant to run on and load from and export to AWS. |
Fully agree. My concern was that this would not be compatible with core n5-java for edge chunks that do not align. @axtimwalde has clarified that this is not the case, so no objections to merge from my side. |
Thanks for the info. Is it pretty easy for the other libraries to use the same edge chunk handling strategy? Thanks @alimanfoo. Have already added some notes to the docs explaining this is experimental and subject to change. The transformational layer is probably useful for a few cases and does sound interesting. That said, I lack the time to pursue this personally and am guessing @funkey is similarly busy. Could I suggest that you raise a new issue with some more info about what you have in mind? Maybe we can discuss this a bit more after this is merged. |
SGTM, have raised zarr-developers/n5py#9, hopefully that makes sense but happy to elaborate. |
Thanks everyone! 🍾 🎉 😄 As it sounds like we are ok with this as is for now, have gone ahead and merged it. If you encounter any problems, please feel free to raise an issue and we can discuss how best to address it. |
Thanks a lot everybody! |
This adds support to read and write from and to N5 containers. The
N5Store
handling the conversion between the zarr and N5 format will automatically be selected whenever thepath
for a container ends in.n5
(similar to how theZipStore
is used for files ending in.zip
).The conversion is done mostly transparently, with one exception being the
N5ChunkWrapper
: This is aCodec
with idn5_wrapper
that will automatically be wrapped around the requested compressor. For example, if you create an array with azlib
compressor, in fact then5_wrapper
codec will be used that delegates to thezlib
codec internally. The additional codec was necessary to introduce N5's chunk headers and ensure big endian storage.In a related note,
gzip
compressed N5 arrays can at the moment not be read, since numcodecs treatszlib
andgzip
as synonyms, which they aren't (their compression headers differ). PR zarr-developers/numcodecs#87 solves this issue.See also https://github.com/zarr-developers/zarr/issues/231
TODO:
tox -e py36
orpytest -v --doctest-modules zarr
)tox -e py27
orpytest -v zarr
)tox -e py36
orflake8 --max-line-length=100 zarr
)tox -e py36
orpython -m doctest -o NORMALIZE_WHITESPACE -o ELLIPSIS docs/tutorial.rst
)tox -e docs
)