Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N5 support #6466

Merged
merged 19 commits into from
Sep 22, 2022
Merged

N5 support #6466

merged 19 commits into from
Sep 22, 2022

Conversation

leowe
Copy link
Contributor

@leowe leowe commented Sep 13, 2022

  • adds support for N5
  • refactors data readers for Zarr and N5 so that they share a common architecture
  • extracts functionality into DatasetArray and DatasetHeader which ZarrArray, N5Array, ZarrHeader, and N5Header subclass and adapt the corresponding spec
  • the reading process of a N5 or Zarr dataset comprises three steps: (1) reading the dataset from file, (2) uncompressing it, and (3) building a multi-dimensional (in our case 3d) array in the right type from the uncompressed data
  • all 3 steps are done by separate components: (1) by FileSystemStore, (2) by Compressor, and (3) by MultiArray
  • the ChunkReader composes those components so that they are easier to swap out

URL of deployed dev instance (used for testing):

Steps to test:

  • Open any N5 dataset, e.g. jrc_cos7-11.n5.zip
  • NB: This dataset only uses values from 0-255

Issues:



TODO

  • Allow Compressor properties in attribute.json to also contain boolean values
  • Test dataset with data type different from uint8

@leowe leowe self-assigned this Sep 13, 2022
…r in simpler composition instead of inheritance
# Conflicts:
#	project/Dependencies.scala
@leowe
Copy link
Contributor Author

leowe commented Sep 15, 2022

Open questions:

  • How to handle the datatypes? Currently the code just uses the ZarrDataType as the "universal" one? Should I introduce another one? That only feels like redundant information...
  • Should I also add a N5 dataset to the wk dev instance and to the Notion page?

@leowe leowe requested a review from fm3 September 15, 2022 08:18
@fm3
Copy link
Member

fm3 commented Sep 15, 2022

How to handle the datatypes? Currently the code just uses the ZarrDataType as the "universal" one? Should I introduce another one? That only feels like redundant information...

If the ZarrDataType also works for N5, maybe you can come up with a good name and place for it so that it can be used for both with good conscience.

Should I also add a N5 dataset to the wk dev instance and to the Notion page?

If that is relatively easy to do, I’d say yes :)

@leowe leowe marked this pull request as ready for review September 15, 2022 14:23
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First scroll-through looks very promising :) I’ll review in more detail on monday, but the abstraction concept seems very nice!
Couple of naming suggestions:
FileSystemStoreN5 (and other things with N5 at the end) → N5FileSystemStore (I’d say it is nicer to have the most general at the end, this thing is a store, after all, not an n5)
resolvedDataType → dataType (is this a problem? you changed this for zarr)
chunkSize → chunkShape (size feels one-dimensional for me. now that we have this freedom, I’d go for shape, compare scalableminds/webknossos-libs#706)

@normanrz
Copy link
Member

What would be missing for users to add remote N5 dataset (e.g. https://janelia-cosem-datasets.s3.amazonaws.com/jrc_macrophage-2/jrc_macrophage-2.n5) in webKnossos?

@fm3
Copy link
Member

fm3 commented Sep 16, 2022

At the very least the ExploreRemoteLayer functionality (where we detect ngff headers and create datasource-properties.jsons from them) needs to be extended to explore N5 datasets as well. That is not yet part of this PR. Edit: I wrote #6480 for that

@normanrz
Copy link
Member

Should I also add a N5 dataset to the wk dev instance and to the Notion page?

If that is relatively easy to do, I’d say yes :)

Instead of putting the actual files on asterix, could also put a datasource-properties.json that references a remote dataset on asterix? Remote datasets are definitely more interesting to test than local ones.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great PR! I really like that you took the time to think about how this can be integrated with the existing zarr reading code.

I added some more comments in addition to my renaming requests above :)

Also I’m looking forward to test this with remote datasets, and with multiple mags. (Maybe you could prepare a datasource-properties.json, as Norman already suggested?)

Comment on lines 17 to 25
lazy val pathWithFallback: String =
path.getOrElse(if (mag.isIsotropic) s"${mag.x}" else s"${mag.x}-${mag.y}-${mag.z}")
private lazy val uri: URI = new URI(pathWithFallback)
private lazy val isRemote: Boolean = FileSystemsHolder.isSupportedRemoteScheme(uri.getScheme)
lazy val remoteSource: Option[RemoteSourceDescriptor] =
if (isRemote)
Some(RemoteSourceDescriptor(uri, credentials.map(_.user), credentials.flatMap(_.password)))
else
None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a lot like ZarrMag. Maybe there can be a common trait? Not sure about a good name, though, ExtendedMag? Maybe you have a better Idea :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly ZarrMag. I went ahead and used hte new class in both cases (DatasetLocatorMag)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have better ideas for the name, feel free to share ;)

@leowe
Copy link
Contributor Author

leowe commented Sep 21, 2022

resolvedDataType → dataType (is this a problem? you changed this for zarr)

This is actually a problem because dataType is used by the N5 attribute.json and if we try to use it another way we will run into some overlap.

@leowe leowe requested a review from fm3 September 21, 2022 14:30
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Reading N5 Datasets
3 participants