Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGFF dataset validator #58

Open
thewtex opened this issue Sep 7, 2021 · 9 comments
Open

NGFF dataset validator #58

thewtex opened this issue Sep 7, 2021 · 9 comments

Comments

@thewtex
Copy link
Contributor

thewtex commented Sep 7, 2021

A tool to validate whether a dataset follows the NGFF spec. Per-version validation. Generate a visual and programmatic summary required and optional features and any errors related to types, etc.

@constantinpape
Copy link
Contributor

Related: ome/ome-zarr-py#102
Also, @joshmoore is considering json-ld to define the NGFF spec, to have a static definition that could be used for language independent validation.

@will-moore
Copy link
Member

In discussion with @joshmoore and @jburel... (e.g. see #31 (comment)) It seems there's 2 types of validation that we're going to need:

  • JSON validation - check that correct types and attributes are present etc. Hopefully this can be achieved with a schema and existing validation tools.
  • Validation against the Zarr arrays. E.g. check that "datasets" are ordered from largest to smallest, check that the "axes" list is the same length as the array dimensions etc. This will likely need custom validation code.

For the JSON validation, started looking at https://www.commonwl.org/v1.2/SchemaSalad.html see #69

cc @glyg

@glyg
Copy link
Contributor

glyg commented Oct 27, 2021

I can start looking on the 2nd aspect by coding something in python

@will-moore
Copy link
Member

@glyg That would be great, thanks!
We imagine that an ome_zarr validate command would work in a similar way to the info command. In due course the info command could include validation (see ome/ome-zarr-py#102).

@constantinpape
Copy link
Contributor

We have a json schema now thanks to efforts by @will-moore and @sbesson: https://github.com/ome/ngff/tree/main/0.4/schemas Usage examples will follow.

@thewtex
Copy link
Contributor Author

thewtex commented Mar 29, 2022

@will-moore @sbesson thanks for your good work on the validation! 🙏 👏

Following:

"$id": "https://ngff.openmicroscopy.org/0.4/schemas/image.schema",

it looks like the *.schema files are intended to be published to gh-pages? This is still todo?

@sbesson
Copy link
Member

sbesson commented Mar 31, 2022

@thewtex you are right, the schemas are currently living in the GitHub repository alongside the samples but there are not published to the gh-pages yet. There were several considerations around the URL naming in the original thread (#76 (review)) and the publication step was deferred to a round of review (#76 (comment)) but coming back to it has not been captured.

I don't know if we want to (ab-)use this issue or create a separate issue to go over the current URL proposal and make sure we are all happy with the decisions.

On a related note, there is also ongoing work on making these schemas available as artifacts so that downstream tools could bundle them and use them for validation e.g. when working offline or simply for performance reasons - see #77. So far, most of the work has been driven by the Python drivers but there are also design decisions to be made that should be fully language agnostic?
Would you have some use case for caching these schemas and using them for validation and would that mandate some particular constraints in terms of layout or distribution?

@thewtex
Copy link
Contributor Author

thewtex commented Mar 31, 2022

@sbesson thanks for the information!

Yes, I am looking to validate in Python, JavaScript, C++, so both the http and package distribution like #77 would be helpful. #77 looks good to me. I will fetch from the GitHub repository for now and report what works well.

@sbesson
Copy link
Member

sbesson commented Apr 13, 2022

As a follow-up, I just opened ome/spec-prod#2 to update the logic to publish the JSON schemas. Now is probably a good time for anyone to suggest alternate permanent URLs for these schemas before we start deploying them to the gh-pages branch.

I assume we'll also want the existing schemas to be listed from https://ngff.openmicroscopy.org/latest/ maybe as a separate sections? or within each specification section?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants