expanding validator and auto-conversion #107

FynnBe · 2021-06-11T14:38:24Z

after
the models all pass the spec, but there are two remaining problems:

{'format_version': ['Must be one of: 0.1.0, 0.2.0.'],
 'notebook': {0: {'authors': {0: {'_schema': ['Invalid input type.']},
                              1: {'_schema': ['Invalid input type.']}},
                  'documentation': ['Invalid suffix '
                                    '(https:/github.com/miura/NEUBIAS_AnalystSchool2020/tree/master/Ignacio): '
                                    'tuple index out of range',
                                    'Invalid as_posix '
                                    '(https:/github.com/miura/NEUBIAS_AnalystSchool2020/tree/master/Ignacio): '
                                    'expected local, relative file path.']},
              1: {'authors': {0: {'_schema': ['Invalid input type.']},
                              1: {'_schema': ['Invalid input type.']},
                              2: {'_schema': ['Invalid input type.']},
                              3: {'_schema': ['Invalid input type.']},
                              4: {'_schema': ['Invalid input type.']}},
                  'documentation': ['Invalid suffix '
                                    '(https:/cbia.fi.muni.cz/research/segmentation/fru-net.html): '
                                    'tuple index out of range',
                                    'Invalid as_posix '
                                    '(https:/cbia.fi.muni.cz/research/segmentation/fru-net.html): '
                                    'expected local, relative file path.']}}}

format_version of manifest is the model format version, not the rdf format version. Unifying the format versions would be great (see _maybe_ update manifest versions #101)
validation for notebook entries in the manifest is still somewhat rudimentary, in particular no auto-conversion is applied. This is also due to me taking the liberty of applying the changes to authors and documentation to the notebooks as well (to all other rdfs really).

I propose to extend the auto-conversion mechanism and a unified format version to all rdf by specifying all rdfs with marshmallow based schemas and generate their respective documentations like the model one.
The schema.Model

spec-bioimage-io/bioimageio/spec/v0_3/schema.py

Line 481 in 666db26

class Model(RDF):

is already based on a schema.RDF(

spec-bioimage-io/bioimageio/spec/v0_3/schema.py

Line 54 in 666db26

class RDF(PyBioSchema):

base.
#106 aims at correcting this base RDF schema

For manifest validation

spec-bioimage-io/bioimageio/spec/v0_3/schema.py

Line 715 in 666db26

class BioImageIoManifest(PyBioSchema):

and

spec-bioimage-io/bioimageio/spec/v0_3/schema.py

Line 693 in 666db26

class BioImageIoManifestNotebookEntry(PyBioSchema):

already exist and only need minor improvements, e.g. update documentation.

The text was updated successfully, but these errors were encountered:

constantinpape · 2021-06-11T19:35:57Z

2\. validation for notebook entries in the manifest is still somewhat rudimentary, in particular no auto-conversion is applied. This is also due to me taking the liberty of applying the changes to `authors` and `documentation` to the notebooks as well (to all other rdfs really).

Sorry, but I think this is fundamentally wrong. We should not use the marshmallow spec here, but use the jsonschema, which is considered the source of truth for the (non-model) rdf. Otherwise we are back to the same situation of having different references.

I think all validation of (non-model) rdfs should be removed from the marshmallow schema and done through the jsonschema.
(Maybe you can use marshmallow to load the jsonschema and automatically convert it to valid marshmallow, otherwise use https://python-jsonschema.readthedocs.io/en/stable/).

Once that is done we can think about updating the author and documentation field in the jsonschema to have it in sync between the model.rdf and the other rdfs.

FynnBe · 2021-06-14T08:34:36Z

I think it would be easier to maintain and develop if we viewed all RDFs under one format version and define and validate all of it in marshmallow. Why should authors of notebooks be defined differently from model ones, etc? imo, it makes a lot of sense to share some fields across all rdfs, to reduce the 'syncing effort'.
jsonschema can be generated from marshmallow, but afaik going the other way is infeasible.

FynnBe · 2021-06-14T09:00:06Z

I can make a draft of the general rdf in marshmallow at the end of this week. Maybe then it'll be easier to discuss...

constantinpape · 2021-06-14T09:00:30Z

if we viewed all RDFs under one format version

Yes, definitely.

define and validate all of it in marshmallow

Having it under the same format version does not mean we have to do it in marshmallow. Putting the RDF in jsonschema is something we have decided on a while ago with @oeway (we can of course reconsider that decision). So given that decision validating it with marshmallow is wrong, and leads to the failures in the validator we are seeing.

imo, it makes a lot of sense to share some fields across all rdfs, to reduce the 'syncing effort'.

I think this would indeed be good, but currently the definitions are different and we cannot just silently assume that they were updated. Also, for some of the fields the definition between model and rdf is contradicting, e.g. documentation:

In the rdf we say "url to the documentation in markdown format".
In the model we demand that it is a local path.

Edit: Just to make my point clear here: I am not against unifying this, but we need to be careful with introducing very large changes (e.g. validating the rdf based on marshmallow instead of the jsonschema), without coordinating this, as this yields a lot of confusion. If we decide to use marshmallow as the "source of truth" for the rdf as well, this definitely requires a bump in the version.

FynnBe · 2021-06-14T09:08:54Z

Also, for some of the fields the definition between model and rdf is contradicting, e.g. documentation

I suggest that more specialized specs like the model spec may overwrite the general rdf spec, e.g. for documentation. Of course it is desirable to have them as homogenous as straight forwardly possible

oeway · 2021-06-14T10:16:40Z

I would actually prefer not moving the generic RDF spec to marshmallow, although there is no fundamental difference since we can generate the jsonschema ultimately. But I felt it somewhat more correct if we keep it in jsonschema, because:

jsonschema is a language-agnostic standard which is more suitable for a generic RDF
the generic RDF are mostly used by the website now, which is mostly javascript.
we have simple fields in the generic RDF, we don't need sophisticated validation comes with marshmallow

Also consider that we may define more spec for dataset/notebooks/applications, I felt it's better to do it in jsonschema, more strightforward (no need to use CI to generate the josnschema), and also aviod touching the model spec part.

FynnBe · 2021-06-14T11:20:25Z

my 4 cents:

jsonschema is a language-agnostic standard which is more suitable for a generic RDF

model packages don't need to use python/marshmallow either, e.g. for javascript based models...

the generic RDF are mostly used by the website now, which is mostly javascript.

again, the exported jsonschema could serve as missionary for the single source marshmallow truth

we have simple fields in the generic RDF, we don't need sophisticated validation comes with marshmallow

that makes the marshmallow code very simple, that's great! It's not longer than jsonschema and we can reuse fields like authors across RDFs

Also consider that we may define more spec for dataset/notebooks/applications, I felt it's better to do it in jsonschema, more strightforward (no need to use CI to generate the josnschema), and also aviod touching the model spec part.

for me more more reason to specify everything in marshmallow. Also note that the jsonschema could be generated just like I setup generation of the documentation: with a pre-commit ensuring that the commited file is "freshly generated" from the current HEAD. For the documentation this has worked really nicely and would result in a commited jsonschema with history in jsonschema, etc..

constantinpape · 2021-06-14T12:17:14Z

From my side there are arguments for both positions, just to mention the two main points:

Pro "everything in marshmallow": everything will be defined in the same framework, which makes the structure of the spec easier and makes it easier to keep rdf and model spec in sync; we can auto-generate jsonschema for the rdf from marshmallow
Contra: contributing to the marshmallow schema is more difficult than to the jsonschema, currently only @FynnBe really knows how this works. jsonschema is a more common, language agnostic and easier approach to spec generation.

I think overall my vote would be to unify, but if we do this we need some proper developer docs on the marshmallow schema, because having two different frameworks for this is confusing in the long run.
We should probably discuss this in the next bioimage.io meeting though.

FynnBe · 2021-06-14T14:04:41Z

we need some proper developer docs on the marshmallow schema

👍 I'd say we need those anyway!

oeway · 2021-06-14T19:43:24Z

OK, let's do it in marshmallow then, it won't be too hard to grasp I guess. However, I would rather implement the generation of docs and schema in CI in order to make sure they are synchronized.

Edit: I am working on the CI now #115

constantinpape · 2021-09-10T08:56:42Z

I think this issue is resolved.

FynnBe mentioned this issue Jun 11, 2021

Manifest validation fails deepimagej/models#23

Closed

constantinpape mentioned this issue Jun 11, 2021

futher improvements #105

Closed

4 tasks

FynnBe mentioned this issue Jun 14, 2021

Generate docs from json schema #110

Closed

constantinpape mentioned this issue Jun 14, 2021

Move 0.3.2 tag #114

Closed

FynnBe mentioned this issue Jun 22, 2021

Define RDF with Marshmallow #125

Merged

constantinpape closed this as completed Sep 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expanding validator and auto-conversion #107

expanding validator and auto-conversion #107

FynnBe commented Jun 11, 2021

constantinpape commented Jun 11, 2021

FynnBe commented Jun 14, 2021

FynnBe commented Jun 14, 2021

constantinpape commented Jun 14, 2021 •

edited

Loading

FynnBe commented Jun 14, 2021

oeway commented Jun 14, 2021

FynnBe commented Jun 14, 2021

constantinpape commented Jun 14, 2021 •

edited

Loading

FynnBe commented Jun 14, 2021

oeway commented Jun 14, 2021 •

edited

Loading

constantinpape commented Sep 10, 2021

expanding validator and auto-conversion #107

expanding validator and auto-conversion #107

Comments

FynnBe commented Jun 11, 2021

constantinpape commented Jun 11, 2021

FynnBe commented Jun 14, 2021

FynnBe commented Jun 14, 2021

constantinpape commented Jun 14, 2021 • edited Loading

FynnBe commented Jun 14, 2021

oeway commented Jun 14, 2021

FynnBe commented Jun 14, 2021

constantinpape commented Jun 14, 2021 • edited Loading

FynnBe commented Jun 14, 2021

oeway commented Jun 14, 2021 • edited Loading

constantinpape commented Sep 10, 2021

constantinpape commented Jun 14, 2021 •

edited

Loading

constantinpape commented Jun 14, 2021 •

edited

Loading

oeway commented Jun 14, 2021 •

edited

Loading