Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expanding validator and auto-conversion #107

Closed
FynnBe opened this issue Jun 11, 2021 · 11 comments
Closed

expanding validator and auto-conversion #107

FynnBe opened this issue Jun 11, 2021 · 11 comments

Comments

@FynnBe
Copy link
Member

FynnBe commented Jun 11, 2021

after
the models all pass the spec, but there are two remaining problems:

{'format_version': ['Must be one of: 0.1.0, 0.2.0.'],
 'notebook': {0: {'authors': {0: {'_schema': ['Invalid input type.']},
                              1: {'_schema': ['Invalid input type.']}},
                  'documentation': ['Invalid suffix '
                                    '(https:/github.com/miura/NEUBIAS_AnalystSchool2020/tree/master/Ignacio): '
                                    'tuple index out of range',
                                    'Invalid as_posix '
                                    '(https:/github.com/miura/NEUBIAS_AnalystSchool2020/tree/master/Ignacio): '
                                    'expected local, relative file path.']},
              1: {'authors': {0: {'_schema': ['Invalid input type.']},
                              1: {'_schema': ['Invalid input type.']},
                              2: {'_schema': ['Invalid input type.']},
                              3: {'_schema': ['Invalid input type.']},
                              4: {'_schema': ['Invalid input type.']}},
                  'documentation': ['Invalid suffix '
                                    '(https:/cbia.fi.muni.cz/research/segmentation/fru-net.html): '
                                    'tuple index out of range',
                                    'Invalid as_posix '
                                    '(https:/cbia.fi.muni.cz/research/segmentation/fru-net.html): '
                                    'expected local, relative file path.']}}}
  1. format_version of manifest is the model format version, not the rdf format version. Unifying the format versions would be great (see _maybe_ update manifest versions #101)
  2. validation for notebook entries in the manifest is still somewhat rudimentary, in particular no auto-conversion is applied. This is also due to me taking the liberty of applying the changes to authors and documentation to the notebooks as well (to all other rdfs really).

I propose to extend the auto-conversion mechanism and a unified format version to all rdf by specifying all rdfs with marshmallow based schemas and generate their respective documentations like the model one.
The schema.Model

is already based on a schema.RDF(
class RDF(PyBioSchema):
base.
#106 aims at correcting this base RDF schema

For manifest validation

class BioImageIoManifest(PyBioSchema):

and
class BioImageIoManifestNotebookEntry(PyBioSchema):

already exist and only need minor improvements, e.g. update documentation.

@constantinpape
Copy link
Collaborator

2\. validation for notebook entries in the manifest is still somewhat rudimentary, in particular no auto-conversion is applied. This is also due to me taking the liberty of applying the changes to `authors` and `documentation` to the notebooks as well (to all other rdfs really).

Sorry, but I think this is fundamentally wrong. We should not use the marshmallow spec here, but use the jsonschema, which is considered the source of truth for the (non-model) rdf. Otherwise we are back to the same situation of having different references.

I think all validation of (non-model) rdfs should be removed from the marshmallow schema and done through the jsonschema.
(Maybe you can use marshmallow to load the jsonschema and automatically convert it to valid marshmallow, otherwise use https://python-jsonschema.readthedocs.io/en/stable/).

Once that is done we can think about updating the author and documentation field in the jsonschema to have it in sync between the model.rdf and the other rdfs.

@constantinpape constantinpape mentioned this issue Jun 11, 2021
4 tasks
@FynnBe
Copy link
Member Author

FynnBe commented Jun 14, 2021

I think it would be easier to maintain and develop if we viewed all RDFs under one format version and define and validate all of it in marshmallow. Why should authors of notebooks be defined differently from model ones, etc? imo, it makes a lot of sense to share some fields across all rdfs, to reduce the 'syncing effort'.
jsonschema can be generated from marshmallow, but afaik going the other way is infeasible.

@FynnBe
Copy link
Member Author

FynnBe commented Jun 14, 2021

I can make a draft of the general rdf in marshmallow at the end of this week. Maybe then it'll be easier to discuss...

@constantinpape
Copy link
Collaborator

constantinpape commented Jun 14, 2021

if we viewed all RDFs under one format version

Yes, definitely.

define and validate all of it in marshmallow

Having it under the same format version does not mean we have to do it in marshmallow. Putting the RDF in jsonschema is something we have decided on a while ago with @oeway (we can of course reconsider that decision). So given that decision validating it with marshmallow is wrong, and leads to the failures in the validator we are seeing.

imo, it makes a lot of sense to share some fields across all rdfs, to reduce the 'syncing effort'.

I think this would indeed be good, but currently the definitions are different and we cannot just silently assume that they were updated. Also, for some of the fields the definition between model and rdf is contradicting, e.g. documentation:

Edit: Just to make my point clear here: I am not against unifying this, but we need to be careful with introducing very large changes (e.g. validating the rdf based on marshmallow instead of the jsonschema), without coordinating this, as this yields a lot of confusion. If we decide to use marshmallow as the "source of truth" for the rdf as well, this definitely requires a bump in the version.

@FynnBe
Copy link
Member Author

FynnBe commented Jun 14, 2021

Also, for some of the fields the definition between model and rdf is contradicting, e.g. documentation

I suggest that more specialized specs like the model spec may overwrite the general rdf spec, e.g. for documentation. Of course it is desirable to have them as homogenous as straight forwardly possible

@oeway
Copy link
Contributor

oeway commented Jun 14, 2021

I would actually prefer not moving the generic RDF spec to marshmallow, although there is no fundamental difference since we can generate the jsonschema ultimately. But I felt it somewhat more correct if we keep it in jsonschema, because:

  1. jsonschema is a language-agnostic standard which is more suitable for a generic RDF
  2. the generic RDF are mostly used by the website now, which is mostly javascript.
  3. we have simple fields in the generic RDF, we don't need sophisticated validation comes with marshmallow

Also consider that we may define more spec for dataset/notebooks/applications, I felt it's better to do it in jsonschema, more strightforward (no need to use CI to generate the josnschema), and also aviod touching the model spec part.

@FynnBe
Copy link
Member Author

FynnBe commented Jun 14, 2021

my 4 cents:

  1. jsonschema is a language-agnostic standard which is more suitable for a generic RDF

model packages don't need to use python/marshmallow either, e.g. for javascript based models...

  1. the generic RDF are mostly used by the website now, which is mostly javascript.

again, the exported jsonschema could serve as missionary for the single source marshmallow truth

  1. we have simple fields in the generic RDF, we don't need sophisticated validation comes with marshmallow

that makes the marshmallow code very simple, that's great! It's not longer than jsonschema and we can reuse fields like authors across RDFs

Also consider that we may define more spec for dataset/notebooks/applications, I felt it's better to do it in jsonschema, more strightforward (no need to use CI to generate the josnschema), and also aviod touching the model spec part.

for me more more reason to specify everything in marshmallow. Also note that the jsonschema could be generated just like I setup generation of the documentation: with a pre-commit ensuring that the commited file is "freshly generated" from the current HEAD. For the documentation this has worked really nicely and would result in a commited jsonschema with history in jsonschema, etc..

@constantinpape
Copy link
Collaborator

constantinpape commented Jun 14, 2021

From my side there are arguments for both positions, just to mention the two main points:

  • Pro "everything in marshmallow": everything will be defined in the same framework, which makes the structure of the spec easier and makes it easier to keep rdf and model spec in sync; we can auto-generate jsonschema for the rdf from marshmallow
  • Contra: contributing to the marshmallow schema is more difficult than to the jsonschema, currently only @FynnBe really knows how this works. jsonschema is a more common, language agnostic and easier approach to spec generation.

I think overall my vote would be to unify, but if we do this we need some proper developer docs on the marshmallow schema, because having two different frameworks for this is confusing in the long run.
We should probably discuss this in the next bioimage.io meeting though.

@FynnBe
Copy link
Member Author

FynnBe commented Jun 14, 2021

we need some proper developer docs on the marshmallow schema

👍 I'd say we need those anyway!

@oeway
Copy link
Contributor

oeway commented Jun 14, 2021

OK, let's do it in marshmallow then, it won't be too hard to grasp I guess. However, I would rather implement the generation of docs and schema in CI in order to make sure they are synchronized.

Edit: I am working on the CI now #115

@constantinpape
Copy link
Collaborator

I think this issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants