-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent schema validation between /v1/workflows/activities
and /metadata/json:validate
API endpoints
#462
Comments
I was able to confirm that a record that does not validate with json:validate can be entered into mongo with v1/workflows/activities. Desired behavior is to have a consistent validation across POST endpoints. When I try this record with json:validate I get a few errors, with v1/workflows/activities I am able to submit the record testing with the dev version of runtime. cc @Michal-Babins @mbthornton-lbl for coordination. Related ticket to fix things on the record generation side is microbiomedata/nmdc_automation#24. I'd like to get the addressed this sprint if possible to prevent more invalid data from getting into mongo. @ssarrafan @shreddd Test record {
"metagenome_annotation_activity_set": [
{
"id": "nmdc:wfmgan-11-mmt28267.2",
"name": "Metagenome Annotation Analysis Activity for nmdc:wfmgan-11-mmt28267.2",
"started_at_time": "2024-02-07T22:56:28.922223+00:00",
"ended_at_time": "2024-02-08T06:03:41.210740+00:00",
"was_informed_by": "nmdc:omprc-11-9mvz7z22",
"used": null,
"execution_resource": "NERSC-Perlmutter",
"git_url": "https://github.com/microbiomedata/mg_annotation",
"has_input": [
"nmdc:dobj-11-5eb6v689"
],
"type": "nmdc:MetagenomeAnnotationActivity",
"has_output": [
"nmdc:dobj-11-xg79v192",
"nmdc:dobj-11-ztbqrz10",
"nmdc:dobj-11-v8ktb336",
"nmdc:dobj-11-7bwq0v72",
"nmdc:dobj-11-9fthdp31",
"nmdc:dobj-11-xpfkx256",
"nmdc:dobj-11-47q33b43",
"nmdc:dobj-11-7k704568",
"nmdc:dobj-11-5qfq9q36",
"nmdc:dobj-11-5sfp4s54",
"nmdc:dobj-11-fge4rm69",
"nmdc:dobj-11-a2fwy597",
"nmdc:dobj-11-0qywmr14",
"nmdc:dobj-11-aqf88r67",
"nmdc:dobj-11-42bfmh62",
"nmdc:dobj-11-jkfqv467",
"nmdc:dobj-11-te9q8e26",
"nmdc:dobj-11-r403jz94",
"nmdc:dobj-11-f26zbz28",
"nmdc:dobj-11-qg6mr936",
"nmdc:dobj-11-1mek9m87",
"nmdc:dobj-11-wycjgd54",
"nmdc:dobj-11-x8r3q961"
],
"part_of": [
"nmdc:omprc-11-9mvz7z22"
],
"version": "v1.0.4",
"qc_status": null,
"qc_comment": null,
"has_failure_categorization": [],
"gold_analysis_project_identifiers": []
}
]
} runtime dev json:validate response body {
"result": "errors",
"detail": {
"metagenome_annotation_activity_set": [
"None is not of type 'string'",
"None is not one of ['pass', 'fail']",
"None is not of type 'string'",
"None is not of type 'string'"
]
}
} runtime dev v1/workflows/activities response body {
"message": "jobs accepted"
}
|
Duplicates: #478 Solution:
|
The changes that were made to the endpoints make it so you can't submit data_object_set records, only workflow execution subclasses. The typical submission is a mix of data_object_set records and workflow subclass records. This needs to be hotfixed as it blocks workflows from using these endpoints. Example record that is expected to pass curl -X 'POST' \
'https://api-dev.microbiomedata.org/workflows/activities' \
-H 'accept: application/json' \
-H 'Authorization: Bearer $TOKEN' \
-H 'Content-Type: application/json' \
-d '{"mags_activity_set": [
{
"id": "nmdc:wfmag-11-zcwca422.5",
"name": "TEST nmdc:wfmag-11-zcwca422.5",
"started_at_time": "2024-03-13T21:45:28.521604+00:00",
"ended_at_time": "2024-03-13T23:56:16.431104+00:00",
"was_informed_by": "nmdc:omprc-11-9mvz7z22",
"execution_resource": "NERSC-Perlmutter",
"git_url": "https://github.com/microbiomedata/metaMAGs",
"has_input": [
"nmdc:dobj-11-5eb6v689",
"nmdc:dobj-11-7k80qv75",
"nmdc:dobj-11-y563v150",
"nmdc:dobj-11-72e7f129",
"nmdc:dobj-11-vn9pwz37",
"nmdc:dobj-11-9x2zaf16",
"nmdc:dobj-11-9kpz9641",
"nmdc:dobj-11-9prnyr33",
"nmdc:dobj-11-feses595",
"nmdc:dobj-11-0z5rhk53",
"nmdc:dobj-11-sb28nx57",
"nmdc:dobj-11-vrcm1x60",
"nmdc:dobj-11-xx1tb938",
"nmdc:dobj-11-y3f47w18",
"nmdc:dobj-11-k62fk420"
],
"type": "nmdc:MagsAnalysisActivity",
"has_output": [
"nmdc:dobj-12-gvntvq90"],
"part_of": [
"nmdc:omprc-11-9mvz7z22"
],
"version": "v1.0.8"
}
],
"data_object_set": [
{
"id": "nmdc:dobj-12-gvntvq90",
"name": "nmdc_wfmag-11-zcwca422.5_checkm_qa.out",
"description": "CheckM for nmdc:wfmag-11-zcwca422.5",
"file_size_bytes": 14027,
"md5_checksum": "dcaa3973977aac97c74eb1610ffdba45",
"data_object_type": "CheckM Statistics",
"type": "nmdc:DataObject"
}]
}' current error on runtime-dev {
"detail": "keys must be nmdc-schema activity collection names`"
} |
Here's a link to a Slack conversation about this issue: I will rename the issue so its name shows the endpoints as URL paths. For reference, here are the Swagger UI links the original issue name was based upon: |
/v1/workflows/activities
and /metadata/json:validate
API endpoints
Here's the validation code used by the nmdc-runtime/nmdc_runtime/api/endpoints/workflows.py Lines 57 to 91 in 1007026
Here's the validation code used by the nmdc-runtime/nmdc_runtime/api/endpoints/metadata.py Lines 220 to 228 in 1007026
Finally, here's the nmdc-runtime/nmdc_runtime/util.py Lines 519 to 561 in 1007026
Of the two endpoints listed above, only # verify activities in activity_set are nmdc-schema compliant
for collection_name in activity_set:
if collection_name not in activity_collection_names(mdb):
raise ValueError("keys must be nmdc-schema activity collection names`") I don't know why the additional check is necessary or why the |
…ivities Documents other than activities may be generated and submittable at the same time. closes #462
Does a data object record with url value of null submit w/o errors? This doesn't validate with json:submit. This record ended up in mongo, we believe from the workflows v1 post endpoint so now we need to double check. If this endpoint doesn't validate against the schema this needs to be fixed.
The text was updated successfully, but these errors were encountered: