-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial CWL JSON schema definition #329
Conversation
description: | | ||
Hint indicating that the Application Package corresponds to an | ||
OGC API - Processes provider that should be remotely executed and monitored | ||
by this instance. (note: can only be an 'hint' as it is unofficial CWL specification). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the process is not otherwise executable without making use of the OGCAPIRequirement
then it should be under requirements
not hints
. Non-standard extensions are allowed in both fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
I think it is implementation dependent whether this would be required or optional on the target platform where the process is being deployed. However, I would like to make this definition proposal somewhat established such that different implementators could be guided toward a somewhat interoperable reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @fmigneault ; this looks like a lot of work!
A bonus would be to try to load each of the CWL v1.2 conformance tests (except those with should_fail: true
) with this schema to ensure that there are no obvious things missing. Either as a CI process or a one-time test.
Coverage so far:
import logging
import os
import posixpath
from functools import cache
from typing import Dict, List, Optional, TypeAlias, TypedDict, TypeGuard, Union
from typing_extensions import NotRequired
from urllib.parse import urljoin, urlparse
import jsonschema
import requests
import pytest
import yaml
from jsonschema.exceptions import ValidationError
from yaml.scanner import ScannerError
CONFORMANCE_TESTS_FILE = "https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/1.2.1_proposed/conformance_tests.yaml"
CWL_JSON_SCHEMA_FILE = "https://raw.githubusercontent.com/crim-ca/ogcapi-processes/cwl-schema/openapi/schemas/processes-dru/cwl.yaml#$definitions/CWL"
LOGGER = logging.getLogger(__name__)
# pylint: disable=C0103,invalid-name
Number = Union[int, float]
ValueType = Union[str, Number, bool]
AnyValueType = Optional[ValueType]
# add more levels of explicit definitions than necessary to simulate JSON recursive structure better than 'Any'
# amount of repeated equivalent definition makes typing analysis 'work well enough' for most use cases
_JSON: TypeAlias = "JSON"
_JsonObjectItemAlias: TypeAlias = "_JsonObjectItem"
_JsonListItemAlias: TypeAlias = "_JsonListItem"
_JsonObjectItem = Dict[str, Union[AnyValueType, _JSON, _JsonObjectItemAlias, _JsonListItemAlias]]
_JsonListItem = List[Union[AnyValueType, _JSON, _JsonObjectItem, _JsonListItemAlias]]
_JsonItem = Union[AnyValueType, _JSON, _JsonObjectItem, _JsonListItem]
JSON = Union[Dict[str, Union[_JSON, _JsonItem]], List[Union[_JSON, _JsonItem]], AnyValueType]
ConformanceTestDef = TypedDict(
"ConformanceTestDef",
{
"id": str,
"doc": str,
"tags": List[str],
"tool": str,
"job": NotRequired[str], # not used, for running the actual CWL
"output": JSON, # not used, output of CWL execution
"should_fail": NotRequired[bool], # indicates failure as "execute failing", but still a valid CWL
}
)
@cache
def is_remote_file(file_location: str) -> TypeGuard[str]:
"""
Parses to file location to figure out if it is remotely available or a local path.
"""
cwl_file_path_or_url = file_location.replace("file://", "")
scheme = urlparse(cwl_file_path_or_url).scheme
return scheme != "" and not posixpath.ismount(f"{scheme}:") # windows partition
@cache
def load_file(file_path: str, text: bool = False) -> Union[JSON, str]:
"""
Load :term:`JSON` or :term:`YAML` file contents from local path or remote URL.
If URL, get the content and validate it by loading, otherwise load file directly.
:param file_path: Local path or URL endpoint where file to load is located.
:param text: load contents as plain text rather than parsing it from :term:`JSON`/:term:`YAML`.
:returns: loaded contents either parsed and converted to Python objects or as plain text.
:raises ValueError: if YAML or JSON cannot be parsed or loaded from location.
"""
try:
if is_remote_file(file_path):
headers = {"Accept": "text/plain"}
resp = requests.get(file_path, headers=headers)
if resp.status_code != 200:
raise ValueError("Loading error: [%s]", file_path)
return resp.content if text else yaml.safe_load(resp.content)
with open(file_path, mode="r", encoding="utf-8") as f:
return f.read() if text else yaml.safe_load(f)
except OSError as exc:
LOGGER.debug("Loading error: %s", exc, exc_info=exc)
raise
except ScannerError as exc: # pragma: no cover
LOGGER.debug("Parsing error: %s", exc, exc_info=exc)
raise ValueError("Failed parsing file content as JSON or YAML.")
def load_conformance_tests(test_file: str) -> List[ConformanceTestDef]:
conformance_tests = load_file(test_file)
assert isinstance(conformance_tests, list)
assert all(isinstance(conf_test, dict) for conf_test in conformance_tests)
test_base = os.path.dirname(test_file)
all_conf_test = []
for conf_test in conformance_tests:
if "$import" in conf_test:
conf_path = urljoin(f"{test_base}/", conf_test["$import"])
conf_test = load_conformance_tests(conf_path)
all_conf_test.extend(conf_test)
continue
for ref in ["job", "tool"]:
if ref not in conf_test:
continue
if not urlparse(conf_test[ref]).scheme:
conf_test[ref] = urljoin(f"{test_base}/", conf_test[ref])
all_conf_test.append(conf_test)
return all_conf_test
@pytest.mark.parametrize(
["schema_file", "conformance_test"],
[
(CWL_JSON_SCHEMA_FILE, conf_test)
for conf_test in load_conformance_tests(CONFORMANCE_TESTS_FILE)
]
)
def test_conformance(schema_file: str, conformance_test: ConformanceTestDef) -> None:
instance_file = conformance_test["tool"]
instance = load_file(instance_file)
schema_path = []
schema_ref = ""
if "#" in schema_file:
schema_file, schema_ref = schema_file.split("#", 1)
schema_path = [ref for ref in schema_ref.split("/") if ref]
schema_ref = f"#{schema_ref}"
schema_base = schema = load_file(schema_file)
if schema_path:
for part in schema_path:
schema = schema[part]
# ensure local schema can find relative $ref, since the provided reference can be a sub-schema (with "#/...")
scheme_uri = f"file://{schema_file}" if schema_file.startswith("/") else schema_file
validator = jsonschema.validators.validator_for(schema_base)
validator.resolver = jsonschema.RefResolver(base_uri=scheme_uri, referrer=schema_base)
validator(schema_base).validate(instance, schema) # raises if invalid Should this be made part of https://github.com/common-workflow-language/cwl-v1.2 directly?
And those definitions would refer to the official CWL JSON schema in https://github.com/common-workflow-language/cwl-v1.2. |
You're so fast, thanks!
Yes, we would be happy to host OpenAPI schemas at https://w3id.org/cwl/v1.2 once the validation passes. I assume it is all hand-written? Maybe we can get an intern or junior dev to build an automated schema-salad to OpenAPI converter based upon your work; to keep it updated with future revisions of the CWL standards. |
I'm about to start checking them.
It's generated from a subset of definitions from https://github.com/crim-ca/weaver/blob/master/weaver/wps_restapi/swagger_definitions.py + a few manual edits to patch some formatting. Many of the documented descriptions are copy-pasted from CWL docs with minor formatting edits. They could reuse a reference document. |
…discriminator' + reduce some duplicated objects
Update as of a67e7ec I've not gone through all the remaining failing cases, but I saw a few that did not look really complicated. For the rest, items not yet supported are:
|
Great progress, I'm impressed! I don't think your schema is "minimalistic" anymore, and that is a good thing!
👍
Correct, it is just a flag that there might be a nested workflow.
For all three of these: you probably should just accept them and not validate them too deeply. |
…le or not in CWL Record type
…inds schema def for arguments/inputs
…g a local file path
08fb2c4
|
…c/label annotations for all tools, workflows and I/O
bc3063e
This should now be fully suported CWL if those issues get addressed! 🚀 |
@mr-c |
Send a PR to https://github.com/common-workflow-language/cwl-v1.2/ ; thanks! |
PR is now open: common-workflow-language/cwl-v1.2#256 |
Closing this in favour of common-workflow-language/cwl-v1.2#256 |
Add a "minimalistic" (as not fully feature complete, but relatively high coverage of most common definitions) CWL schema using JSON schema representation.
The main schema definition of interest is
DeployCWL
which represents one of the content combintion that should be POST'd to/processes
to deploy using CWL in OGC API - Processes - Part 2: Deploy, Replace, Update (DRU).We are also working on porting the
EchoProcess
to its CWL equivalent (https://github.com/crim-ca/weaver/blob/implement-example-process/weaver/processes/builtin/echo_process.cwl). This could be added as example as well.Relates to #319