Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial CWL JSON schema definition #329

Closed
wants to merge 43 commits into from

Conversation

fmigneault
Copy link
Contributor

@fmigneault fmigneault commented May 31, 2023

Add a "minimalistic" (as not fully feature complete, but relatively high coverage of most common definitions) CWL schema using JSON schema representation.

The main schema definition of interest is DeployCWL which represents one of the content combintion that should be POST'd to /processes to deploy using CWL in OGC API - Processes - Part 2: Deploy, Replace, Update (DRU).

We are also working on porting the EchoProcess to its CWL equivalent (https://github.com/crim-ca/weaver/blob/implement-example-process/weaver/processes/builtin/echo_process.cwl). This could be added as example as well.

Relates to #319

@fmigneault fmigneault mentioned this pull request May 31, 2023
description: |
Hint indicating that the Application Package corresponds to an
OGC API - Processes provider that should be remotely executed and monitored
by this instance. (note: can only be an 'hint' as it is unofficial CWL specification).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the process is not otherwise executable without making use of the OGCAPIRequirement then it should be under requirements not hints. Non-standard extensions are allowed in both fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.
I think it is implementation dependent whether this would be required or optional on the target platform where the process is being deployed. However, I would like to make this definition proposal somewhat established such that different implementators could be guided toward a somewhat interoperable reference.

Copy link
Contributor

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @fmigneault ; this looks like a lot of work!

A bonus would be to try to load each of the CWL v1.2 conformance tests (except those with should_fail: true) with this schema to ensure that there are no obvious things missing. Either as a CI process or a one-time test.

https://github.com/common-workflow-language/cwl-v1.2/blob/1.2.1_proposed/conformance_tests.yaml

@fmigneault
Copy link
Contributor Author

fmigneault commented Jun 1, 2023

@mr-c

Coverage so far:

================ 188 failed, 185 passed, 373 warnings in 28.33s ================
import logging
import os
import posixpath
from functools import cache
from typing import Dict, List, Optional, TypeAlias, TypedDict, TypeGuard, Union
from typing_extensions import NotRequired
from urllib.parse import urljoin, urlparse

import jsonschema
import requests
import pytest
import yaml
from jsonschema.exceptions import ValidationError
from yaml.scanner import ScannerError

CONFORMANCE_TESTS_FILE = "https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/1.2.1_proposed/conformance_tests.yaml"
CWL_JSON_SCHEMA_FILE = "https://raw.githubusercontent.com/crim-ca/ogcapi-processes/cwl-schema/openapi/schemas/processes-dru/cwl.yaml#$definitions/CWL"

LOGGER = logging.getLogger(__name__)

# pylint: disable=C0103,invalid-name
Number = Union[int, float]
ValueType = Union[str, Number, bool]
AnyValueType = Optional[ValueType]
# add more levels of explicit definitions than necessary to simulate JSON recursive structure better than 'Any'
# amount of repeated equivalent definition makes typing analysis 'work well enough' for most use cases
_JSON: TypeAlias = "JSON"
_JsonObjectItemAlias: TypeAlias = "_JsonObjectItem"
_JsonListItemAlias: TypeAlias = "_JsonListItem"
_JsonObjectItem = Dict[str, Union[AnyValueType, _JSON, _JsonObjectItemAlias, _JsonListItemAlias]]
_JsonListItem = List[Union[AnyValueType, _JSON, _JsonObjectItem, _JsonListItemAlias]]
_JsonItem = Union[AnyValueType, _JSON, _JsonObjectItem, _JsonListItem]
JSON = Union[Dict[str, Union[_JSON, _JsonItem]], List[Union[_JSON, _JsonItem]], AnyValueType]

ConformanceTestDef = TypedDict(
    "ConformanceTestDef",
    {
        "id": str,
        "doc": str,
        "tags": List[str],
        "tool": str,
        "job": NotRequired[str],  # not used, for running the actual CWL
        "output": JSON,  # not used, output of CWL execution
        "should_fail": NotRequired[bool],  # indicates failure as "execute failing", but still a valid CWL
    }
)


@cache
def is_remote_file(file_location: str) -> TypeGuard[str]:
    """
    Parses to file location to figure out if it is remotely available or a local path.
    """
    cwl_file_path_or_url = file_location.replace("file://", "")
    scheme = urlparse(cwl_file_path_or_url).scheme
    return scheme != "" and not posixpath.ismount(f"{scheme}:")  # windows partition


@cache
def load_file(file_path: str, text: bool = False) -> Union[JSON, str]:
    """
    Load :term:`JSON` or :term:`YAML` file contents from local path or remote URL.

    If URL, get the content and validate it by loading, otherwise load file directly.

    :param file_path: Local path or URL endpoint where file to load is located.
    :param text: load contents as plain text rather than parsing it from :term:`JSON`/:term:`YAML`.
    :returns: loaded contents either parsed and converted to Python objects or as plain text.
    :raises ValueError: if YAML or JSON cannot be parsed or loaded from location.
    """
    try:
        if is_remote_file(file_path):
            headers = {"Accept": "text/plain"}
            resp = requests.get(file_path, headers=headers)
            if resp.status_code != 200:
                raise ValueError("Loading error: [%s]", file_path)
            return resp.content if text else yaml.safe_load(resp.content)
        with open(file_path, mode="r", encoding="utf-8") as f:
            return f.read() if text else yaml.safe_load(f)
    except OSError as exc:
        LOGGER.debug("Loading error: %s", exc, exc_info=exc)
        raise
    except ScannerError as exc:  # pragma: no cover
        LOGGER.debug("Parsing error: %s", exc, exc_info=exc)
        raise ValueError("Failed parsing file content as JSON or YAML.")


def load_conformance_tests(test_file: str) -> List[ConformanceTestDef]:
    conformance_tests = load_file(test_file)
    assert isinstance(conformance_tests, list)
    assert all(isinstance(conf_test, dict) for conf_test in conformance_tests)
    test_base = os.path.dirname(test_file)
    all_conf_test = []
    for conf_test in conformance_tests:
        if "$import" in conf_test:
            conf_path = urljoin(f"{test_base}/", conf_test["$import"])
            conf_test = load_conformance_tests(conf_path)
            all_conf_test.extend(conf_test)
            continue
        for ref in ["job", "tool"]:
            if ref not in conf_test:
                continue
            if not urlparse(conf_test[ref]).scheme:
                conf_test[ref] = urljoin(f"{test_base}/", conf_test[ref])
        all_conf_test.append(conf_test)
    return all_conf_test


@pytest.mark.parametrize(
    ["schema_file", "conformance_test"],
    [
        (CWL_JSON_SCHEMA_FILE, conf_test)
        for conf_test in load_conformance_tests(CONFORMANCE_TESTS_FILE)
    ]
)
def test_conformance(schema_file: str, conformance_test: ConformanceTestDef) -> None:
    instance_file = conformance_test["tool"]
    instance = load_file(instance_file)
    schema_path = []
    schema_ref = ""
    if "#" in schema_file:
        schema_file, schema_ref = schema_file.split("#", 1)
        schema_path = [ref for ref in schema_ref.split("/") if ref]
        schema_ref = f"#{schema_ref}"
    schema_base = schema = load_file(schema_file)
    if schema_path:
        for part in schema_path:
            schema = schema[part]
    
    # ensure local schema can find relative $ref, since the provided reference can be a sub-schema (with "#/...")
    scheme_uri = f"file://{schema_file}" if schema_file.startswith("/") else schema_file
    validator = jsonschema.validators.validator_for(schema_base)
    validator.resolver = jsonschema.RefResolver(base_uri=scheme_uri, referrer=schema_base)
    validator(schema_base).validate(instance, schema)  # raises if invalid

Should this be made part of https://github.com/common-workflow-language/cwl-v1.2 directly?
Only items that would be defined here would be:

  • DeployCWL,
  • DeployOGCAppPackageCWL
  • CWLExecutionUnit
  • some form of extended requirements/hints for OGCAPIRequirement/WPS1Requirement/BuiltinRequirement (and their map variantion)

And those definitions would refer to the official CWL JSON schema in https://github.com/common-workflow-language/cwl-v1.2.

@mr-c
Copy link
Contributor

mr-c commented Jun 1, 2023

@mr-c

Coverage so far:

================ 188 failed, 185 passed, 373 warnings in 28.33s ================

You're so fast, thanks!
Are you working through the failures, or should I take a stab?

Should this be made part of https://github.com/common-workflow-language/cwl-v1.2 directly? Only items that would be defined here would be:

* `DeployCWL`,

* `DeployOGCAppPackageCWL`

* `CWLExecutionUnit`

* some form of extended `requirements`/`hints` for `OGCAPIRequirement`/`WPS1Requirement`/`BuiltinRequirement` (and their map variantion)

And those definitions would refer to the official CWL JSON schema in https://github.com/common-workflow-language/cwl-v1.2.

Yes, we would be happy to host OpenAPI schemas at https://w3id.org/cwl/v1.2 once the validation passes. I assume it is all hand-written? Maybe we can get an intern or junior dev to build an automated schema-salad to OpenAPI converter based upon your work; to keep it updated with future revisions of the CWL standards.

@fmigneault
Copy link
Contributor Author

fmigneault commented Jun 1, 2023

Are you working through the failures, or should I take a stab?

I'm about to start checking them.

I assume it is all hand-written?

It's generated from a subset of definitions from https://github.com/crim-ca/weaver/blob/master/weaver/wps_restapi/swagger_definitions.py + a few manual edits to patch some formatting. Many of the documented descriptions are copy-pasted from CWL docs with minor formatting edits. They could reuse a reference document.

@fmigneault
Copy link
Contributor Author

fmigneault commented Jun 2, 2023

Update as of a67e7ec
=========== 88 failed, 281 passed, 4 xfailed, 371 warnings in 57.31s ===========

I've not gone through all the remaining failing cases, but I saw a few that did not look really complicated.
Probably just missing a field here and there.

For the rest, items not yet supported are:

  • SchemaDefRequirement (xfailed cases)
  • SubWorkflowRequirement (any other technicality to validate than just nesting a Workflow as a step?)
  • Records inputs/outputs format (I've never used those, I will need to investigate further what they imply)

openapi/schemas/processes-dru/cwl.yaml Outdated Show resolved Hide resolved
openapi/schemas/processes-dru/cwl.yaml Outdated Show resolved Hide resolved
@mr-c
Copy link
Contributor

mr-c commented Jun 2, 2023

Update as of a67e7ec =========== 88 failed, 281 passed, 4 xfailed, 371 warnings in 57.31s ===========

Great progress, I'm impressed! I don't think your schema is "minimalistic" anymore, and that is a good thing!

I've not gone through all the remaining failing cases, but I saw a few that did not look really complicated. Probably just missing a field here and there.

👍

For the rest, items not yet supported are:

* `SchemaDefRequirement` (xfailed cases)

* `SubWorkflowRequirement` (any other technicality to validate than just nesting a `Workflow` as a step?)

Correct, it is just a flag that there might be a nested workflow.

* `Records` inputs/outputs format (I've never used those, I will need to investigate further what they imply)

For all three of these: you probably should just accept them and not validate them too deeply.

@fmigneault
Copy link
Contributor Author

@fmigneault
Copy link
Contributor Author

bc3063e
================= 5 failed, 369 passed, 374 warnings in 54.60s =================
Only remaining items are caused by:

This should now be fully suported CWL if those issues get addressed! 🚀

@fmigneault
Copy link
Contributor Author

@mr-c
Where should I migrate the CWL schema definition (repo and directory) and the python test function?

@mr-c
Copy link
Contributor

mr-c commented Jun 7, 2023

@mr-c Where should I migrate the CWL schema definition (repo and directory) and the python test function?

Send a PR to https://github.com/common-workflow-language/cwl-v1.2/ ; thanks!

@fmigneault
Copy link
Contributor Author

PR is now open: common-workflow-language/cwl-v1.2#256
Until merged, the actual file can be found here: https://github.com/fmigneault/cwl-v1.2/blob/cwl-json-schema/json-schema/cwl.yaml
The local CWL schema for OGC API - Processes is updated starting from 6844490 to use the (expected) remote reference.

@fmigneault
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants