Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add linkml validation to unit tests where nmdc-schema database records are made #270

Closed
aclum opened this issue Oct 10, 2024 · 3 comments
Closed
Assignees

Comments

@aclum
Copy link
Contributor

aclum commented Oct 10, 2024

See documentation on how to do this here

We discovered this week that even when using the python classes it is possible that the unit tests make records which would not be accepted by runtime, see #267 for details. Using linkml validation is a way to check these records w/o adding additional runtime API dependencies to the unit tests.

Alternatives considered: use json:validate endpoint to validate records.

@mbthornton-lbl
Copy link
Contributor

Validation vs. the NMDC schema is implemented in watch_nmdc.py

import yaml
import linkml.validator
import importlib.resources
from functools import lru_cache

# import the materialized schema - schema version defined in pyproject.toml - and cache it
@lru_cache(maxsize=None)
def _get_nmdc_materialized():
    with importlib.resources.open_text("nmdc_schema", "nmdc_materialized_patterns.yaml") as f:
        return yaml.safe_load(f)

# validation of the nmdc.database before posting to the API:
job_dict = yaml.safe_load(yaml_dumper.dumps(job_database))
            # validate the database object against the schema
            validation_report = linkml.validator.validate(
                job_dict, self.nmdc_materialized, "Database"
            )
            if validation_report.results:
                logger.error(f"Validation error: {validation_report.results[0].message}")
                logger.error(f"job_dict: {job_dict}")
                continue
            else:
                logger.info(f"Database object validated for job {job.opid}")

@mbthornton-lbl
Copy link
Contributor

Validation in run_import.py - basically the same thing:

# validate the database
        logger.info("Validating imported data")
        db_dict = yaml.safe_load(yaml_dumper.dumps(db))
        validation_report = linkml.validator.validate(db_dict, nmdc_materialized)
        if validation_report.results:
            logger.error(f"Validation Failed")
            for result in validation_report.results:
                logger.error(result.message)
            raise Exception("Validation Failed")
        else:
            logger.info("Validation Passed")

@mbthornton-lbl
Copy link
Contributor

Same basic logic is used 1 unit test

  • test_imports.test_gold_mapper_map_workflow_executions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

2 participants