This repository contains digital object definitions and a data processor (do-processor) to process conforming digital objects. The primary use case for this processor is to generate the Human Reference Atlas Knowledge Graph (hra-kg).
All that's required for execution using our Docker-based setup is Docker
with docker
available on command line. See the Docker website for details on installation based on your platform.
If you want to use a pre-built container run:
docker pull ghcr.io/hubmapconsortium/hra-do-processor:main
if you are not using a pre-built container run:
docker build . -t hra-do-processor
To run using a pre-built container, use the following command:
docker run --mount type=bind,source=./digital-objects,target=/digital-objects --mount type=bind,source=./dist,target=/dist -t ghcr.io/hubmapconsortium/hra-do-processor:main help
If you are using a locally built container, replace ghcr.io/hubmapconsortium/hra-do-processor:main
with hra-do-processor:latest
.
Replace ./digital-objects
and ./dist
with paths to your own digital-objects
and dist
folders if not the same.
You can also use the docker-compose.yaml file as an example if you want to run it
docker compose run do-processor help
You will need to install a Python 3 and Docker (or other container system supported by CWL) and a cwl runner. The default cwl runner can be installed with Python 3's pip module like so:
python3 -m pip cwltool cwl-runner
For running with a pre-built container (no git checkout required), use this command:
cwl-runner https://raw.githubusercontent.com/hubmapconsortium/hra-do-processor/main/do-processor.cwl example-cwl-job.yaml
For running with a local container, use this command:
cwl-runner do-processor.local.cwl example-cwl-job.yaml
- python 3.x
- node.js 16+ (if not using the virtualenv, which has it installed)
- Java 11
First create a virtual environment by running:
./scripts/setup-environment.sh
After finishing, you can enter the virtual environment by running:
source .venv/bin/activate
While in the virtual environment (or if installed globally), use the do-processor
command.
$ do-processor --help
Usage: do-processor [options] [command]
Digital Object Processing Command-Line Interface
Options:
-V, --version output the version number
--base-iri <string> Base IRI for Digital Objects
--do-home <string> Digital Objects home directory
--processor-home <string> DO Processor home
--deployment-home <string> DO deployment home
--skip-validation Skip validation in each command (default: false)
--exclude-bad-values Do not pass invalid values from data processors (default: false)
--remove-individuals Remove OWL individuals (data instances) from the graph (default: false)
-h, --help display help for command
Commands:
normalize <digital-object-path> Mechanically normalizes a Digital Object from it's raw form. Minimally, it converts the source DO type +
integrates the metadata into a single linkml-compatible JSON file.
enrich <digital-object-path> Enriches a Normalized Digital Object, optionally pulling in data from other sources like Uberon, CL, Ubergraph,
or other external resources. Minimally, it converts the Normalized JSON file into RDF. Optionally enriches data
from the original form (ie add Metadata to nodes in the SVG or GLB files).
build [options] <digital-object-path> Given a Digital Object, checks for and runs normalization, enrichment, and packaging in one command.
deploy <digital-object-path> Deploys a given Digital Object to the deployment home (default ./site)
finalize [options] Finalize the deployment home before sending to the live server
list Lists all digital objects in the DO_HOME directory
help [command] display help for command
Each subcommand typically takes a path to the digital object and processes it according to the command's requirements. The digital object path usually looks like ${DO_TYPE}/${DO_NAME}/${DO_VERSION}
. For example, the ASCT+B Table for Kidney, v1.2 looks like this: asct-b/kidney/v1.2
.
normalize - Mechanically normalizes a Digital Object from it's raw form. Minimally, it converts the source DO type + integrates the metadata into a single linkml-compatible JSON file.
enrich - Enriches a Normalized Digital Object, optionally pulling in data from other source like Uberon, CL, Ubergraph, or any other external resource. Minimally, it converts the Normalized JSON file into RDF. Optionally enriches data from the original form (ie add Metadata to nodes in the SVG or GLB files).
build - Given a Digital Object, checks for and runs normalization, enrichment, and packaging in one command.
finalize - Finalizes the deployment home before sending to the live server
list - Lists digital object information in the DO_HOME directory