Node normalization takes a CURIE, and returns:
- The preferred CURIE for this entity
- All other known equivalent identifiers for the entity
- Semantic types for the entity as defined by the Biolink Model
The data currently served by Node Normalization is created by the prototype project Babel, which attempts to find identifier equivalences, and makes sure that CURIE prefixes are BioLink Model compliant. The NodeNormalization service, however, is independent of Babel and as improved identifier equivalence tools are developed, their results can be easily incorporated.
To determine whether Node Normalization is likely to be useful, check /get_semantic_types, which lists the BioLink semantic types for which normalization has been attempted, and /get_curie_prefixes, which lists the number of times each prefix is used for a semantic type.
For examples of service usage, see the example notebook.
The Node normalization website leverages the R3 (Redis-REST with referencing) Redis data design and configuration.
Users can find the publicly available website at service.
Create a virtual environment
python -m venv nodeNormalization-env
Activate the virtual environment
source nodeNormalization-env/bin/activate
Install requirements
> pip install -r requirements.txt
The equivalence data can be generated by running Babel. An example of the contents of a compendia file is shown below:
{"id": {"identifier": "PUBCHEM:50986940"}, "equivalent_identifiers": [{"identifier": "PUBCHEM:50986940"}, {"identifier": "INCHIKEY:CYMOSKLLKPIPCD-UHFFFAOYSA-N"}], "type": ["chemical_substance", "named_thing", "biological_entity", "molecular_entity"]}
{"id": {"identifier": "CHEMBL.COMPOUND:CHEMBL1546789", "label": "CHEMBL1546789"}, "equivalent_identifiers": [{"identifier": "CHEMBL.COMPOUND:CHEMBL1546789", "label": "CHEMBL1546789"}, {"identifier": "PUBCHEM:4879549"}, {"identifier": "INCHIKEY:FUIYIXDZTPMQEH-UHFFFAOYSA-N"}], "type": ["chemical_substance", "named_thing", "biological_entity", "molecular_entity"]}
A running instance of Redis is needed to house the node normalization data. a Redis Docker container image can be downloaded from Docker hub. The Redis caonteriner can be started with thie following docker command:
docker run --name node-norm-redis -p 6379:6379 -d redis redis-server --appendonly yes
Note that the dataset for Node normalization is quite large and 256Gb of memory and disk space should be made available to the Redis instance to insure proper loading of the complete compendia.
Insure that the ./config.json
file is created and contains the parameters for the node normalization load specific to your environment.
The configuration parameters compendium_directory
and data_files
specify the location of the compendia files. An example of the files' contents
are listed below:
{
"compendium_directory": "<path to files>",
"data_files": "anatomy.txt,BiologicalProcess.txt,cell.txt,cellular_component.txt,disease.txt,gene_compendium.txt,gene_family_compendium.txt,MolecularActivity.txt,pathways.txt,phenotypes.txt,taxon_compendium.txt",
"redis_host": "<Redis host server name>",
"redis_port": <Redis connection port>,
"redis_password": "<Redis password",
"test_mode": 1,
"debug_messages": 0
}
The load.py script reads the configuration file for load parameters and the loads the compendia data into the Redis instance.
It is possible to observer the progress of the load opening a command line within the container and issuing Redis commands.
View the number of keys loaded so far.
redis-cli info keyspace
Once the database has completed loading it is recommended that the Redis database be persisted to disk.
redis-cli save
Monitor the database to determine if the save has completed.
redis-cli info persistence
The web server can be started after successful completion of the load.
cd <Node normalization code root>
pip install -r requirements.txt
uvicorn --host 0.0.0.0 --port 8000 --workers 1 node_normalizer.server:app
Then navigate to http://localhost:8000/docs to run the application
Much like the Redis Docker container noted above, a Docker container can also be created and executed to run the webserver.
cd <Node normalization code root>
docker build --tag <image_tag> .
Note the Dockerfile specifies port 6380 for the webservice container.
docker run --name Node-normalization -p 8000:6380 node-norm
Then navigate to: http://localhost:8000/docs to run the application
Kubernetes configurations and helm charts for this project can be found at:
https://github.com/helxplatform/translator-devops/helm/r3
NodeNorm can be configured by setting environmental variables:
SERVER_NAME
: The name of this server (defaults toinfores:sri-node-normalizer
)SERVER_ROOT
: The server root (defaults to/
)LOG_LEVEL
: The log level (defaults toERROR
)TRAPI_VERSION
: The TRAPI version this version of NodeNorm supports.MATURITY_VALUE
: How mature is this NameRes (defaults tomaturity
, e.g.development
)LOCATION_VALUE
: Where is this NameRes setup (defaults tolocation
, e.g.RENCI
)EQ_BATCH_SIZE
: The size of theget_eqids_and_types()
batch size (defaults to2500
)OTEL_ENABLED
: Turn on Open TELemetry (default:'false'
) -- only'true'
will turn this on.JAEGER_HOST
andJAEGER_PORT
: Hostname and port for the Jaegar instance to provide telemetry to.JAEGER_SERVICE_NAME
: The name of this service (defaults to the value ofSERVER_NAME
)