-
Notifications
You must be signed in to change notification settings - Fork 12
Home
Motivation
A significant promise of electronic health records (EHRs) lies in the ability to perform large-scale investigations of mechanistic drivers of complex diseases. Despite significant progress in biomarker discovery, this promise remains largely aspirational due to its disconnectedness from biomedical knowledge (PMID:32335224
, PMID:30304648
). Linking molecular data to clinical data stored in EHR data will support biologically meaningful analysis of that data, and can be achieved by integrating knowledge about biology and pathophysiology from multiple ontologies. Similar to clinical terminologies, computational ontologies are classification systems that provide detailed representations of a specific domain of knowledge consisting of a set of concepts and logically defined relationships. Unlike most clinical terminologies, ontologies are computable and interoperable, which means they can be logically verified using description logics and easily integrated with other ontologies and non-ontological data including data from basic science and clinical research (PMID:30304648
).
The usefulness of normalizing (i.e. mapping or annotating) clinical data to ontologies, like those in the Open Biomedical Ontology (OBO) Foundry, has been recognized as a fundamental need for the future of deep phenotyping (PMID:32335224
). Existing work has largely focused on using ontologies to improve phenotyping in specific diseases (i.e. infectious PMID:31160594
and rare diseases PMID:31231902
) and for the enhancement of specific biological and clinical domains (e.g. laboratory tests PMID:31119199
and diagnoses PMID:29295235
).
Prior work has been largely limited to one-to-one mappings (e.g. mapping a single clinical term to a single ontology concept) and rarely includes external validation. Unfortunately, learning algorithms are not yet able to capture the complex clinical and biological semantics underlying these concepts and their relationships. Until a comprehensive, robust resource that includes mappings between multiple clinical domains and biomedical ontologies is created and validated, automatic generation of inference between patient-level clinical observations and biological knowledge will not be possible.
Objective
We have developed OMOP2OBO
, the first health system-wide integration and alignment between the Observational Health Data Sciences and Informatics' Observational Medical Outcomes Partnership (OMOP) standardized clinical terminologies and eight OBO biomedical ontologies spanning diseases, phenotypes, anatomical entities, cell types, organisms, chemicals, metabolites, hormones, vaccines, and proteins.
To verify that the mappings are both clinically and biologically meaningful, we have performed extensive experiments to verify the accuracy, generalizability, and logical consistency of each released mapping set.
Through this repository we provide the following:
- Open source (coming soon)
omop2obo
mappings that can be used out of the box (requires no coding) for 92,367 OMOP Conditions, 8,615 Drug Exposure ingredients, and 3,827 Measurements (10,673 measurement test results). These mappings are available in several formats including:.txt
,.xlsx
, and.dump
. - A semantic representation or ontologized version of the mappings, integrated with the OBO biomedical ontologies, available as an edge list (
.txt
) and as an.owl
file. See current release for more details. - An algorithm and mapping pipeline that enables one to construct their set of
omop2obo
mappings. The figure below provides a high-level overview of the algorithm workflow. The code provided in this repository facilitates all of the automatic steps shown in this figure except for the manual mapping (for now, although we are currently working on a deep learning model to address this).
If you would like to explore or use the mappings please see our dashboard, which includes mapping statistics, interactive plots and tables, and access to links to download the latest release.
- We submitted a podium abstract to the 2021 Virtual Joint Summits of the American Medical Informatics Association
- We presented preliminary results from OMOP2OBO at the 2020 OHDSI Symposium
- We presented the results of our validation for mapping LOINC lab result to the Human Phenotype Ontology group.
- Vasilevsky N, Zhang A, Yates A et al. LOINC2HPO: Curation of Phenotype Data from the Electronic Health Records using the Human Phenotype Ontology [version 1; not peer reviewed]. F1000Research 2019, 8:383 (slides) (doi: 10.7490/f1000research.1116524.1)
- Vasilevsky N, Zhang A, Gourdine J et al. LOINC2HPO: Curation of Phenotype Data from the Electronic Health Records using the Human Phenotype Ontology [version 1; not peer reviewed]. F1000Research 2019, 8:382 (poster) (doi: 10.7490/f1000research.1116517.1)
We are always looking for ways to make this resource useful for the community. If you have ideas or suggestions, we’d love to hear from you! To get in touch with us, please create an issue or send us an email 💌