Skip to content

Translate Bioconductor resources into AI/ML-ready format + Manual curation/harmonization of the metadata for improved usability

Notifications You must be signed in to change notification settings

waldronlab/OmicsMLRepoData

Repository files navigation

OmicsMLRepo project

Clinical and epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with Omics data is crucial to increase the algorithm’s predictive ability. However, the nature of non-Omics data, such as heterogeneity, lack of standardization, high complexity, and loose links to Omics data types, make it hard to use both Omics and non-Omics data for ML analyses.

OmicsMLRepo project aims to build the first large-scale, platform-independent, curated, ML-ready data repository for diverse Omics and associated non-Omics data, starting from two Bioconductor data packages - curatedMetagenomicData containing human microbiome data and cBioPortalData package on cancer genomics data.

This repository, OmicsMLRepoData, documents the hamonization/curation processes and the artifacts generated throughout. We are also developing a software package, OmicsMLRepoR, allowing users to leverage ontology in metadata search.

In summary, the OmicsMLRepo project simplifies the process of cross-study, multi-faceted data analyses through metadata harmonization and standardization, making Omics data more AI/ML-ready.


Hamonized metadata

You can access the harmonized version of metadata using the OmicsMLRepoR::getMetadata function:

if (!require("devtools"))
    install.packages("devtools")
devtools::install_github("shbrief/OmicsMLRepoR")

library(OmicsMLRepoR)
cmd <- getMetadata("cMD")
cbio <- getMetadata("cBioPortal")

About

Translate Bioconductor resources into AI/ML-ready format + Manual curation/harmonization of the metadata for improved usability

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages