Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential metadata resources to improve data labelling #9

Open
taylorreiter opened this issue Mar 14, 2022 · 1 comment
Open

potential metadata resources to improve data labelling #9

taylorreiter opened this issue Mar 14, 2022 · 1 comment

Comments

@taylorreiter
Copy link
Member

taylorreiter commented Mar 14, 2022

Curated

  • HumanMetagenomeDB
    • Description: HumanMetagenomeDB version 1.0 contains metadata of 69 822 metagenomes. We standardized 203 attributes, based on standardized ontologies, describing host characteristics (e.g. sex, age and body mass index), diagnosis information (e.g. cancer, Crohn's disease and Parkinson), location (e.g. country, longitude and latitude), sampling site (e.g. gut, lung and skin) and sequencing attributes (e.g. sequencing platform, average length and sequence quality). Further, HumanMetagenomeDB version 1.0 metagenomes encompass 58 countries, 9 main sample sites (i.e. body parts), 58 diagnoses and multiple ages, ranging from just born to 91 years old. The HumanMetagenomeDB is publicly available at https://webapp.ufz.de/hmgdb/.
    • paper: https://academic.oup.com/nar/article/49/D1/D743/5998395?login=true
    • doi: 10.1093/nar/gkaa1031
  • TerrestrialMetagenomeDB
  • Planet Microbe
    • https://www.planetmicrobe.org/
    • Description: Here, we present Planet Microbe, a web-based portal for the open sharing and discovery of historical and ongoing oceanographic sequencing efforts. Planet Microbe integrates historical oceanographic ‘omics datasets (Hawaii Ocean Time-series (HOT) [17–21], Bermuda Atlantic Time-series (BATS) [22], Global Ocean Sampling Expedition (GOS) [23], Amazon continuum dataset (ANACONDAS) [24],[25] and Center for Dark Energy Biosphere Investigations (C-DEBI) [26]) along with datasets from large-scale ocean expeditions such as the TARA Oceans [27] and Arctic Expeditions [28] and Ocean Sampling Day (OSD) [29]. In Planet Microbe, these ‘omics data have been reintegrated with their in-situ environmental contextual data, including biological and physicochemical measurements, and information about sampling events, and sampling stations. Finally, cruise tracks, protocols, and instrumentation are also linked to these datasets to provide users with a comprehensive view of the metadata.
    • paper: https://academic.oup.com/nar/article/49/D1/D792/5879428?login=true
    • doi: 10.1093/nar/gkaa637
  • AncientMetagenomeDir
    • https://zenodo.org/record/5547234#.YjSfprhlD0o
    • Description: Ancient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has risen dramatically in recent years, and tracking this data for reuse is particularly important for large-scale ecological and evolutionary studies of individual taxa and communities of both microbes and eukaryotes. AncientMetagenomeDir (archived at https://doi.org/10.5281/zenodo.3980833) is a collection of annotated metagenomic sample lists derived from published studies that provide basic, standardised metadata and accession numbers to allow rapid data retrieval from online repositories. These tables are community-curated and span multiple sub-disciplines to ensure adequate breadth and consensus in metadata definitions, as well as longevity of the database.
    • Paper: Community-curated and standardised metadata of published ancient metagenomic samples with AncientMetagenomeDir
    • doi: 10.1038/s41597-021-00816-y
  • EBI metagenomes/biomes

Learned

Metadata Retrieval

@taylorreiter
Copy link
Member Author

taylorreiter commented Mar 14, 2022

looks like consortia-specific metadata might still be more info rich

e.g. I just went to HumanMetagenomeDB and downloaded the data. I filtered to samples labelled as haven't crohn's disease. The information was incomplete, e.g samples aren't labelled as having come from the same individual, and antibiotic information is not natively included [not that it even is in the ihmp metadata sheet...I had to go look at other samples like serology that were taken at the same time as the metagenome samples to figure out what antibiotic the patient was on when they have an mgx sample]

BUT, it looks like disease, study, sample type, etc. have a good amount of metadata. at the very least, i think any SRA id that is in HumanMetagenomeDB is probably from human, so might be a good and easy cross check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant