gut microbiome study of multiple diseases (GM_common_diseases)
A population-scale meta-analysis of 36 gut microbiome studies reveals universal species signatures for common diseases
Numerous studies have implicated the gut microbiome in human diseases and provided highly variable results, yet it is still a fundamental question on integrating and comparing the microbial signatures among different diseases. In this study, we profiled the gut microbiomes of 6,314 available fecal metagenomes from 36 case-control studies that spanned 28 disease or unhealthy statuses, using a unified reference-based pipeline. We found that the gut microbial diversity reduces in multiple diseases but also increases in a few of them, and revealed that a majority of the investigated diseases are associated with profound alterations in overall microbial communities. Cross-study meta-analysis identified 260 gut species signatures that may be prevalent in disease status. These signatures may be related to the enrichment of opportunistic pathogens in some common diseases and the depletion of beneficial microorganisms such as the producers of short-chain fatty acids. Subsequently, we established a random forest classifier based on the relative abundances of differential species. This classifier achieved a performance of area under the receiver operating characteristic curve (AUC) of 0.767 (95% [confidence interval], 0.755-0.779), with the overall accuracy of 70.1% (95% CI, 68.7%-71.9%), in classifying cases and controls across all investigated dataset. In addition, it achieved an AUC of 0.785 (95% CI, 0.772-0.798) and accuracy of 70.8% (95% CI, 69.7%-73.5%) in distinguishing the patients with high-risk diseases from the controls. The reliability of the classifier was also validated by analyzing fecal metagenomes from newly recruited independent cohorts. Our results facilitated addressing the full picture of the associations among the gut microbiome and common diseases, which will guide further mechanistic, therapeutics, and disease-specific studies.
Contains the corresponding code and data.
Contains the code used for processing data.
Contains the MetaPhlan4
abundance table for all samples, as well as the sample grouping sample.group. When using sample.group
, you only need to use the columns c("Sample","Group","Project","Project_1")
. These columns represent the sample name
, sample group (Control or Disease)
, project classification (author + year + disease type)
, and project source (author + year)
, respectively.