IMPORTANT: This repository is no longer maintained. In addition to running a TAD_Pathways analysis for Bone Mineral Density GWAS, this analysis pipeline downloads genomic data and explores distributions across TADs. A more streamlined analysis pipeline that does not require data preprocessing is available at https://github.com/greenelab/tad_pathways_pipeline
NOTE: Several files built from this pipeline are used in the above repository including the TAD based gene index and the hg19 converted NHGRI-EBI GWAS Catalog.
Gregory P. Way and Casey S. Greene 2016
The repository contains methods for manipulating, observing, and visualizing topologically associating domains (TADs) in the context of SNPs, genes, and repeat elements for human (hg19) and mouse (mm9) genomes.
The repository also proposes methods and tools for the incorporation of TAD domains into the prioritization of GWAS signals through the investigation of publicly available GWAS data. We introduce TAD pathways as a method to identify the likely causal genes from GWAS independent of distance to sentinel SNP.
A preprint of our method is available here on bioRxiv
For all questions and bug reporting please file a GitHub issue
For all other questions contact Casey Greene at [email protected] or Struan Grant at [email protected]
There are two ways to implement a TAD_Pathways analysis:
- Disease/Trait Specific - Uses GWAS identified SNPs
- Custom - Uses custom SNP list
Curates the GWAS catalog and TAD boundaries to visualize TADs and generate TAD based gene lists. This will also perform a TAD pathways analysis for Bone Mineral Density GWAS. This will reproduce the analysis and figures used in the paper.
# Using python dependencies
conda env create --quiet --force --file environment.yml
source activate tad_pathways
bash scripts/run_pipeline.sh
This will download data, perform analyses, and output several genomic figures. The command will also output TAD based genes for 299 different GWAS traits. Our TAD_Pathways method can be applied directly using these gene lists.
TAD_Pathways is customizable and allows a user to prespecify any SNP list of interest to test TAD based pathway associations. To perform a custom analysis create a comma separated file where the first row of each column names the list of snps below in subsequent rows.
E.g.: custom_example.csv
Group 1 | Group 2 |
---|---|
rs12345 | rs67891 |
rs19876 | rs54321 |
Then, perform the following steps:
# Extract locations for SNP list
Rscript --vanilla scripts/tad_util/build_snp_list.R \
--snp_file "custom_example.csv" \
--output_file "mapped_results.tsv"
# Build TAD based genelists for each group
python scripts/build_custom_TAD_genelist.py \
--snp_data_file "mapped_results.tsv" \
--output_file "custom_tad_genelist.tsv"
# The output file is then ready for the manual "TAD_Pathways" steps below
As a case study to demonstrate the utility of a TAD based approach, input the TAD based gene list for the Bone Mineral Density (1,297 genes) into a pathway analysis:
Next, run a WebGestalt pathway analysis on the gene list.
Parameter | Input |
---|---|
Select gene ID type | hsapiens__gene_symbol |
Enrichment Analysis | GO Analysis |
GO Slim Classification | Yes |
Reference Set | hsapiens__genome |
Statistical Method | Hypergeometric |
Multiple Test Adjustment | BH |
Significance Level | Top10 |
Minimum Number of Genes for a Category | 4 |
Note - The output of scripts/run_pipeline.sh
in data/TAD_based_genes/ for
all traits is ready for TAD Pathway Analysis.
After performing the WebGestalt analysis, click Export TSV Only
and save the
file in data/gestalt/<TRAIT>_gestalt.tsv
where <TRAIT>
is "BMD" for the
example.
- GWAS Catalog (2016-02-25)
- eQTL (2016-05-09) eQTL Browser
- Richards et al. 2008 Lancet
- Rivadeneira et al. 2009 Nature Genetics
- Estrada et al. 2012 Nature Genetics
- Styrkarsdottir et al. 2013 Nature
- Analysis ID (All)
- Association Test Significance Filters (p-value 1 x 10^-1)
- Phenotype Traits (Bone Mineral Density)
All analyses were performed in the Anaconda python distribution (3.5.1). For
specific package versions please refer to environment.yml
. R version 3.3.0 was
used for visualization. For more specific environment dependencies refer to our
accompanying docker file at docker/Dockerfile