BioHack_Haplotype

Team members: Hongsheng Lai, Shijie Tang, Xirui Liu, Zhiwen Bian, Hairuo Wang

Advisor: Ben Busby

Project Description GitHub

Google document: https://docs.google.com/document/d/1khYK22Mcprc-Kxlw6h2K9sU7ytjht80P/edit

GitHub: https://github.com/collaborativebioinformatics/Haploblock_Clusters?tab=readme-ov-file

Project GitHub

https://github.com/ShijieTang/BioHack_Haplotype

Background

Over recent decades, methods such as linkage mapping and genome-wide association studies (GWAS) have been developed to identify genetic loci associated with traits. GWAS has become a major research endeavor, with millions of participants and the discovery of hundreds of thousands of trait-associated genetic variants. However, GWAS struggles to account for the evolutionary history and interaction between genetic variants. To address this, we are using ancestral recombination graphs (ARGs), which capture shared ancestry through local trees, to improve quantitative trait locus (QTL) mapping. This approach leverages the genealogical structures within genomes to identify genetic variants related to specific traits, especially in cases of allelic heterogeneity or in understudied populations.

Goal

Teams will select two sections (haploblocks) of chromosome 6 – one including HLA, and one not, as large as possible and not crossed by known recombination hotspots (~50-100 kbp). Teams will compete to generate with the most computationally efficient approach to create a similarity matrix of these two haploblocks for 300 individuals or some representative subpopulation. Ideally, this approach will take into account both copies of each haploblock. One example approach may be to take phased data from beagle or shapeit(x) and then feed it into an ancestral calculator such as relate or beagle. Another may be to use some approximate alignment techniques to generate trees/matrices. It is up to individual teams to determine how many rare variants they take into account.

EGRM Repo

https://github.com/Ephraim-usc/egrm

Definitions

A haplotype is a physical grouping of genomic variants (or polymorphisms) that tend to be inherited together. A specific haplotype typically reflects a unique combination of variants that reside near each other on a chromosome. ref: https://www.genome.gov/genetics-glossary/haplotype
VCF file: https://en.wikipedia.org/wiki/Variant_Call_Format
GENESIS: https://www.bioconductor.org/packages/release/bioc/html/GENESIS.html
KING: https://www.kingrelatedness.com/

GENESIS Pipeline:

https://www.bioconductor.org/packages/release/bioc/vignettes/GENESIS/inst/doc/assoc_test_seq.html https://bioconductor.org/packages/devel/bioc/vignettes/GENESIS/inst/doc/assoc_test.html

Dataset

Data | 1000 Genomes: https://www.internationalgenome.org/data

British in England and Scotland (GBR): https://www.internationalgenome.org/data-portal/population/GBR

Puerto Rican in Puerto Rico (PUR): https://www.internationalgenome.org/data-portal/population/PUR

Chinese Dai in Xishuangbanna, China (CDX): https://www.internationalgenome.org/data-portal/population/CDX

Reference

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10716520/pdf/main.pdf

Link, V., et al., Tree-based QTL mapping with expected local genetic relatedness matrices. The American Journal of Human Genetics, 2023. 110(12): p. 2077-2091.
Gogarten, S.M., et al., Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics, 2019. 35(24): p. 5346-5348.
Conomos M.P., Miller M., & Thornton T. (2015). Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness. Genetic Epidemiology, 39(4), 276-293.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Figures		Figures
GDS		GDS
VCF		VCF
.DS_Store		.DS_Store
GRM-hlaa.jpg		GRM-hlaa.jpg
GRM-hlaa.pdf		GRM-hlaa.pdf
LICENSE		LICENSE
README.md		README.md
Rplot.png		Rplot.png
VCF2GRM.R		VCF2GRM.R
grm.hlaa.RData		grm.hlaa.RData
kingMat.hlaa.Rdata		kingMat.hlaa.Rdata
pcair.hlaa.csv		pcair.hlaa.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioHack_Haplotype

Project Description GitHub

Project GitHub

Background

Goal

EGRM Repo

Definitions

GENESIS Pipeline:

Dataset

Reference

About

Releases

Packages

Languages

License

ShijieTang/BioHack_Haplotype

Folders and files

Latest commit

History

Repository files navigation

BioHack_Haplotype

Project Description GitHub

Project GitHub

Background

Goal

EGRM Repo

Definitions

GENESIS Pipeline:

Dataset

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages