Systematic comparison of modern variant callers

Companion repository for the human variant calling pipelines' benchmark.

Repository contains the following analysis data and scripts:

pipelines folder contains all scripts used to run read aligners and variant callers

eval_happy.sh script was used to run the final benchmarking. eval_happy_42.sh is an analagous script used to evaluate the calls against the newer GIAB v.4.2 reference.

gather_stats.sh was used to gather per-sample hap.py stats into one file.

benchmarking_analysis.R script was used to run all statistical analyses described in the paper.

concordance folder contains scripts used in the analysis of variant call concordance between callers. RTGtools vcfeval was used for all GIAB samples to create a VCF file with TP, FP, and FN calls prior to all downstream analyses.

CustomRegions folder contains stratification BED files that were generated by the authors.

master.bed and master_42.bed files correspond to common regions in GIAB v. 3.3 and GIAB v. 4.2 reference releases.

IMPORTANT! Main files with the statistics generated by hap.py are available through DropBox:

https://www.dropbox.com/s/jsm6o6dh39oeowj/all_stats_v4_by_strata_rework.tsv.gz - GIAB v. 4.2
https://www.dropbox.com/s/pr7beov3xzsc4w3/all_stats_by_strata_rework.tsv.gz - GIAB v. 3.3

Other data files in this repository correspond to hybrid selection metrics used to prepare Figure 1 and similar Supplementary figures, variant counts (used in Table 1 and Table 4), results of the concordance analysis (i.e., the final concordance matrix for all GIAB and non-GIAB samples),and coverage model data from Barbitoff et al., 2020.

Citation: Barbitoff Y.A., Abasov R., Tvorogova V.E., Glotov A.S., Predeus A.V. (2022) Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery. BMC Genomics.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CustomRegions		CustomRegions
concordance		concordance
pipelines		pipelines
All_HS.metrics		All_HS.metrics
GIAB_CDS_GIABhiconf_nodownsample.metrics		GIAB_CDS_GIABhiconf_nodownsample.metrics
LICENSE		LICENSE
README.md		README.md
all_concordance.tsv		all_concordance.tsv
all_exo_q10_mf.tsv		all_exo_q10_mf.tsv
benchmark_analysis.R		benchmark_analysis.R
eval_happy.sh		eval_happy.sh
eval_happy_42.sh		eval_happy_42.sh
gather_stats.sh		gather_stats.sh
master.bed		master.bed
master_42.bed		master_42.bed
nongiab_v4_variant_counts.tsv		nongiab_v4_variant_counts.tsv
v4_variant_counts.tsv		v4_variant_counts.tsv
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Systematic comparison of modern variant callers

About

Releases

Packages

Contributors 2

Languages

License

bioinf/caller_benchmark

Folders and files

Latest commit

History

Repository files navigation

Systematic comparison of modern variant callers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages