This repository has been archived by the owner on Jan 24, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding end-to-end tests hg19-chr22 (#61)
- Loading branch information
Showing
38 changed files
with
527 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
/tests/**/*.fa* filter=lfs diff=lfs merge=lfs -text | ||
/tests/**/*.ser filter=lfs diff=lfs merge=lfs -text | ||
/tests/**/*.vcf* filter=lfs diff=lfs merge=lfs -text |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
*.tmp | ||
|
||
chr22.* | ||
chr22_part.fa | ||
|
||
/data |
3 changes: 3 additions & 0 deletions
3
tests/hg19-chr22/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz
Git LFS file not shown
3 changes: 3 additions & 0 deletions
3
tests/hg19-chr22/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz.tbi
Git LFS file not shown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
genomebuild db_name release | ||
GRCh37 clinvar for-testing | ||
GRCh37 exac r1.0 | ||
GRCh37 gnomad_exomes r2.1.1 | ||
GRCh37 gnomad_genomes r2.1.1 | ||
GRCh37 hgmd_public for-testing | ||
GRCh37 thousand_genomes v3.20101123 | ||
GRCh37 varfish-annotator 0.26-SNAPSHOT | ||
GRCh37 varfish-annotator-db for-testing |
1 change: 1 addition & 0 deletions
1
tests/hg19-chr22/Case_1_index.delly2.feature-effects.tsv-expected
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
case_id set_id sv_uuid refseq_gene_id refseq_transcript_id refseq_transcript_coding refseq_effect ensembl_gene_id ensembl_transcript_id ensembl_transcript_coding ensembl_effect |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
release chromosome chromosome_no bin chromosome2 chromosome_no2 bin2 pe_orientation start end start_ci_left start_ci_right end_ci_left end_ci_right case_id set_id sv_uuid caller sv_type sv_sub_type info num_hom_alt num_hom_ref num_het num_hemi_alt num_hemi_ref genotype |
Git LFS file not shown
Git LFS file not shown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
genomebuild db_name release | ||
GRCh37 clinvar for-testing | ||
GRCh37 exac r1.0 | ||
GRCh37 gnomad_exomes r2.1.1 | ||
GRCh37 gnomad_genomes r2.1.1 | ||
GRCh37 hgmd_public for-testing | ||
GRCh37 thousand_genomes v3.20101123 | ||
GRCh37 varfish-annotator 0.26-SNAPSHOT | ||
GRCh37 varfish-annotator-db for-testing |
Large diffs are not rendered by default.
Oops, something went wrong.
Git LFS file not shown
Git LFS file not shown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
release chromosome start end bin reference alternative clinvar_version set_type variation_type symbols hgnc_ids vcv summary_clinvar_review_status_label summary_clinvar_pathogenicity_label summary_clinvar_pathogenicity summary_clinvar_gold_stars summary_paranoid_review_status_label summary_paranoid_pathogenicity_label summary_paranoid_pathogenicity summary_paranoid_gold_stars details |
Binary file not shown.
Git LFS file not shown
Git LFS file not shown
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
#!/usr/bin/bash | ||
|
||
set -euo pipefail | ||
set -x | ||
|
||
if [[ ! -e chr22.fa ]]; then | ||
wget -O chr22.fa.gz.tmp https://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/chr22.fa.gz | ||
zcat chr22.fa.gz.tmp >chr22.fa.tmp | ||
mv chr22.fa.tmp chr22.fa | ||
fi | ||
|
||
if [[ ! -e chr22_part.fa.fai ]]; then | ||
samtools faidx chr22.fa chr22:1-22,000,000 > chr22_part.fa.tmp | ||
perl -p -i -e 's/^>chr.*/>22/g' chr22_part.fa.tmp | ||
gzip -c chr22_part.fa.tmp >chr22_part.fa.gz | ||
mv chr22_part.fa.tmp chr22_part.fa | ||
samtools faidx chr22_part.fa | ||
fi | ||
|
||
# ADA2(hg19): 22:17,660,192-17,680,545 | ||
# GAB4(hg19): 22:17,442,827-17,489,112 | ||
|
||
if [[ ! -e hg19_refseq.ser ]]; then | ||
jannovar -Xmx4096m download -d hg19/ensembl --gene-ids ENSG00000093072 ENSG00000215568 # ADA2 GAB4 | ||
cp data/hg19_refseq.ser hg19_refseq.ser.tmp | ||
mv hg19_refseq.ser.tmp hg19_refseq.ser | ||
fi | ||
|
||
if [[ ! -e hg19_ensembl.ser ]]; then | ||
jannovar -Xmx4096m download -d hg19/ensembl --gene-ids 51816 128954 # ADA2 GAB4 | ||
cp data/hg19_ensembl.ser hg19_ensembl.ser.tmp | ||
mv hg19_ensembl.ser.tmp hg19_ensembl.ser | ||
fi | ||
|
||
REGIONS="22:17,660,192-17,680,545 22:17,442,827-17,489,112" | ||
BASEDIR=/fast/groups/cubi/work/projects/2021-07-20_varfish-db-downloader-holtgrewe/varfish-db-downloader | ||
|
||
( \ | ||
tabix --only-header $BASEDIR/GRCh37/ExAC/r1/download/ExAC.r1.sites.vep.vcf.gz $REGIONS; \ | ||
tabix $BASEDIR/GRCh37/ExAC/r1/download/ExAC.r1.sites.vep.vcf.gz $REGIONS \ | ||
| sort -k1,1V -k2,2n \ | ||
| uniq; \ | ||
) \ | ||
| bgzip -c \ | ||
> ExAC.r1.sites.vep.vcf.gz | ||
tabix -f ExAC.r1.sites.vep.vcf.gz | ||
|
||
( \ | ||
tabix --only-header $BASEDIR/GRCh37/gnomAD_exomes/r2.1.1/download/gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz $REGIONS; \ | ||
tabix $BASEDIR/GRCh37/gnomAD_exomes/r2.1.1/download/gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz $REGIONS \ | ||
| sort -k1,1V -k2,2n \ | ||
| uniq; \ | ||
) \ | ||
| bgzip -c \ | ||
> gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz | ||
tabix -f gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz | ||
|
||
( \ | ||
tabix --only-header $BASEDIR/GRCh37/gnomAD_genomes/r2.1.1/download/gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz $REGIONS; \ | ||
tabix $BASEDIR/GRCh37/gnomAD_genomes/r2.1.1/download/gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz $REGIONS \ | ||
| sort -k1,1V -k2,2n \ | ||
| uniq; \ | ||
) \ | ||
| bgzip -c \ | ||
> gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz | ||
tabix -f gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz | ||
|
||
( \ | ||
tabix --only-header $BASEDIR/GRCh37/thousand_genomes/phase3/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz; \ | ||
tabix $BASEDIR/GRCh37/thousand_genomes/phase3/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz $REGIONS \ | ||
| sort -k1,1V -k2,2n \ | ||
| uniq; \ | ||
) \ | ||
| bgzip -c \ | ||
> ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz | ||
tabix -f ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz | ||
|
||
(head -n 1 /tmp/Clinvar.tsv; tail -n +2 /tmp/Clinvar.tsv | sort -k2,2V -k3,3n -k4,4n) \ | ||
| bgzip -c >/tmp/Clinvar.tsv.gz | ||
tabix -S 1 -b 3 -e 4 -s 2 -f /tmp/Clinvar.tsv.gz | ||
head -n 1 /tmp/Clinvar.tsv >Clinvar.tsv.tmp | ||
tabix /tmp/Clinvar.tsv.gz $REGIONS \ | ||
>>Clinvar.tsv.tmp | ||
gzip Clinvar.tsv.tmp | ||
mv Clinvar.tsv.tmp.gz Clinvar.tsv.gz | ||
|
||
(head -n 1 /tmp/HgmdPublicLocus.tsv; tail -n +2 /tmp/HgmdPublicLocus.tsv | sort -k2,2V -k3,3n -k4,4n) \ | ||
| bgzip -c >/tmp/HgmdPublicLocus.tsv.gz | ||
tabix -S 1 -b 3 -e 4 -s 2 -f /tmp/HgmdPublicLocus.tsv.gz | ||
head -n 1 /tmp/HgmdPublicLocus.tsv >HgmdPublicLocus.tsv.tmp | ||
tabix /tmp/HgmdPublicLocus.tsv.gz $REGIONS \ | ||
>>HgmdPublicLocus.tsv.tmp | ||
gzip HgmdPublicLocus.tsv.tmp | ||
mv HgmdPublicLocus.tsv.tmp.gz HgmdPublicLocus.tsv.gz | ||
|
||
BASEDIR=/fast/groups/cubi/work/projects/2022-07-06_VarFish_Course_Data/snappy-processing/ | ||
VCF=$BASEDIR/variant_calling/output/bwa.gatk_hc.Case_1_index-N1-DNA1-WGS1/out/bwa.gatk_hc.Case_1_index-N1-DNA1-WGS1.vcf.gz | ||
|
||
( \ | ||
tabix --only-header $VCF $REGIONS; \ | ||
tabix $VCF $REGIONS \ | ||
| sort -k1,1V -k2,2n \ | ||
| uniq; \ | ||
) \ | ||
| bgzip -c \ | ||
> Case_1_index.gatk_hc.vcf.gz | ||
tabix -f Case_1_index.gatk_hc.vcf.gz | ||
|
||
VCF=$BASEDIR/wgs_sv_calling/output/bwa.delly2.Case_1_index-N1-DNA1-WGS1/out/bwa.delly2.Case_1_index-N1-DNA1-WGS1.vcf.gz | ||
|
||
( \ | ||
tabix --only-header $VCF $REGIONS; \ | ||
tabix $VCF $REGIONS \ | ||
| sort -k1,1V -k2,2n \ | ||
| uniq; \ | ||
) \ | ||
| bgzip -c \ | ||
> Case_1_index.delly2.vcf.gz | ||
tabix -f Case_1_index.delly2.vcf.gz |
Git LFS file not shown
Git LFS file not shown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
Table clinvar_var | ||
TABLE RELEASE CHROM COUNT | ||
clinvar_var GRCh37 22 124 | ||
Table hgmd_locus | ||
TABLE RELEASE CHROM COUNT | ||
hgmd_locus GRCh37 22 11 | ||
Table gnomad_exome_var | ||
TABLE RELEASE CHROM COUNT | ||
gnomad_exome_var GRCh37 22 1437 | ||
Table gnomad_genome_var | ||
TABLE RELEASE CHROM COUNT | ||
gnomad_genome_var GRCh37 22 6072 |
Git LFS file not shown
3 changes: 3 additions & 0 deletions
3
tests/hg19-chr22/gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz.tbi
Git LFS file not shown
Git LFS file not shown
3 changes: 3 additions & 0 deletions
3
tests/hg19-chr22/gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz.tbi
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
#!/usr/bin/bash | ||
|
||
set -euo pipefail | ||
set -x | ||
|
||
JAR=$(ls ../../varfish-annotator-cli/target/varfish-annotator-cli-*.jar | grep -v sources | tail -n 1) | ||
|
||
## step 0: help | ||
|
||
java -jar $JAR --help >/tmp/the-output | ||
test -s /tmp/the-output | ||
|
||
set +e | ||
java -jar $JAR > /tmp/the-output | ||
retcode=$? | ||
set -e | ||
test 1 -eq $retcode | ||
test -s /tmp/the-output | ||
|
||
## step 1: init-db | ||
|
||
java -jar $JAR init-db \ | ||
--release GRCh37 \ | ||
--db-release-info varfish-annotator:main \ | ||
--db-release-info varfish-annotator-db:for-testing \ | ||
--db-path /tmp/out \ | ||
\ | ||
--ref-path chr22_part.fa \ | ||
\ | ||
--db-release-info exac:r1.0 \ | ||
--exac-path ExAC.r1.sites.vep.vcf.gz \ | ||
\ | ||
--db-release-info thousand_genomes:v3.20101123 \ | ||
--thousand-genomes-path ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz \ | ||
\ | ||
--db-release-info clinvar:for-testing \ | ||
--clinvar-path Clinvar.tsv.gz \ | ||
\ | ||
--db-release-info gnomad_exomes:r2.1.1 \ | ||
--gnomad-exomes-path gnomad.exomes.r2.1.1.sites.chr22.vcf.bgz \ | ||
\ | ||
--db-release-info gnomad_genomes:r2.1.1 \ | ||
--gnomad-genomes-path gnomad.genomes.r2.1.1.sites.chr22.vcf.bgz \ | ||
\ | ||
--db-release-info hgmd_public:for-testing \ | ||
--hgmd-public HgmdPublicLocus.tsv.gz | ||
|
||
## step 2: db-info | ||
|
||
java -jar $JAR db-stats --db-path /tmp/out.h2.db --parseable \ | ||
> /tmp/db-info.txt | ||
diff /tmp/db-info.txt db-info.txt-expected | ||
|
||
## step 3: annotate | ||
|
||
java -jar $JAR annotate \ | ||
--release GRCh37 \ | ||
--input-vcf Case_1_index.gatk_hc.vcf.gz \ | ||
--output-gts /tmp/Case_1_index.gatk_hc.gts.tsv \ | ||
--output-db-info /tmp/Case_1_index.gatk_hc.db-info.tsv \ | ||
--ref-path chr22_part.fa \ | ||
--refseq-ser-path hg19_refseq.ser \ | ||
--ensembl-ser-path hg19_ensembl.ser \ | ||
--db-path /tmp/out.h2.db \ | ||
--self-test-chr22-only | ||
|
||
diff /tmp/Case_1_index.gatk_hc.gts.tsv Case_1_index.gatk_hc.gts.tsv-expected | ||
diff /tmp/Case_1_index.gatk_hc.db-info.tsv Case_1_index.gatk_hc.db-info.tsv-expected | ||
|
||
## step 4: annotate-svs | ||
|
||
java -jar $JAR annotate-svs \ | ||
--release GRCh37 \ | ||
--input-vcf Case_1_index.delly2.vcf.gz \ | ||
--output-gts /tmp/Case_1_index.delly2.gts.tsv \ | ||
--output-feature-effects /tmp/Case_1_index.delly2.feature-effects.tsv \ | ||
--output-db-info /tmp/Case_1_index.delly2.db-info.tsv \ | ||
--refseq-ser-path hg19_refseq.ser \ | ||
--ensembl-ser-path hg19_ensembl.ser \ | ||
--db-path /tmp/out.h2.db \ | ||
--self-test-chr22-only | ||
|
||
diff /tmp/Case_1_index.delly2.gts.tsv Case_1_index.delly2.gts.tsv-expected | ||
diff /tmp/Case_1_index.delly2.db-info.tsv Case_1_index.delly2.db-info.tsv-expected | ||
diff /tmp/Case_1_index.delly2.feature-effects.tsv Case_1_index.delly2.feature-effects.tsv-expected | ||
|
||
## if we reach here, everything is fine | ||
|
||
echo "-- ALL TESTS PASSED --" | ||
exit 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.