You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used vcftools (remove-indels and keep-only-indels) and gatk-selectvariants to extract SNPs and InDels from a gatk-GenotypeGVCFs generated original vcf file, but I got different results. The number of SNPs extracted by these two softwares was the same, but the number of InDels extracted by these two softwares was different. In my opinion, in the original VCF file, there are only two types of variants, SNP and InDel, and the number of SNPs plus the numbers InDels should be equal to the variant number in the original vcf file. For VCFtools, SNPS plus InDels equal all the variant number in the original vcf file, but gatk-selectvariants not. I am wondering, if there are some special rules for gatk-selectvariants function when extracting InDels from vcf file, leading to its number smaller than expected. Any help will be appreciated. Below are two pictures generated by gatk-selectvariants (left) and vcftools (right).
@woaishiye some of the sites that vcftools outputs as indels but SelectVariants does not are excluded because SelectVariants considers them mixed sites of both SNPs and INDELs (chr1:37 for example). However other sites, like chr1:61, seem to be highlighting an issue with how the spanning deletion allele (*) is treated in htsjdk.
Affected tool(s) or class(es)
GATK
VCFTOOLS
Affected version(s)
GATK-v4.1.1.0
VCFTOOLS-v0.1.15
Description
I used vcftools (remove-indels and keep-only-indels) and gatk-selectvariants to extract SNPs and InDels from a gatk-GenotypeGVCFs generated original vcf file, but I got different results. The number of SNPs extracted by these two softwares was the same, but the number of InDels extracted by these two softwares was different. In my opinion, in the original VCF file, there are only two types of variants, SNP and InDel, and the number of SNPs plus the numbers InDels should be equal to the variant number in the original vcf file. For VCFtools, SNPS plus InDels equal all the variant number in the original vcf file, but gatk-selectvariants not. I am wondering, if there are some special rules for gatk-selectvariants function when extracting InDels from vcf file, leading to its number smaller than expected. Any help will be appreciated. Below are two pictures generated by gatk-selectvariants (left) and vcftools (right).
Steps to reproduce
java -Xmx3990m -Djava.io.tmpdir=./JavaTmpDir
-jar gatk-package-4.1.1.0-local.jar SelectVariants
-R ./a_a_ref/Gmax_275_v2.0.fa
-V original.vcf.gz
-select-type INDEL
-O original.InDel.vcf.gz
$VCFTOOLS --gzvcf original.vcf.gz
--keep-only-indels --out original.InDel
--recode --recode-INFO-all
Expected behavior
we should get a same number of InDels by using VCFtools and GATK-selectvariants.
Actual behavior
the InDel number generated by gatk-selectvariants was smaller than vcftools.
The text was updated successfully, but these errors were encountered: