Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gatk GenotypeGVCFs USER ERROR: The list of input alleles must contain <NON_REF> as an allele but that is not the case #7147

Closed
lubocoix opened this issue Mar 16, 2021 · 13 comments
Assignees

Comments

@lubocoix
Copy link

I have already get ".g.vcf " through "gatk Haplotype Caller" but when I used the code "./gatk GenotypeGVCFs -R /Users/lubo/sorgum/GCF_000003195.3_Sorghum_bicolor_NCBIv3_genomic.fna.
-V /Users/lubo/sorgum/propinquum_variation.g.vcf
-O /Users/lubo/sorgum/propinquum.vcf" to generate the output file "propinquum.vcf" ,A USER ERROR has occurred: The list of input alleles must contain <NON_REF> as an allele but that is not the case at position 11733; please use the Haplotype Caller with gVCF output to generate appropriate records。
I don't know what's wrong with my code ,is that mean my input file have something wrong?

@droazen
Copy link
Contributor

droazen commented Mar 22, 2021

@lubocoix Are you running HaplotypeCaller with the -ERC GVCF argument?

@droazen droazen changed the title gatk GenotypeGVCFs gatk GenotypeGVCFs USER ERROR: The list of input alleles must contain <NON_REF> as an allele but that is not the case Apr 12, 2021
@droazen droazen self-assigned this Apr 12, 2021
@ShuwenXia
Copy link

Hi lubocoix, did you fix the error? I got the same issue. Please let me know how did you fix the problem. Many Thanks!!

@lubocoix
Copy link
Author

lubocoix commented Oct 8, 2021 via email

@ShuwenXia
Copy link

I have used Haplotype Caller to get husheep_reseq.g.vcf file and run "./share/nas1/comput5/Tools/GATK/gatk-4.2.0.0/gatk --java-options -Xmx80G GenotypeGVCFs -R /share/nas1/comput5/ref_genome/husheep_ref/GCA_011170295.1_ASM1117029v1_genomic.fna -V husheep_reseq.g.vcf -O husheep_reseq.vcf “ to generate husheep_reseq.vcf file, but I got an error "The list of input alleles must contain <NON_REF> as an allele but that is not the case at position 100; please use the Haplotype Caller with gVCF output to generate appropriate records". Seems similar in your case.

@PlatonB
Copy link

PlatonB commented Oct 10, 2023

@droazen In my case, CombineGVCFs crashes on multiallelic variants.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  mother_DV
chrX    10001   .       C       <NON_REF>       0       .       END=10060       GT:GQ:MIN_DP:PL 0/0:1:0:0,0,0
chrX    10061   .       G       <NON_REF>       0       .       END=10068       GT:GQ:MIN_DP:PL 0/0:6:2:0,6,59
chrX    10069   .       T       A,<NON_REF>     14.7    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:13:2:0,2,0:1,0:14,18,0,990,990,990
chrX    10070   .       G       <NON_REF>       0       .       END=10070       GT:GQ:MIN_DP:PL 0/0:6:2:0,6,59
chrX    10071   .       G       <NON_REF>       0       .       END=10111       GT:GQ:MIN_DP:PL 0/0:15:5:0,15,149
chrX    10112   .       G       <NON_REF>       0       .       END=10120       GT:GQ:MIN_DP:PL 0/0:18:6:0,18,179
<...>
org.broadinstitute.hellbender.exceptions.GATKException: Exception thrown at chrX:10071 [VC /home/pbykadorov/family/02_merged/mother_DV_chrX.g.vcf.gz @ chrX:10071-10111 Q0.00 of type=SYMBOLIC alleles=[G*, <NON_REF>] attr={END=10111} GT=GT
:GQ:MIN_DP:PL   0/0:15:5:0,15,149 filters=
Caused by: org.broadinstitute.hellbender.exceptions.UserException: The list of input alleles must contain <NON_REF> as an allele but that is not the case at position 10069; please use the Haplotype Caller with gVCF output to generate app
ropriate records

Note.
<NON_REF> was inserted in place of DeepVariant's <*> manually:

sed -e "s/<\*>/<NON_REF>/g"

@gokalpcelik
Copy link
Contributor

Hi @PlatonB
GATK GVCF tools are designed to work with only GVCFs from GATK HaplotypeCaller. If you wish to work with GVCFs generated by deepvariant or other tools you may use glnexus.

@PlatonB
Copy link

PlatonB commented Oct 10, 2023

@gokalpcelik The same error occurs when working with HaplotypeCaller results.

@gokalpcelik
Copy link
Contributor

Can you also post a line from HaplotypeCaller generated GVCF that causes this issue ?

@PlatonB
Copy link

PlatonB commented Oct 10, 2023

chrX 10069 . T A,<NON_REF> 63.31 . DP=3;ExcessHet=0;MLEAC=1,0;MLEAF=0.5,0;RAW_MQandDP=5337,3 GT:<...>

@gokalpcelik
Copy link
Contributor

How was this gvcf created can you post us your steps as well? Also are you trying to combine gvcfs generated by different tools with haplotypecallers gvcf ?
What version of GATK are you using?

@PlatonB
Copy link

PlatonB commented Oct 10, 2023

How was this gvcf created can you post us your steps as well?

I'm afraid that's non-public information. Perhaps after consulting with my employer, I will publish some of the pipelines' code.

Also are you trying to combine gvcfs generated by different tools with haplotypecallers gvcf ?

Yes, I am merging the gVCFs generated by HaplotypeCaller, DeepVariant, BCFtools mpileup&call and FreeBayes. Before merging, I try to unify the source gVCFs using self-written Bash and Python scripts.

What version of GATK are you using?

gatk --version
Using GATK jar /home/pbykadorov/miniconda3/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/pbykadorov/miniconda3/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar --version
The Genome Analysis Toolkit (GATK) v4.4.0.0
HTSJDK Version: 3.0.5
Picard Version: 3.0.0

@gokalpcelik
Copy link
Contributor

gokalpcelik commented Oct 10, 2023

This is the keypoint.
"Merging the gVCFs generated by HaplotypeCaller, DeepVariant, BCFtools mpileup&call and FreeBayes. "

CombineGVCFs tool's aim is not to combine GVCFs from different sources. The tool can only work with GVCFs from HaplotypeCaller.

For us to understand the issue clearly we need to get the actual command line for HaplotypeCaller for generating the GVCF file. GVCFs generated by other tools are not our concern therefore if you are doing something outside of GATK domain we cannot help.

There may be other tools doing this kind of work but we don't have extensive information on them and we cannot debug their issues either.

@lbergelson
Copy link
Member

@PlatonB Since this is a complex issue involving multiple non-gatk tools and a proprietary pipeline we can't really do much to debug it without a way to reproduce the problem. If you could provide (ideally minimal) inputs to reproduce the issue that would give us much more of a handle to look into it. Otherwise I don't think we can do much with the information you provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants