Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.IllegalArgumentException: Cigar cannot be null with HaplotypeCaller in GENOTYPE_GIVEN_ALLELES mode #6037

Closed
freeseek opened this issue Jul 15, 2019 · 5 comments · Fixed by #6047

Comments

@freeseek
Copy link

I obtain this reproducible issue with gatk 4.1.2.0:

Using the following code:

wget https://github.com/broadinstitute/picard/releases/download/2.19.0/picard.jar

wget https://github.com/broadinstitute/gatk/releases/download/4.1.2.0/gatk-4.1.2.0.zip
unzip gatk-4.1.2.0.zip

wget -O- ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz | \
  gzip -d > GCA_000001405.15_GRCh38_no_alt_analysis_set.fna

samtools faidx GCA_000001405.15_GRCh38_no_alt_analysis_set.fna

java -jar picard.jar \
  CreateSequenceDictionary \
  R=GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
  O=GCA_000001405.15_GRCh38_no_alt_analysis_set.dict

(echo "##fileformat=VCFv4.2"; \
echo "##contig=<ID=chrX,length=156040895>"; \
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO"; \
echo -e "chrX\t1052617\t.\tC\tCAAAGGCTGCAATGTGAATGAATTTTTGGAAATAGCCCTAATGCTCATCTATGAAGGAGTGATAAACACAGCATCCTTTATCCATGCAATGGAATATTATGCAGTCTAGAAAAGGAATAAGGCTCTGACAAAAGACTGCAATATGTATGAATTTTGGAAACAGCCCTACTGCCCATCTATAAAGGAATGGATAAACACAGCATAGTTCATCTATACAATGCAATATTATAATGGAATATTATGCAGCCTGGAACAGGAACAAGGCTCTGAG\t.\t.\t.") | \
  bgzip > input.vcf.gz; \
tabix -f input.vcf.gz

(echo -e "@HD\tVN:1.6\tGO:none\tSO:coordinate"; \
echo -e "@SQ\tSN:chrX\tLN:156040895"; \
echo -e "@RG\tID:ID\tPL:ILLUMINA\tPU:ID\tLB:LIBRARY\tSM:SAMPLE") | \
  samtools view -Sb -o input.bam; \
samtools index input.bam

gatk-4.1.2.0/gatk HaplotypeCaller \
  -R GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
  -I input.bam \
  -O output.vcf.gz \
  --genotyping-mode GENOTYPE_GIVEN_ALLELES \
  --alleles input.vcf.gz

I get the following error:

java.lang.IllegalArgumentException: Cigar cannot be null
	at org.broadinstitute.hellbender.utils.read.AlignmentUtils.consolidateCigar(AlignmentUtils.java:716)
	at org.broadinstitute.hellbender.utils.haplotype.Haplotype.setCigar(Haplotype.java:193)
	at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.addGivenAlleles(AssemblyBasedCallerUtils.java:350)
	at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:291)
	at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:542)
	at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:240)
	at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:308)
	at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:281)
	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1039)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
	at org.broadinstitute.hellbender.Main.main(Main.java:291)

Somehow the HaplotypeCaller seems to have some bug in GENOTYPE_GIVEN_ALLELES mode and when the VCF file for the given alleles contains a very large indel it ends up giving a cryptic error, regardless of what is contained in the bam file.

@cwhelan
Copy link
Member

cwhelan commented Jul 15, 2019

After some initial debugging it looks like the alternate haplotype is not being successfully aligned back to the reference haplotype with an insertion CIGAR -- the SWPairwiseAlignmentResult returned by the SW aligner has a cigar of 285M10D280S. Perhaps we need a different set of SW alignment parameters for the SW alignment in this code path. @davidbenjamin any thoughts on this since I think you've been looking at SW parameters recently?

@freeseek
Copy link
Author

I am getting the same issue when I use --genotyping-mode GENOTYPE_GIVEN_ALLELES even if the --alleles file contains only SNPs and no indels. It is very difficult to get around / debug as I don't know what variants the GATK HaplotypeCaller was working on when it shut down. Was this bug introduced with version 4.1.0.0?

@davidbenjamin
Copy link
Contributor

@freeseek I believe I accidentally introduced it in 4.1.2 while fixing a different bug in GGA mode. It's on my to-do list for this week.

@freeseek
Copy link
Author

@davidbenjamin thank you for letting me know. I will try version 4.1.1.0 for now and I will report if I identify any further issues. Happy to test intermediate versions if that is of any help.

@davidbenjamin
Copy link
Contributor

@freeseek @cwhelan The issue was that our Smith-Waterman parameters were permitting soft-clips, which in this case is incorrect. It's an easy fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants