java.lang.NullPointerException during Bubble Processing #23

ariane-lozachmeur · 2018-11-29T20:41:38Z

Hi,

I'm was running Kourami on a dozens of samples and for a few of them, I get this java error:

Bubble Processing and Path Assembly for:        A
Bubble Processing and Path Assembly for:        B
Bubble Processing and Path Assembly for:        C
Bubble Processing and Path Assembly for:        DQA1
Bubble Processing and Path Assembly for:        DQB1
java.lang.NullPointerException
        at Bubble.removeUnsupported(Bubble.java:842)
        at Bubble.<init>(Bubble.java:379)
        at Bubble.<init>(Bubble.java:384)
        at HLAGraph.countBubbles(HLAGraph.java:2050)
        at HLAGraph.countBubblesAndMerge(HLAGraph.java:900)
        at HLA.countBubblesAndMerge(HLA.java:357)
        at HLA.main(HLA.java:585)

In this case it happened during the Bubble Processing of DQB1 but it sometimes happens for other genes.

I should specify that I changed slightly the preprocessing scripts to work on the hg19 reference and include unmapped reads.
Any idea of what might cause that?
Thank you for your help and amazing work on Kourami!

The text was updated successfully, but these errors were encountered:

heewookl · 2018-12-05T17:39:29Z

Hi,

Could you please explain what you have changed in the preprocessing scripts? My guess you changed the loci information for extracting HLA alleles from hg19? If you can share what you exactly changed, that will be helpful.

Also, could you please share the bam file aligned to the kourami panel? Thank you!

ariane-lozachmeur · 2018-12-05T19:31:48Z

You guessed correctly, I changed the loci information to extract reads from hg19. In addition, our testing showed better performances when we added back the unmapped reads to the extracted reads.

The resulting code is the following (for the extraction part of the preprocessing)

echo ">>>>>>>>>>>>>>>> extracting reads mapping to HLA loci and ALT contigs"
$samtools_bin view -b -o hla.bam $bam_path 6:29900000-29914000 6:31230000-31241000 6:31320000-31326000 \
                                6:33032000-33042000 6:33043000-33058000 6:32604000-32612000 \
                                6:32629000-32634500 6:32407550-32413000 \
                                6:32520000-32558000
$samtools_bin sort -n -@8 -o hla.sorted.bam hla.bam
$samtools_bin view -b -f4 -@8 -o unmapped.bam $bam_path
$samtools_bin sort -n -@8 -o unmapped.sorted.bam unmapped.bam
$samtools_bin merge -p -f all.bam hla.sorted.bam unmapped.sorted.bam
$samtools_bin sort --thread $num_processors -m $samtools_sort_memory_per_thread -O BAM -o $sampleid.extract.bam all.bam

The rest of the script is the same.

Unfortunately, I am not sure I will be able to send you the file. I am waiting for approval and I'll get back to you when I know more.
In the meantime, here is what I tried to do to diagnose the issue, maybe that will help.

To try and see where the NullPointerException was coming from I made a little experiment.
Say the Bubble Processing failed during assembly for DQA1, I isolated the reads mapped to DQA1 from the bam file aligned to the Kourami panel (using samtools view -o new.bam in.bam DQA1*01:01:01:01:0-100000 [...] DQA1*06:02:0-100000)
Then I downsampled this bam by 10% (samtools view -s0.9 -o new.0.9.bam new.bam)
Depending on the seed I used in samtools, those smaller bam file sometimes worked with Kourami and sometimes didn't.
At that point I figured that some specific reads were probably causing the issue. So I identified which reads were in the files that didn't work and not in the files that worked.
Removing those reads from the original on_KouramiPanel.bam file didn't solve the problem. Moreover, a bam file containing only those suspicious reads didn't cause the bug.
Now my guess is that it is probably a certain subset of reads that together create this edge case.

Lastly, the end of the .log file looks like that:

=========================
=  DQB1
=========================
[k] = 2113
trying new k: [k] = 2112
BASE:   [A,2113]
BASE:   [C,2113]
BASE:   [G,2113]
BASE:   [T,2113]
trying new k: [k] = 2111
BASE:   [A,2112]
BASE:   [C,2112]
BASE:   [G,2112]
[...]
trying new k: [k] = 2065
BASE:   [G,2066]
BASE:   [T,2066]
trying new k: [k] = 2064
BASE:   [G,2065]
Found the new start!
Setting Trimming length(header):        49

I'm sorry I can't give you the bam file to reproduce this yet. Please let me know if you have any ideas or things you would like me to test.
Thank you!

heewookl · 2018-12-05T22:22:51Z

Thank you for providing me with much details.

I am guessing you updated those coordinates from hg19 annotation for HLA genes for the extraction part. Including unmapped reads can certainly improve the results especially when the initial mapping is done on hg19, however, sometimes spurious mappings can creep in which can lead to mistyping.

I have a few questions:

Are all alignments performed by bwa?
Is this WGS? and what are the coverages for these samples?

I understand that you can't send me the extracted bam. In the meantime, I will think of what test you can run on your end to get this problem sorted out. I shall be in touch soon. Thank you.

ariane-lozachmeur · 2018-12-06T23:03:15Z

The first alignment (to hg19) is performed by SNAP. But during the preprocessing for Kourami I only use bwa.
This is exome sequencing on around 600 genes with a 100x coverage. By adding back the unmapped reads, I get around 500k reads mapped to the Kourami reference.

heewookl · 2019-02-26T22:01:54Z

Hi, I would like to follow up on this issue and would it be possible for you to drop me an email to arrange a secure data transfer? Thanks.

Rashesh7 · 2019-04-16T15:03:43Z

Hi,
Sorry for jumping in, but I am also interested in editing the preprocessing scripts to work on the hg19 reference.

Please do let me know if this was resolved and there is a consensus on what reads should it extract.

Thanks.

NicolaLady · 2019-10-07T17:57:52Z

Hi,

I am having the same issue as the original poster. Any updates on how to resolve this?

I am working with whole exome sequencing. Could the issue be due to possibly low coverage at DQB1?

Issue:

----------------REF GRAPH CONSTRUCTION--------------
---------------- READ LOADING --------------
---------------- GRAPH CLEANING --------------
Bubble Processing and Path Assembly for: A
Bubble Processing and Path Assembly for: B
Bubble Processing and Path Assembly for: C
Bubble Processing and Path Assembly for: DQA1
Bubble Processing and Path Assembly for: DQB1
java.lang.NullPointerException
at Bubble.removeUnsupported(Bubble.java:842)
at Bubble.(Bubble.java:379)
at Bubble.(Bubble.java:384)
at HLAGraph.countBubbles(HLAGraph.java:2040)
at HLAGraph.countBubblesAndMerge(HLAGraph.java:890)
at HLA.countBubblesAndMerge(HLA.java:357)
at HLA.main(HLA.java:585)

The tail end of my log file

DQA1

[k] = 4757
trying new k: [k] = 4756
BASE: [G,4757]
Found the new start!
Setting Trimming length(header): 1
NumBubbles: 36 found

<---------------------------------->
CHECKING INTER-SUPERBUBBLE PHASING:
<---------------------------------->

Printing 1 fractured super bubbles.
SUPER BUBBLE [0]
FractureEndIndex:[ 0 ]
IntersectionScore: 27.701600143119084 -8.23921481214836

candidate_0-0
CTGACCACGTTGCCTCTTGTGGTGTAAACTTGTACCAGTTTTACGGTCCCTCTGGCCAGTACACCCATGAATTTGATGGAGATGAGGAGTTCTACGTGGACC
TGGAGAGGAAGGAGACTGCCTGGCGGTGGCCTGAGTTCAGCAAATTTGGAGGTTTTGACCCGCAGGGTGCACTGAGAAACATGGCTGTGGCAAAACACAACT
TGAACATCATGATTAAACGCTACAACTCTACCGCTGCTACCAATG

FractureEndIndex:[ 0 ]
IntersectionScore: 8.298399856880916 -52.64326228050601

candidate_0-1
CTGACCACGTCGCCTCTTATGGTGTAAACTTGTACCAGTCTTACGGTCCCTCTGGCCAGTACACCCATGAATTTGATGGAGATGAGCAGTTCTACGTGGACC
TGGGGAGGAAGGAGACTGTCTGGTGTTTGCCTGTTCTCAGACAATTTAGATTTGACCCGCAATTTGCACTGACAAACATCGCTGTCCTAAAACATAACTTGA
ACAGTCTGATTAAACGCTCCAACTCTACCGCTGCTACCAATG

AllelePair [0:0] {PAIRSCORE:-65.01606974352967 E_SUM:63724.0 MAXFLOW:59.0}
AllelePair [0:1] {PAIRSCORE:-62.28610893381102 E_SUM:66484.0 MAXFLOW:65.0}
AllelePair [1:1] {PAIRSCORE:-153.82416468024502 E_SUM:54037.0 MAXFLOW:17.0}

BEST PAIR[0:1]: -62.28610893381102
[0 0.7914742898034024 -8.23921481214836]BEST MATCH: DQA101:01:01G 249 1.0 [MaxFl
ow:65.0] [MinDepth1:33.0] [MinDepth2:17.0]
[1 0.23709713876802618 -52.64326228050601]BEST MATCH: DQA105:01:01G 246 1.0 [MaxFl
ow:65.0] [MinDepth1:33.0] [MinDepth2:17.0]

DQB1

[k] = 2113
trying new k: [k] = 2112
BASE: [A,2113]
BASE: [G,2113]
BASE: [N,2113]
trying new k: [k] = 2111
BASE: [A,2112]
BASE: [N,2112]
trying new k: [k] = 2110
BASE: [C,2111]
BASE: [N,2111]
trying new k: [k] = 2109
BASE: [G,2110]
BASE: [N,2110]
trying new k: [k] = 2108
BASE: [C,2109]
BASE: [G,2109]
BASE: [T,2109]
trying new k: [k] = 2107
BASE: [A,2108]
BASE: [C,2108]
BASE: [T,2108]
trying new k: [k] = 2106
BASE: [C,2107]
BASE: [G,2107]
trying new k: [k] = 2105
BASE: [A,2106]
BASE: [C,2106]
BASE: [T,2106]
BASE: [N,2106]
trying new k: [k] = 2104
BASE: [G,2105]
BASE: [T,2105]
BASE: [N,2105]
trying new k: [k] = 2103
BASE: [A,2104]
BASE: [T,2104]
BASE: [N,2104]
trying new k: [k] = 2102
BASE: [A,2103]
BASE: [C,2103]
BASE: [T,2103]
trying new k: [k] = 2101
BASE: [G,2102]
BASE: [N,2102]
trying new k: [k] = 2100
BASE: [A,2101]
BASE: [C,2101]
BASE: [G,2101]
BASE: [T,2101]
BASE: [N,2101]
trying new k: [k] = 2099
BASE: [A,2100]
BASE: [G,2100]
BASE: [T,2100]
trying new k: [k] = 2098
BASE: [C,2099]
BASE: [G,2099]
BASE: [N,2099]
trying new k: [k] = 2097
BASE: [C,2098]
BASE: [G,2098]
BASE: [N,2098]
trying new k: [k] = 2096
BASE: [A,2097]
BASE: [C,2097]
BASE: [G,2097]
BASE: [N,2097]
trying new k: [k] = 2095
BASE: [C,2096]
BASE: [G,2096]
BASE: [T,2096]
BASE: [N,2096]
trying new k: [k] = 2094
BASE: [A,2095]
BASE: [G,2095]
BASE: [T,2095]
trying new k: [k] = 2093
BASE: [C,2094]
BASE: [T,2094]
trying new k: [k] = 2092
BASE: [C,2093]
Found the new start!
Setting Trimming length(header): 21

Kind regards,

Nicola

nikolasthuesen · 2020-10-21T12:19:43Z

I got the same error, when running Kourami on extracted WES data from the 1000 genomes samples NA19346 and NA19471. These were the only two giving this error out of the 829 samples I ran the workflow for. The errors were not caused by using hg19 reference - I was using GRCh38.
The error happened during the typing of DQB1 as well, meaning that Kourami had already typed A, B, C and DQA1. As the error happened, both samples had a missing ".result" file and the predictions, that it did manage to do before the error, could therefore not be found here. However, the typing results of A, B, C and DQA1 were still present in the ".log" file.
If anyone is still having this problem, and like me don't have time to wait for the fix, the predictions for A, B, C and DQA1 can still be retrieved.

masen1991 · 2021-03-17T08:18:26Z

Can Kourami work on hg19 ref samples?
and by the way,how can i use Kourami on WES samples (maybe hg19 ref or hg38 ref)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.NullPointerException during Bubble Processing #23

java.lang.NullPointerException during Bubble Processing #23

ariane-lozachmeur commented Nov 29, 2018 •

edited

Loading

heewookl commented Dec 5, 2018

ariane-lozachmeur commented Dec 5, 2018

heewookl commented Dec 5, 2018

ariane-lozachmeur commented Dec 6, 2018

heewookl commented Feb 26, 2019

Rashesh7 commented Apr 16, 2019

NicolaLady commented Oct 7, 2019 •

edited

Loading

nikolasthuesen commented Oct 21, 2020 •

edited

Loading

masen1991 commented Mar 17, 2021

java.lang.NullPointerException during Bubble Processing #23

java.lang.NullPointerException during Bubble Processing #23

Comments

ariane-lozachmeur commented Nov 29, 2018 • edited Loading

heewookl commented Dec 5, 2018

ariane-lozachmeur commented Dec 5, 2018

heewookl commented Dec 5, 2018

ariane-lozachmeur commented Dec 6, 2018

heewookl commented Feb 26, 2019

Rashesh7 commented Apr 16, 2019

NicolaLady commented Oct 7, 2019 • edited Loading

Issue:

**The tail end of my log file **

DQA1

DQB1

nikolasthuesen commented Oct 21, 2020 • edited Loading

masen1991 commented Mar 17, 2021

ariane-lozachmeur commented Nov 29, 2018 •

edited

Loading

NicolaLady commented Oct 7, 2019 •

edited

Loading

The tail end of my log file

nikolasthuesen commented Oct 21, 2020 •

edited

Loading