formatIMGT.sh NullPointerException #19

zezzipa · 2018-10-15T18:35:55Z

Hi again,

I have downloaded the Alignments_Rel_3330.zip from https://github.com/ANHIG/IMGTHLA and put the hla_nom_g.txt in that folder. It seem to be working fine at first to create the new reference with formatIMGT.sh but then it crashes, the last part of the log is as follow:

Processing [Y] <<<<<<<<<<
nucRefAl: Y01:01 genRefAl: Y01:01
refGeneName on nuc and gen are same
Wrting to : /proj/uppstore2018100/kourami-0.9.6/scripts/../custom_db/3.33.0/Y_gen.txt
Wrting to : /proj/uppstore2018100/kourami-0.9.6/scripts/../custom_db/3.33.0/Y_nuc.txt
REF SEQ names differs :
(nuc):Y*01:01
(gen):Y
java.lang.NullPointerException
at Sequence.processBlock(Sequence.java:554)
at Sequence.(Sequence.java:498)
at MergeMSFs.mergeAndAdd(MergeMSFs.java:383)
at MergeMSFs.mergeAndAdd(MergeMSFs.java:372)
at MergeMSFs.merge(MergeMSFs.java:298)
at FormatIMGT.processGene(FormatIMGT.java:199)
at FormatIMGT.main(FormatIMGT.java:100)

Any idea what the problem might be?
Thank you in advance for the help!

zezzipa · 2018-10-18T06:43:11Z

And as a follow-up on that question, is there a way to include MICA, MICB, TAP1 and TAP2 in the output. These genes are of interest to us in the disease we are studying and it would be great if we could get information for all genes from the same software.

heewookl · 2018-10-18T17:26:34Z

I have updated the code to handle a few minor changes in IMGT alignment file format.

Please clone the latest commit in the repo and it should run fine.

Adding MICA/B and TAP1/2 can probably done but I am not sure when I can get to it.

zezzipa · 2018-10-19T08:39:34Z

Thank you for looking into this. So irritating when they change format. I already had that problem another time this week.

I downloaded the new version, I had a problem with DRB5_gen.txt now (with IMGT/HLA version 3.34.0).

Processing [DRB5] <<<<<<<<<<
nucRefAl: DRB101:01:01 genRefAl: DRB501:01:01
refGeneName on nuc and gen are NOT same
Reference sequence entry [DRB5*01:01:01] is NOT found in nuc alignments.
Check the alignment files.

I fixed it by changing name on the allele in the DRB5_gen.txt file, since I don't care about DRB5 that worked for me. But for the future, with someone that does care, it can be good to look into.
Now I have a new reference, thank you so much!

I understand, it is not possible to do everything at once.

heewookl · 2018-10-19T17:10:50Z

Hi,

I didn't bother checking what was new in 3.34.0 release. Addition of an allele of DRB5 in the release, DRB5_gen.txt has been newly added in the release. I understand you don't care about DRB5 sequences, but I suggest you to use 3.33.0 release for the time being rather than modifying allele names to get away with the error. I should be able to get to this early Nov along with a possibility of supporting MICA/B and TAP1/2.

mmaiers-nmdp · 2019-04-29T15:53:21Z

Actually the DRB5_gen.txt that comes in the Alignments_Rel_3360.zip has the right name so everything works if you just comment out the part in scripts/formatIMGT.sh where it overwrites the file from the resources directory

davetang · 2020-05-01T06:05:48Z

Thank you @mmaiers-nmdp. scripts/formatIMGT.sh works with release 3.40.0, if I comment out the following lines (or remove the code block).

if [ ! -e "$resource_dir/DRB5_gen.txt" ];then
    echo "Missing DRB5_gen.txt in the resource directory. Please git pull or git clone"
    exit 1
# else
# cp $resource_dir/DRB5_gen.txt $input_msa/.
fi

freshfischer · 2020-08-27T02:34:11Z

Hi again,

I have downloaded the Alignments_Rel_3330.zip from https://github.com/ANHIG/IMGTHLA and put the hla_nom_g.txt in that folder. It seem to be working fine at first to create the new reference with formatIMGT.sh but then it crashes, the last part of the log is as follow:

Processing [Y] <<<<<<<<<<
nucRefAl: Y_01:01 genRefAl: Y_01:01
refGeneName on nuc and gen are same
Wrting to : /proj/uppstore2018100/kourami-0.9.6/scripts/../custom_db/3.33.0/Y_gen.txt
Wrting to : /proj/uppstore2018100/kourami-0.9.6/scripts/../custom_db/3.33.0/Y_nuc.txt
REF SEQ names differs :
(nuc):Y*01:01
(gen):Y
java.lang.NullPointerException
at Sequence.processBlock(Sequence.java:554)
at Sequence.(Sequence.java:498)
at MergeMSFs.mergeAndAdd(MergeMSFs.java:383)
at MergeMSFs.mergeAndAdd(MergeMSFs.java:372)
at MergeMSFs.merge(MergeMSFs.java:298)
at FormatIMGT.processGene(FormatIMGT.java:199)
at FormatIMGT.main(FormatIMGT.java:100)

Any idea what the problem might be?
Thank you in advance for the help!

This problem happens when java cannot identifiy allele "Y*01:01" correctly due to ' * ' in the first base position in file Y_gene.txt, script works after deleting the first alignment position base.

danilovkiri · 2020-12-21T10:23:17Z

@freshfischer hi, thank you for your comment. Could you please explain what do you mean by "deleting the first alignment position base"? Am I getting it correctly below? Do I need to change the -1 gDNA position to 0 and remove asterisks/G and spaces that come before the pipe in the sequence coding lines starting with Y*...?

# file: Y_gen.txt
# date: 2020-10-15
# version: IPD-IMGT/HLA 3.42.0
# origin: http://hla.alleles.org/wmda/Y_gen.txt
# repository: https://raw.githubusercontent.com/ANHIG/IMGTHLA/Latest/alignments/Y_gen.t>
# author: Steven G. E. Marsh ([email protected])

 gDNA              -1
                    |
 Y*01:01           * | ATGGCGGTC GTGGCGCCCC GAACCCTCCT CCTGCTACTC TCGGGGGCCC TGGCCCTGAC>
 Y*02:01           G | --------- ---------- ---------- ---------- ---------- ---------->
 Y*03:01           * | --------- ---------- ---------- ---------- ---------- ---------->

freshfischer · 2020-12-21T10:44:04Z

@freshfischer hi, thank you for your comment. Could you please explain what do you mean by "deleting the first alignment position base"? Am I getting it correctly below? Do I need to change the -1 gDNA position to 0 and remove asterisks/G and spaces that come before the pipe in the sequence coding lines starting with Y*...?
# file: Y_gen.txt
# date: 2020-10-15
# version: IPD-IMGT/HLA 3.42.0
# origin: http://hla.alleles.org/wmda/Y_gen.txt
# repository: https://raw.githubusercontent.com/ANHIG/IMGTHLA/Latest/alignments/Y_gen.t>
# author: Steven G. E. Marsh ([email protected])

 gDNA              -1
                    |
 Y*01:01           * | ATGGCGGTC GTGGCGCCCC GAACCCTCCT CCTGCTACTC TCGGGGGCCC TGGCCCTGAC>
 Y*02:01           G | --------- ---------- ---------- ---------- ---------- ---------->
 Y*03:01           * | --------- ---------- ---------- ---------- ---------- ---------->

As for this problem, I just re-edit Y_gen.txt like this to make it identified by java:

> Nuc+Gen merged MSA for Kourami
> # file: Y_gen.txt
> # date: 2020-04-20
> # version: IPD-IMGT/HLA 3.40.0
> # origin: http://hla.alleles.org/wmda/Y_gen.txt
> # repository: https://raw.githubusercontent.com/ANHIG/IMGTHLA/Latest/alignments/Y_gen.txt
> # author: WHO, Steven G. E. Marsh (steven.marsh.ac.uk)
>                    
>  gDNA              0                                                                                   
>                    |                                                                                 
>  Y*01:01            ATGGCGGTC GTGGCGCCCC GAACCCTCCT CCTGCTACTC TCGGGGGCCC TGGCCCTGAC CCAGACCTGG GCGG 
>  Y*02:01            --------- ---------- ---------- ---------- ---------- ---------- ---------- ---- 
>  Y*03:01            --------- ---------- ---------- ---------- ---------- ---------- ---------- ----

davetang · 2021-03-01T14:00:03Z

Hi again,
I have downloaded the Alignments_Rel_3330.zip from https://github.com/ANHIG/IMGTHLA and put the hla_nom_g.txt in that folder. It seem to be working fine at first to create the new reference with formatIMGT.sh but then it crashes, the last part of the log is as follow:

Processing [Y] <<<<<<<<<<
nucRefAl: Y_01:01 genRefAl: Y_01:01
refGeneName on nuc and gen are same
Wrting to : /proj/uppstore2018100/kourami-0.9.6/scripts/../custom_db/3.33.0/Y_gen.txt
Wrting to : /proj/uppstore2018100/kourami-0.9.6/scripts/../custom_db/3.33.0/Y_nuc.txt
REF SEQ names differs :
(nuc):Y*01:01
(gen):Y
java.lang.NullPointerException
at Sequence.processBlock(Sequence.java:554)
at Sequence.(Sequence.java:498)
at MergeMSFs.mergeAndAdd(MergeMSFs.java:383)
at MergeMSFs.mergeAndAdd(MergeMSFs.java:372)
at MergeMSFs.merge(MergeMSFs.java:298)
at FormatIMGT.processGene(FormatIMGT.java:199)
at FormatIMGT.main(FormatIMGT.java:100)

Any idea what the problem might be?
Thank you in advance for the help!

This problem happens when java cannot identifiy allele "Y*01:01" correctly due to ' * ' in the first base position in file Y_gene.txt, script works after deleting the first alignment position base.

If you use the latest update of this repository (commit 545c770) instead of the latest release tag (v0.9.6), you don't get that particular error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

formatIMGT.sh NullPointerException #19

formatIMGT.sh NullPointerException #19

zezzipa commented Oct 15, 2018

zezzipa commented Oct 18, 2018

heewookl commented Oct 18, 2018

zezzipa commented Oct 19, 2018

heewookl commented Oct 19, 2018 •

edited

Loading

mmaiers-nmdp commented Apr 29, 2019

davetang commented May 1, 2020

freshfischer commented Aug 27, 2020

danilovkiri commented Dec 21, 2020

freshfischer commented Dec 21, 2020 •

edited

Loading

davetang commented Mar 1, 2021

formatIMGT.sh NullPointerException #19

formatIMGT.sh NullPointerException #19

Comments

zezzipa commented Oct 15, 2018

zezzipa commented Oct 18, 2018

heewookl commented Oct 18, 2018

zezzipa commented Oct 19, 2018

heewookl commented Oct 19, 2018 • edited Loading

mmaiers-nmdp commented Apr 29, 2019

davetang commented May 1, 2020

freshfischer commented Aug 27, 2020

danilovkiri commented Dec 21, 2020

freshfischer commented Dec 21, 2020 • edited Loading

davetang commented Mar 1, 2021

heewookl commented Oct 19, 2018 •

edited

Loading

freshfischer commented Dec 21, 2020 •

edited

Loading