Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) #41

Open
gill0110 opened this issue Jul 7, 2021 · 7 comments
Open

Segmentation fault (core dumped) #41

gill0110 opened this issue Jul 7, 2021 · 7 comments

Comments

@gill0110
Copy link

gill0110 commented Jul 7, 2021

Hi!I have done a de novo transcriptome assembly, now I'm trying to map the assembled transcriptome to a genome of relative species. I have used 'makeidx.pl' command in seqdb :
makeidx.pl -inp Pgenome.mfa
and then:
spaln -Q7 -O3 -dPgenome -t40 N83.Trinity.fasta -oN83.spaln
Some of the results will be printed,but then spaln report error (Segmentation fault (core dumped))

Pp06 4338791 4351464 TRINITY_DN18_c0_g1_i2 941 + 4338791 4351464 255,0,0 23 398,258,261,66,85,167,108,102,201,111,68,388,156,111,54,258,110,154,454,215,239,127,488, 0,682,1166,1512,2027,2758,3043,3926,4118,4480,5198,5469,6319,6553,6907,7620,8442,9122,10188,11045,11349,11982,12185
Pp01 28537321 28540192 TRINITY_DN28_c0_g2_i1 966 + 28537321 28540192 255,0,0 3 749,1289,601, 0,834,2270
Pp08 21474939 21479094 TRINITY_DN84_c0_g1_i2 927 - 21474939 21479094 0,255,255 19 12,641,115,60,54,60,63,44,85,81,93,65,76,66,66,60,90,63,319, 0,116,857,1055,1223,1362,1516,1656,1826,2004,2190,2369,2531,2767,2948,3127,3278,3689,3836
Segmentation fault (core dumped)

How to resolve?

@ogotoh
Copy link
Owner

ogotoh commented Jul 9, 2021

Thank you for your report, but your information is too sparse to resolve the problem. I first suggest that any options (-oN83.spaln in your example) should be placed before the argument (N83.Trinity.fasta). Second, please set proper species-specific parameter set by -T option (ex. -Thomosapi) . If ‘Pgenome.mfa’ does not represent the entire genomic sequence, set the expected maximal gene length by -XG option (ex. -XG1M).

By the way, how large is your genome? Spaln may fail if the genome size exceeds some threshold.

Osamu,

@ogotoh
Copy link
Owner

ogotoh commented Jul 11, 2021

I have found a serious error for DNA queries with -S3 option (default), which might cause the segmentation faults in your site. I have just uploaded Ver.2.4.5 being fixed of the error. In addition, the default setting of the formatting mode was accidentally changed in a few recent releases. The original setting has been recovered in this new version. Please try this new version, and if possible, please let me know your results.

Osamu,

@zfuller5280
Copy link

Hello, I am getting a similar "Segmentation fault" error, however using protein sequence as a query and a whole genome sequence (~475 Mb). I am using the most recent version of spaln, referenced above. Also, please let me know if I should open this as a new issue.

I make my genomic database as follows:
spaln -W -KP -E -t4 Amil_superscaffold.gf

and then try to get protein alignments with the following (I have tried running with and without -T option and have used different species-specific parameters, all fail at the same point):
spaln -O0 -Q7 -M1 -dAmil_superscaffold -T Mollusca -R /moto/palab/users/zlf2101/software/spaln/seqdb/Amil_superscaffold.bkp -E ${INFILE} > ${INFILE}.gff

Here $INFILE refers to my input .faa sequences. The first few lines as an example, look like this:

>Amillepora16770-RA protein AED:0.13 eAED:0.13 QI:51|0.5|1|1|1|1|3|427|72
MTSTDIKGNRRVLKNRPPPEEPEVDLYKLPLMKSTSAQRQSIHSINGAIESHHSASTNGG
YSRKHDGGFYTC
>Amillepora16769-RA protein AED:0.01 eAED:0.01 QI:90|1|1|1|1|1|7|1213|1096
MGIQGLQEYLEANCPDACEEVDLKQVIIGQESTNENAVVLLVDTRSCLKHLYGPNTDWVC
GGQWNEMLRAVENFTRSFRQQSIQIVMYFDGEGESRKLHQWIRNQNDKRQLARQILTHVM
KMNCYPGKRLYFPPPAVETCLRLAFLSCGVSVCSSTEDLHKEMATYCLAEGYAGVISHHA
DFLIFDVPNYFSSDHLKFSKKEITTVRFKREAMLSVLQLHHDRLGLFASLLGTDFIPEEI
LGSFYWNLLGPDHPLAKVQVKEKHQPIFPANEIIITSVVSFIKSLADPHNLPWIARQVFR
SEKVDLAEITEKLENAVQHYSGLTQGKEHAPQSPGIAGRKQNEHQYQKMWQHWQQHQFVP

My output .gff file ends up being ~6.7M and the output to stdout is ~405k. Every time the core that gets dumped is <2G. I'm currently running on a node with 720G of memory available, so I don't think it is an oom error. Can you please advise? Let me know if I need to provide any further information.

@ogotoh
Copy link
Owner

ogotoh commented Jul 13, 2021

Dear Zach,
Thank you for your report. I suggest a few points to be tried.

  1. Remove -E option at the formatting and search modes. As it consumes large memory, I couldn’t fully examine its performance on my personal computer at home. I will remove -E option in future releases.
  2. At the search mode, -R option is unnecessary. Try also without -M1 option, which behaves somewhat differently from other -M_n_ (n > 1) options.
  3. Your input file looks fine. If you still encounter a segmentation fault, please let met know the site from which your genomic sequence was downloaded, together with a part of the input aa sequence file for which the segmentation fault takes place. You will be able to locate the problematic part by running spaln like that:
    $ aln -O0 -Q7 -d Amil_superscaffold -T Mollusca ‘query.faa (from to)’
    were from and to (from <= to) are the start and end entry numbers of query.faa.

Osamu

@gill0110
Copy link
Author

I have found a serious error for DNA queries with -S3 option (default), which might cause the segmentation faults in your site. I have just uploaded Ver.2.4.5 being fixed of the error. In addition, the default setting of the formatting mode was accidentally changed in a few recent releases. The original setting has been recovered in this new version. Please try this new version, and if possible, please let me know your results.

Osamu,

Thanks for your reply !
I downloaded Ver.2.4.5 and try. But it still didn't work,
free(): invalid size
Aborted (core dumped)
Then I picked 10,000 sequences at random . I found that it worked. What are the possible problems ?
thanks!

@ogotoh
Copy link
Owner

ogotoh commented Jul 19, 2021

I tried to figure out the source of the segmentation faults. I have found a few points that might slightly improve the performance of Spaln. However, they do not seem to be relevant to the faults.

If you have time, please try the following:

  1. Run spaln again without -t option for the genome and queries for which spaln failed.
  2. grep ‘^>’ query.fna | grep -n last_entry, where last_entry stands for the last successful query entry. A ‘tail’ command applied to your output file will reveal last_entry.
  3. Let ‘N’ be the entry number obtained by the grep -n command. For simplicity, I assume N = 100 here.
  4. spaln -Q7 -O4 -d genome_g … -pw ‘query.fna (101)’
  5. If you find the same segmentation error as before, please let me know the 101-th sequence, together with the site of genomic sequence you have downloaded.
  6. If spaln reports some results without a segmentation fault, please try
  7. spaln -Q7 -O4 -d genome_g … -pw ‘query.fna (101-)’
  8. Go back to 2.

I hope your kind cooperation

Osamu,

@ogotoh
Copy link
Owner

ogotoh commented Sep 14, 2021

Although it took unexpectedly long time, I have finished modification of spaln. Tested upon more than 100 pairs of genomic and assembled transcript DNA sequences in the DDBJ database of various sequence similarity levels, the new version (Ver.2.4.6) runs without segmentation faults. For protein queries, tests have not been done in this detail. However, it works fine for a few examples. Thus, I wanted not to further delay the release of this version.

I thank you for your patience. If you encounter any problems with this or previous versions of spaln, please let me know at your convenience.

Osamu,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants