core dumped #64

minjinhan · 2023-08-01T16:08:27Z

Hi,
We recently used spaln (version 3.4.13f) to align CDS sequences (~25,000 sequences) to a genome(~450Mb) on Centos 8 system (Our server has 1T of memory) . But we encountered some problems as follows:

The running process will be interrupted and showed "core dumped", such as:
(1) double free or corruption (!prev)
Aborted (core dumped)

(2)15046 > 179: Vmf out of range !
151127305 > 481: Vmf out of range !
free(): invalid size
Aborted (core dumped)

(3) 33753347 > 3443: Vmf out of range !
64576 > 14242: Vmf out of range !
Segmentation fault (core dumped)

(4) malloc(): invalid size (unsorted)
Aborted (core dumped)
There are >2000 CDS sequences that did not give the result. The resulting files are empty, but the following information is displayed on the terminal. I do not understand the meaning of these information and I do not know why these sequences did not produce mapping Hits.

BMnr21259 > 0 2664 Contig29_118 13717 13720 5.17 0.00 1310 1380 57 56 130 124
BMnr21250 < 0 1368 Contig49_118 18167 18147 17.75 5.20 658 711 28 27 68 62
BMnr21295 > 0 1242 Contig32_118 14382 14387 0.00 9.28 642 664 26 25 61 57
.....

We generated the genomic database as follows:
spaln -W -KD -E -t10 Sample118A.gf

We have tried following commands, But the "core dumped" still arises:
spaln -Q4 -O0,5,6,7 -M5 -t120 -d Sample118A BMnr_CDS.fas
spaln -Q7 -O0,5,6,7 -M5 -t120 -d Sample118A BMnr_CDS.fas
spaln -Q4 -O0,5,6,7 -M300 -t120 -d Sample118A BMnr_CDS.fas
spaln -Q7 -O0,5,6,7 -M300 -t120 -d Sample118A BMnr_CDS.fas
spaln -Q4 -O0,5,6,7 -po -yX1 -M500 -S3 -LS -t120 -d Sample118A BMnr_CDS.fas
spaln -Q7 -O0,5,6,7 -po -yX1 -M500 -S3 -LS -t120 -d Sample118A BMnr_CDS.fas
....

We've also tried running with different query sequences, and it seems that some of them are causing "core dump". But there are so many CDS sequences that it's hard to figure out exactly which query sequences out of 25,000 will cause "core dump".

Any suggestions?

Thanks,
Min-Jin

ogotoh · 2023-08-04T01:39:00Z

Thank you for your report.

I wander some options you set might be troublesome.

-E option is obsolete, and should not be used.
The argument to -t option should not be larger than the number of cores of your system. Even if your machine is equipped with many cores, too many threads will soon exhaust available memory. Please try a moderate number, say 16 or 32.
The argument to -M option should also be moderate. I expect it (expected number of close paralogs) only a few for DNA queries.

< There are >2000 CDS sequences that did not give the result.

It is not unusual that some queries fail to output results but instead show messages like your examples. This happens when, for some reasons, the query is not similar to any part of the genomic sequence.

Although I don’t think it the major cause of your trouble, I have found a few minor bugs. I just uploaded a new version, spaln2.4.13g that fixes them.

Osamu,

minjinhan · 2023-08-08T02:47:55Z

Dear Osamu. Thank you very much for your help and suggestions. It worked relatively well when I used spaln2.4.13g and set the -T parameter. But we found two other problems. First, I found that searching the genomes of different samples with the same queries and parameters was still core dumped in a few samples. Second, I found that the program would run very slowly towards the end, and although I set up a lot of CPUs, such as 10, at the beginning the program would run using all 10 CPUs, but it would usually end up using only one CPU for a very long time. Thanks, Min-Jin

…

-------------------------------------------------------------------------------------- Min-Jin Han, Associate professor | Ph. D. State Key Laboratory Of Silkworm Genome Biology Southwest University No. 2 Tiansheng Road, BeiBei District Chongqing 400715 P.R.China At 2023-08-04 09:39:11, "ogotoh" ***@***.***> wrote: Thank you for your report. I wander some options you set might be troublesome.

-E option is obsolete, and should not be used. The argument to -t option should not be larger than the number of cores of your system. Even if your machine is equipped with many cores, too many threads will soon exhaust available memory. Please try a moderate number, say 16 or 32. The argument to -M option should also be moderate. I expect it (expected number of close paralogs) only a few for DNA queries. < There are >2000 CDS sequences that did not give the result. It is not unusual that some queries fail to output results but instead show messages like your examples. This happens when, for some reasons, the query is not similar to any part of the genomic sequence. Although I don’t think it the major cause of your trouble, I have found a few minor bugs. I just uploaded a new version, spaln2.4.13g that fixes them. Osamu, — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

ogotoh · 2023-08-10T03:03:13Z

Dear Min-Jin,

I am trying a larger scale test to find when spaln fails. Please wait a while till I can figure out the source of the trouble.

Because of the nature of the queue used by spaln, it is normal that only a small fraction of CPUs actually works at the end of a run. A small number of queries may contribute to the prolonged execution,

Osamu,

ogotoh · 2023-09-13T06:30:56Z

Dear Min-Jin,

I have just uploaded new version of spaln ver3.0.0. Compared with previous versions, the computation speed has been considerably improved, partly due to modified algorithms and vectorization. For DNA queries, speeding up is most prominent with -LS (local similarity) option. Please try the new version, and if possible, please tell me your opinion as to it.

Osamu,

minjinhan · 2023-09-13T06:49:50Z

Sounds great, Thank you!

…

-------------------------------------------------------------------------------------- Min-Jin Han, Associate professor | Ph. D. State Key Laboratory Of Silkworm Genome Biology Southwest University No. 2 Tiansheng Road, BeiBei District Chongqing 400715 P.R.China At 2023-09-13 14:31:06, "ogotoh" ***@***.***> wrote: Dear Min-Jin, I have just uploaded new version of spaln ver3.0.0. Compared with previous versions, the computation speed has been considerably improved, partly due to modified algorithms and vectorization. For DNA queries, speeding up is most prominent with -LS (local similarity) option. Please try the new version, and if possible, please tell me your opinion as to it. Osamu, — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core dumped #64

core dumped #64

minjinhan commented Aug 1, 2023

ogotoh commented Aug 4, 2023

minjinhan commented Aug 8, 2023 via email

ogotoh commented Aug 10, 2023

ogotoh commented Sep 13, 2023

minjinhan commented Sep 13, 2023 via email

core dumped #64

core dumped #64

Comments

minjinhan commented Aug 1, 2023

ogotoh commented Aug 4, 2023

minjinhan commented Aug 8, 2023 via email

ogotoh commented Aug 10, 2023

ogotoh commented Sep 13, 2023

minjinhan commented Sep 13, 2023 via email