Reduce memory usage of scaffolding step - Limit size of SAM file from aligning contigs against complete ref db #97

ppericard · 2019-11-04T14:56:43Z

Right now, we are using SortMeRNA to align contigs against the complete ref db, and we are outputting all alignments. This can lead to very big SAM files because some very conserved contigs will have alignments against almost every ref sequence. In the next sub-step, when reading that SAM file with Python, we will load in memory all alignments of the same contig, which can lead to huge memory usage.

We can imagine several complementary solutions to reduce this RAM and disk space usage:

optimise batch SAM reading in Python, by storing only relevant data in memory
store alignments in a BAM file instead of a SAM file. Then we'll probably need a combination of samtools and pythons libraries to read it properly.
rethink our scaffolding strategy to not have to output all possible alignments. This strategy will probably improve memory usage the best, but it will change the algorithm and need to be well though of in advance.

ppericard added the enhancement label Nov 4, 2019

ppericard added this to the Release v2.0.0 milestone Nov 4, 2019

ppericard mentioned this issue Nov 6, 2019

Memory Error #96

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage of scaffolding step - Limit size of SAM file from aligning contigs against complete ref db #97

Reduce memory usage of scaffolding step - Limit size of SAM file from aligning contigs against complete ref db #97

ppericard commented Nov 4, 2019 •

edited

Loading

Reduce memory usage of scaffolding step - Limit size of SAM file from aligning contigs against complete ref db #97

Reduce memory usage of scaffolding step - Limit size of SAM file from aligning contigs against complete ref db #97

Comments

ppericard commented Nov 4, 2019 • edited Loading

ppericard commented Nov 4, 2019 •

edited

Loading