Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage of scaffolding step - Limit size of SAM file from aligning contigs against complete ref db #97

Open
3 tasks
ppericard opened this issue Nov 4, 2019 · 0 comments
Milestone

Comments

@ppericard
Copy link
Member

ppericard commented Nov 4, 2019

Right now, we are using SortMeRNA to align contigs against the complete ref db, and we are outputting all alignments. This can lead to very big SAM files because some very conserved contigs will have alignments against almost every ref sequence. In the next sub-step, when reading that SAM file with Python, we will load in memory all alignments of the same contig, which can lead to huge memory usage.

We can imagine several complementary solutions to reduce this RAM and disk space usage:

  • optimise batch SAM reading in Python, by storing only relevant data in memory
  • store alignments in a BAM file instead of a SAM file. Then we'll probably need a combination of samtools and pythons libraries to read it properly.
  • rethink our scaffolding strategy to not have to output all possible alignments. This strategy will probably improve memory usage the best, but it will change the algorithm and need to be well though of in advance.
@ppericard ppericard added this to the Release v2.0.0 milestone Nov 4, 2019
@ppericard ppericard mentioned this issue Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant