-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading plaintext alignments consumes a lot of memory (workaround inside) #24
Comments
I'm using this workaround, and the memory usage went from over 1TB to only 20GB. However, mSWEEP is still very slow to read the pseudoalignment file. It has now been reading the file for 11 CPU hours at 4 threads. The compacted alignment file is only about 250MB, so that does not sound right. |
The "reading" part also includes deserializing the pseudoalignment into memory, constructing equivalence classes, and assigning the reads to the equivalence classes so it's a bit more than just reading the file, but still this probably needs some design changes to handle large input (100 000 000 reads x 60 000 references in this example) better. |
Relevant functions for this issue: Deserializing the file Memory use in plaintext data Equivalence classes
|
v2.1.0 should contain a fix for this. I've also implemented a flag to filter out targets that have 0 alignments across reads, this can reduce the memory and cpu use significantly for sparse inputs. Filtering can be toggled with |
Reading a plaintext pseudoalignment from Themisto consumes a lot more memory than is necessary because plaintext input disables the internal encoding of the pseudoalignments as a sparse vector.
Workaround: use alignment-writer to compact the alignment file and then read in the compact alignment file instead of the plaintext one.
The text was updated successfully, but these errors were encountered: