Path id and clustering optimization #6

jonassibbesen · 2020-07-14T23:41:07Z

The following PR contains these major changes:

Path ids are now located during inference instead of when parsing the alignment paths. With this change all ids is not stored in memory at the same time anymore thus decreasing overall memory usage significantly.
Inference path clusters are now by default inferred from the paths and not the reads. The read based clustering is only needed if multi-maps are used, which is currently not supported. Furthermore, the clustering is now also multi-threaded.
Probabilities are now collapsed during matrix construction. This reduces peak memory for really large clusters.

jonassibbesen added 8 commits June 26, 2020 19:10

restructure clustering

1f8ab2b

use only node clustering

7aa46c2

id refactor

348edae

compacted construct

f1f6dde

faster clustering

680346d

slower but lower memory clustering

93f92e9

thread clustering

0b13cd4

fix unit tests

0324839

jonassibbesen merged commit 1bc29e7 into master Jul 14, 2020

jonassibbesen deleted the clustering-optim branch July 21, 2020 22:03

CarlosAmadeo7 mentioned this pull request Nov 2, 2024

Inflate operation failed: invalid distance too far back terminate called after throwing an instance of 'std::runtime_error' #64

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Path id and clustering optimization #6

Path id and clustering optimization #6

jonassibbesen commented Jul 14, 2020

Path id and clustering optimization #6

Path id and clustering optimization #6

Conversation

jonassibbesen commented Jul 14, 2020