Parallelise collapse_paralogs? #126

johnlees · 2021-07-28T13:51:25Z

Running and finding the Processing paralogs... stage quite slow:

Processing paralogs...
  2%|██▉                                                    | 92/3815 [1:37:47<153:39:25, 148.58s/it]

Looks single threaded in the code – is it possible to do this in a multiprocessing loop, or are there interactions which make this difficult?

The text was updated successfully, but these errors were encountered:

gtonkinhill · 2021-07-29T04:42:10Z

This is definitely a bottleneck in the pipeline. It is not trivial to parallelise as the processing of one paralogous family can depend on the result of another.

The algorithm initially attempts to collapse paralogous genes by identifying the nearest neighbour of a paralogous gene within the graph and collapsing them if they are from seperate samples. This is tricky to parallelise as you need to be sure that a previous step has not already included a gene from the same sample in that cluster.

When this approach fails the algorithm defaults to using gene context. This stage would be much easier to parallelise and is already faster as it does not rely on calculating shortest paths. I was hoping to experiment with how much the results would change if we only used this approach but might not get a chance for a couple of months.

johnlees · 2021-07-29T09:46:58Z

To keep using the first approach, could you use some shared memory to mark genes which have been included? The shared memory manager in python3.8 has made this kind of thing easier to get to work in poppunk

Just a suggestion though, sounds like this was already on your radar!

gtonkinhill self-assigned this Jul 29, 2021

gtonkinhill added the enhancement New feature or request label Jul 29, 2021

rbeiko mentioned this issue Oct 24, 2022

Question about multithreading during paralog phase #207

Closed

nzmacalasdair mentioned this issue Apr 13, 2024

Increase in processing time #284

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelise collapse_paralogs? #126

Parallelise collapse_paralogs? #126

johnlees commented Jul 28, 2021 •

edited

Loading

gtonkinhill commented Jul 29, 2021

johnlees commented Jul 29, 2021

Parallelise collapse_paralogs? #126

Parallelise collapse_paralogs? #126

Comments

johnlees commented Jul 28, 2021 • edited Loading

gtonkinhill commented Jul 29, 2021

johnlees commented Jul 29, 2021

johnlees commented Jul 28, 2021 •

edited

Loading