You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey! I've been running pileups, both observed and observed/expected, with cooltools.pileup. I noticed that obs/exp pileups are generated 20x slower than obs pileups (as for my data) when setting nproc=8: it took 40s to create a single obs/exp pileup and only 2s to create a single obs pileup. But when setting nproc=1, the speed is almost the same: both pileups took ~12s to run.
I explored the issue, and it seems like the problem is with the multiprocessing provider. cooltools uses multiprocess library internally, while I also tried multiprocessing and concurrent.futures from stdlib. The benchmarking results for obs/exp pileup creation are below:
Provider of the map functor
nproc
time
stdlib (map)
1
11.5s
concurrent.futures.ProcessPoolExecutor
1
15.8s
concurrent.futures.ProcessPoolExecutor
8
4.52s
multiprocess
8
21s
multiprocessing
8
3.31s
Both multiprocessing and concurrent.futures outperform multiprocess with nproc=8. Also note that the stdlib map in single-core mode is 2x faster than multiprocess.Pool.map with nproc=8. I suspect the reason is in how expected_df is serialized and/or passed to child processes in multiprocess.
multiprocessing uses pickle for serialization, while multiprocess uses dill. I checked that the serialization speed and byte length of the serialized expected_df are very similar between dill and pickle. So the problem might be with how multiprocess implements Pool or Queue.
Anyway, it seems like changing the multiprocessing framework would be beneficial for the performance of cooltools, especially when using CLI. Within API, users can provide a custom map_functor (but it is broken, see #535). I would be glad to start a discussion on this matter.
P.S. I don't know how reproducible the issue is, but it is in case of my data: 8 coolers, 2 different resolutions, various numbers of features and nproc.
The text was updated successfully, but these errors were encountered:
Hey! I've been running pileups, both observed and observed/expected, with
cooltools.pileup
. I noticed that obs/exp pileups are generated 20x slower than obs pileups (as for my data) when settingnproc=8
: it took 40s to create a single obs/exp pileup and only 2s to create a single obs pileup. But when settingnproc=1
, the speed is almost the same: both pileups took ~12s to run.I explored the issue, and it seems like the problem is with the multiprocessing provider. cooltools uses multiprocess library internally, while I also tried multiprocessing and concurrent.futures from stdlib. The benchmarking results for obs/exp pileup creation are below:
Both
multiprocessing
andconcurrent.futures
outperformmultiprocess
withnproc=8
. Also note that the stdlibmap
in single-core mode is 2x faster thanmultiprocess.Pool.map
withnproc=8
. I suspect the reason is in howexpected_df
is serialized and/or passed to child processes inmultiprocess
.multiprocessing
usespickle
for serialization, whilemultiprocess
usesdill
. I checked that the serialization speed and byte length of the serializedexpected_df
are very similar betweendill
andpickle
. So the problem might be with howmultiprocess
implementsPool
orQueue
.Anyway, it seems like changing the multiprocessing framework would be beneficial for the performance of cooltools, especially when using CLI. Within API, users can provide a custom
map_functor
(but it is broken, see #535). I would be glad to start a discussion on this matter.P.S. I don't know how reproducible the issue is, but it is in case of my data: 8 coolers, 2 different resolutions, various numbers of features and nproc.
The text was updated successfully, but these errors were encountered: