-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelization #46
Comments
Hi @ale94mleon, It's possible to parallelize the run method of ProLIF, and it's something I plan on including in the code at some point. In the meantime, here's a script to do that: import multiprocessing as mp
from tqdm.auto import tqdm
import prolif as plf
import MDAnalysis as mda
# setup the mda.Universe, lig and prot selections
# ...
# parameters for the parallel run
N_PROCESSES = 8
frames = list(range(u.trajectory.n_frames))
interactions = ['HBDonor', 'HBAcceptor', 'PiStacking', 'Anionic', 'Cationic', 'CationPi', 'PiCation']
# run in parallel
def job(frame):
fp = plf.Fingerprint(interactions)
fp.run(u.trajectory[frame:frame+1], lig, prot, progress=False)
return fp.ifp[0]
with mp.Pool(N_PROCESSES) as pool:
results = []
# trigger MDAnalysis caching
lig.convert_to.rdkit()
prot.convert_to.rdkit()
for ifp in tqdm(pool.imap_unordered(job, frames),
total=len(frames)):
results.append(ifp)
df = plf.to_dataframe(results, interactions) This will run on all frames of your trajectory, if you only want a subset of the trajectory make sure to change |
Cool! This looks very nice. Thanks @cbouy !! |
## [1.0.0] - 2022-06-07 ### Added - Support for multiprocessing, enabled by default (Issue #46). The number of processes can be controlled through `n_jobs` in `fp.run` and `fp.run_from_iterable`. - New interaction: van der Waals contact, based on the sum of vdW radii of two atoms. - Saving/loading the fingerprint object as a pickle with `fp.to_pickle` and `Fingerprint.from_pickle` (Issue #40). ### Changed - Molecule suppliers can now be indexed, reused and can return their length, instead of being single-use generators. ### Fixed - ProLIF can now be installed through pip and conda (Issue #6). - If no interaction is detected in the first frame, `to_dataframe` will not complain about a `KeyError` anymore (Issue #44). - When creating a `plf.Fingerprint`, unknown interactions will no longer fail silently.
Something I noticed when trying to create prolif molecules is that the rdkit mol user assigned property 'map index' was missing if I used mp.Pool. I imagine this is the case for other user assigned properties, if they exist. I believe this issue arose due to the pickling of the molecule objects when multiprocessing is run. I fixed this by running: |
Just to come back to this - It seems like the solution I posted above has its issues. If I try to access map index property on a mol run through the multiprocessor (with Chem DefaultPickleProperties assigned to All), the map index is available but it doesn't correspond to the correct atomic numbering in the input file. If I do the same without the multiprocessing then the atomic numbering is correct. |
That doesn't sound right! Thanks for reporting it, I'll try to have a look soon |
This is not an issue, it is more a question/suggestion. Is it possible to parallelize the method
run
of theclass Fingerprint
? I was trying to look on the MDanalysis documentation and this is not so straightforward because howMDAnalysis.core.universe.Universe.trajectory
is designed. But I also read about PMDanalysis. So, should not be possible to incorporate the parralelization to ProLIF? I think that this feature will really improve the package and the usability.The text was updated successfully, but these errors were encountered: