Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization #46

Closed
ale94mleon opened this issue Mar 15, 2022 · 5 comments · Fixed by #128
Closed

Parallelization #46

ale94mleon opened this issue Mar 15, 2022 · 5 comments · Fixed by #128
Labels
enhancement New feature or request

Comments

@ale94mleon
Copy link

ale94mleon commented Mar 15, 2022

This is not an issue, it is more a question/suggestion. Is it possible to parallelize the method run of the class Fingerprint? I was trying to look on the MDanalysis documentation and this is not so straightforward because how MDAnalysis.core.universe.Universe.trajectory is designed. But I also read about PMDanalysis. So, should not be possible to incorporate the parralelization to ProLIF? I think that this feature will really improve the package and the usability.

@cbouy
Copy link
Member

cbouy commented Mar 15, 2022

Hi @ale94mleon,

It's possible to parallelize the run method of ProLIF, and it's something I plan on including in the code at some point. In the meantime, here's a script to do that:

import multiprocessing as mp
from tqdm.auto import tqdm
import prolif as plf
import MDAnalysis as mda

# setup the mda.Universe, lig and prot selections
# ...

# parameters for the parallel run
N_PROCESSES = 8
frames = list(range(u.trajectory.n_frames)) 
interactions = ['HBDonor', 'HBAcceptor', 'PiStacking', 'Anionic', 'Cationic', 'CationPi', 'PiCation']

# run in parallel 
def job(frame):
    fp = plf.Fingerprint(interactions)
    fp.run(u.trajectory[frame:frame+1], lig, prot, progress=False)
    return fp.ifp[0]

with mp.Pool(N_PROCESSES) as pool:
    results = []
    # trigger MDAnalysis caching
    lig.convert_to.rdkit()
    prot.convert_to.rdkit()
 
    for ifp in tqdm(pool.imap_unordered(job, frames),
                    total=len(frames)):
        results.append(ifp)

df = plf.to_dataframe(results, interactions)

This will run on all frames of your trajectory, if you only want a subset of the trajectory make sure to change frames = list(range(u.trajectory.n_frames)) to what you need.
It will run 8 different processes in parallel, adjust that number according to your machine.

@ale94mleon
Copy link
Author

Cool! This looks very nice. Thanks @cbouy !!

@cbouy cbouy added the enhancement New feature or request label Mar 15, 2022
@cbouy cbouy mentioned this issue Jun 5, 2022
7 tasks
cbouy added a commit that referenced this issue Jun 7, 2022
## [1.0.0] - 2022-06-07

### Added
- Support for multiprocessing, enabled by default (Issue #46). The number of processes can
  be controlled through `n_jobs` in `fp.run` and `fp.run_from_iterable`.
- New interaction: van der Waals contact, based on the sum of vdW radii of two atoms.
- Saving/loading the fingerprint object as a pickle with `fp.to_pickle` and
  `Fingerprint.from_pickle` (Issue #40).
### Changed
- Molecule suppliers can now be indexed, reused and can return their length, instead of
  being single-use generators.
### Fixed
- ProLIF can now be installed through pip and conda (Issue #6).
- If no interaction is detected in the first frame, `to_dataframe` will not complain about
  a `KeyError` anymore (Issue #44).
- When creating a `plf.Fingerprint`, unknown interactions will no longer fail silently.
@noahharrison64
Copy link

Something I noticed when trying to create prolif molecules is that the rdkit mol user assigned property 'map index' was missing if I used mp.Pool. I imagine this is the case for other user assigned properties, if they exist. I believe this issue arose due to the pickling of the molecule objects when multiprocessing is run. I fixed this by running:
Chem.SetDefaultPickleProperties(Chem.PropertyPickleOptions.AllProps)
Thought I'd just point this out in case this was something you weren't aware of!

@noahharrison64
Copy link

noahharrison64 commented Jul 8, 2022

Just to come back to this - It seems like the solution I posted above has its issues. If I try to access map index property on a mol run through the multiprocessor (with Chem DefaultPickleProperties assigned to All), the map index is available but it doesn't correspond to the correct atomic numbering in the input file. If I do the same without the multiprocessing then the atomic numbering is correct.

@cbouy
Copy link
Member

cbouy commented Jul 10, 2022

That doesn't sound right! Thanks for reporting it, I'll try to have a look soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants