-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
potential performance improvements for H-bond analysis 'between' #4130
Comments
Hello, I'd be happy to try and work on a PR for this if it is okay? I'm not a GSOC person though, just keen to start contributing to some open source software. Assuming yes, my understanding of the inefficiency is: Inside the function:
If a user has provided a selection to the input However, before this filtering, some calculations are run on atoms that will later be filtered.
The solution that I think you would like to see is:
I have had a look at your documentation on how to contribute, and its very clear so thank you for that. In summary, is it okay for me to start working on this and if so have I've understood the issue correctly? If so, I'll start working on a solution and a PR. Thanks! |
@RMCrean you're more than welcome to work on the issue, outside of GSOC. The tag just means that we think it's suitable for potential GSOC candidates. I think your understanding of the problem is generally correct but @p-j-smith is the expert here. I don't think that code in Please put up a PR! (EDIT: and please ensure that the PR mentions this issue so that the PR gets auto-linked with the issue) |
Great, thanks. I'll start working on this later today then! Thanks for the point about |
Sounds like a decent start. I suggest you just put up a PR with what you have. You can mark it as draft if you like but it's generally a lot easier to discuss when code is visible. For doing performance comparison, I would create two virtual environments ( You can paste the code that you use for comparison into the comments of the PR. We also have ASV benchmarking code in benchmarks/benchmarks that are run nightly. These benchmarks are published at https://www.mdanalysis.org/benchmarks/ and it would be really great to have some for hydrogen bond analysis. |
I want to work on this if no one is working right now. @orbeckst |
So, here is what I have found.
I ran cProfile to check which part was relatively taking the most cumulative time. Here is the result The
takes the most time, enumerating all possible donors in the found hydrogens. To leverage the Please let me know if my understanding is correct so I can work on it further. @orbeckst |
Without directly looking at the code I don't know if this will solve the problem. Try it out and open a PR. Check that your changes don't break the tests. Benchmark. |
Is your feature request related to a problem?
In the review of PR #4092 , @richardjgowers observed that there's an opportunity for improving the performance for the "between" keyword of HydrogenBondAnalysis #4092 , namely to filter atomgroups for "between" before
capped_distances()
is called to calculate distances.Describe the solution you'd like
Follow up on the suggestion and implement the suggested improvement and benchmark the old code vs the updated one.
Describe alternatives you've considered
Do nothing — that's fine, the update is not mission-critical.
Additional context
Discussion #4092 around
mdanalysis/package/MDAnalysis/analysis/hydrogenbonds/hbond_analysis.py
Line 735 in 3ebd5ec
@richardjgowers : [...] would it make more sense to filter before the call to capped_distances?
mdanalysis/package/MDAnalysis/analysis/hydrogenbonds/hbond_analysis.py
Line 712 in 3ebd5ec
@p-j-smith : good point - that would definitely be better, but it's probably not a straightforward change and I imagine there are lots of ways to do it. One way would be to iterate over the pairs of atom selections passed to
between
and calculate donor-acceptor distances for each pair. So e.g. ifbetween=[["protein", "SOL"], ["protein", "protein"]]
, we would calculate:and concatenate them.
The text was updated successfully, but these errors were encountered: