Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactoring HydrogenBondAnalysis #2238

Closed
orbeckst opened this issue Apr 6, 2019 · 7 comments
Closed

refactoring HydrogenBondAnalysis #2238

orbeckst opened this issue Apr 6, 2019 · 7 comments

Comments

@orbeckst
Copy link
Member

orbeckst commented Apr 6, 2019

Is your feature request related to a problem? Please describe.
The analysis.hbonds.hbond_analysis.HydrogenBondAnalysis class is very useful and widely used but the code is messy and old and thus hard to maintain, to debug (see e.g. #1687), or to extend. It also does not follow the Analysis API. Performance is also not stellar (e.g. does not use capped_distances())

Describe the solution you'd like
Refactor/rewrite, ideally without breaking the API, which includes

  • signature of HydrogenBondAnalysis
  • stored time series (attributes timeseries and table
  • convenience functions such as count_by_time(), count_by_type(), timesteps_by_type()

The data structures should be clean, see #2177 for a discussion.

The class should implement the Analysis API ("Bauhaus") from #719.

Describe alternatives you've considered

  1. Leave the old code in place. This has the advantage that all the users that have been using this piece of code heavily won't get their code broken... See @kain88-de 's The format of timerseries for hydrogen bond and water bridge analysis #2177 (comment)

Before you make any changes to the water bridge output please consider [HydrogenBondAnalysis] is a very widely used feature of MDAnalysis. It is the most frequently cited analysis in papers! I don’t have exact numbers but I looked at almost every paper this year citing MDAnalysis.

Maybe we keep the old time series and add the improved version with a new name.

  1. Keep old version of the code and deprecated it for release 2.0 (!), i.e., keep it around forever in maintenance mode. Add the new module under a different name.

Current work

Currently there are two PRs with different solutions

@orbeckst
Copy link
Member Author

orbeckst commented Apr 6, 2019

@xiki-tempula you said about the new proposed HydrogenBondAnalysis (PR #2237 by @p-j-smith ) #2237 (comment)

The code looks really nice, the capped distance is a very neat way of finding the atoms compared with the old neighbour search. I will try to use capped_distance in the water bridge we well. I guess the only problem is that the interface and the outputs are very different from the previous one. Personally, I hate the original representation as well.

Do you think there is a way that you can build/refactor WaterBridgeAnalysis on the new class?

I think we are starting a discussion about what the API of a new HydrogenBondAnalysis ought to look like so things would likely change. But you would think that you could – in principle – work with the new class? And do you think that would actually have the time to do it if necessary?

@xiki-tempula
Copy link
Contributor

xiki-tempula commented Apr 6, 2019

@orbeckst I guess the major obstacle is the underlying data structure. In wba, the data is stored as a network to represent the nature of the water network, so the analysis part acts upon the underlying network data structure.
The other though a bit trivial problem is the amount of redundancy removed by combing two classes. If expressed in simple pseudocode, hba works as

for frame in frames:
    find_hb(sele1, sele2)

whereas wba works as:

for frame in frames:
    find_hb(sele1, sele2)
    find_hb(sele1, water1)
    for water_no in number_of_water:
        find_hb(water[water_no], sele2)
        find_hb(water[water_no], water[water_no+1])

So the only common part is find_hb(sele1, sele2), which is only 3 lines in the current PR.

@orbeckst
Copy link
Member Author

Update:

  1. We want to go ahead with developing @p-j-smith and @bieniekmateusz 's new implementation PR Hbond analysis #2237 as the new version of hydrogen bond analysis.
  2. The new version will live in MDAnalysis.analysis.hydrogenbonds.hbond_analysis so that we can keep the old version in MDAnalysis.analysis.hbonds.hbond_analysis and slowly move it into deprecated and legacy status (by 1.0?).
  3. The new version does not have to be fully API compatible with the old version (because we are keeping the old one).
  4. We will keep the widely used hbonds.hbond_analysis around for a while (with its tests) but won't continue developing it and will strive to replace it with the new version within MDAnalysis (e.g., for WaterBridgeAnalysis, PR Move the water bridge analysis to the new analysis class #2087).

@xiki-tempula
Copy link
Contributor

xiki-tempula commented Jul 31, 2019

The PR #2087 is fully compatible with the old hbond_analysis and has passed all the test. I wonder if it is possible to review the PR, please? Thank you.

@orbeckst
Copy link
Member Author

See my #2087 (comment).

@lilyminium
Copy link
Member

lilyminium commented Feb 22, 2020

As part of this waterdynamics.HydrogenBondLifetimes should be refactored to use hydrogenbonds.HydrogenBondAnalysis and/or turned into a subclass. Edit: #2547

@orbeckst
Copy link
Member Author

orbeckst commented Mar 5, 2020

This issue was fixed by PR #2237.

Anything else can go into their own issues.

@orbeckst orbeckst closed this as completed Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants