Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent Autocorrelation and Intermittency - Across Different Analysis #2256

Merged
merged 47 commits into from
May 12, 2020

Conversation

bieniekmateusz
Copy link
Member

@bieniekmateusz bieniekmateusz commented Apr 18, 2019

Fixes #2247

Changes made in this Pull Request:

  • Provides well defined autocorrelation which can be used for both, SurvivalProbability and HydrogenLifetimes (and more?)
  • Autocorrelation function and intermittency function are now independant with their own test cases
  • Removed the class HydrogenBondLifetimes and moved the autocorrelation functionality into the hbond_analysis class as a .autocorrelation method. This method should replace also hbond_autocorrel.py file in the future.
  • New test cases for the method .autocorrelation (HydrogenBondAnalysis) were added
  • Added a test cases that, with our definition of intermittency, would fail in the case of the hbond_autocorrel.py.

The main issue with HydrogenBondLifetimes (besides the bug) is that it defined its own definition of a hydrogen bond, without allowing the user to change it. It should be clear that the hydrogen bonds first have to be computed by the user, and then the autocorrelation for the bonds can be found. This is achieved with the method .autocorrelation.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Work done in duo @p-j-smith

Now the function can be embedded into HydrogenBondLifetime.
should ideally be extracted from the HydrogenBondLifetime and fed into
it in the constructor.
test. Our end index is inclusive (5th frame, giving 6 frames). But
the other author's analysis was done on 0-4 frames (5 frames).
…ions

with their own test cases.

Removed the class HydrogenBondLifetimes and moved the autocorrelation
functionality into the hbond_analysis class as a .autocorrelation
method. This method should replace also hbond_autocorrel.py file. Test
cases for the new method .autocorrelation are added.

Added a test cases that, with our definition
of intermittency, would fail in the case of the hbond_autocorrel.py.
@codecov
Copy link

codecov bot commented Apr 18, 2019

Codecov Report

Merging #2256 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #2256   +/-   ##
========================================
  Coverage    91.03%   91.03%           
========================================
  Files          160      160           
  Lines        21612    21612           
  Branches      3087     3087           
========================================
  Hits         19675    19675           
  Misses        1318     1318           
  Partials       619      619           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95641de...95641de. Read the comment docs.

@bieniekmateusz
Copy link
Member Author

It might be good to consider joining this later with PR #2237 by @p-j-smith, which should be easy to do since the functions stand alone.

Linking #2238 for discussion.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to consolidate a few of these classes. First the peformance of the hbondanalysis class needs to get addressed, then before autocorrel can get squashed in its functionality needs to be replicated in the analysis modue.

@@ -296,6 +296,11 @@ def __init__(self, universe,
nruns=1, # number of times to iterate through the trajectory
nsamples=50, # number of different points to sample in a run
pbc=True):

warnings.warn("This class is deprecated, use analysis.hbonds.HydrogenBondAnalysis "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HBondAnalysis class doesn't replicate the definition of hbonds in this class (eg here you can pass AtomGroups). I think we can eventually deprecate this, but it will have to be after PRs like #2237 which modernise the hbond detection code a little.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge the changes together from #2237 and here and submit it as one large commit. The two converge together but we tried to separate them and make them incremental to simplify the decision making.

The definition of hbonds in HBL was defined internally (hiding it from the user) and could not be changed previously from the point of view of the user, which I think is problematic and could lead to a naive use of the class. This, and due to the existing bug #2247, and the overall design of the code, where only one Hydrogen Bond analysis tool exists (analysis.hbonds.HydrogenBondAnalysis) would be much cleaner.

Could I maybe tell the user to run analysis.hbonds.HydrogenBondAnalysis and explicitly give the user the parameters that were previously used in HBL? This way they could reproduce partially the same analysis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bieniekmateusz I'd rather do the opposite, get #2237 done (which is large enough as is), then rebase these changes on top of that and see what these changes then look like.

One other concern I've got is performance, which is easier to make decisions on once hbondanalysis is given a fair chance of being fast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hope the changes to be minimal. We have a clear interface for autocorrelation and would simply plug that into the new hbonds.HydrogenBondAnalysis. That's partly why we're doing this - to simplify it. We would have to replace only one line in the new code.

Performance wise, we're not making anything slower. Previously, HBL used internally hbonds.HydrogenBondAnalysis, and the code for autocorrelation I am certain was slower (the overhead of function calling, and multiple pass over the dataset, in comparison to our linear implementation with the "window").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion over the deprecation warning - we understood that we should first implement the hydrogen bond lifetime functionality in the new hydrogen bond class before pressing ahead with this PR. The avoid over-complicating this current pull request, we've removed this deprecation warning and will add a hydrogen bond lifetime method that uses this autocorrelation function in a future pull request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @richardjgowers, we temporarily removed the changed made to the part of the code and the deprecation warning, as @p-j-smith describes. Could comment on this when you have a second. We assume this is what was preventing this pull from being merged. Thanks

timeseries_data = [[] for _ in range(tau_max)]

# calculate autocorrelation
for t in range(0, len(list_of_sets), window_step):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this many loops this will need to be cythonised (like similar utility things in lib/util/_cutil)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to have a look at the cython part since I never wrote any actual code in it. However, it might be important to ask if this is necessary right now. This code has a linear complexity (window_step is always small) and is significantly faster than the previous implementations. The performance tests I run showed that 99.9% of the running time is spent in atom_select, and the actual Autocorrelation took less than a second.

So I'd advise against making this code more complex with cython, at least until we have a clear case that the performance optimisation here is useful (so on the ground of Premature Optimisation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @richardjgowers , I finally had a look at the performance and I designed this little unit test to see whether I should look more into cython. It took below 1 second to compute 100 k frames with the same atoms (worst case), and around 8 seconds for 1 million frames. With this number of frames, it is going to be atom selection that takes most of the time. For this reason I would recommend against cythonising this function for now.

Also, it appears that having a list of sets as an argument in the function complicates cythonizing.

import time
def test_autocorrelation_perfTest():
    # generate 1 million input, worst case scenario (50 same atoms found each time)
    input = [{x for x in range(50)} for x in range(1000 * 1000)]

    st = time.time()
    autocorrelation(input, tau_max=3)
    print('Comuptation Time', time.time() - st)

@bieniekmateusz
Copy link
Member Author

Regarding the API. We realise it is a sensitive class and that you care about the users and we are more than happy to try to account for that. At the same time, by ensuring that only one class takes care neatly of finding Hydrogen Bond, we remove some of the confusion. So overall we hope to simplify the interface and the underlying code, which in the long run should bring more users.

For example, right now someone might use hbonds.HydrogenBondAnalysis to find hydrogen bonds with a certain set of parameters, just to realise that they cannot calculate HBondLifetime - because it has to find its own Hydrogen Bonds from scratch, without letting the user define them.

In other words, the temporary pain for the users to switch will hopefully be minimal with the new HBAnalysis in #2237 and allow directly to calculate Lifetime from their dataset.

@bieniekmateusz
Copy link
Member Author

The separation of the more intuitive Autocorrelation and Intermittency in now separate functions has the potential to provide a good basis for consistency, as the functions are shared across Hydrogen Bond Lifetimes and Survival Probability.

This is particularly the case with Intermittency which we found to be a tricky concept. By ensuring that atoms are only counted if they comeback, we do not overestimate the presence of atoms. Also, by using the idea of consecutive absence, we ensure that whether it is tau=8 or tau=20, the intermittency of (ie) 2 has exactly the same meaning and significance.

@bieniekmateusz
Copy link
Member Author

Hi @richardjgowers

Is there anything we can do to facilitate the discussion or progress here?

Thanks, Mat

@richardjgowers
Copy link
Member

@bieniekmateusz we can't deprecate hbondautocorrel into hbondanalysis.autocorrel until the API for defining/finding hydrogen bonds is identical between the two, so that can't merge currently.

I (or someone else) needs to review the autocorrelation code here and PR #2237 , they can probably get merged soon

bieniekmateusz and others added 10 commits October 29, 2019 22:03
Now the function can be embedded into HydrogenBondLifetime.
should ideally be extracted from the HydrogenBondLifetime and fed into
it in the constructor.
test. Our end index is inclusive (5th frame, giving 6 frames). But
the other author's analysis was done on 0-4 frames (5 frames).
…ions

with their own test cases.

Removed the class HydrogenBondLifetimes and moved the autocorrelation
functionality into the hbond_analysis class as a .autocorrelation
method. This method should replace also hbond_autocorrel.py file. Test
cases for the new method .autocorrelation are added.

Added a test cases that, with our definition
of intermittency, would fail in the case of the hbond_autocorrel.py.
@orbeckst
Copy link
Member

Given that PR #2237 was merged, what's the status on this PR?

@p-j-smith
Copy link
Member

We should have some time to wrap this up over the next week or so. We'll rebase onto the latest develop and then add a hydrogenbonds.lifetime method (which will use our standalone autocorrelation function in this PR) to the new HydrogenBonds class.

This function will be removed in a future PR once a replacement that uses the new hydrogen bond class is in place.
@bieniekmateusz
Copy link
Member Author

Thanks @orbeckst for taking the time. We believe we have answered all of your questions now with the latest push. Please let us know if you have any other comments.

@bieniekmateusz
Copy link
Member Author

We've added the module under 4.12 Utils, inside of the analysis modules. Please let us know if you are happy with the location or if we should move it somewhere more suitable. I think we now addressed all of your questions. Thanks

@bieniekmateusz
Copy link
Member Author

bieniekmateusz commented Apr 13, 2020

After further thought, we think that MDAnalysis.lib.mdamath might be a more suitable location for these functions, but we'll defer to you @orbeckst on this.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good. Thank you for the docs — they improve the usefulness of the code a lot.

I suggest you make a new module MDAnalysis.lib.correlations – see comments.

There are also some minor comments (mainly on citing), please have a look. Thank you!!

acount for intermittency before passing the results to
:func:`autocorrelation`.

See Gowers and Carbonne, 2015, (DOI:10.1063.1.4922445) for a further
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please cite using reST syntax and put the paper in references.rst (see References.

If the whole module is based on the paper, please also add a duecredit citation to the module or the appropriate function (look elsewhere in the code base where duecredit is used – you will only need the DOI for that).

package/MDAnalysis/analysis/utils/autocorrelation.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/utils/autocorrelation.py Outdated Show resolved Hide resolved
this feature is present at every frame from :math:`t_0` to :math:`N(t0, t_0 + \tau)`.
The angular brackets represent an average over all time origins, :math:`t_0`.

See Araya-Secchi et al., 2014, (https://doi.org/10.1016/j.bpj.2014.05.037)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cite with reST.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If use of this function should cite a paper then you can use a duecredit decorator. I don't know the history of the function, so your call.

package/MDAnalysis/analysis/utils/autocorrelation.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/utils/autocorrelation.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/utils/autocorrelation.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/utils/autocorrelation.py Outdated Show resolved Hide resolved
@orbeckst
Copy link
Member

@bieniekmateusz looking good. Will you also add duecredit?

Ping me when I should review again.

@richardjgowers – if you have a moment to also look over the final PR then that would be much appreciated.

@bieniekmateusz
Copy link
Member Author

Thanks @orbeckst . Our understanding is that currently when someone makes use of SurvivalProbability or hbond.hbond_analysis, the previous duecredit will be triggered.

@bieniekmateusz
Copy link
Member Author

We have now further corrected the citations and we hope to write a small paper in which we will discuss the generalisations we introduce.

Thanks @orbeckst and @richardjgowers for your help. Please let us know if you have any further comments.

@bieniekmateusz
Copy link
Member Author

The sphinx warnings have been now corrected, so that is all on our side now. Thanks

@orbeckst
Copy link
Member

@richardjgowers if you have anything to add here then please do; if I don't hear anything over the next 2 days I'll just dismiss your review to move forward. Thanks!

@orbeckst
Copy link
Member

(Maybe we get the PR merged in under a year... thank you @bieniekmateusz and @p-j-smith for not giving up!)

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, Will need .. versionchanged and .. versionadded tags added, and a CHANGELOG entry

@bieniekmateusz
Copy link
Member Author

We have updated the tags and the CHANGELOG file.

In the next stage we hope to use these functions in places that currently duplicate this functionality. In particular, we will add a new HydrogenBondLifetimes function to HydrogenBondsAnalysis, which will also remove the issue #2247. It would also be nice to add a feature that leverages the LeafletFinder tool, combined with this autocorrelation function, to calculate the rate of cholesterol flip-flop in lipid bilayers.

We are planning to write a short software development paper explaining the intermittency function and the generalisation of the autocorrelation function. Given your experience in this field @richardjgowers , the fact that your HBL code was part of our inspiration for this module, and your input in developing this module, we were wondering whether you would be willing to provide feedback and guidance on the technical aspects of the paper as an author on the publication? We realise you must be super busy at the minute, so no worries if not!

Big thanks to both of you @orbeckst and @richardjgowers for your help in getting this PR through!

@orbeckst
Copy link
Member

I restarted one CI task that ran out of time.

@richardjgowers are you happy & approve the changes?

@bieniekmateusz
Copy link
Member Author

Hi again, is there else you'd like us to have a look at?

Thanks, Mat

@orbeckst orbeckst self-assigned this May 12, 2020
@orbeckst
Copy link
Member

@bieniekmateusz @p-j-smith – apologies, this dropped off my radar. Merging/squashing now...

@bieniekmateusz
Copy link
Member Author

Thanks!

@orbeckst orbeckst merged commit d692fcd into MDAnalysis:develop May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HydrogenBondLifetimes (Autocorrelation) has an increasing value
5 participants