-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent Autocorrelation and Intermittency - Across Different Analysis #2256
Conversation
Now the function can be embedded into HydrogenBondLifetime.
should ideally be extracted from the HydrogenBondLifetime and fed into it in the constructor.
test. Our end index is inclusive (5th frame, giving 6 frames). But the other author's analysis was done on 0-4 frames (5 frames).
…ions with their own test cases. Removed the class HydrogenBondLifetimes and moved the autocorrelation functionality into the hbond_analysis class as a .autocorrelation method. This method should replace also hbond_autocorrel.py file. Test cases for the new method .autocorrelation are added. Added a test cases that, with our definition of intermittency, would fail in the case of the hbond_autocorrel.py.
Codecov Report
@@ Coverage Diff @@
## develop #2256 +/- ##
========================================
Coverage 91.03% 91.03%
========================================
Files 160 160
Lines 21612 21612
Branches 3087 3087
========================================
Hits 19675 19675
Misses 1318 1318
Partials 619 619 Continue to review full report at Codecov.
|
It might be good to consider joining this later with PR #2237 by @p-j-smith, which should be easy to do since the functions stand alone. Linking #2238 for discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to consolidate a few of these classes. First the peformance of the hbondanalysis class needs to get addressed, then before autocorrel can get squashed in its functionality needs to be replicated in the analysis modue.
@@ -296,6 +296,11 @@ def __init__(self, universe, | |||
nruns=1, # number of times to iterate through the trajectory | |||
nsamples=50, # number of different points to sample in a run | |||
pbc=True): | |||
|
|||
warnings.warn("This class is deprecated, use analysis.hbonds.HydrogenBondAnalysis " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HBondAnalysis class doesn't replicate the definition of hbonds in this class (eg here you can pass AtomGroups). I think we can eventually deprecate this, but it will have to be after PRs like #2237 which modernise the hbond detection code a little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge the changes together from #2237 and here and submit it as one large commit. The two converge together but we tried to separate them and make them incremental to simplify the decision making.
The definition of hbonds in HBL was defined internally (hiding it from the user) and could not be changed previously from the point of view of the user, which I think is problematic and could lead to a naive use of the class. This, and due to the existing bug #2247, and the overall design of the code, where only one Hydrogen Bond analysis tool exists (analysis.hbonds.HydrogenBondAnalysis) would be much cleaner.
Could I maybe tell the user to run analysis.hbonds.HydrogenBondAnalysis and explicitly give the user the parameters that were previously used in HBL? This way they could reproduce partially the same analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bieniekmateusz I'd rather do the opposite, get #2237 done (which is large enough as is), then rebase these changes on top of that and see what these changes then look like.
One other concern I've got is performance, which is easier to make decisions on once hbondanalysis is given a fair chance of being fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd hope the changes to be minimal. We have a clear interface for autocorrelation and would simply plug that into the new hbonds.HydrogenBondAnalysis. That's partly why we're doing this - to simplify it. We would have to replace only one line in the new code.
Performance wise, we're not making anything slower. Previously, HBL used internally hbonds.HydrogenBondAnalysis, and the code for autocorrelation I am certain was slower (the overhead of function calling, and multiple pass over the dataset, in comparison to our linear implementation with the "window").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion over the deprecation warning - we understood that we should first implement the hydrogen bond lifetime functionality in the new hydrogen bond class before pressing ahead with this PR. The avoid over-complicating this current pull request, we've removed this deprecation warning and will add a hydrogen bond lifetime method that uses this autocorrelation function in a future pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @richardjgowers, we temporarily removed the changed made to the part of the code and the deprecation warning, as @p-j-smith describes. Could comment on this when you have a second. We assume this is what was preventing this pull from being merged. Thanks
timeseries_data = [[] for _ in range(tau_max)] | ||
|
||
# calculate autocorrelation | ||
for t in range(0, len(list_of_sets), window_step): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this many loops this will need to be cythonised (like similar utility things in lib/util/_cutil
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to have a look at the cython part since I never wrote any actual code in it. However, it might be important to ask if this is necessary right now. This code has a linear complexity (window_step is always small) and is significantly faster than the previous implementations. The performance tests I run showed that 99.9% of the running time is spent in atom_select, and the actual Autocorrelation took less than a second.
So I'd advise against making this code more complex with cython, at least until we have a clear case that the performance optimisation here is useful (so on the ground of Premature Optimisation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @richardjgowers , I finally had a look at the performance and I designed this little unit test to see whether I should look more into cython. It took below 1 second to compute 100 k frames with the same atoms (worst case), and around 8 seconds for 1 million frames. With this number of frames, it is going to be atom selection that takes most of the time. For this reason I would recommend against cythonising this function for now.
Also, it appears that having a list of sets as an argument in the function complicates cythonizing.
import time
def test_autocorrelation_perfTest():
# generate 1 million input, worst case scenario (50 same atoms found each time)
input = [{x for x in range(50)} for x in range(1000 * 1000)]
st = time.time()
autocorrelation(input, tau_max=3)
print('Comuptation Time', time.time() - st)
Regarding the API. We realise it is a sensitive class and that you care about the users and we are more than happy to try to account for that. At the same time, by ensuring that only one class takes care neatly of finding Hydrogen Bond, we remove some of the confusion. So overall we hope to simplify the interface and the underlying code, which in the long run should bring more users. For example, right now someone might use hbonds.HydrogenBondAnalysis to find hydrogen bonds with a certain set of parameters, just to realise that they cannot calculate HBondLifetime - because it has to find its own Hydrogen Bonds from scratch, without letting the user define them. In other words, the temporary pain for the users to switch will hopefully be minimal with the new HBAnalysis in #2237 and allow directly to calculate Lifetime from their dataset. |
The separation of the more intuitive Autocorrelation and Intermittency in now separate functions has the potential to provide a good basis for consistency, as the functions are shared across Hydrogen Bond Lifetimes and Survival Probability. This is particularly the case with Intermittency which we found to be a tricky concept. By ensuring that atoms are only counted if they comeback, we do not overestimate the presence of atoms. Also, by using the idea of consecutive absence, we ensure that whether it is tau=8 or tau=20, the intermittency of (ie) 2 has exactly the same meaning and significance. |
Is there anything we can do to facilitate the discussion or progress here? Thanks, Mat |
@bieniekmateusz we can't deprecate I (or someone else) needs to review the autocorrelation code here and PR #2237 , they can probably get merged soon |
Now the function can be embedded into HydrogenBondLifetime.
should ideally be extracted from the HydrogenBondLifetime and fed into it in the constructor.
test. Our end index is inclusive (5th frame, giving 6 frames). But the other author's analysis was done on 0-4 frames (5 frames).
…ions with their own test cases. Removed the class HydrogenBondLifetimes and moved the autocorrelation functionality into the hbond_analysis class as a .autocorrelation method. This method should replace also hbond_autocorrel.py file. Test cases for the new method .autocorrelation are added. Added a test cases that, with our definition of intermittency, would fail in the case of the hbond_autocorrel.py.
e649ba7
to
f04eb69
Compare
Given that PR #2237 was merged, what's the status on this PR? |
We should have some time to wrap this up over the next week or so. We'll rebase onto the latest develop and then add a |
This function will be removed in a future PR once a replacement that uses the new hydrogen bond class is in place.
co-authored-by: p-j-smith <[email protected]>
Thanks @orbeckst for taking the time. We believe we have answered all of your questions now with the latest push. Please let us know if you have any other comments. |
co-authored-by: p-j-smith <[email protected]>
We've added the module under 4.12 Utils, inside of the analysis modules. Please let us know if you are happy with the location or if we should move it somewhere more suitable. I think we now addressed all of your questions. Thanks |
After further thought, we think that MDAnalysis.lib.mdamath might be a more suitable location for these functions, but we'll defer to you @orbeckst on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good. Thank you for the docs — they improve the usefulness of the code a lot.
I suggest you make a new module MDAnalysis.lib.correlations
– see comments.
There are also some minor comments (mainly on citing), please have a look. Thank you!!
acount for intermittency before passing the results to | ||
:func:`autocorrelation`. | ||
|
||
See Gowers and Carbonne, 2015, (DOI:10.1063.1.4922445) for a further |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cite using reST syntax and put the paper in references.rst (see References.
If the whole module is based on the paper, please also add a duecredit citation to the module or the appropriate function (look elsewhere in the code base where duecredit
is used – you will only need the DOI for that).
this feature is present at every frame from :math:`t_0` to :math:`N(t0, t_0 + \tau)`. | ||
The angular brackets represent an average over all time origins, :math:`t_0`. | ||
|
||
See Araya-Secchi et al., 2014, (https://doi.org/10.1016/j.bpj.2014.05.037) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cite with reST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If use of this function should cite a paper then you can use a duecredit
decorator. I don't know the history of the function, so your call.
package/doc/sphinx/source/documentation_pages/analysis/autocorrelation.rst
Outdated
Show resolved
Hide resolved
package/doc/sphinx/source/documentation_pages/analysis_modules.rst
Outdated
Show resolved
Hide resolved
…is#2256) co-authored-by: p-j-smith <[email protected]>
@bieniekmateusz looking good. Will you also add Ping me when I should review again. @richardjgowers – if you have a moment to also look over the final PR then that would be much appreciated. |
co-authored-by: p-j-smith <[email protected]>
Thanks @orbeckst . Our understanding is that currently when someone makes use of SurvivalProbability or hbond.hbond_analysis, the previous |
We have now further corrected the citations and we hope to write a small paper in which we will discuss the generalisations we introduce. Thanks @orbeckst and @richardjgowers for your help. Please let us know if you have any further comments. |
co-authored-by: p-j-smith <[email protected]>
co-authored-by: p-j-smith <[email protected]>
co-authored-by: p-j-smith <[email protected]>
The sphinx warnings have been now corrected, so that is all on our side now. Thanks |
@richardjgowers if you have anything to add here then please do; if I don't hear anything over the next 2 days I'll just dismiss your review to move forward. Thanks! |
(Maybe we get the PR merged in under a year... thank you @bieniekmateusz and @p-j-smith for not giving up!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, Will need .. versionchanged
and .. versionadded
tags added, and a CHANGELOG entry
co-authored-by: p-j-smith <[email protected]>
We have updated the tags and the CHANGELOG file. In the next stage we hope to use these functions in places that currently duplicate this functionality. In particular, we will add a new HydrogenBondLifetimes function to HydrogenBondsAnalysis, which will also remove the issue #2247. It would also be nice to add a feature that leverages the LeafletFinder tool, combined with this autocorrelation function, to calculate the rate of cholesterol flip-flop in lipid bilayers. We are planning to write a short software development paper explaining the intermittency function and the generalisation of the autocorrelation function. Given your experience in this field @richardjgowers , the fact that your HBL code was part of our inspiration for this module, and your input in developing this module, we were wondering whether you would be willing to provide feedback and guidance on the technical aspects of the paper as an author on the publication? We realise you must be super busy at the minute, so no worries if not! Big thanks to both of you @orbeckst and @richardjgowers for your help in getting this PR through! |
I restarted one CI task that ran out of time. @richardjgowers are you happy & approve the changes? |
Hi again, is there else you'd like us to have a look at? Thanks, Mat |
@bieniekmateusz @p-j-smith – apologies, this dropped off my radar. Merging/squashing now... |
Thanks! |
Fixes #2247
Changes made in this Pull Request:
The main issue with HydrogenBondLifetimes (besides the bug) is that it defined its own definition of a hydrogen bond, without allowing the user to change it. It should be clear that the hydrogen bonds first have to be computed by the user, and then the autocorrelation for the bonds can be found. This is achieved with the method .autocorrelation.
PR Checklist
Work done in duo @p-j-smith