-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SurvivalProbability - Readability, performance, and algorithm changes #1995
SurvivalProbability - Readability, performance, and algorithm changes #1995
Conversation
Atom selections at each frame are now stored as sets rather than lists, so determining the intersection of selections is faster. Survival probability is now calculated by a single function.
The survival probability is a measure of how likely a selected molecule is to remain within a specified region over a period of time. If at t0 the selected molecule is not in the specified region, no survival probability can be calculated. For example, consider that we would like to calculate the survival probability of an ion at an interface. If the ion never approaches the interface, the previous test would return a survival probability of 0, which suggests that the ion comes to the interface but leaves immediately. However, as the ion was never at the interface, no survival probability can be calculated from this data.
-return the list rather than require the user to access the internal list -use tau_max variable name which is consistent with the equation, in contrast to dtmax
… and more times over tau. This will lead to faster code (local memory/cache optimisation). Additionally, opens the door for combining multiple taus together.
…+tau. Refactoring.
…Or should the first data point reflect the first tau which is non-zero and therefore the first change is quantified? I believe it is the latter. This breaks the current tests which, for example, by asking for 4 datapoints, always get the first datapoint for tau=0 that is 1.
code and no performance drawbacks. Tests: the user should not be forced to rely on the internal data structures .timeseries. Returning the results with the function run().
numpy arrays. Changed the name of tau_timeseries to sp_timeseries as it is a more accurate descriptor. Removed some of the +/- 1 in indices to improve readability.
Saving the extracted data for the user to be able to dig deeper, if necessary.
an atom group, to reduce the memory load. Updated the docstring.
Waterdynamics: removed an index bug that went 1 too high tau.
@bieniekmateusz thanks, this looks like it will make some good changes. WRT "faster shift for dt", are you talking about doing every 2nd frame rather than every frame? |
@richardjgowers Thanks. Also, I am still to add your test case for the t0case which I hope to add in the next version with the progress meter. Your question: Yes, every nth frame. We are considering letting the user specify the jump over the trajectory. However, we want to first check how, for example, tau = 20 is affected by being calculated for overlapping datasets (t=0-20, t=1-21, ...). We are not sure how (and if) that affects the SP yet. Any thoughts on it are welcome. |
For the case where no molecules are found, it makes sense to disregard this run as you can't measure anything. It's not that the molecule had 0 survival probability. So for hydrogen bond autocorrelation I let people define a total duration (tau_max here) and a number of frames within that duration to consider. I'm not completely happy with that solution either, so it would be good to define a clear & comprehensive way/language to define this sort of stuff. I need to read through this module fully but.. Reducing the resolution of frames should be fine in some cases. We can actually check after the calculation whether the calculated tau makes sense with respect to how we sampled it. Then issue some sort of warning if the data is probably junk (and suggest better settings). Overlapping frames should be fine. The concentration in an area will remain constant, so by starting a new sample in the same window will identify different "starting molecules". It will be slightly correlated with the old sample though (so isn't 100% new data) |
…f the first two frames are skipped.
Codecov Report
@@ Coverage Diff @@
## develop #1995 +/- ##
===========================================
- Coverage 88.93% 88.84% -0.09%
===========================================
Files 144 144
Lines 17490 17497 +7
Branches 2693 2702 +9
===========================================
- Hits 15554 15545 -9
- Misses 1323 1332 +9
- Partials 613 620 +7
Continue to review full report at Codecov.
|
…ults will be given.
I've updated the last test case (t0) to avoid arbitrary datasets and to test tf in the same test case. |
Saving the extracted data for the user to be able to dig deeper, if necessary.
an atom group, to reduce the memory load. Updated the docstring.
Waterdynamics: removed an index bug that went 1 too high tau.
…f the first two frames are skipped.
…ults will be given.
a064686
to
8536d54
Compare
I did the rebase which was not too much trouble at the end. Cheers |
…lysis into survivalprobability
Error in testsuite/MDAnalysisTests/topology/test_pqr.py:93: Could it be that the rebasing introduced this issue since I have not touched the topology files. Do you have any suggestions on what I should do to correct the tests? Thanks |
I've updated the example to match the data being stored in the object's fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @bieniekmateusz
and @p-j-smith! We're happy to hear that. And thanks to you too! |
Small changes:
-- issue due to the unnecessary function overhead,
-- iterating over the dataset only once (RAM memory optimisation)
Validity Test Cases Changes:
Algorithmic Changes made in this Pull Request - Request for Contribution from the Original Author:
Questions:
Future considerations:
Edit: Acknowledgement - This work was done together with @p-j-smith, but we submitted most of it on my account.
PR Checklist