-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up the MBAR calculation #357
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #357 +/- ##
=======================================
Coverage 98.82% 98.83%
=======================================
Files 28 28
Lines 1875 1890 +15
Branches 405 409 +4
=======================================
+ Hits 1853 1868 +15
Misses 2 2
Partials 20 20 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat idea and cool that it gives a real performance boost.
Definitely needs documentation!
.github/workflows/ci.yaml
Outdated
with: | ||
token: ${{ secrets.CODECOV_TOKEN }} | ||
token: ${{ secrets.CODECOV }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a change due to a merge or did you smuggle it into this PR??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you could see that the CI failed for codecov token issues, I tried to see if I could fix it.
result = estimator_fit(sample) | ||
result = estimator_fit.fit(sample) | ||
if estimator == "MBAR": | ||
estimator_fit.initial_f_k = result.delta_f_.iloc[0, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't fully follow the code path but how is delta_f
already initialized with the BAR results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you did a MBAR calculation first result = estimator_fit.fit(sample)
Which populates the result.delta_f_
with for the current MBAR results.
This MBAR results is then propagated back as initial guess for the next calculation.
Some codecov is still failing for token issue but otherwise this seems fine. When this PR is merged. Do you mind if I do a new 2.2.1 release? I think the speed increase is quite significant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make a 2.3.0 release (see my suggested changes — you can just accept the whole bunch) and document carefully API change (new default).
Please add a test for MBAR(initial_nk=None)
for the old behavior, given that default changed.
@@ -71,14 +80,19 @@ def __init__( | |||
self, | |||
maximum_iterations=10000, | |||
relative_tolerance=1.0e-7, | |||
initial_f_k=None, | |||
initial_f_k: np.ndarray | Literal["BAR"] | None = "BAR", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the default is technically speaking an API change so we can't just do it in a patch release.
The conservative approach would be to keep the original default.
However, I think we want to make this speed-up available from release so my suggestion is to make a 2.3 release and make clearer that this changed — see other comments.
Co-authored-by: Oliver Beckstein <[email protected]>
Co-authored-by: Oliver Beckstein <[email protected]>
Co-authored-by: Oliver Beckstein <[email protected]>
Co-authored-by: Oliver Beckstein <[email protected]>
Co-authored-by: Oliver Beckstein <[email protected]>
I have adapted @mrshirts 's advice of using BAR results as initial guess into the MBAR, which seems to provide a 5-fold speed up.
I tried to optimise the convergence detection as well, where using the result from the previous MBAR run as input into the next MBAR run. Compared to using BAR as input, this approach still provides some speed up.
Using BAR as initial guess
3.58 s ± 35.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Using the previous step as initial guess.
2.25 s ± 23.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)