Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update qcprot to cython.floating types #4273

Merged
merged 11 commits into from
Sep 4, 2023
Merged

Conversation

richardjgowers
Copy link
Member

@richardjgowers richardjgowers commented Sep 2, 2023

use memoryviews throughout for potential performance

allows either float32 or float64 inputs

Related #3927

Changes made in this Pull Request:

  • qcprot functions take either float32 or float64 inputs
  • modernised qcprot to use memoryviews

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4273.org.readthedocs.build/en/4273/

use memoryviews throughout for potential performance

allows either float32 or float64 inputs
@richardjgowers richardjgowers added the CZI-performance performance track of CZIEOSS4 grant label Sep 2, 2023
@github-actions
Copy link

github-actions bot commented Sep 2, 2023

Linter Bot Results:

Hi @richardjgowers! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ✅ Passed
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/6066435327/job/16457020106


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

@richardjgowers
Copy link
Member Author

Still need to do some tests here. rms.RMSD looks like it's converting/copying everything up to float64, but I imagine that not forcing this would be some sort of precision-regression. Instead I can probably write some tests that using lib.qcprot directly with floats/doubles work as expected and we can have some future discussion about precision and dtypes in the future.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. Is there an associated issue this is working towards? (edit: I know there were a few out there that had been opened about fp precision).

@richardjgowers
Copy link
Member Author

@IAlibay I've tagged a related issue, I think we should be moving towards numpy-like behaviour where output dtype matches input dtype and either float or double are allowed. A future discussion is the next major release of MDA will probably have a single precision option, since we're able to get 2x performance & 1/2 memory footprint for this.

The memoryview change is just a best-practice and we should gradually modernise to this as it's documented as the recommended route for best performance.

@@ -13,7 +13,7 @@ The rules for this file:
* release numbers follow "Semantic Versioning" http://semver.org

------------------------------------------------------------------------------
??/??/?? IAlibay, ianmkenney, PicoCentauri
??/??/?? IAlibay, ianmkenney, PicoCentauri, richardjgowers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd, I thought you had conttributed in 2.6 for the GSD stuff, maybe that was 2.5... time flies by.

previously rmsd was being returned as None for some cases, this didn't make sense

removed test for int inputs, nonsensical
@@ -494,7 +509,7 @@ def FastCalcRMSDAndRotation(np.ndarray[np.float64_t, ndim=1] rot,
rot[0] = rot[4] = rot[8] = 1.0
rot[1] = rot[2] = rot[3] = rot[5] = rot[6] = rot[7] = 0.0

return
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this branch looks like it was a bug. at this point the RMSD has been calculated and the code has failed to determine a suitable rotation matrix, returning None doesn't seem to make sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Sep 2, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (f50a097) 93.40% compared to head (a6d45dd) 93.40%.

Additional details and impacted files
@@            Coverage Diff             @@
##           develop    #4273     +/-   ##
==========================================
  Coverage    93.40%   93.40%             
==========================================
  Files          170      184     +14     
  Lines        22250    23358   +1108     
  Branches      4071     4071             
==========================================
+ Hits         20783    21818   +1035     
- Misses         951     1024     +73     
  Partials       516      516             

see 14 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

using float precision seems to break filling the rotation matrix.  Not sure why, this works though.
@richardjgowers richardjgowers changed the title [WIP] update qcprot to cython.floating types update qcprot to cython.floating types Sep 2, 2023
cython.floating[:, :] coords1,
cython.floating[:, :] coords2,
int N,
cython.floating[:] weight):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that a while back I tried to improve performance in SciPy by switching from the old NumPy buffer syntax, to memoryviews, and it actually caused a performance regression: scipy/scipy#12379 (comment)

Anyway, you may want to measure that you remain at least performance neutral, my experience with following that best practice hasn't historically been great, but hopefully the situation is better now.


err = 0.001 if dtype is np.float32 else 0.000001
assert r == pytest.approx(0.6057544485785074, abs=err)
assert_array_almost_equal(rot, rot_ref)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use assert_allclose for new test code per the Note at https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_almost_equal.html ?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RMSD functions are quite time-critical for many applications so if there's any chance for a performance impact then please provide some benchmark data before merge. Thanks!

@orbeckst
Copy link
Member

orbeckst commented Sep 3, 2023

About float vs double. I remember vaguely that during one of the REU projects we played around with single vs double precision and arrived at the conclusion that float32 is insufficient for the algorithm to properly converge. Of course, I don't remember any details and have no idea if there are any notes anywhere. Primarily I want to say that it simply might not be as easy to say that we just run the algorithm at whatever dtype the rest of the code wants to run and that this will require careful testing.

@richardjgowers
Copy link
Member Author

@orbeckst benchmarks

from MDAnalysisTests.datafiles import PSF, DCD
import MDAnalysis as mda
from MDAnalysis.lib import qcprot
import numpy as np

u = mda.Universe(PSF, DCD)

prot = u.select_atoms('protein')

w = (prot.masses / np.mean(prot.masses)).astype(np.float64)

ref = prot.positions.astype(np.float64)
u.trajectory[1]
conf = prot.positions.astype(np.float64)

rot = np.zeros(9, dtype=np.float64)
A = np.zeros(9)

N = len(prot)

then:

%%timeit

qcprot.CalcRMSDRotationalMatrix(ref, conf, len(prot), rot, w)

old code:
10.1 µs ± 46.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
new code:
9.97 µs ± 29.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The number fluctuate a bit, I think there's no performance gain/loss.

I was able to reproduce the memoryview regression that @tylerjereddy flagged up, but I think it's an order of magnitude smaller than the numbers measured here, so it won't show up. It's something to keep in mind if we're writing code that has a lot of function calls (there's 3 here).

re: precision, yes I had to make the qcp function use doubles throughout to keep it working. The InnerProduct can use either float or double now. Strangely these take the same amount of time, which means the compiler isn't able to do anything clever here (where it ought to be) so probably some future digging. This does some ground work at least.

@richardjgowers richardjgowers merged commit 2853201 into develop Sep 4, 2023
@richardjgowers richardjgowers deleted the qcprot_floating branch October 14, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-lib CZI-performance performance track of CZIEOSS4 grant enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants