update qcprot to cython.floating types #4273

richardjgowers · 2023-09-02T10:46:54Z

use memoryviews throughout for potential performance

allows either float32 or float64 inputs

Related #3927

Changes made in this Pull Request:

qcprot functions take either float32 or float64 inputs
modernised qcprot to use memoryviews

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

Developers certificate of origin

I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.

📚 Documentation preview 📚: https://mdanalysis--4273.org.readthedocs.build/en/4273/

use memoryviews throughout for potential performance allows either float32 or float64 inputs

github-actions · 2023-09-02T10:49:19Z

Linter Bot Results:

Hi @richardjgowers! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location	Outcome
main package	✅ Passed
testsuite	⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/6066435327/job/16457020106

Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

richardjgowers · 2023-09-02T10:49:54Z

Still need to do some tests here. rms.RMSD looks like it's converting/copying everything up to float64, but I imagine that not forcing this would be some sort of precision-regression. Instead I can probably write some tests that using lib.qcprot directly with floats/doubles work as expected and we can have some future discussion about precision and dtypes in the future.

IAlibay

This looks reasonable to me. Is there an associated issue this is working towards? (edit: I know there were a few out there that had been opened about fp precision).

richardjgowers · 2023-09-02T11:36:27Z

@IAlibay I've tagged a related issue, I think we should be moving towards numpy-like behaviour where output dtype matches input dtype and either float or double are allowed. A future discussion is the next major release of MDA will probably have a single precision option, since we're able to get 2x performance & 1/2 memory footprint for this.

The memoryview change is just a best-practice and we should gradually modernise to this as it's documented as the recommended route for best performance.

IAlibay · 2023-09-02T11:39:47Z

package/CHANGELOG

@@ -13,7 +13,7 @@ The rules for this file:
  * release numbers follow "Semantic Versioning" http://semver.org

 ------------------------------------------------------------------------------
-??/??/?? IAlibay, ianmkenney, PicoCentauri
+??/??/?? IAlibay, ianmkenney, PicoCentauri, richardjgowers


Odd, I thought you had conttributed in 2.6 for the GSD stuff, maybe that was 2.5... time flies by.

previously rmsd was being returned as None for some cases, this didn't make sense removed test for int inputs, nonsensical

richardjgowers · 2023-09-02T17:11:45Z

package/MDAnalysis/lib/qcprot.pyx

@@ -494,7 +509,7 @@ def FastCalcRMSDAndRotation(np.ndarray[np.float64_t, ndim=1] rot,
                    rot[0] = rot[4] = rot[8] = 1.0
                    rot[1] = rot[2] = rot[3] = rot[5] = rot[6] = rot[7] = 0.0

-                    return


this branch looks like it was a bug. at this point the RMSD has been calculated and the code has failed to determine a suitable rotation matrix, returning None doesn't seem to make sense

e.g. compared to this implementation in BioPython: https://github.com/biopython/biopython/blob/master/Bio/PDB/qcprot.py#L190

codecov · 2023-09-02T17:23:21Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (f50a097) 93.40% compared to head (a6d45dd) 93.40%.

Additional details and impacted files

@@            Coverage Diff             @@
##           develop    #4273     +/-   ##
==========================================
  Coverage    93.40%   93.40%             
==========================================
  Files          170      184     +14     
  Lines        22250    23358   +1108     
  Branches      4071     4071             
==========================================
+ Hits         20783    21818   +1035     
- Misses         951     1024     +73     
  Partials       516      516

see 14 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

using float precision seems to break filling the rotation matrix. Not sure why, this works though.

tylerjereddy · 2023-09-02T21:18:23Z

package/MDAnalysis/lib/qcprot.pyx

+                                   cython.floating[:, :] coords1,
+                                   cython.floating[:, :] coords2,
+                                   int N,
+                                   cython.floating[:] weight):


I remember that a while back I tried to improve performance in SciPy by switching from the old NumPy buffer syntax, to memoryviews, and it actually caused a performance regression: scipy/scipy#12379 (comment)

Anyway, you may want to measure that you remain at least performance neutral, my experience with following that best practice hasn't historically been great, but hopefully the situation is better now.

tylerjereddy · 2023-09-02T21:19:28Z

testsuite/MDAnalysisTests/lib/test_qcprot.py

+
+        err = 0.001 if dtype is np.float32 else 0.000001
+        assert r == pytest.approx(0.6057544485785074, abs=err)
+        assert_array_almost_equal(rot, rot_ref)


Maybe use assert_allclose for new test code per the Note at https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_array_almost_equal.html ?

orbeckst

The RMSD functions are quite time-critical for many applications so if there's any chance for a performance impact then please provide some benchmark data before merge. Thanks!

orbeckst · 2023-09-03T01:47:25Z

About float vs double. I remember vaguely that during one of the REU projects we played around with single vs double precision and arrived at the conclusion that float32 is insufficient for the algorithm to properly converge. Of course, I don't remember any details and have no idea if there are any notes anywhere. Primarily I want to say that it simply might not be as easy to say that we just run the algorithm at whatever dtype the rest of the code wants to run and that this will require careful testing.

…ating

richardjgowers · 2023-09-03T19:50:35Z

@orbeckst benchmarks

from MDAnalysisTests.datafiles import PSF, DCD
import MDAnalysis as mda
from MDAnalysis.lib import qcprot
import numpy as np

u = mda.Universe(PSF, DCD)

prot = u.select_atoms('protein')

w = (prot.masses / np.mean(prot.masses)).astype(np.float64)

ref = prot.positions.astype(np.float64)
u.trajectory[1]
conf = prot.positions.astype(np.float64)

rot = np.zeros(9, dtype=np.float64)
A = np.zeros(9)

N = len(prot)

then:

%%timeit

qcprot.CalcRMSDRotationalMatrix(ref, conf, len(prot), rot, w)

old code:
10.1 µs ± 46.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
new code:
9.97 µs ± 29.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The number fluctuate a bit, I think there's no performance gain/loss.

I was able to reproduce the memoryview regression that @tylerjereddy flagged up, but I think it's an order of magnitude smaller than the numbers measured here, so it won't show up. It's something to keep in mind if we're writing code that has a lot of function calls (there's 3 here).

re: precision, yes I had to make the qcp function use doubles throughout to keep it working. The InnerProduct can use either float or double now. Strangely these take the same amount of time, which means the compiler isn't able to do anything clever here (where it ought to be) so probably some future digging. This does some ground work at least.

update qcprot to cython.floating types

ec42308

use memoryviews throughout for potential performance allows either float32 or float64 inputs

richardjgowers added the CZI-performance performance track of CZIEOSS4 grant label Sep 2, 2023

github-actions bot added the Component-lib label Sep 2, 2023

IAlibay approved these changes Sep 2, 2023

View reviewed changes

clarify optional arguments to qcprot functions

41daaef

Merge branch 'develop' into qcprot_floating

ba645e3

IAlibay reviewed Sep 2, 2023

View reviewed changes

richardjgowers added 2 commits September 2, 2023 17:58

qcprot doc tweaks

6f84dea

tests: fixes to test_align

85afdb7

previously rmsd was being returned as None for some cases, this didn't make sense removed test for int inputs, nonsensical

richardjgowers commented Sep 2, 2023

View reviewed changes

richardjgowers added 3 commits September 2, 2023 19:14

regression tests for qcprot

789016b

revert changing qcprot.FastCalcRMSD to use floating

8fd6235

using float precision seems to break filling the rotation matrix. Not sure why, this works though.

Update CHANGELOG

10d4320

richardjgowers changed the title ~~[WIP] update qcprot to cython.floating types~~ update qcprot to cython.floating types Sep 2, 2023

tylerjereddy reviewed Sep 2, 2023

View reviewed changes

orbeckst requested changes Sep 3, 2023

View reviewed changes

richardjgowers added 2 commits September 3, 2023 20:40

ues assert_allclose not assert_array_almost_equal

e6ac33d

Merge remote-tracking branch 'origin/qcprot_floating' into qcprot_flo…

d7b7f14

…ating

Merge branch 'develop' into qcprot_floating

a6d45dd

orbeckst approved these changes Sep 3, 2023

View reviewed changes

richardjgowers merged commit 2853201 into develop Sep 4, 2023

IAlibay added the enhancement label Sep 21, 2023

richardjgowers deleted the qcprot_floating branch October 14, 2023 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update qcprot to cython.floating types #4273

update qcprot to cython.floating types #4273

richardjgowers commented Sep 2, 2023 •

edited by orbeckst

Loading

github-actions bot commented Sep 2, 2023 •

edited

Loading

richardjgowers commented Sep 2, 2023

IAlibay left a comment •

edited

Loading

richardjgowers commented Sep 2, 2023

IAlibay Sep 2, 2023

richardjgowers Sep 2, 2023

richardjgowers Sep 2, 2023

codecov bot commented Sep 2, 2023 •

edited

Loading

tylerjereddy Sep 2, 2023

tylerjereddy Sep 2, 2023

orbeckst left a comment

orbeckst commented Sep 3, 2023

richardjgowers commented Sep 3, 2023

update qcprot to cython.floating types #4273

update qcprot to cython.floating types #4273

Conversation

richardjgowers commented Sep 2, 2023 • edited by orbeckst Loading

PR Checklist

Developers certificate of origin

github-actions bot commented Sep 2, 2023 • edited Loading

Linter Bot Results:

richardjgowers commented Sep 2, 2023

IAlibay left a comment • edited Loading

Choose a reason for hiding this comment

richardjgowers commented Sep 2, 2023

IAlibay Sep 2, 2023

Choose a reason for hiding this comment

richardjgowers Sep 2, 2023

Choose a reason for hiding this comment

richardjgowers Sep 2, 2023

Choose a reason for hiding this comment

codecov bot commented Sep 2, 2023 • edited Loading

Codecov Report

tylerjereddy Sep 2, 2023

Choose a reason for hiding this comment

tylerjereddy Sep 2, 2023

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst commented Sep 3, 2023

richardjgowers commented Sep 3, 2023

richardjgowers commented Sep 2, 2023 •

edited by orbeckst

Loading

github-actions bot commented Sep 2, 2023 •

edited

Loading

IAlibay left a comment •

edited

Loading

codecov bot commented Sep 2, 2023 •

edited

Loading