WIP: add two point distance function #1512

kain88-de · 2017-07-18T11:04:40Z

Fixes #1262

Changes made in this Pull Request:

add a new distance function for two atom positions.
I used the general algorithm from tuckerman, see triclinic PBC treatment is not always correct #1475

TODO

type checking of input a,b
type checking of input box including the format that is used.
correct distance for cubic boxes
correct distance for rhombic boxes

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

richardjgowers · 2017-07-18T16:29:12Z

I'm not really a fan of this method. I know it's not meant to be high performance, but all those linalg calls are just slower.

I also think we'll need some sort of minimum image function anyway, (ie given a vector and a box, return a vector which is the minimum vector), so I'd rather this function use that.

kain88-de · 2017-07-18T16:33:04Z

Returning a vector isn't difficult. But for correctness tests this implementation might still be useful. I have tested it on a simulation with a cubic box and the algorithm worked perfect. But if someone has a triclinic cell example that would be great.

orbeckst

I don't know if you want to merge this, I would be in favor. Main issue is the docs and no tests.

orbeckst · 2017-08-01T22:16:00Z

package/MDAnalysis/lib/distances.py

+    sa = np.dot(ibox, a)
+    sb = np.dot(ibox, b)
+    s = sa - sb
+    s -= np.round(s)


Is np.round() different from np.rint()? – the latter would be the canonical function for this problem.

they are equivalent

orbeckst · 2017-08-02T00:45:33Z

package/MDAnalysis/lib/distances.py

+        return np.linalg.norm(a - b)
+
+    box = triclinic_vectors(box)
+    ibox = np.linalg.inv(box)


You can explicitly calculate the inverse as in #1475.

This is marginally faster in numpy but might be a lot faster in cython.

import numpy as np def make_box(): a = [np.random.random(), 0, 0] b = np.zeros(3) b[:2] = np.random.random(2) c = np.random.random(3) return 20 * np.array([a, b, c]) def ibox(box): a,b,c = box return np.array([[1/a[0], 0, 0], [-b[0]/(a[0]*b[1]), 1/b[1], 0], [(b[0]*c[1] - b[1]*c[0])/(a[0]*b[1]*c[2]), -c[1]/(b[1]*c[2]), 1/c[2]]])

Timing:

box = make_box() %timeit ibox(box) # 7.33 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) %timeit np.linalg.inv(box) #9.98 µs ± 247 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Correctness

In [42]: np.dot(box, ibox(box)) Out[42]: array([[ 1.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 1.90479818e-18, 1.00000000e+00, 0.00000000e+00], [ 3.40613054e-17, -7.40127872e-17, 1.00000000e+00]])

(Closed form from sympy...).

from sympy.interactive.printing import init_printing init_printing(use_unicode=False, wrap_line=False, no_global=True) a0, b0, b1, c0, c1, c2 = sym.symbols("a0, b0, b1, c0, c1, c2") h = sym.Matrix([[a0, b0, c0], [0, b1, c1], [0, 0, c2]]) h.inv() In [35]: h Out[35]: [a0 b0 c0] [ ] [0 b1 c1] [ ] [0 0 c2] In [37]: h.inv() Out[37]: [1 -b0 b0*c1 - b1*c0] [-- ----- -------------] [a0 a0*b1 a0*b1*c2 ] [ ] [ 1 -c1 ] [0 -- ----- ] [ b1 b1*c2 ] [ ] [ 1 ] [0 0 -- ] [ c2 ]

Yes with cython this can be speed up. But I have to rewrite to use less variables to get the full speedup and not keep calling python functions.

from numba import jit import numpy as np @jit def jit_ibox(box): inv = np.zeros(box.shape) inv[0, 0] = 1 / box[0, 0] inv[1, 0] = -box[1,0] / (box[0,0]*box[1,1]) inv[1, 1] = 1 / box[1,1] inv[2, 0] = (box[1,0]*box[2,1] - box[1,1]*box[2,0])/(box[0,0]*box[1,1]*box[2,2]) inv[2, 1] = -box[2,1]/(box[1,1]*box[2,2]) inv[2, 2] = 1 / box[2,2] return inv

note: I tested this with numba because it has similar performance but much shorter test cycles

%timeit jit_ibox(box) 656 ns ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This looks promising; what is the performance of np.linalg.inv() on your machine?

Sour don't what you report. It might not help much though because of the dot products that are used.

Sorry autocorrect. My performance of Linalg.inv is comparable to yours.

Thanks. And yes, you're right, of course, the matrix-vector multiplications are bad (although I don't know if np.dot uses any optimized BLAS/LAPACK)

it does but the python function call as well as the input verification that numpy does is expensive. Maybe we can unpack that more using explicit formulas calculated with sympy.

orbeckst · 2017-08-02T00:47:02Z

package/MDAnalysis/lib/distances.py

@@ -208,6 +208,23 @@ def _check_lengths_match(*arrays):
        raise ValueError("Input arrays must all be same shape"
                         "Got {0}".format([a.shape for a in arrays]))

+
+def distance(a, b, box=None):
+    """calc distances between two points. Uses algorithm from Tuckerman


Doc string should state that this takes any PBC into account. I would also reference Tuckerman properly (and possibly even write out the math because we would want to use it as a test case for correctness and in this case, one cannot be explicit enough.)

orbeckst · 2017-08-02T00:51:22Z

But if someone has a triclinic cell example that would be great.

In [40]: tric = mda.Universe(MDAnalysis.tests.datafiles.PSF_TRICLINIC, MDAnalysis.tests.datafiles.DCD_TRICLINIC)

In [41]: tric.dimensions
Out[41]:
array([ 35.44603729,  35.06156158,  34.15850449,  91.32802582,
        61.7352066 ,  44.4070282 ], dtype=float32)

kain88-de · 2017-08-02T06:38:44Z

@orbeckst do you also have a testcase that doesn't require our test data? I would like to be able to place a bead as pure numpy arrays and calculate the minimal image by hand. These super basic methods shouldn't depend on our test files

jbarnoud · 2017-08-02T07:26:10Z

@kain88-de You can copy the box and the coordinates of two points from one frame of PSF_TRICLINIC.

kain88-de · 2017-08-04T06:14:47Z

@kain88-de You can copy the box and the coordinates of two points from one frame of PSF_TRICLINIC

I would like to have points for cell types that I can also check by hand. If you want to provide some I'm happy so use them. I think it will take me some time to generate them myself.

kain88-de · 2017-09-21T20:58:01Z

See #1475 (comment)

add two point distance function

35d798d

kain88-de added Component-lib enhancement help wanted Work in progress labels Jul 18, 2017

orbeckst requested changes Aug 2, 2017

View reviewed changes

kain88-de closed this Sep 21, 2017

kain88-de deleted the single-atom-distance branch September 21, 2017 20:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add two point distance function #1512

WIP: add two point distance function #1512

kain88-de commented Jul 18, 2017

richardjgowers commented Jul 18, 2017

kain88-de commented Jul 18, 2017

orbeckst left a comment

orbeckst Aug 1, 2017

kain88-de Aug 2, 2017

orbeckst Aug 2, 2017

kain88-de Aug 2, 2017

orbeckst Aug 2, 2017

kain88-de Aug 2, 2017

kain88-de Aug 2, 2017

orbeckst Aug 2, 2017 •

edited

Loading

kain88-de Aug 2, 2017

orbeckst Aug 2, 2017

orbeckst commented Aug 2, 2017

kain88-de commented Aug 2, 2017

jbarnoud commented Aug 2, 2017

kain88-de commented Aug 4, 2017

kain88-de commented Sep 21, 2017

WIP: add two point distance function #1512

WIP: add two point distance function #1512

Conversation

kain88-de commented Jul 18, 2017

TODO

PR Checklist

richardjgowers commented Jul 18, 2017

kain88-de commented Jul 18, 2017

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst Aug 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Aug 2, 2017

kain88-de commented Aug 2, 2017

jbarnoud commented Aug 2, 2017

kain88-de commented Aug 4, 2017

kain88-de commented Sep 21, 2017

orbeckst Aug 2, 2017 •

edited

Loading