Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add two point distance function #1512

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions package/MDAnalysis/lib/distances.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,23 @@ def _check_lengths_match(*arrays):
raise ValueError("Input arrays must all be same shape"
"Got {0}".format([a.shape for a in arrays]))


def distance(a, b, box=None):
"""calc distances between two points. Uses algorithm from Tuckerman
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc string should state that this takes any PBC into account. I would also reference Tuckerman properly (and possibly even write out the math because we would want to use it as a test case for correctness and in this case, one cannot be explicit enough.)


"""
if box is None:
return np.linalg.norm(a - b)

box = triclinic_vectors(box)
ibox = np.linalg.inv(box)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can explicitly calculate the inverse as in #1475.

This is marginally faster in numpy but might be a lot faster in cython.

import numpy as np

def make_box():
    a = [np.random.random(), 0, 0]
    b = np.zeros(3)
    b[:2] = np.random.random(2)
    c = np.random.random(3)
    return 20 * np.array([a, b, c])

def ibox(box):
    a,b,c = box
    return np.array([[1/a[0], 0, 0], [-b[0]/(a[0]*b[1]), 1/b[1], 0], [(b[0]*c[1] - b[1]*c[0])/(a[0]*b[1]*c[2]), -c[1]/(b[1]*c[2]), 1/c[2]]])

Timing:

box = make_box()
%timeit ibox(box)
# 7.33 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.linalg.inv(box)
#9.98 µs ± 247 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Correctness

In [42]: np.dot(box, ibox(box))
Out[42]:
array([[  1.00000000e+00,   0.00000000e+00,   0.00000000e+00],
       [  1.90479818e-18,   1.00000000e+00,   0.00000000e+00],
       [  3.40613054e-17,  -7.40127872e-17,   1.00000000e+00]])

(Closed form from sympy...).

from sympy.interactive.printing import init_printing
init_printing(use_unicode=False, wrap_line=False, no_global=True)
a0, b0, b1, c0, c1, c2 = sym.symbols("a0, b0, b1, c0, c1, c2")
h = sym.Matrix([[a0, b0, c0], [0, b1, c1], [0, 0, c2]])
h.inv()

In [35]: h
Out[35]:
[a0  b0  c0]
[          ]
[0   b1  c1]
[          ]
[0   0   c2]

In [37]: h.inv()
Out[37]:
[1    -b0   b0*c1 - b1*c0]
[--  -----  -------------]
[a0  a0*b1     a0*b1*c2  ]
[                        ]
[     1          -c1     ]
[0    --        -----    ]
[     b1        b1*c2    ]
[                        ]
[                1       ]
[0     0         --      ]
[                c2      ]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes with cython this can be speed up. But I have to rewrite to use less variables to get the full speedup and not keep calling python functions.

from numba import jit
import numpy as np

@jit
def jit_ibox(box):
    inv = np.zeros(box.shape)
    inv[0, 0] = 1 / box[0, 0]
    inv[1, 0] = -box[1,0] / (box[0,0]*box[1,1])
    inv[1, 1] = 1 / box[1,1]
    inv[2, 0] = (box[1,0]*box[2,1] - box[1,1]*box[2,0])/(box[0,0]*box[1,1]*box[2,2])
    inv[2, 1] = -box[2,1]/(box[1,1]*box[2,2])
    inv[2, 2] = 1 / box[2,2]
    return inv

note: I tested this with numba because it has similar performance but much shorter test cycles

%timeit jit_ibox(box)
656 ns ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising; what is the performance of np.linalg.inv() on your machine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sour don't what you report. It might not help much though because of the dot products that are used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry autocorrect. My performance of Linalg.inv is comparable to yours.

Copy link
Member

@orbeckst orbeckst Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. And yes, you're right, of course, the matrix-vector multiplications are bad (although I don't know if np.dot uses any optimized BLAS/LAPACK)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does but the python function call as well as the input verification that numpy does is expensive. Maybe we can unpack that more using explicit formulas calculated with sympy.

sa = np.dot(ibox, a)
sb = np.dot(ibox, b)
s = sa - sb
s -= np.round(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is np.round() different from np.rint()? – the latter would be the canonical function for this problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are equivalent

return np.linalg.norm(np.dot(box, s))


def distance_array(reference, configuration, box=None, result=None, backend="serial"):
"""Calculate all distances between a reference set and another configuration.

Expand Down