-
Notifications
You must be signed in to change notification settings - Fork 663
/
psa.py
2296 lines (1882 loc) · 87.2 KB
/
psa.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
#
# MDAnalysis --- https://www.mdanalysis.org
# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors
# (see the file AUTHORS for the full list of names)
#
# Released under the GNU Public Licence, v2 or any higher version
#
# Please cite your use of MDAnalysis in published work:
#
# R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler,
# D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein.
# MDAnalysis: A Python package for the rapid analysis of molecular dynamics
# simulations. In S. Benthall and S. Rostrup editors, Proceedings of the 15th
# Python in Science Conference, pages 102-109, Austin, TX, 2016. SciPy.
# doi: 10.25080/majora-629e541a-00e
#
# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein.
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#
r"""
Calculating path similarity --- :mod:`MDAnalysis.analysis.psa`
==========================================================================
:Author: Sean Seyler
:Year: 2015
:Copyright: GNU Public License v3
.. versionadded:: 0.10.0
The module contains code to calculate the geometric similarity of trajectories
using path metrics such as the Hausdorff or Fréchet distances
[Seyler2015]_. The path metrics are functions of two paths and return a
nonnegative number, i.e., a distance. Two paths are identical if their distance
is zero, and large distances indicate dissimilarity. Each path metric is a
function of the individual points (e.g., coordinate snapshots) that comprise
each path and, loosely speaking, identify the two points, one per path of a
pair of paths, where the paths deviate the most. The distance between these
points of maximal deviation is measured by the root mean square deviation
(RMSD), i.e., to compute structural similarity.
One typically computes the pairwise similarity for an ensemble of paths to
produce a symmetric distance matrix, which can be clustered to, at a glance,
identify patterns in the trajectory data. To properly analyze a path ensemble,
one must select a suitable reference structure to which all paths (each
conformer in each path) will be universally aligned using the rotations
determined by the best-fit rmsds. Distances between paths and their structures
are then computed directly with no further alignment. This pre-processing step
is necessary to preserve the metric properties of the Hausdorff and Fréchet
metrics; using the best-fit rmsd on a pairwise basis does not generally
preserve the triangle inequality.
Note
----
The `PSAnalysisTutorial`_ outlines a typical application of PSA to
a set of trajectories, including doing proper alignment,
performing distance comparisons, and generating heat
map-dendrogram plots from hierarchical clustering.
.. Rubric:: References
.. [Seyler2015] Seyler SL, Kumar A, Thorpe MF, Beckstein O (2015)
Path Similarity Analysis: A Method for Quantifying
Macromolecular Pathways. PLoS Comput Biol 11(10): e1004568.
doi: `10.1371/journal.pcbi.1004568`_
.. _`10.1371/journal.pcbi.1004568`: http://dx.doi.org/10.1371/journal.pcbi.1004568
.. _`PSAnalysisTutorial`: https://github.com/Becksteinlab/PSAnalysisTutorial
Helper functions and variables
------------------------------
The following convenience functions are used by other functions in this module.
.. autofunction:: sqnorm
.. autofunction:: get_msd_matrix
.. autofunction:: get_coord_axes
Classes, methods, and functions
-------------------------------
.. autofunction:: get_path_metric_func
.. autofunction:: hausdorff
.. autofunction:: hausdorff_wavg
.. autofunction:: hausdorff_avg
.. autofunction:: hausdorff_neighbors
.. autofunction:: discrete_frechet
.. autofunction:: dist_mat_to_vec
.. autoclass:: Path
:members:
.. attribute:: u_original
:class:`MDAnalysis.Universe` object with a trajectory
.. attribute:: u_reference
:class:`MDAnalysis.Universe` object containing a reference structure
.. attribute:: ref_select
string, selection for
:meth:`~MDAnalysis.core.groups.AtomGroup.select_atoms` to select frame
from :attr:`Path.u_reference`
.. attribute:: path_select
string, selection for
:meth:`~MDAnalysis.core.groups.AtomGroup.select_atoms` to select atoms
to compose :attr:`Path.path`
.. attribute:: ref_frame
int, frame index to select frame from :attr:`Path.u_reference`
.. attribute:: u_fitted
:class:`MDAnalysis.Universe` object with the fitted trajectory
.. attribute:: path
:class:`numpy.ndarray` object representation of the fitted trajectory
.. autoclass:: PSAPair
.. attribute:: npaths
int, total number of paths in the comparison in which *this*
:class:`PSAPair` was generated
.. attribute:: matrix_id
(int, int), (row, column) indices of the location of *this*
:class:`PSAPair` in the corresponding pairwise distance matrix
.. attribute:: pair_id
int, ID of *this* :class:`PSAPair` (the pair_id:math:`^\text{th}`
comparison) in the distance vector corresponding to the pairwise distance
matrix
.. attribute:: nearest_neighbors
dict, contains the nearest neighbors by frame index and the
nearest neighbor distances for each path in *this* :class:`PSAPair`
.. attribute:: hausdorff_pair
dict, contains the frame indices of the Hausdorff pair for each path in
*this* :class:`PSAPair` and the corresponding (Hausdorff) distance
.. autoclass:: PSAnalysis
:members:
.. attribute:: universes
list of :class:`MDAnalysis.Universe` objects containing trajectories
.. attribute:: u_reference
:class:`MDAnalysis.Universe` object containing a reference structure
.. attribute:: ref_select
string, selection for
:meth:`~MDAnalysis.core.groups.AtomGroup.select_atoms` to select frame
from :attr:`PSAnalysis.u_reference`
.. attribute:: path_select
string, selection for
:meth:`~MDAnalysis.core.groups.AtomGroup.select_atoms` to select atoms
to compose :attr:`Path.path`
.. attribute:: ref_frame
int, frame index to select frame from :attr:`Path.u_reference`
.. attribute:: filename
string, name of file to store calculated distance matrix
(:attr:`PSAnalysis.D`)
.. attribute:: paths
list of :class:`numpy.ndarray` objects representing the set/ensemble of
fitted trajectories
.. attribute:: D
string, name of file to store calculated distance matrix
(:attr:`PSAnalysis.D`)
.. Markup definitions
.. ------------------
..
.. |3Dp| replace:: :math:`N_p \times N \times 3`
.. |2Dp| replace:: :math:`N_p \times (3N)`
.. |3Dq| replace:: :math:`N_q \times N \times 3`
.. |2Dq| replace:: :math:`N_q \times (3N)`
.. |3D| replace:: :math:`N_p\times N\times 3`
.. |2D| replace:: :math:`N_p\times 3N`
.. |Np| replace:: :math:`N_p`
"""
from __future__ import division, absolute_import, print_function
import six
from six.moves import range, cPickle, zip
from six import raise_from, string_types
import os
import warnings
import numbers
import numpy as np
from scipy import spatial, cluster
from scipy.spatial.distance import directed_hausdorff
import matplotlib
import MDAnalysis
import MDAnalysis.analysis.align
from MDAnalysis import NoDataError
from MDAnalysis.lib.util import deprecate
import logging
logger = logging.getLogger('MDAnalysis.analysis.psa')
from ..due import due, Doi
due.cite(Doi("10.1371/journal.pcbi.1004568"),
description="Path Similarity Analysis algorithm and implementation",
path="MDAnalysis.analysis.psa",
cite_module=True)
del Doi
def get_path_metric_func(name):
"""Selects a path metric function by name.
Parameters
----------
name : str
name of path metric
Returns
-------
path_metric : function
The path metric function specified by *name* (if found).
"""
path_metrics = {
'hausdorff' : hausdorff,
'weighted_average_hausdorff' : hausdorff_wavg,
'average_hausdorff' : hausdorff_avg,
'hausdorff_neighbors' : hausdorff_neighbors,
'discrete_frechet' : discrete_frechet
}
try:
return path_metrics[name]
except KeyError as key:
raise_from(
KeyError(
'Path metric "{}" not found. Valid selections: {}'.format(
key,
" ".join('"{}"'.format(n) for n in path_metrics.keys())
)
),
None)
def sqnorm(v, axis=None):
"""Compute the sum of squares of elements along specified axes.
Parameters
----------
v : numpy.ndarray
coordinates
axes : None / int / tuple (optional)
Axes or axes along which a sum is performed. The default
(*axes* = ``None``) performs a sum over all the dimensions of
the input array. The value of *axes* may be negative, in
which case it counts from the last axis to the zeroth axis.
Returns
-------
float
the sum of the squares of the elements of `v` along `axes`
"""
return np.sum(v*v, axis=axis)
def get_msd_matrix(P, Q, axis=None):
r"""Generate the matrix of pairwise mean-squared deviations between paths.
The MSDs between all pairs of points in `P` and `Q` are
calculated, each pair having a point from `P` and a point from
`Q`.
`P` (`Q`) is a :class:`numpy.ndarray` of :math:`N_p` (:math:`N_q`) time
steps, :math:`N` atoms, and :math:`3N` coordinates (e.g.,
:attr:`MDAnalysis.core.groups.AtomGroup.positions`). The pairwise MSD
matrix has dimensions :math:`N_p` by :math:`N_q`.
Parameters
----------
P : numpy.ndarray
the points in the first path
Q : numpy.ndarray
the points in the second path
Returns
-------
msd_matrix : numpy.ndarray
matrix of pairwise MSDs between points in `P` and points
in `Q`
Notes
-----
We calculate the MSD matrix
.. math::
M_{ij} = ||p_i - q_j||^2
where :math:`p_i \in P` and :math:`q_j \in Q`.
"""
return np.asarray([sqnorm(p - Q, axis=axis) for p in P])
def reshaper(path, axis):
"""Flatten path when appropriate to facilitate calculations
requiring two dimensional input.
"""
if len(axis) > 1:
path = path.reshape(len(path), -1)
return path
def get_coord_axes(path):
"""Return the number of atoms and the axes corresponding to atoms
and coordinates for a given path.
The `path` is assumed to be a :class:`numpy.ndarray` where the 0th axis
corresponds to a frame (a snapshot of coordinates). The :math:`3N`
(Cartesian) coordinates are assumed to be either:
1. all in the 1st axis, starting with the x,y,z coordinates of the
first atom, followed by the *x*,*y*,*z* coordinates of the 2nd, etc.
2. in the 1st *and* 2nd axis, where the 1st axis indexes the atom
number and the 2nd axis contains the *x*,*y*,*z* coordinates of
each atom.
Parameters
----------
path : numpy.ndarray
representing a path
Returns
-------
(int, (int, ...))
the number of atoms and the axes containing coordinates
"""
path_dimensions = len(path.shape)
if path_dimensions == 3:
N = path.shape[1]
axis = (1,2) # 1st axis: atoms, 2nd axis: x,y,z coords
elif path_dimensions == 2:
# can use mod to check if total # coords divisible by 3
N = path.shape[1] / 3
axis = (1,) # 1st axis: 3N structural coords (x1,y1,z1,...,xN,xN,zN)
else:
raise ValueError("Path must have 2 or 3 dimensions; the first "
"dimensions (axis 0) must correspond to frames, "
"axis 1 (and axis 2, if present) must contain atomic "
"coordinates.")
return N, axis
def hausdorff(P, Q):
r"""Calculate the symmetric Hausdorff distance between two paths.
The metric used is RMSD, as opposed to the more conventional L2
(Euclidean) norm, because this is convenient for i.e., comparing
protein configurations.
*P* (*Q*) is a :class:`numpy.ndarray` of :math:`N_p` (:math:`N_q`) time
steps, :math:`N` atoms, and :math:`3N` coordinates (e.g.,
:attr:`MDAnalysis.core.groups.AtomGroup.positions`). *P* (*Q*) has
either shape |3Dp| (|3Dq|), or |2Dp| (|2Dq|) in flattened form.
Note that reversing the path does not change the Hausdorff distance.
Parameters
----------
P : numpy.ndarray
the points in the first path
Q : numpy.ndarray
the points in the second path
Returns
-------
float
the Hausdorff distance between paths `P` and `Q`
Example
-------
Calculate the Hausdorff distance between two halves of a trajectory:
>>> from MDAnalysis.tests.datafiles import PSF, DCD
>>> u = Universe(PSF,DCD)
>>> mid = len(u.trajectory)/2
>>> ca = u.select_atoms('name CA')
>>> P = numpy.array([
... ca.positions for _ in u.trajectory[:mid:]
... ]) # first half of trajectory
>>> Q = numpy.array([
... ca.positions for _ in u.trajectory[mid::]
... ]) # second half of trajectory
>>> hausdorff(P,Q)
4.7786639840135905
>>> hausdorff(P,Q[::-1]) # hausdorff distance w/ reversed 2nd trajectory
4.7786639840135905
Notes
-----
:func:`scipy.spatial.distance.directed_hausdorff` is an optimized
implementation of the early break algorithm of [Taha2015]_; the
latter code is used here to calculate the symmetric Hausdorff
distance with an RMSD metric
References
----------
.. [Taha2015] A. A. Taha and A. Hanbury. An efficient algorithm for
calculating the exact Hausdorff distance. IEEE Transactions On Pattern
Analysis And Machine Intelligence, 37:2153-63, 2015.
"""
N_p, axis_p = get_coord_axes(P)
N_q, axis_q = get_coord_axes(Q)
if N_p != N_q:
raise ValueError("P and Q must have matching sizes")
P = reshaper(P, axis_p)
Q = reshaper(Q, axis_q)
return max(directed_hausdorff(P, Q)[0],
directed_hausdorff(Q, P)[0]) / np.sqrt(N_p)
def hausdorff_wavg(P, Q):
r"""Calculate the weighted average Hausdorff distance between two paths.
*P* (*Q*) is a :class:`numpy.ndarray` of :math:`N_p` (:math:`N_q`) time
steps, :math:`N` atoms, and :math:`3N` coordinates (e.g.,
:attr:`MDAnalysis.core.groups.AtomGroup.positions`). *P* (*Q*) has
either shape |3Dp| (|3Dq|), or |2Dp| (|2Dq|) in flattened form. The nearest
neighbor distances for *P* (to *Q*) and those of *Q* (to *P*) are averaged
individually to get the average nearest neighbor distance for *P* and
likewise for *Q*. These averages are then summed and divided by 2 to get a
measure that gives equal weight to *P* and *Q*.
Parameters
----------
P : numpy.ndarray
the points in the first path
Q : numpy.ndarray
the points in the second path
Returns
-------
float
the weighted average Hausdorff distance between paths `P` and `Q`
Example
-------
>>> from MDAnalysis import Universe
>>> from MDAnalysis.tests.datafiles import PSF, DCD
>>> u = Universe(PSF,DCD)
>>> mid = len(u.trajectory)/2
>>> ca = u.select_atoms('name CA')
>>> P = numpy.array([
... ca.positions for _ in u.trajectory[:mid:]
... ]) # first half of trajectory
>>> Q = numpy.array([
... ca.positions for _ in u.trajectory[mid::]
... ]) # second half of trajectory
>>> hausdorff_wavg(P,Q)
2.5669644353703447
>>> hausdorff_wavg(P,Q[::-1]) # weighted avg hausdorff dist w/ Q reversed
2.5669644353703447
Notes
-----
The weighted average Hausdorff distance is not a true metric (it does not
obey the triangle inequality); see [Seyler2015]_ for further details.
"""
N, axis = get_coord_axes(P)
d = get_msd_matrix(P, Q, axis=axis)
out = 0.5*( np.mean(np.amin(d,axis=0)) + np.mean(np.amin(d,axis=1)) )
return ( out / N )**0.5
def hausdorff_avg(P, Q):
r"""Calculate the average Hausdorff distance between two paths.
*P* (*Q*) is a :class:`numpy.ndarray` of :math:`N_p` (:math:`N_q`) time
steps, :math:`N` atoms, and :math:`3N` coordinates (e.g.,
:attr:`MDAnalysis.core.groups.AtomGroup.positions`). *P* (*Q*) has
either shape |3Dp| (|3Dq|), or |2Dp| (|2Dq|) in flattened form. The nearest
neighbor distances for *P* (to *Q*) and those of *Q* (to *P*) are all
averaged together to get a mean nearest neighbor distance. This measure
biases the average toward the path that has more snapshots, whereas weighted
average Hausdorff gives equal weight to both paths.
Parameters
----------
P : numpy.ndarray
the points in the first path
Q : numpy.ndarray
the points in the second path
Returns
-------
float
the average Hausdorff distance between paths `P` and `Q`
Example
-------
>>> from MDAnalysis.tests.datafiles import PSF, DCD
>>> u = Universe(PSF,DCD)
>>> mid = len(u.trajectory)/2
>>> ca = u.select_atoms('name CA')
>>> P = numpy.array([
... ca.positions for _ in u.trajectory[:mid:]
... ]) # first half of trajectory
>>> Q = numpy.array([
... ca.positions for _ in u.trajectory[mid::]
... ]) # second half of trajectory
>>> hausdorff_avg(P,Q)
2.5669646575869005
>>> hausdorff_avg(P,Q[::-1]) # hausdorff distance w/ reversed 2nd trajectory
2.5669646575869005
Notes
-----
The average Hausdorff distance is not a true metric (it does not obey the
triangle inequality); see [Seyler2015]_ for further details.
"""
N, axis = get_coord_axes(P)
d = get_msd_matrix(P, Q, axis=axis)
out = np.mean( np.append( np.amin(d,axis=0), np.amin(d,axis=1) ) )
return ( out / N )**0.5
def hausdorff_neighbors(P, Q):
r"""Find the Hausdorff neighbors of two paths.
*P* (*Q*) is a :class:`numpy.ndarray` of :math:`N_p` (:math:`N_q`) time
steps, :math:`N` atoms, and :math:`3N` coordinates (e.g.,
:attr:`MDAnalysis.core.groups.AtomGroup.positions`). *P* (*Q*) has
either shape |3Dp| (|3Dq|), or |2Dp| (|2Dq|) in flattened form.
Parameters
----------
P : numpy.ndarray
the points in the first path
Q : numpy.ndarray
the points in the second path
Returns
-------
dict
dictionary of two pairs of numpy arrays, the first pair (key
"frames") containing the indices of (Hausdorff) nearest
neighbors for `P` and `Q`, respectively, the second (key
"distances") containing (corresponding) nearest neighbor
distances for `P` and `Q`, respectively
Notes
-----
- Hausdorff neighbors are those points on the two paths that are separated by
the Hausdorff distance. They are the farthest nearest neighbors and are
maximally different in the sense of the Hausdorff distance [Seyler2015]_.
- :func:`scipy.spatial.distance.directed_hausdorff` can also provide the
hausdorff neighbors.
"""
N, axis = get_coord_axes(P)
d = get_msd_matrix(P, Q, axis=axis)
nearest_neighbors = {
'frames' : (np.argmin(d, axis=1), np.argmin(d, axis=0)),
'distances' : ((np.amin(d,axis=1)/N)**0.5, (np.amin(d, axis=0)/N)**0.5)
}
return nearest_neighbors
def discrete_frechet(P, Q):
r"""Calculate the discrete Fréchet distance between two paths.
*P* (*Q*) is a :class:`numpy.ndarray` of :math:`N_p` (:math:`N_q`) time
steps, :math:`N` atoms, and :math:`3N` coordinates (e.g.,
:attr:`MDAnalysis.core.groups.AtomGroup.positions`). *P* (*Q*) has
either shape |3Dp| (|3Dq|), or :|2Dp| (|2Dq|) in flattened form.
Parameters
----------
P : numpy.ndarray
the points in the first path
Q : numpy.ndarray
the points in the second path
Returns
-------
float
the discrete Fréchet distance between paths *P* and *Q*
Example
-------
Calculate the discrete Fréchet distance between two halves of a
trajectory.
>>> u = Universe(PSF,DCD)
>>> mid = len(u.trajectory)/2
>>> ca = u.select_atoms('name CA')
>>> P = np.array([
... ca.positions for _ in u.trajectory[:mid:]
... ]) # first half of trajectory
>>> Q = np.array([
... ca.positions for _ in u.trajectory[mid::]
... ]) # second half of trajectory
>>> discrete_frechet(P,Q)
4.7786639840135905
>>> discrete_frechet(P,Q[::-1]) # frechet distance w/ 2nd trj reversed 2nd
6.8429011177113832
Note that reversing the direction increased the Fréchet distance:
it is sensitive to the direction of the path.
Notes
-----
The discrete Fréchet metric is an approximation to the continuous Fréchet
metric [Frechet1906]_ [Alt1995]_. The calculation of the continuous
Fréchet distance is implemented with the dynamic programming algorithm of
[EiterMannila1994]_ [EiterMannila1997]_.
References
----------
.. [Frechet1906] M. Fréchet. Sur quelques points du calcul
fonctionnel. Rend. Circ. Mat. Palermo, 22(1):1–72, Dec. 1906.
.. [Alt1995] H. Alt and M. Godau. Computing the Fréchet distance between
two polygonal curves. Int J Comput Geometry & Applications,
5(01n02):75–91, 1995. doi: `10.1142/S0218195995000064`_
.. _`10.1142/S0218195995000064`: http://doi.org/10.1142/S0218195995000064
.. [EiterMannila1994] T. Eiter and H. Mannila. Computing discrete Fréchet
distance. Technical Report CD-TR 94/64, Christian Doppler Laboratory for
Expert Systems, Technische Universität Wien, Wien, 1994.
.. [EiterMannila1997] T. Eiter and H. Mannila. Distance measures for point
sets and their computation. Acta Informatica, 34:109–133, 1997. doi: `10.1007/s002360050075`_.
.. _10.1007/s002360050075: http://doi.org/10.1007/s002360050075
"""
N, axis = get_coord_axes(P)
Np, Nq = len(P), len(Q)
d = get_msd_matrix(P, Q, axis=axis)
ca = -np.ones((Np, Nq))
def c(i, j):
"""Compute the coupling distance for two partial paths formed by *P* and
*Q*, where both begin at frame 0 and end (inclusive) at the respective
frame indices :math:`i-1` and :math:`j-1`. The partial path of *P* (*Q*)
up to frame *i* (*j*) is formed by the slicing ``P[0:i]`` (``Q[0:j]``).
:func:`c` is called recursively to compute the coupling distance
between the two full paths *P* and *Q* (i.e., the discrete Frechet
distance) in terms of coupling distances between their partial paths.
Parameters
----------
i : int
partial path of *P* through final frame *i-1*
j : int
partial path of *Q* through final frame *j-1*
Returns
-------
dist : float
the coupling distance between partial paths `P[0:i]` and `Q[0:j]`
"""
if ca[i,j] != -1 :
return ca[i,j]
if i > 0:
if j > 0:
ca[i,j] = max( min(c(i-1,j),c(i,j-1),c(i-1,j-1)), d[i,j] )
else:
ca[i,j] = max( c(i-1,0), d[i,0] )
elif j > 0:
ca[i,j] = max( c(0,j-1), d[0,j] )
else:
ca[i,j] = d[0,0]
return ca[i,j]
return (c(Np-1, Nq-1) / N)**0.5
def dist_mat_to_vec(N, i, j):
"""Convert distance matrix indices (in the upper triangle) to the index of
the corresponding distance vector.
This is a convenience function to locate distance matrix elements (and the
pair generating it) in the corresponding distance vector. The row index *j*
should be greater than *i+1*, corresponding to the upper triangle of the
distance matrix.
Parameters
----------
N : int
size of the distance matrix (of shape *N*-by-*N*)
i : int
row index (starting at 0) of the distance matrix
j : int
column index (starting at 0) of the distance matrix
Returns
-------
int
index (of the matrix element) in the corresponding distance vector
"""
if not (isinstance(N, numbers.Integral) and isinstance(i, numbers.Integral)
and isinstance(j, numbers.Integral)):
raise ValueError("N, i, j all must be of type int")
if i < 0 or j < 0 or N < 2:
raise ValueError("Matrix indices are invalid; i and j must be greater "
"than 0 and N must be greater the 2")
if (j > i and (i > N - 1 or j > N)) or (j < i and (i > N or j > N - 1)):
raise ValueError("Matrix indices are out of range; i and j must be "
"less than N = {0:d}".format(N))
if j > i:
return (N*i) + j - (i+2)*(i+1) // 2 # old-style division for int output
elif j < i:
warnings.warn("Column index entered (j = {:d} is smaller than row "
"index (i = {:d}). Using symmetric element in upper "
"triangle of distance matrix instead: i --> j, "
"j --> i".format(j, i))
return (N*j) + i - (j+2)*(j+1) // 2 # old-style division for int output
else:
raise ValueError("Error in processing matrix indices; i and j must "
"be integers less than integer N = {0:d} such that"
" j >= i+1.".format(N))
class Path(object):
"""Represent a path based on a :class:`~MDAnalysis.core.universe.Universe`.
Pre-process a :class:`Universe` object: (1) fit the trajectory to a
reference structure, (2) convert fitted time series to a
:class:`numpy.ndarray` representation of :attr:`Path.path`.
The analysis is performed with :meth:`PSAnalysis.run` and stores the result
in the :class:`numpy.ndarray` distance matrix :attr:`PSAnalysis.D`.
:meth:`PSAnalysis.run` also generates a fitted trajectory and path from
alignment of the original trajectories to a reference structure.
.. versionadded:: 0.9.1
"""
def __init__(self, universe, reference, ref_select='name CA',
path_select='all', ref_frame=0):
"""Setting up trajectory alignment and fitted path generation.
Parameters
----------
universe : Universe
:class:`MDAnalysis.Universe` object containing a trajectory
reference : Universe
reference structure (uses `ref_frame` from the trajectory)
ref_select : str or dict or tuple (optional)
The selection to operate on for rms fitting; can be one of:
1. any valid selection string for
:meth:`~MDAnalysis.core.groups.AtomGroup.select_atoms` that
produces identical selections in *mobile* and *reference*; or
2. a dictionary ``{'mobile':sel1, 'reference':sel2}`` (the
:func:`MDAnalysis.analysis.align.fasta2select` function returns
such a dictionary based on a ClustalW_ or STAMP_ sequence
alignment); or
3. a tuple ``(sel1, sel2)``
When using 2. or 3. with *sel1* and *sel2* then these selections
can also each be a list of selection strings (to generate an
AtomGroup with defined atom order as described under
:ref:`ordered-selections-label`).
ref_frame : int
frame index to select the coordinate frame from
`ref_select.trajectory`
path_select : selection_string
atom selection composing coordinates of (fitted) path; if ``None``
then `path_select` is set to `ref_select` [``None``]
"""
self.u_original = universe
self.u_reference = reference
self.ref_select = ref_select
self.ref_frame = ref_frame
self.path_select = path_select
self.top_name = self.u_original.filename
self.trj_name = self.u_original.trajectory.filename
self.newtrj_name = None
self.u_fitted = None
self.path = None
self.natoms = None
def fit_to_reference(self, filename=None, prefix='', postfix='_fit',
rmsdfile=None, targetdir=os.path.curdir,
weights=None, tol_mass=0.1):
"""Align each trajectory frame to the reference structure
Parameters
----------
filename : str (optional)
file name for the RMS-fitted trajectory or pdb; defaults to the
original trajectory filename (from :attr:`Path.u_original`) with
`prefix` prepended
prefix : str (optional)
prefix for auto-generating the new output filename
rmsdfile : str (optional)
file name for writing the RMSD time series [``None``]
weights : {"mass", ``None``} or array_like (optional)
choose weights. With ``"mass"`` uses masses as weights; with
``None`` weigh each atom equally. If a float array of the same
length as the selected AtomGroup is provided, use each element of
the `array_like` as a weight for the corresponding atom in the
AtomGroup.
tol_mass : float (optional)
Reject match if the atomic masses for matched atoms differ by more
than `tol_mass` [0.1]
Returns
-------
Universe
:class:`MDAnalysis.Universe` object containing a fitted trajectory
Notes
-----
Uses :class:`MDAnalysis.analysis.align.AlignTraj` for the fitting.
.. deprecated:: 0.16.1
Instead of ``mass_weighted=True`` use new ``weights='mass'``;
refactored to fit with AnalysisBase API
.. versionchanged:: 0.17.0
Deprecated keyword `mass_weighted` was removed.
"""
head, tail = os.path.split(self.trj_name)
oldname, ext = os.path.splitext(tail)
filename = filename or oldname
self.newtrj_name = os.path.join(targetdir, filename + postfix + ext)
self.u_reference.trajectory[self.ref_frame] # select frame from ref traj
aligntrj = MDAnalysis.analysis.align.AlignTraj(self.u_original,
self.u_reference,
select=self.ref_select,
filename=self.newtrj_name,
prefix=prefix,
weights=weights,
tol_mass=tol_mass).run()
if rmsdfile is not None:
aligntrj.save(rmsdfile)
return MDAnalysis.Universe(self.top_name, self.newtrj_name)
def to_path(self, fitted=False, select=None, flat=False):
r"""Generates a coordinate time series from the fitted universe
trajectory.
Given a selection of *N* atoms from *select*, the atomic positions for
each frame in the fitted universe (:attr:`Path.u_fitted`) trajectory
(with |Np| total frames) are appended sequentially to form a 3D or 2D
(if *flat* is ``True``) :class:`numpy.ndarray` representation of the
fitted trajectory (with dimensions |3D| or |2D|, respectively).
Parameters
----------
fitted : bool (optional)
construct a :attr:`Path.path` from the :attr:`Path.u_fitted`
trajectory; if ``False`` then :attr:`Path.path` is generated with
the trajectory from :attr:`Path.u_original` [``False``]
select : str (optional)
the selection for constructing the coordinates of each frame in
:attr:`Path.path`; if ``None`` then :attr:`Path.path_select`
is used, else it is overridden by *select* [``None``]
flat : bool (optional)
represent :attr:`Path.path` as a 2D (|2D|) :class:`numpy.ndarray`;
if ``False`` then :attr:`Path.path` is a 3D (|3D|)
:class:`numpy.ndarray` [``False``]
Returns
-------
numpy.ndarray
representing a time series of atomic positions of an
:class:`MDAnalysis.core.groups.AtomGroup` selection from
:attr:`Path.u_fitted.trajectory`
"""
select = select if select is not None else self.path_select
if fitted:
if not isinstance(self.u_fitted, MDAnalysis.Universe):
raise TypeError("Fitted universe not found. Generate a fitted " +
"universe with fit_to_reference() first, or explicitly "+
"set argument \"fitted\" to \"False\" to generate a " +
"path from the original universe.")
u = self.u_fitted
else:
u = self.u_original
frames = u.trajectory
atoms = u.select_atoms(select)
self.natoms = len(atoms)
frames.rewind()
if flat:
return np.array([atoms.positions.flatten() for _ in frames])
else:
return np.array([atoms.positions for _ in frames])
def run(self, align=False, filename=None, postfix='_fit', rmsdfile=None,
targetdir=os.path.curdir, weights=None, tol_mass=0.1,
flat=False):
r"""Generate a path from a trajectory and reference structure.
As part of the path generation, the trajectory can be superimposed
("aligned") to a reference structure if specified.
This is a convenience method to generate a fitted trajectory from an
inputted universe (:attr:`Path.u_original`) and reference structure
(:attr:`Path.u_reference`). :meth:`Path.fit_to_reference` and
:meth:`Path.to_path` are used consecutively to generate a new universe
(:attr:`Path.u_fitted`) containing the fitted trajectory along with the
corresponding :attr:`Path.path` represented as an
:class:`numpy.ndarray`. The method returns a tuple of the topology name
and new trajectory name, which can be fed directly into an
:class:`MDAnalysis.Universe` object after unpacking the tuple using the
``*`` operator, as in
``MDAnalysis.Universe(*(top_name, newtraj_name))``.
Parameters
----------
align : bool (optional)
Align trajectory to atom selection :attr:`Path.ref_select` of
:attr:`Path.u_reference`. If ``True``, a universe containing an
aligned trajectory is produced with :meth:`Path.fit_to_reference`
[``False``]
filename : str (optional)
filename for the RMS-fitted trajectory or pdb; defaults to the
original trajectory filename (from :attr:`Path.u_original`) with
*prefix* prepended
postfix : str (optional)
prefix for auto-generating the new output filename
rmsdfile : str (optional)
file name for writing the RMSD time series [``None``]
weights : {"mass", ``None``} or array_like (optional)
choose weights. With ``"mass"`` uses masses as weights; with
``None`` weigh each atom equally. If a float array of the same
length as the selected AtomGroup is provided, use each element of
the `array_like` as a weight for the corresponding atom in the
AtomGroup.
tol_mass : float (optional)
Reject match if the atomic masses for matched atoms differ by more
than *tol_mass* [0.1]