triclinic PBC treatment is not always correct #1475

orbeckst · 2017-07-11T05:31:05Z

I recently looked at lib/include/include/calc_distances.h and I wonder if we are treating triclinic boxes correctly: In calc_distances._calc_distance_array_triclinic() it looks as if the box inverse is calculated in the same way as for orthorhombic boxes:

box_inverse[0] = 1.0 / box[0][0];
box_inverse[1] = 1.0 / box[1][1];
box_inverse[2] = 1.0 / box[2][2];

even though a triclinic box contains three vectors where some have off-diagonal non-zero entries:

x0  x1   x2   
0   y1   y2 
0    0   z3

The inverse of this matrix is something else:

In [16]: from sympy.interactive.printing import init_printing
In [17]: init_printing(use_unicode=False, wrap_line=False, no_global=True)
In [18]: import sympy as sym

In [19]: x0, x1, y1, x2, y2, z2 = sym.symbols("x0, x1, y1, x2, y2, z2")
In [20]: h = sym.Matrix([[x0, x1, x2], [0, y1, y2], [0, 0, z2]])
In [21]: h.inv()
Out[21]:
[1    -x1   x1*y2 - x2*y1]
[--  -----  -------------]
[x0  x0*y1     x0*y1*z2  ]
[                        ]
[     1          -y2     ]
[0    --        -----    ]
[     y1        y1*z2    ]
[                        ]
[                1       ]
[0     0         --      ]
[                z2      ]

In [22]: h
Out[22]:
[x0  x1  x2]
[          ]
[0   y1  y2]
[          ]
[0   0   z2]

Admittedly, I might not quite understand the algorithm that is used in _triclinic_pbc() where only diagonal elements of the inverse matrix are used.

How does this algorithm compare to the general algorithm for remapping into a triclinic unitcell proposed by Waasenaar [1] and later by Tuckerman [2] (see #136)? Tuckerman [2] in Appendix B, Eq B.9 provides the general algorithm for the minimum image convention:

  h := [a, b, c], a=(a1,a2,a3), ... (the matrix of box vectors)
  r_ij := r_i - r_j                 (difference vector)

  s_i = h^{-1} r_i  
  s_ij = s_i - s_j
  s_ij <-- s_ij - NINT(s_ij)        (general minimum image convention)
  r_ij = h s_ij

NINT(x) is the nearest integer function (e.g. numpy.rint()), h^{-1} is the
matrix-inverse of h.

Or is the approach taken with minimum_image_triclinic() completely different from the Tuckerman approach?

@richardjgowers can you explain?

NOTE: see the discussion below on the limitations of the "general image convention" (aka Waassenaar algorithm) as discussed in [1] Ch 3, Appendix C.

[1] T. A. Wassenaar. Molecular dynamics of sense and sensibility in processing and analysis of data. PhD thesis, University of Groningen, Groningen, The Netherlands, 2006. http://hdl.handle.net/11370/b0c3a19b-9f60-4911-ab23-d9725a2d45a2

[2] M. E. Tuckerman. Statistical Mechanics: Theory and Molecular Simulation.
Oxford University Press, Oxford, UK, 2010.

The text was updated successfully, but these errors were encountered:

jbarnoud · 2017-07-11T08:16:03Z

I did not look at the code, but I am currently calculating minimum distances between proteins in triclinic boxes. I just compared my MDAnalysis output for one of my systems with an output from gmx mindist and the result is the same.

kain88-de · 2017-07-11T08:58:52Z

We noticed problems in my group as well. I'll see that I get an example we can post publicly

richardjgowers · 2017-07-11T09:19:47Z

So the box_inverse isn't an inverse (great naming @richardjgowers!), but is only used to calculate the multiple of box vectors to take away from a given separation.

The algorithm I'm using is copied from domain.cpp which used to assume that the maximum separation was one box length, but it seems they've changed it to cater for multiple box separations.

There's probably a fancy mathematical term for this, but the way minimum image convention work is, it takes away multiples of the x vector of the box (which include y and z components) from the separation vector until the x distance is minimised. The number of multiples is nint(vector[0], box_inverse[0]). Then the x distance in the separation vector is fixed. You can then do the same for y which won't affect the x components as the box matrix is 0 for that dimension.

kain88-de · 2017-08-03T12:14:31Z

Just a FYI I found a nice document explaining what boxes to choose in a simulation with graphics that show how the mapping to a triclinic unit cell works.

jbarnoud · 2017-08-03T12:18:16Z

@Tsjerk's thesis is indeed my go-to document for anything unitcell related.

kain88-de · 2017-08-03T12:32:07Z

I also now looked for test data. As far as I can see every MD code out there either doesn't test it or uses historical data as a comparison

kain88-de · 2017-09-21T20:57:52Z

After looking through the mic implementations in various MD codes and into different papers (mostly the thesis of @Tsjerk) I now have a good understanding how to calculate distances in triclinic unit cells.
The Tuckerman algorithm which is implemented in a slightly different form in MDAnalysis is also used in major MD engines (openMM, lammps, GROMACS). That algorithm though is only an approximation and yields correct distances up to half the shortest box length. Here is a notebook illustrating that.

This might sound scary but even in a uniform filled box only maximum 25% of all distances possible are affected. In real simulations comparing proteins the number is likely lower given the simulation box is large enough.

But we should implement the distance calculation using a algorithm that looks into all neighboring images. Unfortunately this algorithm is a lot slower and not easily optimized by the compiler. I'm currently working on a specialized SIMD implementation.

Tsjerk · 2017-09-21T21:24:39Z

Dear Max, It seems my thesis was printed 4 years before the book from Tuckerman. It should be called the Wassenaar algorithm :) So, yes, it's exactly as you found out, and I could have told you so beforehand, as it's an obvious consequence. The solution isn't necessarily checking all distances, though. First of all, there are two cases to consider. 1. where large distances are unlikely or not important, and 2. where they are likely and/or important. The first case comprises looking for contacts or checking minimal distances between selections. The error will only affect larger distances which are not giving contacts anyway, so why care. Another situation is analysis of displacement vectors, so distances between successive frames, where the frame separation should not allow particles to travel more than half the shortest box length or the analysis is meaningless whatsoever. The second case is more tricky, but as you mention you can first do the trick with the box matrix and then see which distances need to be corrected, by investigating other images. Now it will be possible to also avoid that to some extent, as you also know where the distance vector points in the box, from which you can deduce how to correct it. Not entirely sure how to do that, without spending some time with pen and paper, but I'm confident it's possible (and much quicker than checking all images). Cheers, Tsjerk

…

On Thu, Sep 21, 2017 at 10:57 PM, Max Linke ***@***.***> wrote: After looking through the mic implementations in various MD codes and into different papers (mostly the thesis of @Tsjerk <https://github.com/tsjerk>) I now have a good understanding how to calculate distances in triclinic unit cells. The Tuckerman algorithm which is implemented in a slightly different form in MDAnalysis is also used in major MD engines (openMM <https://github.com/pandegroup/openmm/blob/53f770f446d124fa47e12225dc1d55158629e405/platforms/cpu/src/CpuNonbondedForceVec4.cpp#L385>, lammps <https://github.com/lammps/lammps/blob/638b91bf74c1b1bb5db0337c0ed3cbd1d8918687/src/domain.cpp#L977>, GROMACS <https://github.com/gromacs/gromacs/blob/4f423711fabc081226d43beef1ab80c33c25edf8/src/gromacs/mdlib/ns.cpp#L1263>). That algorithm though is only an approximation and yields correct distances up to half the shortest box length. Here is a notebook <https://gist.github.com/kain88-de/d004620a57b08e45812b7f5108a375d7> illustrating that. This might sound scary but even in a uniform filled box only maximum 25% of all distances possible are affected. In real simulations comparing proteins the number is likely lower given the simulation box is large enough. But we should implement the distance calculation using a algorithm that looks into all neighboring images. Unfortunately this algorithm is a lot slower and not easily optimized by the compiler. I'm currently working on a specialized SIMD implementation. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1475 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABIX98cxo1o2KOqjp3_v3KIqvTD0zXwuks5sks3SgaJpZM4OTvxi> .

-- Tsjerk A. Wassenaar, Ph.D.

kain88-de · 2017-09-21T21:43:05Z

It seems my thesis was printed 4 years before the book from Tuckerman. It should be called the Wassenaar algorithm :)

Maybe yeah. I went with olivers nomenclature. But I'll remember it for the future.

The error will only affect larger distances which are not giving contacts anyway,

Calculating distances is not only interesting for a contact analysis. For example to calculate SAXS profiles one needs radial distribution functions between particles in the whole simulation box.

The second case is more tricky, but as you mention you can first do the trick with the box matrix and then see which distances need to be corrected, by investigating other images
Now it will be possible to also avoid that to some extent, as you also know where the distance vector points in the box, from which you can deduce how to correct it. Not entirely sure how to do that, without spending some time with pen and paper, but I'm confident it's possible (and much quicker than checking all images).

I thought that too. My first implementation actually use that idea and had good performance. But I went a little bit obsessive with performance for two days and in my final implementation using manual loop unrolling with SIMD doing the image search was the best option and only about 2 times slower then the Wassenaar algorithm. Due to the loop unrolling a implementation that first uses Wassenaar and corrects the distance if found to be wrong you actually get worse performance. I'm not entirely sure what the reason for that is but modern CPU's are complicated and give unpredictable behavior when writing high performance code.

orbeckst · 2017-09-23T00:27:50Z

I'll be happy to call it the Waassenaar algorithm henceforth and refer to Ch 3, Appendix C of [1], especially with its discussion of its range of applicability.

(I edited the top of the thread.)

[1] T. A. Wassenaar. Molecular dynamics of sense and sensibility in processing and analysis of data. PhD thesis, University of Groningen, Groningen, The Netherlands, 2006. http://hdl.handle.net/11370/b0c3a19b-9f60-4911-ab23-d9725a2d45a2

richardjgowers · 2017-09-24T16:09:59Z

Is this blocking for 0.17? Ie could we put in the slow method for triclinic pbc so it's correct, then hopefully optimise later?

arm61 · 2024-09-20T11:29:15Z

Am I being daft, or can the positions of particles in a triclinic cell be transformed into positions in an orthorhombic cell and, therefore, get around many of these problems (see image)? It is very possible that I am overlooking something.

Tsjerk · 2024-09-20T11:53:15Z

No, because the problem is about distances between particles in adjacent cells. After putting everything in a brick, the neighbours still need to be determined through the lattice vectors. It's not the shape of the cell that matters, but the lattice.

arm61 · 2024-09-20T12:13:07Z

Ah, yes, for the calculation of distances, you are right. I was thinking about the positions of the particles alone.

richardjgowers · 2024-09-20T13:39:39Z

Top picture here is the thing to visualise: https://manual.gromacs.org/documentation/2019-rc1/reference-manual/algorithms/periodic-boundary-conditions.html

orbeckst added the question label Jul 11, 2017

kain88-de mentioned this issue Jul 18, 2017

WIP: add two point distance function #1512

Closed

8 tasks

kain88-de added Component-lib confirmed defect performance Priority-High and removed question labels Sep 21, 2017

kain88-de self-assigned this Sep 21, 2017

orbeckst changed the title ~~review triclinic PBC treatment~~ triclinic PBC treatment is not always correct Sep 23, 2017

kain88-de mentioned this issue Sep 24, 2017

Triclinic cells #1674

Merged

4 tasks

kain88-de closed this as completed in #1674 Sep 26, 2017

HaoZeke mentioned this issue Aug 7, 2024

ENH: Handle other periodic boundary conditions TheochemUI/eOn#176

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triclinic PBC treatment is not always correct #1475

triclinic PBC treatment is not always correct #1475

orbeckst commented Jul 11, 2017 •

edited

Loading

jbarnoud commented Jul 11, 2017

kain88-de commented Jul 11, 2017

richardjgowers commented Jul 11, 2017

kain88-de commented Aug 3, 2017

jbarnoud commented Aug 3, 2017

kain88-de commented Aug 3, 2017

kain88-de commented Sep 21, 2017

Tsjerk commented Sep 21, 2017 via email

kain88-de commented Sep 21, 2017

orbeckst commented Sep 23, 2017 •

edited

Loading

richardjgowers commented Sep 24, 2017

arm61 commented Sep 20, 2024 •

edited

Loading

Tsjerk commented Sep 20, 2024

arm61 commented Sep 20, 2024

richardjgowers commented Sep 20, 2024

triclinic PBC treatment is not always correct #1475

triclinic PBC treatment is not always correct #1475

Comments

orbeckst commented Jul 11, 2017 • edited Loading

jbarnoud commented Jul 11, 2017

kain88-de commented Jul 11, 2017

richardjgowers commented Jul 11, 2017

kain88-de commented Aug 3, 2017

jbarnoud commented Aug 3, 2017

kain88-de commented Aug 3, 2017

kain88-de commented Sep 21, 2017

Tsjerk commented Sep 21, 2017 via email

kain88-de commented Sep 21, 2017

orbeckst commented Sep 23, 2017 • edited Loading

richardjgowers commented Sep 24, 2017

arm61 commented Sep 20, 2024 • edited Loading

Tsjerk commented Sep 20, 2024

arm61 commented Sep 20, 2024

richardjgowers commented Sep 20, 2024

orbeckst commented Jul 11, 2017 •

edited

Loading

orbeckst commented Sep 23, 2017 •

edited

Loading

arm61 commented Sep 20, 2024 •

edited

Loading