-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
triclinic PBC treatment is not always correct #1475
Comments
I did not look at the code, but I am currently calculating minimum distances between proteins in triclinic boxes. I just compared my MDAnalysis output for one of my systems with an output from |
We noticed problems in my group as well. I'll see that I get an example we can post publicly |
So the The algorithm I'm using is copied from domain.cpp which used to assume that the maximum separation was one box length, but it seems they've changed it to cater for multiple box separations. There's probably a fancy mathematical term for this, but the way minimum image convention work is, it takes away multiples of the x vector of the box (which include y and z components) from the separation vector until the x distance is minimised. The number of multiples is |
Just a FYI I found a nice document explaining what boxes to choose in a simulation with graphics that show how the mapping to a triclinic unit cell works. |
@Tsjerk's thesis is indeed my go-to document for anything unitcell related. |
I also now looked for test data. As far as I can see every MD code out there either doesn't test it or uses historical data as a comparison |
After looking through the mic implementations in various MD codes and into different papers (mostly the thesis of @Tsjerk) I now have a good understanding how to calculate distances in triclinic unit cells. This might sound scary but even in a uniform filled box only maximum 25% of all distances possible are affected. In real simulations comparing proteins the number is likely lower given the simulation box is large enough. But we should implement the distance calculation using a algorithm that looks into all neighboring images. Unfortunately this algorithm is a lot slower and not easily optimized by the compiler. I'm currently working on a specialized SIMD implementation. |
Dear Max,
It seems my thesis was printed 4 years before the book from Tuckerman. It
should be called the Wassenaar algorithm :)
So, yes, it's exactly as you found out, and I could have told you so
beforehand, as it's an obvious consequence. The solution isn't necessarily
checking all distances, though. First of all, there are two cases to
consider. 1. where large distances are unlikely or not important, and 2.
where they are likely and/or important. The first case comprises looking
for contacts or checking minimal distances between selections. The error
will only affect larger distances which are not giving contacts anyway, so
why care. Another situation is analysis of displacement vectors, so
distances between successive frames, where the frame separation should not
allow particles to travel more than half the shortest box length or the
analysis is meaningless whatsoever. The second case is more tricky, but as
you mention you can first do the trick with the box matrix and then see
which distances need to be corrected, by investigating other images. Now it
will be possible to also avoid that to some extent, as you also know where
the distance vector points in the box, from which you can deduce how to
correct it. Not entirely sure how to do that, without spending some time
with pen and paper, but I'm confident it's possible (and much quicker than
checking all images).
Cheers,
Tsjerk
…On Thu, Sep 21, 2017 at 10:57 PM, Max Linke ***@***.***> wrote:
After looking through the mic implementations in various MD codes and into
different papers (mostly the thesis of @Tsjerk <https://github.com/tsjerk>)
I now have a good understanding how to calculate distances in triclinic
unit cells.
The Tuckerman algorithm which is implemented in a slightly different form
in MDAnalysis is also used in major MD engines (openMM
<https://github.com/pandegroup/openmm/blob/53f770f446d124fa47e12225dc1d55158629e405/platforms/cpu/src/CpuNonbondedForceVec4.cpp#L385>,
lammps
<https://github.com/lammps/lammps/blob/638b91bf74c1b1bb5db0337c0ed3cbd1d8918687/src/domain.cpp#L977>,
GROMACS
<https://github.com/gromacs/gromacs/blob/4f423711fabc081226d43beef1ab80c33c25edf8/src/gromacs/mdlib/ns.cpp#L1263>).
That algorithm though is only an approximation and yields correct distances
up to half the shortest box length. Here is a notebook
<https://gist.github.com/kain88-de/d004620a57b08e45812b7f5108a375d7>
illustrating that.
This might sound scary but even in a uniform filled box only maximum 25%
of all distances possible are affected. In real simulations comparing
proteins the number is likely lower given the simulation box is large
enough.
But we should implement the distance calculation using a algorithm that
looks into all neighboring images. Unfortunately this algorithm is a lot
slower and not easily optimized by the compiler. I'm currently working on a
specialized SIMD implementation.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1475 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABIX98cxo1o2KOqjp3_v3KIqvTD0zXwuks5sks3SgaJpZM4OTvxi>
.
--
Tsjerk A. Wassenaar, Ph.D.
|
Maybe yeah. I went with olivers nomenclature. But I'll remember it for the future.
Calculating distances is not only interesting for a contact analysis. For example to calculate SAXS profiles one needs radial distribution functions between particles in the whole simulation box.
I thought that too. My first implementation actually use that idea and had good performance. But I went a little bit obsessive with performance for two days and in my final implementation using manual loop unrolling with SIMD doing the image search was the best option and only about 2 times slower then the Wassenaar algorithm. Due to the loop unrolling a implementation that first uses Wassenaar and corrects the distance if found to be wrong you actually get worse performance. I'm not entirely sure what the reason for that is but modern CPU's are complicated and give unpredictable behavior when writing high performance code. |
I'll be happy to call it the Waassenaar algorithm henceforth and refer to Ch 3, Appendix C of [1], especially with its discussion of its range of applicability. (I edited the top of the thread.) [1] T. A. Wassenaar. Molecular dynamics of sense and sensibility in processing and analysis of data. PhD thesis, University of Groningen, Groningen, The Netherlands, 2006. http://hdl.handle.net/11370/b0c3a19b-9f60-4911-ab23-d9725a2d45a2 |
Is this blocking for 0.17? Ie could we put in the slow method for triclinic pbc so it's correct, then hopefully optimise later? |
No, because the problem is about distances between particles in adjacent cells. After putting everything in a brick, the neighbours still need to be determined through the lattice vectors. It's not the shape of the cell that matters, but the lattice. |
Ah, yes, for the calculation of distances, you are right. I was thinking about the positions of the particles alone. |
Top picture here is the thing to visualise: https://manual.gromacs.org/documentation/2019-rc1/reference-manual/algorithms/periodic-boundary-conditions.html |
I recently looked at lib/include/include/calc_distances.h and I wonder if we are treating triclinic boxes correctly: In calc_distances._calc_distance_array_triclinic() it looks as if the box inverse is calculated in the same way as for orthorhombic boxes:
even though a triclinic
box
contains three vectors where some have off-diagonal non-zero entries:The inverse of this matrix is something else:
Admittedly, I might not quite understand the algorithm that is used in _triclinic_pbc() where only diagonal elements of the inverse matrix are used.
How does this algorithm compare to the general algorithm for remapping into a triclinic unitcell proposed by Waasenaar [1] and later by Tuckerman [2] (see #136)? Tuckerman [2] in Appendix B, Eq B.9 provides the general algorithm for the minimum image convention:
NINT(x)
is the nearest integer function (e.g.numpy.rint()
),h^{-1}
is thematrix-inverse of
h
.Or is the approach taken with minimum_image_triclinic() completely different from the Tuckerman approach?
@richardjgowers can you explain?
NOTE: see the discussion below on the limitations of the "general image convention" (aka Waassenaar algorithm) as discussed in [1] Ch 3, Appendix C.
[1] T. A. Wassenaar. Molecular dynamics of sense and sensibility in processing and analysis of data. PhD thesis, University of Groningen, Groningen, The Netherlands, 2006. http://hdl.handle.net/11370/b0c3a19b-9f60-4911-ab23-d9725a2d45a2
[2] M. E. Tuckerman. Statistical Mechanics: Theory and Molecular Simulation.
Oxford University Press, Oxford, UK, 2010.
The text was updated successfully, but these errors were encountered: