Center of mass/gravity for all residues of a ResidueGroup #1053

mimischi · 2016-10-26T18:55:15Z

Expected behaviour

Calling the methods center_of_mass or center_of_gravity on a ResidueGroup I would expect to retrieve the CoM/CoG of each residue in an array:

In [1]: import MDAnalysis as mda

In [2]: from MDAnalysis.tests.datafiles import PSF, DCD

In [3]: u = mda.Universe(PSF, DCD)

In [4]: protein = u.select_atoms('protein')

In [5]: protein.residues.center_of_mass()
Out [5]: [ 10.52567673   9.49548312  -8.15335145]
[ 13.90618512   4.59222315  -7.35309049]
[ 8.12902093  6.03077052 -3.66130656]
....
[  7.76611535  16.02359343   0.44614907]
[  7.75536069  14.75541562  -4.00511608]
[  6.74699649  18.21034743  -6.73924321]

Actual behaviour

These methods return the CoM/CoG of the whole selection, just like calling protein.center_of_mass() or protein.atoms.center_of_mass():

In [1]: import MDAnalysis as mda

In [2]: from MDAnalysis.tests.datafiles import PSF, DCD

In [3]: u = mda.Universe(PSF, DCD)

In [4]: protein = u.select_atoms('protein')

In [5]: protein.residues.center_of_mass()
Out [5]: array([-0.01094035,  0.05727601, -0.12885778])

Current solution

This works, but seems somehow slow when called for each frame in a bigger system.

for res in protein.residues:
    print res.center_of_mass()

Currently version of MDAnalysis:

0.15.0

The text was updated successfully, but these errors were encountered:

orbeckst · 2016-10-27T22:53:13Z

@richardjgowers / @dotsdl how are we handling ResidueGroup.center_of_mass() in #363?

richardjgowers · 2016-10-28T09:42:42Z

So currently *Group.center_of_mass == *Group.atoms.center_of_mass() like in develop. It could be changed however..

orbeckst · 2016-10-28T18:07:11Z

There was some discussion in #411 on how "aggregate" functions should work at the SegmentGroup and ResidueGroup level. I think we concluded that singular method such as g.center_of_mass() should always be equivalent to g.atoms.center_of_mass().

I don't think we introduced "plural" methods in #363 such as g.centers_of_mass() that would do

np.array([member.center_of_mass() for member in g])

Admittedly, having the plural s inside the method name as I wrote here is pretty confusing although grammatically correct. There was also the idea #385 (comment) to use something like ResidueGroup.byres("center_of_mass") to force a calculation on a per-residue basis.

In all cases, we would need to think if there are additional performance enhancements that we could do (in addition to the syntactic sugar).

@mimischi , what's your primary concern: performance or ease of writing the operation?

mimischi · 2016-10-30T10:30:46Z

@orbeckst I was wondering about the performance.

Actually I want to calculate the distance between the center of mass of residues and a reference point at every time frame. Doing that loop, as mentioned above, and going over every residue to retrieve the value just seems like a slow task.

orbeckst · 2016-11-22T06:37:07Z

Actually I want to calculate the distance between the center of mass of residues and a reference point at every time frame. Doing that loop, as mentioned above, and going over every residue to retrieve the value just seems like a slow task.

At the moment I can't think of a nice way to calculate the C.O.M. for a bunch of residues without a loop.

residue_coms =  np.array([r.atoms.center_of_mass() for r in u.select_atoms("protein").residues])

(and this would probably be the under-the-hood implementation of the requested feature). Btw, you might be able to speed-up what you showed by using a list comprehension instead of an explicit for loop.

tylerjereddy · 2016-11-22T17:18:40Z

I think the problem is a fair bit more tractable when all residues have the same shape (number of atoms). If we simplify for the case of the center of geometry (centroid) of each residue for an atomgroup with 12 residues, each of which has 10 atoms, then I suspect we'd want a data structure with this shape: (12, 10, 3). Then, I believe it should be possible to use a standard numpy vectorized ufunc on the appropriate axis to determine the means (centroids). For the center of mass, I suspect we'd just add (one more?) vectorized operation that (multiplies by?) includes the weights for each atom in an array of the same shape.

Ok, so what about for real data with heterogenously-sized residues (amino acids, lipids, etc.)? I suspect that we could simply NaN-fill to produce a homogenous array. Generally, the fast vectorized numpy operations will not work on jagged arrays (which are typed as object). Once the NaN filling is done, we can repeat the above steps for the homogenous array, as numpy tends to gracefully handle / ignore NaN for many array operations & so combining the weights and masses with the appropriate axes / slicing shouldn't be too bad.

The only thing I don't like about this approach is that we need to know the maximum number of atoms for any residue in the given atomgroup so that we can NaN-fill appropriately. In a biomolecular context, I suspect we might get away with a user-adjustable default value that exceeds any reasonable atom count for a conventional amino acid, lipid, or nucleotide / glycan. I wonder if there's a fast way that MDAnalysis could parse this out during i.e., universe / topology initialization -- the other core devs would likely know more about this. That would probably allow automated NaN-filling in a less-awkward / more robust & topology-aware manner.

richardjgowers · 2016-11-22T18:05:06Z

Ragged arrays aren't too impossible with a few extra arrays

# The contents of each Group/Residue
g1 = [10, 12, 13, 14]
g2 = [51, 61, 71]
g3 = [8, 9, 10]

# The size of each group
sizes = [4, 3, 3]
# The contents of each group concatenated
identities = [10, 12, 13, 14, 51, 61, 71, 8, 9, 10] 


offset = 0
for s in sizes:  # loop over groups
    for i in range(s):  # loop over atoms in this group
        atom = identities[offset + i]
    offset += s

tylerjereddy · 2016-11-22T21:37:40Z

Heterogenous arrays aren't impossible at all--I'm just targeting raw performance. That's a nested for loop in pure python, so I'm not sure it would fare much better than the original list comprehension approach. Could be cythonized though, I suppose.

Even with the Nan-fill approach, I'm still thinking that a bit of creativity would be needed to avoid any looping. I guess you'd preallocate an empty array of the appropriate shape for n_residues and then somehow assign / add in the heterogenous coordinates in such a way that the residue atoms don't overlap. But the original and target arrays have a different shape, so that may be trickier than I thought.

orbeckst · 2016-11-30T07:21:02Z

The NaN filling sounds complicated and a bit brittle – aren't glycans single residues? Lipids are, and you can have big lipids.

I would try a hybrid approach for residues: You could group all residues with same number of atoms into arrays, remember the residue indices, work on these blocks, and then assemble results in the correct sequence.

For all waters this should give a good speed-up but even cutting down a protein with 500 residues into 20 blocks with 500/20 = 25 residues each you will probably see improvements.

For segments, where we typically only have O(1) - O(10) we can probably just do the list comprehension.

orbeckst · 2016-11-30T07:24:30Z

Admittedly, having the plural s inside the method name as I wrote here – ag.centers_of_mass() — is pretty confusing.

We can be fancy and use ag.barycenters() (W:barycenter) :-).

tylerjereddy · 2016-11-30T15:17:53Z

aren't glycans single residues?

Glycolipids are usually treated as a single residue in CG. I think that makes more sense than having the molecule split in two or more pieces topologically for various reasons (they are often parametrized as a custom-made unit, and we are often interested in their properties as a unit, etc.). I think for glycoproteins the glycans can be separate residues though, depending on the FF maybe.

The NaN filling sounds complicated and a bit brittle

Maybe. Hard to say without actually trying and comparing different approaches. I suspect there are tradeoffs in terms of performance gained and the assumptions you can introduce. A more brittle solution might be faster because it can make more assumptions or preallocate larger arrays for vectorization.

richardjgowers · 2016-11-30T22:54:53Z

https://gist.github.com/richardjgowers/0a63f12fa207f26de201e586ee22f4d7

@mimischi I put together this to see what the fastest we can do is. It takes ~3.5 ms, and just constructing the AtomGroup from each residue is 3 ms, so it looks like the bottleneck is there now.

orbeckst · 2017-06-14T01:42:30Z

How to do the barycenter when you have a block of identical residues, eg all TIP4P waters with 4 atoms each:

from MDAnalysisTests.datafiles import TPR, XTC
waters = u.select_atoms("resname SOL")

natoms = 4
barycenters = (waters.positions * waters.masses[:, np.newaxis]).reshape(-1, natoms, 3).mean(axis=1) 
   / waters.masses.reshape(-1, natoms).sum(axis=1)[:, np.newaxis]

orbeckst added Component-Core question API labels Nov 22, 2016

tylerjereddy mentioned this issue Jul 27, 2017

ENH: ResidueGroup closest approach #1562

Open

KyleStiers mentioned this issue Feb 15, 2018

Velocity of residue movement calculation question #1778

Closed

richardjgowers mentioned this issue Apr 10, 2018

Feature: AtomGroupGroup #1861

Open

zemanj mentioned this issue May 18, 2018

Added compound parameter to *Group.center_of_mass() #1903

Merged

4 tasks

richardjgowers closed this as completed in #1903 May 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Center of mass/gravity for all residues of a ResidueGroup #1053

Center of mass/gravity for all residues of a ResidueGroup #1053

mimischi commented Oct 26, 2016

orbeckst commented Oct 27, 2016

richardjgowers commented Oct 28, 2016

orbeckst commented Oct 28, 2016

mimischi commented Oct 30, 2016

orbeckst commented Nov 22, 2016

tylerjereddy commented Nov 22, 2016

richardjgowers commented Nov 22, 2016

tylerjereddy commented Nov 22, 2016

orbeckst commented Nov 30, 2016

orbeckst commented Nov 30, 2016

tylerjereddy commented Nov 30, 2016

richardjgowers commented Nov 30, 2016

orbeckst commented Jun 14, 2017 •

edited

Loading

Center of mass/gravity for all residues of a ResidueGroup #1053

Center of mass/gravity for all residues of a ResidueGroup #1053

Comments

mimischi commented Oct 26, 2016

Expected behaviour

Actual behaviour

Current solution

Currently version of MDAnalysis:

orbeckst commented Oct 27, 2016

richardjgowers commented Oct 28, 2016

orbeckst commented Oct 28, 2016

mimischi commented Oct 30, 2016

orbeckst commented Nov 22, 2016

tylerjereddy commented Nov 22, 2016

richardjgowers commented Nov 22, 2016

tylerjereddy commented Nov 22, 2016

orbeckst commented Nov 30, 2016

orbeckst commented Nov 30, 2016

tylerjereddy commented Nov 30, 2016

richardjgowers commented Nov 30, 2016

orbeckst commented Jun 14, 2017 • edited Loading

orbeckst commented Jun 14, 2017 •

edited

Loading