You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Handling residue objects in a ResidueGroup can be tricky because their (atom) positions can have different numpy array data shapes, precluding some of the most useful vectorization mechanics for efficient calculation, as discussed in the context of residue centroids in #1053.
On a related note, finding the closest approach between any two residues in a given (massive) ResidueGroup can be very tedious. Here's a bit of creativity with indicies and numpy arrays that seems to be quite helpful when you get into the hundreds of thousands of residues range.
Might be something worth considering for internal implementation, or could perhaps inspire an even more efficient approach eventually, since I think there's some interest in emphasizing our capabilities with huge systems. The approach directly interacts with our new topology system & scipy is an official dependency now.
Rough code yanked from a more specific implementation:
all_sel=universe.atomsall_residues=all_sel.residuesn_atoms=all_sel.n_atoms# determine the max number of atoms possible in a residue in this # system (for KDTree nearest neighbor k-value) # adding 1 to this number should provide a k-value for KDTree that # always returns at least one neighbor particle that is not in the # given residue 29max_atoms_in_a_residue=030forresidueinall_residues:
31current_atoms=residue.atoms.n_atoms32ifcurrent_atoms>max_atoms_in_a_residue:
33max_atoms_in_a_residue=current_atoms35k_nearest_value=max_atoms_in_a_residue+136# I think I need 'exclusion arrays' that provide the atom index 37# limits for the current residue that an atom is in, so that 38# I can distinguish inter vs. intra-residue particle interactions 39# below 40per_atom_residue_atom_index_limits= \
41np.array([atom.residue.atoms.ix_array[np.array([0,-1])] foratominall_sel])
42# per_atom_residue_atom_index_limits should have shape (n_atoms, 2) 43# with each column containing the min & max allowed atom index for 44# an atom in the same residue as the atom in a given row 47list_inter_residue_min_dists= []
48fortsinuniverse.trajectory:
50all_coords=all_sel.positions51# even in the absence of solvent, this system is massive 52# use a k-d tree approach to avoid an outrageously large 53# distance matrix 54tree=cKDTree(all_coords)
55closest_particle_approaches, index_match=tree.query(all_coords,
56k=k_nearest_value)
60# index_match and closest_particle_approaches both have shape 61# (n_atoms, k_nearest_value) 63# use above atom / residue-index related data structures 64# to filter k-d tree nearest neighbor data to INTER-residue 65# only, and then take the minimum dist in this frame 66min_dist=closest_particle_approaches[np.logical_or(index_match<67np.broadcast_to(np.expand_dims(per_atom_residue_atom_index_limits[...,0],
681),
69 (n_atoms, k_nearest_value)), index_match>70np.broadcast_to(np.expand_dims(per_atom_residue_atom_index_limits[...,1],
711), (n_atoms, k_nearest_value)))].min()
72list_inter_residue_min_dists.append(min_dist)
It might also be useful to identify the residue indices involved in the closest approach or allow determination of the N-closest residue pairs, etc.
Anyway, this is just a transplant of some potentially useful ideas that may or may not be useful internally in some form -- i.e., ResidueGroup.closest_approach()
The text was updated successfully, but these errors were encountered:
Handling residue objects in a
ResidueGroup
can be tricky because their (atom) positions can have different numpy array data shapes, precluding some of the most useful vectorization mechanics for efficient calculation, as discussed in the context of residue centroids in #1053.On a related note, finding the closest approach between any two residues in a given (massive)
ResidueGroup
can be very tedious. Here's a bit of creativity with indicies and numpy arrays that seems to be quite helpful when you get into the hundreds of thousands of residues range.Might be something worth considering for internal implementation, or could perhaps inspire an even more efficient approach eventually, since I think there's some interest in emphasizing our capabilities with huge systems. The approach directly interacts with our new topology system & scipy is an official dependency now.
Rough code yanked from a more specific implementation:
It might also be useful to identify the residue indices involved in the closest approach or allow determination of the N-closest residue pairs, etc.
Anyway, this is just a transplant of some potentially useful ideas that may or may not be useful internally in some form -- i.e.,
ResidueGroup.closest_approach()
The text was updated successfully, but these errors were encountered: