ENH: ResidueGroup closest approach #1562

tylerjereddy · 2017-07-27T18:28:03Z

Handling residue objects in a ResidueGroup can be tricky because their (atom) positions can have different numpy array data shapes, precluding some of the most useful vectorization mechanics for efficient calculation, as discussed in the context of residue centroids in #1053.

On a related note, finding the closest approach between any two residues in a given (massive) ResidueGroup can be very tedious. Here's a bit of creativity with indicies and numpy arrays that seems to be quite helpful when you get into the hundreds of thousands of residues range.

Might be something worth considering for internal implementation, or could perhaps inspire an even more efficient approach eventually, since I think there's some interest in emphasizing our capabilities with huge systems. The approach directly interacts with our new topology system & scipy is an official dependency now.

Rough code yanked from a more specific implementation:

all_sel = universe.atoms 
all_residues = all_sel.residues 
n_atoms = all_sel.n_atoms 

# determine the max number of atoms possible in a residue in this             
# system (for KDTree nearest neighbor k-value)                              
# adding 1 to this number should provide a k-value for KDTree that            
# always returns at least one neighbor particle that is not in the            
# given residue   
 29     max_atoms_in_a_residue = 0                                                    
 30     for residue in all_residues:                                                  
 31         current_atoms = residue.atoms.n_atoms                                     
 32         if current_atoms > max_atoms_in_a_residue:                                
 33             max_atoms_in_a_residue = current_atoms   

 35     k_nearest_value = max_atoms_in_a_residue + 1                                  
 36     # I think I need 'exclusion arrays' that provide the atom index               
 37     # limits for the current residue that an atom is in, so that                  
 38     # I can distinguish inter vs. intra-residue particle interactions             
 39     # below                                                                       
 40     per_atom_residue_atom_index_limits = \                                        
 41     np.array([atom.residue.atoms.ix_array[np.array([0,-1])] for atom in all_sel]) 
 42     # per_atom_residue_atom_index_limits should have shape (n_atoms, 2)         
 43     # with each column containing the min & max allowed atom index for          
 44     # an atom in the same residue as the atom in a given row 
 47     list_inter_residue_min_dists = [] 
 48     for ts in universe.trajectory:  
 50         all_coords = all_sel.positions   
 51         # even in the absence of solvent, this system is massive                
 52         # use a k-d tree approach to avoid an outrageously large                
 53         # distance matrix                                                       
 54         tree = cKDTree(all_coords)                                                                                                                    
 55         closest_particle_approaches, index_match = tree.query(all_coords,                                                                             
 56                                                             k=k_nearest_value)
 60         # index_match and closest_particle_approaches both have shape                                                                                 
 61         # (n_atoms, k_nearest_value)   

 63         # use above atom / residue-index related data structures                                                                                      
 64         # to filter k-d tree nearest neighbor data to INTER-residue                                                                                   
 65         # only, and then take the minimum dist in this frame                                                                                          
 66         min_dist = closest_particle_approaches[np.logical_or(index_match <                                                                            
 67         np.broadcast_to(np.expand_dims(per_atom_residue_atom_index_limits[...,0],                                                                     
 68         1),                                                                                                                                           
 69         (n_atoms, k_nearest_value)), index_match >                                                                                                    
 70         np.broadcast_to(np.expand_dims(per_atom_residue_atom_index_limits[...,1],                                                                     
 71         1), (n_atoms, k_nearest_value)))].min()                                                                                                       
 72         list_inter_residue_min_dists.append(min_dist)

It might also be useful to identify the residue indices involved in the closest approach or allow determination of the N-closest residue pairs, etc.

Anyway, this is just a transplant of some potentially useful ideas that may or may not be useful internally in some form -- i.e., ResidueGroup.closest_approach()

The text was updated successfully, but these errors were encountered:

tylerjereddy added the proposal label Jul 27, 2017

orbeckst added the usability label Jul 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: ResidueGroup closest approach #1562

ENH: ResidueGroup closest approach #1562

tylerjereddy commented Jul 27, 2017 •

edited

Loading

ENH: ResidueGroup closest approach #1562

ENH: ResidueGroup closest approach #1562

Comments

tylerjereddy commented Jul 27, 2017 • edited Loading

tylerjereddy commented Jul 27, 2017 •

edited

Loading