`ts.dimensions` object with different trajectory formats #1575

tylerjereddy · 2017-08-03T00:33:00Z

Expected behaviour

The ts.dimensions returned object (the numpy array of values) is NOT linked back to the state of the trajectory -- the property should return a numpy array of dimensions that has no connection to the current state of trajectory iteration once the value invalue = ts.dimensions has been stored i.e., in a list.

Futhermore, this behavior must be consistent between different trajectory formats.

Actual behaviour

I'm so confused because look at this behavior when tracking dimensions over time with two different formats (and for even more confusion, look at what happens to the stored values for the XTC format after iteration ends!!!)

  1 import MDAnalysis                                                               
  2 from MDAnalysisTests.datafiles import GRO, XTC, PSF, DCD                        
  3                                                                                 
  4 u1 = MDAnalysis.Universe(GRO, XTC)                                              
  5 u2 = MDAnalysis.Universe(PSF, DCD)                                              
  6 list_dims_1 = []                                                                
  7 list_dims_2 = []                                                                
  8                                                                                 
  9 for universe, list_dims in zip([u1, u2],                                        
 10                                [list_dims_1, list_dims_2]):                     
 11     for ts in universe.trajectory[:2]:                                          
 12         frame = ts.frame                                                        
 13         dims = ts.dimensions                                                    
 14         dims[0] = frame                                                         
 15         list_dims.append((frame, dims))                                         
 16         print('universe:', universe)                                            
 17         print('frame:', frame)                                                  
 18         print('dims:', dims)                                                    
 19         print('list_dims:', list_dims)                                          
 20                                                                                 
 21 # if the above isn't crazy enough for you                                       
 22 # when the trajectory iteration ends, the                                       
 23 # apparent XTC rewind will reset all the dimension                              
 24 # values assigned to the list to those from                                     
 25 # the first frame !!!!                                                          
 26                                                                                 
 27 for element in list_dims_1:                                                     
 28     print(element)                                                              
 29                                                                                 
 30 for element in list_dims_2:                                                     
 31     print(element)

Output:

universe: <Universe with 47681 atoms>
frame: 0
dims: [  0.          80.01700592  80.01700592  60.          60.          90.        ]
list_dims: [(0, array([  0.        ,  80.01700592,  80.01700592,  60.        ,
        60.        ,  90.        ], dtype=float32))]
universe: <Universe with 47681 atoms>
frame: 1
dims: [  1.          80.13008118  80.13008118  60.          60.          90.        ]
list_dims: [(0, array([  1.        ,  80.13008118,  80.13008118,  60.        ,
        60.        ,  90.        ], dtype=float32)), (1, array([  1.        ,  80.13008118,  80.13008118,  60.        ,
        60.        ,  90.        ], dtype=float32))]
universe: <Universe with 3341 atoms>
frame: 0
dims: [  0.   0.   0.  90.  90.  90.]
list_dims: [(0, array([  0.,   0.,   0.,  90.,  90.,  90.], dtype=float32))]
universe: <Universe with 3341 atoms>
frame: 1
dims: [  1.   0.   0.  90.  90.  90.]
list_dims: [(0, array([  0.,   0.,   0.,  90.,  90.,  90.], dtype=float32)), (1, array([  1.,   0.,   0.,  90.,  90.,  90.], dtype=float32))]


(0, array([ 80.01700592,  80.01700592,  80.01700592,  60.        ,
        60.        ,  90.        ], dtype=float32))
(1, array([ 80.01700592,  80.01700592,  80.01700592,  60.        ,
        60.        ,  90.        ], dtype=float32))
(0, array([  0.,   0.,   0.,  90.,  90.,  90.], dtype=float32))
(1, array([  1.,   0.,   0.,  90.,  90.,  90.], dtype=float32))

This is incredibly confusing to me. Why on earth is the interaction behavior of the XTC returned dimension value so different from DCD returned dimension value. The latter looks sane, but XTC looks wild and has caused me hours of confusing / debugging in a much larger framework context today.

I'm using Python 3.6 and the dev branch.

The text was updated successfully, but these errors were encountered:

tylerjereddy · 2017-08-03T00:38:38Z

Look very carefully at what happens on those append operations with XTC. All numpy arrays stored in the list are reassigned to the current numpy array of dimensions for XTC but not DCD.

kain88-de · 2017-08-03T07:03:06Z

You have been running into a issue with references in python. Unfortunately we seem to handle references for the dimensions differently between formats. For the xtc reader ts.dimensions is retuning a reference that we keep updating when reading a frame. You only store that reference in your list. For the DCD reader we actually have a individual copy. Not sure why that is currently.

The question here is if ts.* should return a reference that gets updated by a reader or should it allocate new memory when a new frame is read.

The dcd has another curious thing. The file you are reading doesn't have any box information. Therefore the unitcell should always be 0, 0, 0, 90, 90, 90 (our default for none given). It's strange that we have a different unit-cell on the second frame.

richardjgowers · 2017-08-03T15:29:18Z

So AtomGroup.positions deliberately returns a copy of positions, and AtomGroup.dimensions should also for consistency (but it looks like it won't right now).

Timestep.positions I'm not completely sure on whether it's meant to return a view/reference/pointer or copy. Accessing the data within the Timestep object is a little more advanced, and I can see that maybe it would be better for this to default to views.

richardjgowers · 2017-08-03T15:30:24Z

Also related (I think @kain88-de mentions this above), some Readers create a new Timestep object each time they read a frame, others will fill the same object. Not sure what this should do either.

tylerjereddy · 2017-08-03T17:16:15Z

@kain88-de There's nothing curious about the second DCD frame. I intentionally reassigned the first element of the unit cell to the frame number so that I could assess if the values were references or not--it was more confusing to deal with reference vs. copy if the unit cell is identical in each frame because the reassignments in the list are confounded by the fact that the unit cell doesn't change in the test file during iteration.

You'll also notice that I intentionally appended the value of ts.frame as well -- and that's not a reference for either format. So, they aren't even consistent in and of themselves.

My take is that we should never return references like this for values a user would typically want to access. I've always used MDAnalysis in a manner that allows me to use pythonic paradigms like appending trajectory-retrieved values to a list.

Regardless of what decision is made, I think we can agree that it should be consistent across all formats and probably all reasonably-accessed property types--a real power of MDAnalysis is the ability to perform reproducible analyses in a format-agnostic manner, and breaking this is why I've labelled a high priority here.

I would suggest that we even try to enforce the policy with a (parametrized?) unit test that crawls through the relevant parameters for the derived readers & their properties & ensures that whatever type of object (reference / value / view) is supposed to be returned is actually returned.

orbeckst · 2017-08-03T22:13:33Z

Following up from #276 (comment)

We should decide clearly if Timestep._unitcell is going to behave like a view (like Timestep.positions) or like a copy. Forpositions we should keep a view for performance reasons; creating big coordinate arrays is really bad. For _unitcell the overhead of creating a copy is small so we could just do that if necessary.

Timestep.dimensions is a property and will either produce a copy on its own by converting _unitcell or hand over _unitcell directly. For consistency between the two cases I suggest to prescribe that dimensions always provides a copy, even if it does not change _unitcell. The basic implementation is then

class Timestep:
  @property
  def dimensions(self):
       return self._unitcell.copy()

EDIT: I would then leave _unitcell as is – basically, whatever works best for the implementation of the reader. Users are not encouraged to use it anyway.

orbeckst · 2017-08-03T22:29:31Z

@tylerjereddy in general I very much agree with your #1575 (comment):

My take is that we should never return references like this for values a user would typically want to access. I've always used MDAnalysis in a manner that allows me to use pythonic paradigms like appending trajectory-retrieved values to a list.

I consider Timestep.positions the one major exception to this approach; a common mistake is, as you know,

timeseries = [ts.positions for ts in u.trajectory]    # does NOT produce a timeseries of different positions

whereas [ag.positions for ts in u.trajectory] will work (as @richardjgowers already said). This could be considered a break in the user interface and is certainly a gotcha and needs to be documented better, especially when we improve the docs #1175 . The docs should show some "best practice" examples, both for the user and for the developers to refer back to.

However, I would consider working with Timestep directly reasonably advanced so I am not overly worried; power users should be able to figure this out. For performance reasons it is important that Timestep.positions remains a view because at least at the moment, AtomGroup.positions slices Timestep.positions – it would be awful if we made two copies at this stage.

If people had very strong feelings about Timestep.positions having to return a copy we could make it do it and then internally use something like Timestep._positions (and the same for velocities and forces).

orbeckst · 2017-08-21T23:33:55Z

Should this still be considered a bug?

What is the action that should be taken on this issue?

Is it still high priority?

kain88-de · 2017-08-22T08:14:04Z

This is a bug. The problem is a principal design deficiency in TimeStep that requires us to create a subclass for every format to support the current model to show the raw dimensions. I didn't have time to have a proper look into TimeStep and look into a solution that doesn't require sub-classing. It will likely be an easy fix but we have to check/change all readers for this.

I think we just need to redefine what is a raw cell data. I hope we can do this without breaking old user code.

richardjgowers · 2018-01-30T13:23:56Z

So as @kain88-de has said, it's impossible to make ts.dimensions return a view currently, as some Timesteps store _unitcell in a proprietary format.

I'd like to make ts.dimensions return a view though, so ts.positions and ts.dimensions (and ts.whatever) all behave in the same "raw" way. For this we'd have to remove _unitcell and make Readers & Writers do whatever conversions necessary to handle a particular format. Ie shift the burden of weird box formats from Timestep to Reader/Writer. This might actually simplify things overall.

Thoughts?

kain88-de · 2018-01-30T13:46:15Z

Ie shift the burden of weird box formats from Timestep to Reader/Writer. This might actually simplify things overall.

This is actually already done for the XDR formats of GROMACS for simplicity. Personally, I would also like to get rid of sub classing timestep. It means less moving parts to get right and potentially works towards simplifying the readers to implement a read/write function that does all necessary conversions.

richardjgowers · 2018-01-30T14:32:49Z

Yeah exactly, fewer classes is always better. We'd just need to make sure we never lose precision when converting etc (ie avoid 89.99999 angles)

…

On Tue, 30 Jan 2018, 13:46 Max Linke, ***@***.***> wrote: Ie shift the burden of weird box formats from Timestep to Reader/Writer. This might actually simplify things overall. This is actually already done for the XDR formats of GROMACS for simplicity. Personally, I would also like to get rid of sub classing timestep. It means less moving parts to get right and potentially works towards simplifying the readers to implement a read/write function that does all necessary conversions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1575 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AI0jB4vcY00F-GSQ-z8Sj8yMfd5qt51wks5tPx0ngaJpZM4Or0ft> .

IAlibay · 2021-08-17T20:44:09Z

@richardjgowers is this done now?

tylerjereddy mentioned this issue Aug 3, 2017

Add triclinic_vector information to timestep #276

Closed

tylerjereddy added defect Priority-High labels Aug 3, 2017

richardjgowers mentioned this issue Aug 4, 2017

AtomGroup.dimensions returns view sometimes #1582

Closed

richardjgowers self-assigned this Jan 30, 2018

richardjgowers mentioned this issue Jan 30, 2018

WIP: Removed format specific Timesteps #1765

Closed

orbeckst added this to the 1.0 milestone May 1, 2018

richardjgowers modified the milestones: 1.0, 2.0 Jun 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ts.dimensions` object with different trajectory formats #1575

`ts.dimensions` object with different trajectory formats #1575

tylerjereddy commented Aug 3, 2017

tylerjereddy commented Aug 3, 2017

kain88-de commented Aug 3, 2017

richardjgowers commented Aug 3, 2017

richardjgowers commented Aug 3, 2017

tylerjereddy commented Aug 3, 2017

orbeckst commented Aug 3, 2017 •

edited

Loading

orbeckst commented Aug 3, 2017

orbeckst commented Aug 21, 2017

kain88-de commented Aug 22, 2017

richardjgowers commented Jan 30, 2018

kain88-de commented Jan 30, 2018

richardjgowers commented Jan 30, 2018 via email

IAlibay commented Aug 17, 2021

ts.dimensions object with different trajectory formats #1575

ts.dimensions object with different trajectory formats #1575

Comments

tylerjereddy commented Aug 3, 2017

Expected behaviour

Actual behaviour

tylerjereddy commented Aug 3, 2017

kain88-de commented Aug 3, 2017

richardjgowers commented Aug 3, 2017

richardjgowers commented Aug 3, 2017

tylerjereddy commented Aug 3, 2017

orbeckst commented Aug 3, 2017 • edited Loading

orbeckst commented Aug 3, 2017

orbeckst commented Aug 21, 2017

kain88-de commented Aug 22, 2017

richardjgowers commented Jan 30, 2018

kain88-de commented Jan 30, 2018

richardjgowers commented Jan 30, 2018 via email

IAlibay commented Aug 17, 2021

`ts.dimensions` object with different trajectory formats #1575

`ts.dimensions` object with different trajectory formats #1575

orbeckst commented Aug 3, 2017 •

edited

Loading