Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behaviour of amber parser on file with duplicated time #315

Closed
xiki-tempula opened this issue May 22, 2023 · 5 comments · Fixed by #326
Closed

Behaviour of amber parser on file with duplicated time #315

xiki-tempula opened this issue May 22, 2023 · 5 comments · Fixed by #326
Milestone

Comments

@xiki-tempula
Copy link
Collaborator

xiki-tempula commented May 22, 2023

In many occasions, one would need to run simulation in chunks. For 10 ns simulation, one run the first ns, the seconds ns and so on. For Gromacs, this is easy and the output would be a continuous file. However, given that AMBER doesn't have a checkpoint continue mechanism. A convention to do continuation would be that one generate a number of amber output files and concatenate the text files together.
This would result in

 NSTEP =      200   TIME(PS) =      40.400  TEMP(K) =   300.75  PRESS =     0.0
 Etot   =    -17617.2822  EKtot   =      3869.1100  EPtot      =    -21486.3921
 BOND   =         7.5660  ANGLE   =       100.4182  DIHED      =         6.3933
 1-4 NB =         6.9516  1-4 EEL =       -34.1688  VDWAALS    =      3077.9556
 EELEC  =    -24653.8963  EHBOND  =         0.0000  RESTRAINT  =         2.3882
 EAMBER (non-restraint)  =    -21488.7803
 EKCMT  =         0.0000  VIRIAL  =         0.0000  VOLUME     =     65829.0102
                                                    Density    =         0.9878
 TEMP0  =       298.0000  REPNUM  =              1  EXCHANGE#  =              1
 ------------------------------------------------------------------------------

...

MBAR Energy analysis:
Energy at 0.0000 =    -21510.840230
Energy at 0.0667 =    -21507.963581
Energy at 0.1333 =    -21490.171333
Energy at 0.2000 =    -21448.494884
Energy at 0.2667 =    -21379.504007
Energy at 0.3333 =    -21284.274183
Energy at 0.4000 =    -21167.309140
Energy at 0.4667 =    -21035.636518
Energy at 0.5333 =    -20897.816290
Energy at 0.6000 =    -20762.961241
Energy at 0.6667 =    -20640.032587
Energy at 0.7333 =    -20537.016108
Energy at 0.8000 =    -20460.060524
Energy at 0.8667 =    -20412.262820
Energy at 0.9333 =    -20391.474269
Energy at 1.0000 =    -20388.088606
 ------------------------------------------------------------------------------
...
NSTEP =      200   TIME(PS) =      40.400  TEMP(K) =   300.75  PRESS =     0.0
 Etot   =    -17617.2822  EKtot   =      3869.1100  EPtot      =    -21486.3921
 BOND   =         7.5660  ANGLE   =       100.4182  DIHED      =         6.3933
 1-4 NB =         6.9516  1-4 EEL =       -34.1688  VDWAALS    =      3077.9556
 EELEC  =    -24653.8963  EHBOND  =         0.0000  RESTRAINT  =         2.3882
 EAMBER (non-restraint)  =    -21488.7803
 EKCMT  =         0.0000  VIRIAL  =         0.0000  VOLUME     =     65829.0102
                                                    Density    =         0.9878
 TEMP0  =       298.0000  REPNUM  =              1  EXCHANGE#  =              1
 ------------------------------------------------------------------------------

...

MBAR Energy analysis:
Energy at 0.0000 =    -21510.840230
Energy at 0.0667 =    -21507.963581
Energy at 0.1333 =    -21490.171333
Energy at 0.2000 =    -21448.494884
Energy at 0.2667 =    -21379.504007
Energy at 0.3333 =    -21284.274183
Energy at 0.4000 =    -21167.309140
Energy at 0.4667 =    -21035.636518
Energy at 0.5333 =    -20897.816290
Energy at 0.6000 =    -20762.961241
Energy at 0.6667 =    -20640.032587
Energy at 0.7333 =    -20537.016108
Energy at 0.8000 =    -20460.060524
Energy at 0.8667 =    -20412.262820
Energy at 0.9333 =    -20391.474269
Energy at 1.0000 =    -20388.088606
 ------------------------------------------------------------------------------

So you have duplicated time points in the same input file.
The current behaviour is that when we reach the end of the first file 5. TIMINGS, we close the file parser, so the rest of the information is gone, which is not ideal.

There are several ways of solving this problem.
The easiest way would be to just to remove this break https://github.com/alchemistry/alchemlyb/blob/master/src/alchemlyb/parsing/amber.py#L361
This would allow the parser to continue. On the other hand, instead of logging the data for 0~1 ns, 0~1 ns, 0~1 ns. This would log 0~1 ns, 1~2 ns, 2~3 ns. In some sense this would mean that we are not reading the data as it is. On the other hand, this would probably be the expected behaviour in this case so this might be fine.

The more involved way of solving this problem might be to log the data as it is, For example, 0~1 ns, 0~1 ns, 0~1 ns. Then the user might need to change the time point by themselves.

I'm in favour of the easy solution but is not sure of this would create some additional problem @DrDomenicoMarson
Here is an example input file , note that only 50 data points are loaded.
amber.out.zip

@DrDomenicoMarson
Copy link
Contributor

Hi @xiki-tempula ,
In dealing with the exact situation you are describing I usually just read each output file from the same simulation and concatenate them via concat from alchemlyb, without any issue!

The time for each file is not directly read from the output file, but is computed from the starting time, the dt, and the steps between output.

The only problem may arise if one tries to concatenate files from different simulations, but this is the case also for GROMACS I think!

@xiki-tempula
Copy link
Collaborator Author

@DrDomenicoMarson Thanks.
Do you think it would be fine if I just remove https://github.com/alchemistry/alchemlyb/blob/master/src/alchemlyb/parsing/amber.py#L361
Then issue a warning if there are more data after the finish?

@DrDomenicoMarson
Copy link
Contributor

I don't have the time to test this right now, but I think it shouldn't be an issue!

I think there should be an Error and not a Warning, if data is found after the finish because I don't know if the logic that is used now would deal with a single file that is a concatenation of several output AMBER files though.

@xiki-tempula
Copy link
Collaborator Author

Close this via the concatenation of the pandas dataframe #316

@xiki-tempula
Copy link
Collaborator Author

An error should be thrown if the input is a concatenated amber output file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants