Behaviour of amber parser on file with duplicated time #315

xiki-tempula · 2023-05-22T13:27:33Z

In many occasions, one would need to run simulation in chunks. For 10 ns simulation, one run the first ns, the seconds ns and so on. For Gromacs, this is easy and the output would be a continuous file. However, given that AMBER doesn't have a checkpoint continue mechanism. A convention to do continuation would be that one generate a number of amber output files and concatenate the text files together.
This would result in

 NSTEP =      200   TIME(PS) =      40.400  TEMP(K) =   300.75  PRESS =     0.0
 Etot   =    -17617.2822  EKtot   =      3869.1100  EPtot      =    -21486.3921
 BOND   =         7.5660  ANGLE   =       100.4182  DIHED      =         6.3933
 1-4 NB =         6.9516  1-4 EEL =       -34.1688  VDWAALS    =      3077.9556
 EELEC  =    -24653.8963  EHBOND  =         0.0000  RESTRAINT  =         2.3882
 EAMBER (non-restraint)  =    -21488.7803
 EKCMT  =         0.0000  VIRIAL  =         0.0000  VOLUME     =     65829.0102
                                                    Density    =         0.9878
 TEMP0  =       298.0000  REPNUM  =              1  EXCHANGE#  =              1
 ------------------------------------------------------------------------------

...

MBAR Energy analysis:
Energy at 0.0000 =    -21510.840230
Energy at 0.0667 =    -21507.963581
Energy at 0.1333 =    -21490.171333
Energy at 0.2000 =    -21448.494884
Energy at 0.2667 =    -21379.504007
Energy at 0.3333 =    -21284.274183
Energy at 0.4000 =    -21167.309140
Energy at 0.4667 =    -21035.636518
Energy at 0.5333 =    -20897.816290
Energy at 0.6000 =    -20762.961241
Energy at 0.6667 =    -20640.032587
Energy at 0.7333 =    -20537.016108
Energy at 0.8000 =    -20460.060524
Energy at 0.8667 =    -20412.262820
Energy at 0.9333 =    -20391.474269
Energy at 1.0000 =    -20388.088606
 ------------------------------------------------------------------------------
...
NSTEP =      200   TIME(PS) =      40.400  TEMP(K) =   300.75  PRESS =     0.0
 Etot   =    -17617.2822  EKtot   =      3869.1100  EPtot      =    -21486.3921
 BOND   =         7.5660  ANGLE   =       100.4182  DIHED      =         6.3933
 1-4 NB =         6.9516  1-4 EEL =       -34.1688  VDWAALS    =      3077.9556
 EELEC  =    -24653.8963  EHBOND  =         0.0000  RESTRAINT  =         2.3882
 EAMBER (non-restraint)  =    -21488.7803
 EKCMT  =         0.0000  VIRIAL  =         0.0000  VOLUME     =     65829.0102
                                                    Density    =         0.9878
 TEMP0  =       298.0000  REPNUM  =              1  EXCHANGE#  =              1
 ------------------------------------------------------------------------------

...

MBAR Energy analysis:
Energy at 0.0000 =    -21510.840230
Energy at 0.0667 =    -21507.963581
Energy at 0.1333 =    -21490.171333
Energy at 0.2000 =    -21448.494884
Energy at 0.2667 =    -21379.504007
Energy at 0.3333 =    -21284.274183
Energy at 0.4000 =    -21167.309140
Energy at 0.4667 =    -21035.636518
Energy at 0.5333 =    -20897.816290
Energy at 0.6000 =    -20762.961241
Energy at 0.6667 =    -20640.032587
Energy at 0.7333 =    -20537.016108
Energy at 0.8000 =    -20460.060524
Energy at 0.8667 =    -20412.262820
Energy at 0.9333 =    -20391.474269
Energy at 1.0000 =    -20388.088606
 ------------------------------------------------------------------------------

So you have duplicated time points in the same input file.
The current behaviour is that when we reach the end of the first file 5. TIMINGS, we close the file parser, so the rest of the information is gone, which is not ideal.

There are several ways of solving this problem.
The easiest way would be to just to remove this break https://github.com/alchemistry/alchemlyb/blob/master/src/alchemlyb/parsing/amber.py#L361
This would allow the parser to continue. On the other hand, instead of logging the data for 0~1 ns, 0~1 ns, 0~1 ns. This would log 0~1 ns, 1~2 ns, 2~3 ns. In some sense this would mean that we are not reading the data as it is. On the other hand, this would probably be the expected behaviour in this case so this might be fine.

The more involved way of solving this problem might be to log the data as it is, For example, 0~1 ns, 0~1 ns, 0~1 ns. Then the user might need to change the time point by themselves.

I'm in favour of the easy solution but is not sure of this would create some additional problem @DrDomenicoMarson
Here is an example input file , note that only 50 data points are loaded.
amber.out.zip

The text was updated successfully, but these errors were encountered:

DrDomenicoMarson · 2023-05-22T14:59:59Z

Hi @xiki-tempula ,
In dealing with the exact situation you are describing I usually just read each output file from the same simulation and concatenate them via concat from alchemlyb, without any issue!

The time for each file is not directly read from the output file, but is computed from the starting time, the dt, and the steps between output.

The only problem may arise if one tries to concatenate files from different simulations, but this is the case also for GROMACS I think!

xiki-tempula · 2023-05-22T15:16:11Z

@DrDomenicoMarson Thanks.
Do you think it would be fine if I just remove https://github.com/alchemistry/alchemlyb/blob/master/src/alchemlyb/parsing/amber.py#L361
Then issue a warning if there are more data after the finish?

DrDomenicoMarson · 2023-05-22T15:30:37Z

I don't have the time to test this right now, but I think it shouldn't be an issue!

I think there should be an Error and not a Warning, if data is found after the finish because I don't know if the logic that is used now would deal with a single file that is a concatenation of several output AMBER files though.

xiki-tempula · 2023-06-01T19:19:18Z

Close this via the concatenation of the pandas dataframe #316

xiki-tempula · 2023-06-11T09:45:02Z

An error should be thrown if the input is a concatenated amber output file

xiki-tempula closed this as completed Jun 1, 2023

xiki-tempula reopened this Jun 11, 2023

xiki-tempula linked a pull request Jun 11, 2023 that will close this issue

ValueError will be raised if concatenated amber output file has been ingested #326

Merged

xiki-tempula mentioned this issue Jun 11, 2023

ValueError will be raised if concatenated amber output file has been ingested #326

Merged

xiki-tempula added this to the 2.1.0 milestone Jun 11, 2023

xiki-tempula closed this as completed in #326 Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Behaviour of amber parser on file with duplicated time #315

Behaviour of amber parser on file with duplicated time #315

xiki-tempula commented May 22, 2023 •

edited

Loading

DrDomenicoMarson commented May 22, 2023

xiki-tempula commented May 22, 2023

DrDomenicoMarson commented May 22, 2023

xiki-tempula commented Jun 1, 2023

xiki-tempula commented Jun 11, 2023

Behaviour of amber parser on file with duplicated time #315

Behaviour of amber parser on file with duplicated time #315

Comments

xiki-tempula commented May 22, 2023 • edited Loading

DrDomenicoMarson commented May 22, 2023

xiki-tempula commented May 22, 2023

DrDomenicoMarson commented May 22, 2023

xiki-tempula commented Jun 1, 2023

xiki-tempula commented Jun 11, 2023

xiki-tempula commented May 22, 2023 •

edited

Loading