Fix amber stdout parsing #56

lohedges · 2023-04-26T19:49:04Z

This PR provides a possible solution for issue #52, i.e. fixing the parsing of AMBER stdout records for free-energy perturbation simulations. Previously (before #27) we parsed the mdinfo file using watchdog. This only contained a single set of records, so there wasn't any problem. In contrast, the standard output contains three sets of records, i.e. one for each degree of freedom in the system. These correspond to the two TI regions, as well as the soft core part. (In the previous case, I'm not sure if the data in the mdinfo file only corresponded to the first TI region, or some average of the two.)

The approach that I've take is to store a record dictionary for each degree of freedom, then allow the user to select which one they want to use by specifying the dof keyword argument when extracting records, which defaults to dof=0.

For example, to get records for the first TI region:

records0 = process.getRecords(dof=0)

To get the total energy or the second TI region:

total_energy1 = process.getTotalEnergy(dof=1)

Here the degrees of freedom are indexed, with the meaning of the index specified in the docs. Regular, i.e. non-FreeEnergy protocols should just use the default, i.e. dof=0, and return nothing for other values. (This could be changed to an exception.)

In order to parse the records in a consistent way I've applied some formatting tweaks to the records keys so that the same key can be used for different degrees of freedom. This is because, due to the fixed-width nature of the output formatting, the keys can be abbreviated differently. This could cause issues if you are doing some internal analysis based on the existing keys.

I've also provided some convenience functions to extract some new records, e.g. DV/DL. I've not exposed any of the soft-core records this way, since they aren't properly documented in the AMBER manual, so I'm not sure of their precise meaning.

I've added a unit test to check that I get the correct number of records for some example output. This test runs against FreeEnergy and FreeEnergyMinimisation protocol to ensure that the parsing works in both cases, since the formatting is different.

Things that I am still unsure of:

In the examples that you provided the records for the two TI regions are identical in all cases, e.g.:

| TI region  1



   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       1.1499E+04     1.3559E+02     3.3411E+03     O        4411

 BOND    =    21543.4552  ANGLE   =        3.3520  DIHED      =        0.0000
 VDWAALS =    14714.9925  EEL     =   -24763.0744  HBOND      =        0.0000
 1-4 VDW =        0.0000  1-4 EEL =        0.0000  RESTRAINT  =        0.0000
 DV/DL  =        38.5181

| TI region  2



   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       1.1499E+04     1.3559E+02     3.3411E+03     O        4411

 BOND    =    21543.4552  ANGLE   =        3.3520  DIHED      =        0.0000
 VDWAALS =    14714.9925  EEL     =   -24763.0744  HBOND      =        0.0000
 1-4 VDW =        0.0000  1-4 EEL =        0.0000  RESTRAINT  =        0.0000
 DV/DL  =        38.5181

Is this just because this a contrived example, or due to it being a particular lambda value? Essentially I am wondering if you actually need the data for both regions?

Is the output formatting the same for absolute and relative free-energy simulations? If not, could you provide additional output examples.
How would the output differ for other types of AMBER simulation that we might want to run in future? For example, would the dof approach always be something that works, or are we going to get into a situation where there are other types of record that would be clunky to parse in this way?

Checklist

I confirm that I have merged the latest version of devel into this branch before issuing this pull request (e.g. by running git pull origin devel): [y]
I confirm that I have permission to release this code under the GPL3 license: [y]

Suggested reviewers:

@xiki-tempula

[closes #52]

lohedges · 2023-04-27T08:03:06Z

With regards to this bit...

In order to parse the records in a consistent way I've applied some formatting tweaks to the records keys so that the same key can be used for different degrees of freedom. This is because, due to the fixed-width nature of the output formatting, the keys can be abbreviated differently. This could cause issues if you are doing some internal analysis based on the existing keys.

It should be easy for me to create a mapping between the universal key and the actual record label from the output file. This would allow me to return the data with the standard AMBER format keys, which would be familiar to any AMBER user. I'll try this when I get a moment.

xiki-tempula · 2023-04-27T08:23:10Z

python/BioSimSpace/Process/_amber.py

@@ -895,7 +895,7 @@ def getElectrostaticEnergy(self, time_series=False, block="AUTO"):
        energy : :class:`Energy <BioSimSpace.Types.Energy>`
           The electrostatic energy.
        """
-        return self.getRecord("EELECT", time_series, _Units.Energy.kcal_per_mol, block)


Why is this change?

Because the key is wrong. All output that I generate for any protocol has EELEC, not EELECT. The commit message explains the reason for the change.

xiki-tempula · 2023-04-27T08:26:12Z

python/BioSimSpace/Sandpit/Exscientia/Process/_amber.py

@@ -1093,9 +1229,15 @@ def getElectrostaticEnergy(self, time_series=False, block="AUTO"):
        energy : :class:`Energy <BioSimSpace.Types.Energy>`
           The electrostatic energy.
        """
-        return self.getRecord("EELECT", time_series, _Units.Energy.kcal_per_mol, block)
+        return self.getRecord(


Might add a doc to explain the different keys, when changing from EELECT to EEL.

Good idea. Originally this was only an internal function so the user didn't really need to know what the inputs meant. Once I have the mapping in place, then I'll just explain that they are AMBER output keys and point the user to the manual. (I'm not going explain them all, since they are not even documented appropriately in the manual itself.)

xiki-tempula · 2023-04-27T08:41:32Z

python/BioSimSpace/Sandpit/Exscientia/Process/_amber.py

@@ -1410,16 +1749,31 @@ def getTotalEnergy(self, time_series=False, block="AUTO"):
        energy : :class:`Energy <BioSimSpace.Types.Energy>`
           The total energy.
        """
-        if isinstance(self._protocol, _Protocol.Minimisation):
+
+        if not isinstance(dof, int):


Might get this part into a decorater instead of checking it a million times

xiki-tempula · 2023-04-27T08:42:46Z

python/BioSimSpace/Sandpit/Exscientia/Process/_amber.py

@@ -1534,9 +1932,15 @@ def getTemperature(self, time_series=False, block="AUTO"):
        temperature : :class:`Temperature <BioSimSpace.Types.Temperature>`
           The temperature.
        """
-        return self.getRecord("TEMP(K)", time_series, _Units.Temperature.kelvin, block)
+        return self.getRecord(
+            "TEMP(K)",


Would the key be including unit or not including unit?

This is just exactly what is in the AMBER file. Not sure why they bother putting the unit, since they don't for anything else.

xiki-tempula

LGTM

lohedges · 2023-04-27T08:47:43Z

Thanks. Any comments on the points raised in the original post, in particular, is the output the same for absolute and relative free-energy simulations?

xiki-tempula · 2023-04-27T08:50:50Z

Sorry to disappoint you but I don't know the answer. To be honest, I have been using Gromacs throughout my life and has only been using amber since I joined this company.

lohedges · 2023-04-27T08:53:23Z

No problem. I'll tag in @msuruzhon in case he knows anything more. I'm not too fussed, since you are happy with these changes, I'd just rather not add something if it will likely need to be re-worked in the near future.

Cheers.

msuruzhon · 2023-04-27T09:03:08Z

Hi @lohedges I also don't know the answer for sure, but I think that the alchemical machinery is always the same regardless of whether RBFE or ABFE is run, so I think the output would be the same in this case. It is different for pure MD though.

lohedges · 2023-04-27T09:04:14Z

Thanks. I'll just sort out mapping between the original and universal keys, then merge.

Backport fixes from PR #56 and #59 into main

lohedges added 3 commits April 26, 2023 15:43

Fix electrostatic energy record key.

1b571d9

Merge branch 'devel' into fix_amber_stdout_parsing

936b32f

Handle multiple degrees of freedom when parsing AMBER FEP output.

90958f6

[closes #52]

lohedges added bug Something isn't working exscientia Related to work with Exscientia labels Apr 26, 2023

lohedges temporarily deployed to biosimspace-build April 26, 2023 19:49 — with GitHub Actions Inactive

xiki-tempula reviewed Apr 27, 2023

View reviewed changes

xiki-tempula previously approved these changes Apr 27, 2023

View reviewed changes

lohedges added 2 commits April 27, 2023 11:35

Store stdout dictionaries in a list to simplify code.

a885339

Map universal key to original record key and add method to invert.

1e06b41

lohedges dismissed xiki-tempula’s stale review via 1e06b41 April 27, 2023 13:36

lohedges had a problem deploying to biosimspace-build April 27, 2023 13:36 — with GitHub Actions Failure

lohedges had a problem deploying to biosimspace-build April 27, 2023 13:36 — with GitHub Actions Error

lohedges had a problem deploying to biosimspace-build April 27, 2023 13:36 — with GitHub Actions Failure

lohedges had a problem deploying to biosimspace-build April 27, 2023 13:36 — with GitHub Actions Error

Expose missing stdout_key dictionary.

f8eb6b8

lohedges temporarily deployed to biosimspace-build April 27, 2023 14:12 — with GitHub Actions Inactive

Guard against missing keys. [ci skip]

485d2c6

lohedges merged commit 200458f into devel Apr 27, 2023

lohedges deleted the fix_amber_stdout_parsing branch April 27, 2023 18:57

lohedges added a commit that referenced this pull request Apr 27, 2023

Backport fix from PR #56.

5744ee8

lohedges mentioned this pull request Apr 27, 2023

Backport fixes from PR #56 and #59 into main #60

Merged

lohedges added a commit that referenced this pull request Apr 27, 2023

Merge pull request #60 from OpenBioSim/fix_52_58_main

609989e

Backport fixes from PR #56 and #59 into main

lohedges mentioned this pull request May 4, 2023

[BUG] process.getDensity broken #63

Closed

lohedges mentioned this pull request May 23, 2023

[BUG] AMBER output parser gives wrong number of values in dof=2 #78

Closed

lohedges pushed a commit that referenced this pull request Oct 24, 2024

Fix the bug where AMBER Process will not use report interval (#56)

44ffd59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix amber stdout parsing #56

Fix amber stdout parsing #56

lohedges commented Apr 26, 2023

lohedges commented Apr 27, 2023

xiki-tempula Apr 27, 2023

lohedges Apr 27, 2023

xiki-tempula Apr 27, 2023 •

edited

Loading

lohedges Apr 27, 2023

xiki-tempula Apr 27, 2023

xiki-tempula Apr 27, 2023

lohedges Apr 27, 2023

xiki-tempula left a comment

lohedges commented Apr 27, 2023

xiki-tempula commented Apr 27, 2023

lohedges commented Apr 27, 2023

msuruzhon commented Apr 27, 2023

lohedges commented Apr 27, 2023

Fix amber stdout parsing #56

Fix amber stdout parsing #56

Conversation

lohedges commented Apr 26, 2023

Checklist

Suggested reviewers:

lohedges commented Apr 27, 2023

xiki-tempula Apr 27, 2023

Choose a reason for hiding this comment

lohedges Apr 27, 2023

Choose a reason for hiding this comment

xiki-tempula Apr 27, 2023 • edited Loading

Choose a reason for hiding this comment

lohedges Apr 27, 2023

Choose a reason for hiding this comment

xiki-tempula Apr 27, 2023

Choose a reason for hiding this comment

xiki-tempula Apr 27, 2023

Choose a reason for hiding this comment

lohedges Apr 27, 2023

Choose a reason for hiding this comment

xiki-tempula left a comment

Choose a reason for hiding this comment

lohedges commented Apr 27, 2023

xiki-tempula commented Apr 27, 2023

lohedges commented Apr 27, 2023

msuruzhon commented Apr 27, 2023

lohedges commented Apr 27, 2023

xiki-tempula Apr 27, 2023 •

edited

Loading