Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix amber stdout parsing #56

Merged
merged 7 commits into from
Apr 27, 2023
Merged

Fix amber stdout parsing #56

merged 7 commits into from
Apr 27, 2023

Conversation

lohedges
Copy link
Contributor

This PR provides a possible solution for issue #52, i.e. fixing the parsing of AMBER stdout records for free-energy perturbation simulations. Previously (before #27) we parsed the mdinfo file using watchdog. This only contained a single set of records, so there wasn't any problem. In contrast, the standard output contains three sets of records, i.e. one for each degree of freedom in the system. These correspond to the two TI regions, as well as the soft core part. (In the previous case, I'm not sure if the data in the mdinfo file only corresponded to the first TI region, or some average of the two.)

The approach that I've take is to store a record dictionary for each degree of freedom, then allow the user to select which one they want to use by specifying the dof keyword argument when extracting records, which defaults to dof=0.

For example, to get records for the first TI region:

records0 = process.getRecords(dof=0)

To get the total energy or the second TI region:

total_energy1 = process.getTotalEnergy(dof=1)

Here the degrees of freedom are indexed, with the meaning of the index specified in the docs. Regular, i.e. non-FreeEnergy protocols should just use the default, i.e. dof=0, and return nothing for other values. (This could be changed to an exception.)

In order to parse the records in a consistent way I've applied some formatting tweaks to the records keys so that the same key can be used for different degrees of freedom. This is because, due to the fixed-width nature of the output formatting, the keys can be abbreviated differently. This could cause issues if you are doing some internal analysis based on the existing keys.

I've also provided some convenience functions to extract some new records, e.g. DV/DL. I've not exposed any of the soft-core records this way, since they aren't properly documented in the AMBER manual, so I'm not sure of their precise meaning.

I've added a unit test to check that I get the correct number of records for some example output. This test runs against FreeEnergy and FreeEnergyMinimisation protocol to ensure that the parsing works in both cases, since the formatting is different.

Things that I am still unsure of:

  • In the examples that you provided the records for the two TI regions are identical in all cases, e.g.:
| TI region  1



   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       1.1499E+04     1.3559E+02     3.3411E+03     O        4411

 BOND    =    21543.4552  ANGLE   =        3.3520  DIHED      =        0.0000
 VDWAALS =    14714.9925  EEL     =   -24763.0744  HBOND      =        0.0000
 1-4 VDW =        0.0000  1-4 EEL =        0.0000  RESTRAINT  =        0.0000
 DV/DL  =        38.5181

| TI region  2



   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       1.1499E+04     1.3559E+02     3.3411E+03     O        4411

 BOND    =    21543.4552  ANGLE   =        3.3520  DIHED      =        0.0000
 VDWAALS =    14714.9925  EEL     =   -24763.0744  HBOND      =        0.0000
 1-4 VDW =        0.0000  1-4 EEL =        0.0000  RESTRAINT  =        0.0000
 DV/DL  =        38.5181

Is this just because this a contrived example, or due to it being a particular lambda value? Essentially I am wondering if you actually need the data for both regions?

  • Is the output formatting the same for absolute and relative free-energy simulations? If not, could you provide additional output examples.

  • How would the output differ for other types of AMBER simulation that we might want to run in future? For example, would the dof approach always be something that works, or are we going to get into a situation where there are other types of record that would be clunky to parse in this way?

Checklist

  • I confirm that I have merged the latest version of devel into this branch before issuing this pull request (e.g. by running git pull origin devel): [y]
  • I confirm that I have permission to release this code under the GPL3 license: [y]

Suggested reviewers:

@xiki-tempula

@lohedges lohedges added bug Something isn't working exscientia Related to work with Exscientia labels Apr 26, 2023
@lohedges lohedges temporarily deployed to biosimspace-build April 26, 2023 19:49 — with GitHub Actions Inactive
@lohedges lohedges temporarily deployed to biosimspace-build April 26, 2023 19:49 — with GitHub Actions Inactive
@lohedges lohedges temporarily deployed to biosimspace-build April 26, 2023 19:49 — with GitHub Actions Inactive
@lohedges lohedges temporarily deployed to biosimspace-build April 26, 2023 19:49 — with GitHub Actions Inactive
@lohedges lohedges temporarily deployed to biosimspace-build April 26, 2023 19:49 — with GitHub Actions Inactive
@lohedges
Copy link
Contributor Author

With regards to this bit...

In order to parse the records in a consistent way I've applied some formatting tweaks to the records keys so that the same key can be used for different degrees of freedom. This is because, due to the fixed-width nature of the output formatting, the keys can be abbreviated differently. This could cause issues if you are doing some internal analysis based on the existing keys.

It should be easy for me to create a mapping between the universal key and the actual record label from the output file. This would allow me to return the data with the standard AMBER format keys, which would be familiar to any AMBER user. I'll try this when I get a moment.

@@ -895,7 +895,7 @@ def getElectrostaticEnergy(self, time_series=False, block="AUTO"):
energy : :class:`Energy <BioSimSpace.Types.Energy>`
The electrostatic energy.
"""
return self.getRecord("EELECT", time_series, _Units.Energy.kcal_per_mol, block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the key is wrong. All output that I generate for any protocol has EELEC, not EELECT. The commit message explains the reason for the change.

@@ -1093,9 +1229,15 @@ def getElectrostaticEnergy(self, time_series=False, block="AUTO"):
energy : :class:`Energy <BioSimSpace.Types.Energy>`
The electrostatic energy.
"""
return self.getRecord("EELECT", time_series, _Units.Energy.kcal_per_mol, block)
return self.getRecord(
Copy link
Contributor

@xiki-tempula xiki-tempula Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might add a doc to explain the different keys, when changing from EELECT to EEL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Originally this was only an internal function so the user didn't really need to know what the inputs meant. Once I have the mapping in place, then I'll just explain that they are AMBER output keys and point the user to the manual. (I'm not going explain them all, since they are not even documented appropriately in the manual itself.)

@@ -1410,16 +1749,31 @@ def getTotalEnergy(self, time_series=False, block="AUTO"):
energy : :class:`Energy <BioSimSpace.Types.Energy>`
The total energy.
"""
if isinstance(self._protocol, _Protocol.Minimisation):

if not isinstance(dof, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might get this part into a decorater instead of checking it a million times

@@ -1534,9 +1932,15 @@ def getTemperature(self, time_series=False, block="AUTO"):
temperature : :class:`Temperature <BioSimSpace.Types.Temperature>`
The temperature.
"""
return self.getRecord("TEMP(K)", time_series, _Units.Temperature.kelvin, block)
return self.getRecord(
"TEMP(K)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the key be including unit or not including unit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just exactly what is in the AMBER file. Not sure why they bother putting the unit, since they don't for anything else.

xiki-tempula
xiki-tempula previously approved these changes Apr 27, 2023
Copy link
Contributor

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lohedges
Copy link
Contributor Author

Thanks. Any comments on the points raised in the original post, in particular, is the output the same for absolute and relative free-energy simulations?

@xiki-tempula
Copy link
Contributor

Sorry to disappoint you but I don't know the answer. To be honest, I have been using Gromacs throughout my life and has only been using amber since I joined this company.

@lohedges
Copy link
Contributor Author

No problem. I'll tag in @msuruzhon in case he knows anything more. I'm not too fussed, since you are happy with these changes, I'd just rather not add something if it will likely need to be re-worked in the near future.

Cheers.

@msuruzhon
Copy link
Contributor

Hi @lohedges I also don't know the answer for sure, but I think that the alchemical machinery is always the same regardless of whether RBFE or ABFE is run, so I think the output would be the same in this case. It is different for pure MD though.

@lohedges
Copy link
Contributor Author

Thanks. I'll just sort out mapping between the original and universal keys, then merge.

@lohedges lohedges temporarily deployed to biosimspace-build April 27, 2023 14:12 — with GitHub Actions Inactive
@lohedges lohedges temporarily deployed to biosimspace-build April 27, 2023 14:12 — with GitHub Actions Inactive
@lohedges lohedges temporarily deployed to biosimspace-build April 27, 2023 14:12 — with GitHub Actions Inactive
@lohedges lohedges temporarily deployed to biosimspace-build April 27, 2023 14:12 — with GitHub Actions Inactive
@lohedges lohedges temporarily deployed to biosimspace-build April 27, 2023 14:12 — with GitHub Actions Inactive
@lohedges lohedges merged commit 200458f into devel Apr 27, 2023
@lohedges lohedges deleted the fix_amber_stdout_parsing branch April 27, 2023 18:57
lohedges added a commit that referenced this pull request Apr 27, 2023
lohedges added a commit that referenced this pull request Apr 27, 2023
Backport fixes from PR #56 and #59 into main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exscientia Related to work with Exscientia
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants