-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix amber stdout parsing #56
Conversation
With regards to this bit...
It should be easy for me to create a mapping between the universal key and the actual record label from the output file. This would allow me to return the data with the standard AMBER format keys, which would be familiar to any AMBER user. I'll try this when I get a moment. |
@@ -895,7 +895,7 @@ def getElectrostaticEnergy(self, time_series=False, block="AUTO"): | |||
energy : :class:`Energy <BioSimSpace.Types.Energy>` | |||
The electrostatic energy. | |||
""" | |||
return self.getRecord("EELECT", time_series, _Units.Energy.kcal_per_mol, block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the key is wrong. All output that I generate for any protocol has EELEC
, not EELECT
. The commit message explains the reason for the change.
@@ -1093,9 +1229,15 @@ def getElectrostaticEnergy(self, time_series=False, block="AUTO"): | |||
energy : :class:`Energy <BioSimSpace.Types.Energy>` | |||
The electrostatic energy. | |||
""" | |||
return self.getRecord("EELECT", time_series, _Units.Energy.kcal_per_mol, block) | |||
return self.getRecord( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might add a doc to explain the different keys, when changing from EELECT to EEL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Originally this was only an internal function so the user didn't really need to know what the inputs meant. Once I have the mapping in place, then I'll just explain that they are AMBER output keys and point the user to the manual. (I'm not going explain them all, since they are not even documented appropriately in the manual itself.)
@@ -1410,16 +1749,31 @@ def getTotalEnergy(self, time_series=False, block="AUTO"): | |||
energy : :class:`Energy <BioSimSpace.Types.Energy>` | |||
The total energy. | |||
""" | |||
if isinstance(self._protocol, _Protocol.Minimisation): | |||
|
|||
if not isinstance(dof, int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might get this part into a decorater instead of checking it a million times
@@ -1534,9 +1932,15 @@ def getTemperature(self, time_series=False, block="AUTO"): | |||
temperature : :class:`Temperature <BioSimSpace.Types.Temperature>` | |||
The temperature. | |||
""" | |||
return self.getRecord("TEMP(K)", time_series, _Units.Temperature.kelvin, block) | |||
return self.getRecord( | |||
"TEMP(K)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the key be including unit or not including unit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just exactly what is in the AMBER file. Not sure why they bother putting the unit, since they don't for anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks. Any comments on the points raised in the original post, in particular, is the output the same for absolute and relative free-energy simulations? |
Sorry to disappoint you but I don't know the answer. To be honest, I have been using Gromacs throughout my life and has only been using amber since I joined this company. |
No problem. I'll tag in @msuruzhon in case he knows anything more. I'm not too fussed, since you are happy with these changes, I'd just rather not add something if it will likely need to be re-worked in the near future. Cheers. |
Hi @lohedges I also don't know the answer for sure, but I think that the alchemical machinery is always the same regardless of whether RBFE or ABFE is run, so I think the output would be the same in this case. It is different for pure MD though. |
Thanks. I'll just sort out mapping between the original and universal keys, then merge. |
This PR provides a possible solution for issue #52, i.e. fixing the parsing of AMBER stdout records for free-energy perturbation simulations. Previously (before #27) we parsed the
mdinfo
file usingwatchdog
. This only contained a single set of records, so there wasn't any problem. In contrast, the standard output contains three sets of records, i.e. one for each degree of freedom in the system. These correspond to the two TI regions, as well as the soft core part. (In the previous case, I'm not sure if the data in themdinfo
file only corresponded to the first TI region, or some average of the two.)The approach that I've take is to store a record dictionary for each degree of freedom, then allow the user to select which one they want to use by specifying the
dof
keyword argument when extracting records, which defaults todof=0
.For example, to get records for the first TI region:
To get the total energy or the second TI region:
Here the degrees of freedom are indexed, with the meaning of the index specified in the docs. Regular, i.e. non-
FreeEnergy
protocols should just use the default, i.e.dof=0
, and return nothing for other values. (This could be changed to an exception.)In order to parse the records in a consistent way I've applied some formatting tweaks to the records keys so that the same key can be used for different degrees of freedom. This is because, due to the fixed-width nature of the output formatting, the keys can be abbreviated differently. This could cause issues if you are doing some internal analysis based on the existing keys.
I've also provided some convenience functions to extract some new records, e.g.
DV/DL
. I've not exposed any of the soft-core records this way, since they aren't properly documented in the AMBER manual, so I'm not sure of their precise meaning.I've added a unit test to check that I get the correct number of records for some example output. This test runs against
FreeEnergy
andFreeEnergyMinimisation
protocol to ensure that the parsing works in both cases, since the formatting is different.Things that I am still unsure of:
Is this just because this a contrived example, or due to it being a particular lambda value? Essentially I am wondering if you actually need the data for both regions?
Is the output formatting the same for absolute and relative free-energy simulations? If not, could you provide additional output examples.
How would the output differ for other types of AMBER simulation that we might want to run in future? For example, would the
dof
approach always be something that works, or are we going to get into a situation where there are other types of record that would be clunky to parse in this way?Checklist
devel
into this branch before issuing this pull request (e.g. by runninggit pull origin devel
): [y]Suggested reviewers:
@xiki-tempula