-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAMD parsers #7
Comments
I'm going to ask Christophe Chipot about this. @Limn1 in my group also has some basic parsers which might be good enough/serve as a starting point/provide a skeleton. |
Chipot is unfortunately unable to help at present, he reports. @Limn1 may be able to help but is tied up with some stuff on deadline til early March, I expect. |
I talked to @vtlim about this today. She could use some of her existing data, but we also batted around the idea of generating some sample data for a simpler test system. |
Good, feel free to grab alchemistry/alchemtest#17 ! |
There's a data set from JC Gumbart for the Tyr->Ala example from the NAMD FEP tutorial that he gave me. The data files are at http://simbac.gatech.edu/forMichael/NAMD-FEP-tutorial-TYR2ALA.tgz If these look useful, then I can contact him about the requisite permissions. |
I moved the discussion of tests datasets to alchemistry/alchemtest#17 |
@davidlmobley please assign this issue (and alchemistry/alchemtest#17) to @vtlim |
@orbeckst - Apologies for the delay; I had a proposal application due in past hour. |
On Dec 12, 2017, at 1:19 AM, Victoria Lim ***@***.***> wrote:
@orbeckst <https://github.com/orbeckst> - Apologies for the delay; I had a proposal application due in past hour.
No problem, I just didn’t know if you got the github invite.
Also, I see mention of "reST" in [alchemistry/alchemtest#17 <alchemistry/alchemtest#17>]
restructured text, the doc markup format, as parsed by sphinx http://www.sphinx-doc.org/en/stable/rest.html
… but could not find mention of this term elsewhere in either repo. This may be a trivial question,
but can you clarify what it is? (alchemistry/alchemtest#17 <alchemistry/alchemtest#17>)
|
@orbeckst - we've not used Sphinx yet, so this is helpful. yet another argument for a general "best practices for code development" writeup. :) |
@vtlim the NAMD data set is now in alchemtest – thank you for making this work. I tentatively added this feature to the 0.2.0 release but if you are pretty busy at the moment then we can easily move it to 0.3.0. We can release as quickly as we like. |
The html-formatted docs are nicer: https://alchemlyb.readthedocs.io/en/latest/parsing.html I don't know what the NAMD data look like. But if it comes from any kind of free energy calculation then there should be either (for thermodynamic integration, TI) dH/dlambda derivatives (the dHdl standard form or for FEP/BAR/MBAR kind of calculations, potential energies evaluated at different lambdas (the u_nk standard form). Do you know what kind of calculations were done in the NAMD case? If it is dHdl then "fep-lambda" would be a generic name for the lambda value at which dHdl was sampled. For Gromacs, one has typically multiple lambda values (coul-lambda, vdw-lambda, maybe restr-lambda, ...) but for the Amber reader they are just called "lambdas" alchemlyb/src/alchemlyb/parsing/amber.py Line 36 in e5b2e58
|
The NAMD case I am working on is the FEP-based estimator (the Based on the link you sent:
My understanding of the above was that the different columns represent the lambda value of sampling, but that would mean the |
@vtlim I am looking at the test data. But while doing so, I found that we overlooked something in the test data accessor. I pushed PR alchemistry/alchemtest#29 – if you have a moment could you review the PR, please? Thanks. I'll say more about u_nk when I had a look at the data... hopefully in a few minutes. |
The DataFrame should contain all data needed to run BAR/MBAR or one of the FEP estimators (which we don't have implemented yet). The "fep-lambda" index states at which lambda this particular frame was sampled. If you have one output file at lambda=0, one at lambda=0.5, and one at lambda=1 then your parser needs to read all the data files and join them together in one big DataFrame. The timesteps (or trajectory "frames") are each tagged with the lambda at which they were sample, i.e., data from the lambda=0.5 file will all have fep-lambda 0.5. The columns are the evaluations of the Hamiltonian (or the potential energy U) at other lambdas (sometimes called "foreign lambdas"): take positions sampled at fep-lambda lambda0, I am looking at the data (the following needs PR alchemistry/alchemtest#29 to work): >>> import alchemtest.namd
>>> import bz2
>>> t = alchemtest.namd.load_tyr2ala()
>>> fwd = bz2.open(t.data['forward'][0], mode="rt").readlines()
>>> print("".join(fwd[0:5]))
# STEP Elec vdW dE dE_avg Temp dG
# l l+dl l l+dl E(l+dl)-E(l)
#NEW FEP WINDOW: LAMBDA SET TO 0 LAMBDA2 0.05
FepEnergy: 10 -5391.7740 -5390.5907 533.4814 532.2565 -0.0415 -0.1133 302.6836 -0.1161
FepEnergy: 20 -5281.7946 -5280.8118 452.4335 451.4536 0.0029 -0.0665 298.4974 -0.0702 It looks to me as if we have a single lambda (as you say), which has the value lambda0=0. I am not quite sure how to interpret LAMBDA2 and the l+dl; I am assuming that l+dl == LAMBDA2. In my notation then you would apparently have Elec and vdW evaluated at lambda0 (l) and lambda1 (l+dl). I don't know if the sum of Elec and vdW is the full change in energy. I think the question is how do you normally process the NAMD data to generate the free energies? |
NAMD joins the different lambda windows in its standard routine. Here's a web example of a NAMD output file. You can see that new windows are declared by "#NEW FEP WINDOW: LAMBDA SET TO ..." I don't have an additional set of lambda values other than these.
Yes, that should be the case for both statements.
Correct. Essentially, the
We have two fepout files from NAMD: one representing the forward work of transformation, and one representing the reverse work. For each lambda window, (e.g., 0 to 0.05), we:
I can share the code I am in the process of adapting and paste the example output from the data I put in alchemtest. |
The |
Applying the code I shared above on the example that is in alchemtest: |
From what I can gather from what you posted is that you should assemble the data into a DataFrame that roughly looks like the following
I think this should be for the total We could either have a function that also extracts the Elec and VDW components into the same format. Or one could add additional multi-indices to this DataFrame but it seems to me that this would break the overall API ( @dotsdl ?) |
That sounds right, I think. But @vtlim - you basically have this already working with BAR in your existing analysis, right? So you should be able to take a file you can analyze there, format it to analyze with alchemlyb, and check that you get the same answers... It should be using the same code (BAR/MBAR) for analysis, so the differences should just be in terms of how the data flow is organized. |
@orbeckst - BAR is not yet in In other words, I have 20,000 samples based on 20,000 time steps, and this is not equal to the number of windows (20). |
The missing BAR estimator #28 is one of the major open issues. I don't think that the MBAR estimator requires you to have as many foreign lambdas as samples. Does this error (is it an error message?) come from alchemlyb or from pymbar? |
@vtlim could you say a little more about how you reassembled your data frame? Also, the traceback of the error message would probably help. I think the error message is telling you that you have |
The error message above came from pymbar. The error is raised here. I think the error is actually not an issue with mbar but with my dataframe somehow, triggered by somewhere around here. Right now, my code passes the wrong value for
For the Gromacs data I'm testing, it returns N_k of Here's the head of the Gromacs dataframe:
Here's the head of the NAMD dataframe:
I'm not sure where the difference lies right now, but it must have something to do with how I created the pandas dataframe. I printed out the |
I found the issue. The lambda values of the column headers have an inconsistent number of decimal places as printed out by NAMD. That means the 0 of the column header doesn't match exactly the 0.0 of the fep-lambda column. Right now I have a workaround to list the column headers and the fep-lambda column to be of two decimal points and of the same data type. How I end up fixing that workaround depends on how I want to structure the final dataframe (which should depend on the BAR implementation, I think). The dataframe I printed out above is technically not correct since I populated by the column lambdas instead of the fep-lambda. The correct version (which I have also tested successfully with MBAR) looks more along the lines of what @orbeckst said earlier (but I have the
The reason that I think this should depend on how BAR is implemented is that it should take in a list of forward and reverse work values. This last dataframe I'm showing has 20020 rows compared to 1001 rows, and most of them are zero. It seems to me that the first format I posted would make more sense, but that would mean that the lambdas take on a different meaning in this application -- which I'm not sure would be consistent with the API. |
What's the current status on this, @vtlim ? Anything that needs looking at? |
@mrshirts -- I have code takes the NAMD output and formats it into a pandas dataframe consistent with the existing API. The dataframe works with MBAR but I'm not sure if the output values are correct (I'm not sure how valid it is to put this dataset into MBAR. Though I realize that MBAR reduces to BAR in the limit of sampling just two states.) As of now, having the BAR estimator (#28) would be ideal for me to wrap this up. |
Thanks for letting me know that BAR is a limiting issue now. Yes, one could examine it by simply running BAR on all pairwise states; that will give free energies that are equivalent to BAR. |
@orbeckst - I'm not sure I'll be able to work on the NAMD parser soon due to file system rearrangements + operating system updates on our local cluster and also having not worked on this project in a while. So I can work on it, but probably not anytime soon. |
In line with the overall API proposal, we want to have parsers for each of the major MD engines, and eventually have coverage for all of those in use. Since there are essentially two types of estimators (TI and FEP), each packages needs a parser for:
u_nk
from output files (for FEP).DHdl
from output files (for TI).This issue is the nexus for discussion for such parsers for the NAMD package. If you have existing parsing code for this package, comment below and we can begin adapting it into the parsers outlined above in a PR.
The text was updated successfully, but these errors were encountered: