-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAMD - decorrelation and equilibration detection fails in v0.7 and 1.0 #274
Comments
@EzryStIago Hi Ezry, many thanks for reporting. This is a tricky issue, part of the decorrelate_u_nk function is the sanitisation of the data, which drops all the rows that have NaN. Before 0.7 the code base for subsampling is a mess where some parts would drop the rows with NaN while others won't, the 0.7 standardised the subsampling module such that they are all behaving the same now, which means all functions would implicitly drop the rows that have NaN now. In this case, the data frame is
So this dropping NaN empties the data frame and cause the error. Sorry, I'm not too familiar with the NAMD, I wonder if you mind giving some context of why are there so many NaNs? @dotsdl I noticed that you have added the line of dropping the rows with NaN |
Hi @xiki-tempula, thank you for your response. NAMD doesn’t calculate dE for all lambda pairs, just the adjacent lambdas; those NaNs correspond to the remaining pairs. Of course that restricts us to BAR estimation. |
@EzryStIago Many thanks for the explanation. I have done a fix where the pre-processing will no longer drop the NaN rows. Do you mind having a test to see if this fits your purpose? #275
Sorry, I don't quite understand this part. Do you mean that you have to make sure each dataframe only has one fixed lambda value? Instead of multiple different lambdas in one dataframe. |
Thank you @xiki-tempula!
Exactly, perhaps this a peculiarity of NAMD, but all data are output to the same file by default. To use decorrelation or equilibrium detection, I first split the dataframe up by lambda, process, and reassemble. Is that the intended usage? If so, this is a non-issue. |
I see. This is kind of a tricky thing. We are currently only supporting a single lambda in For NAMD, if all energy files are dumped to the same file, then they need to be separated and decoorelated separately. I think the best solution for this would be alchemlyb.parsing.namd.extract_u_nk to support an optional keyword which would return a list of dataframes instead of one dataframe. I don't really know anything about the NAMD. I noticed that @jhenin and @ttjoseph has previously contributed to the NAMD parser. I wonder if there are any advice on what is the best way forward? Thanks. |
Thanks for your suggestion @xiki-tempula ! That seems doable, however it involves a change in the |
- Fix #274 - removed dropping of rows with NaN in the pre-processing slicing() functions (functionality was not documented and lead to incorrect behavior with NAMD data) - update tests - update CHANGES
Thank you @xiki-tempula! |
Recent updates (to preprocessing.py, I believe) have broken our scripts' ability to use decorrelation and equilibration detection for NAMD output. I have attached an MWE that includes some sample data from alchemtest. The most recent version that works for these test cases is Alchemlyb 0.6.0. I have included sample logs for runs using 0.6, 0.7, and 1.0
Finally, it looks like decorrelation should work on the entire dataframe based on the documentation, but we have had to separate the dataframe by fep-lambda. Otherwise, decorrelate_u_nk returns just fep-lambda=0.
Any clarification or advice would be helpful!
MWE: alchemlyb_crash_MWE.zip
The text was updated successfully, but these errors were encountered: