NAMD - decorrelation and equilibration detection fails in v0.7 and 1.0 #274

EzryStIago · 2022-11-16T19:16:12Z

Recent updates (to preprocessing.py, I believe) have broken our scripts' ability to use decorrelation and equilibration detection for NAMD output. I have attached an MWE that includes some sample data from alchemtest. The most recent version that works for these test cases is Alchemlyb 0.6.0. I have included sample logs for runs using 0.6, 0.7, and 1.0

Finally, it looks like decorrelation should work on the entire dataframe based on the documentation, but we have had to separate the dataframe by fep-lambda. Otherwise, decorrelate_u_nk returns just fep-lambda=0.

Any clarification or advice would be helpful!

MWE: alchemlyb_crash_MWE.zip

xiki-tempula · 2022-11-16T20:46:23Z

@EzryStIago Hi Ezry, many thanks for reporting.

This is a tricky issue, part of the decorrelate_u_nk function is the sanitisation of the data, which drops all the rows that have NaN. Before 0.7 the code base for subsampling is a mess where some parts would drop the rows with NaN while others won't, the 0.7 standardised the subsampling module such that they are all behaving the same now, which means all functions would implicitly drop the rows that have NaN now.

In this case, the data frame is

                    0.0       0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.0
time    fep-lambda                                                            
5000.0  0.0         0.0 -3.696651  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
5010.0  0.0         0.0 -3.643142  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
5020.0  0.0         0.0 -3.214734  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
5030.0  0.0         0.0 -4.236102  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
5040.0  0.0         0.0 -4.873010  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
...                 ...       ...  ...  ...  ...  ...  ...  ...  ...  ...  ...
49960.0 0.0         0.0 -2.994324  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
49970.0 0.0         0.0 -3.477750  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
49980.0 0.0         0.0 -3.645658  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
49990.0 0.0         0.0 -3.753682  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
50000.0 0.0         0.0 -3.558601  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
[4501 rows x 11 columns]

So this dropping NaN empties the data frame and cause the error.

Sorry, I'm not too familiar with the NAMD, I wonder if you mind giving some context of why are there so many NaNs?

@dotsdl I noticed that you have added the line of dropping the rows with NaN df = df.dropna() into slicing in a655f3a#diff-82d2833cef709c2bd46823ce8a352cb3baac5d0a77386953ab3f538d908dd7a0R29 I wonder why did you do this? Thank you.

EzryStIago · 2022-11-18T14:34:07Z

Hi @xiki-tempula, thank you for your response.

NAMD doesn’t calculate dE for all lambda pairs, just the adjacent lambdas; those NaNs correspond to the remaining pairs. Of course that restricts us to BAR estimation.

xiki-tempula · 2022-11-20T12:15:51Z

@EzryStIago Many thanks for the explanation. I have done a fix where the pre-processing will no longer drop the NaN rows. Do you mind having a test to see if this fits your purpose? #275

Finally, it looks like decorrelation should work on the entire dataframe based on the documentation, but we have had to separate the dataframe by fep-lambda. Otherwise, decorrelate_u_nk returns just fep-lambda=0.

Sorry, I don't quite understand this part. Do you mean that you have to make sure each dataframe only has one fixed lambda value? Instead of multiple different lambdas in one dataframe.

EzryStIago · 2022-11-21T16:39:16Z

Thank you @xiki-tempula!
#275 does indeed behave as expected. I'm getting slightly different results for my free energy estimates (up to a tenth of a kcal/mol) and slightly larger equilibrated/decorrelated dataframes. They're still within error of the original estimates, though.

Do you mean that you have to make sure each dataframe only has one fixed lambda value? Instead of multiple different lambdas in one dataframe.

Exactly, perhaps this a peculiarity of NAMD, but all data are output to the same file by default. To use decorrelation or equilibrium detection, I first split the dataframe up by lambda, process, and reassemble. Is that the intended usage? If so, this is a non-issue.

xiki-tempula · 2022-11-21T16:55:27Z

@EzryStIago

Exactly, perhaps this a peculiarity of NAMD, but all data are output to the same file by default. To use decorrelation or equilibrium detection, I first split the dataframe up by lambda, process, and reassemble. Is that the intended usage? If so, this is a non-issue.

I see. This is kind of a tricky thing. We are currently only supporting a single lambda in subsampling.decorrelate_u_nk. I guess I need to state this in the documentation.

For NAMD, if all energy files are dumped to the same file, then they need to be separated and decoorelated separately. I think the best solution for this would be alchemlyb.parsing.namd.extract_u_nk to support an optional keyword which would return a list of dataframes instead of one dataframe.

I don't really know anything about the NAMD. I noticed that @jhenin and @ttjoseph has previously contributed to the NAMD parser. I wonder if there are any advice on what is the best way forward? Thanks.

jhenin · 2022-11-23T09:49:09Z

Thanks for your suggestion @xiki-tempula ! That seems doable, however it involves a change in the extract_u_nk API. A lighter process would be for the code calling extract_u_nk to split the dataframe into single lambdas, pass them to decorrelate_u_nk and then concat them back. We could add a wrapper function alchemlyb.parsing.namd.decorrelate that does this. Even better maybe, this step could be integrated into decorrelate_u_nk to make the whole process transparent and robust to users with custom workflows passing multi-lambda dataframes.

- Fix #274 - removed dropping of rows with NaN in the pre-processing slicing() functions (functionality was not documented and lead to incorrect behavior with NAMD data) - update tests - update CHANGES

EzryStIago · 2022-12-07T18:12:14Z

Thank you @xiki-tempula!

EzryStIago changed the title ~~Decorrelation and Equilibrium Detection of Alchemtest NAMD Data~~ NAMD - decorrelation and equilibration detection fails in v0.7 and 1.0 Nov 16, 2022

orbeckst added preprocessors NAMD MD engine bug labels Nov 16, 2022

xiki-tempula mentioned this issue Nov 20, 2022

Slicing will no longer drop the NaN rows #275

Merged

orbeckst closed this as completed in #275 Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NAMD - decorrelation and equilibration detection fails in v0.7 and 1.0 #274

NAMD - decorrelation and equilibration detection fails in v0.7 and 1.0 #274

EzryStIago commented Nov 16, 2022

xiki-tempula commented Nov 16, 2022 •

edited

Loading

EzryStIago commented Nov 18, 2022

xiki-tempula commented Nov 20, 2022 •

edited

Loading

EzryStIago commented Nov 21, 2022

xiki-tempula commented Nov 21, 2022

jhenin commented Nov 23, 2022

EzryStIago commented Dec 7, 2022

NAMD - decorrelation and equilibration detection fails in v0.7 and 1.0 #274

NAMD - decorrelation and equilibration detection fails in v0.7 and 1.0 #274

Comments

EzryStIago commented Nov 16, 2022

xiki-tempula commented Nov 16, 2022 • edited Loading

EzryStIago commented Nov 18, 2022

xiki-tempula commented Nov 20, 2022 • edited Loading

EzryStIago commented Nov 21, 2022

xiki-tempula commented Nov 21, 2022

jhenin commented Nov 23, 2022

EzryStIago commented Dec 7, 2022

xiki-tempula commented Nov 16, 2022 •

edited

Loading

xiki-tempula commented Nov 20, 2022 •

edited

Loading