Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMBER dVdl convert to pandas error #272

Closed
fr-0zt opened this issue Nov 13, 2022 · 8 comments
Closed

AMBER dVdl convert to pandas error #272

fr-0zt opened this issue Nov 13, 2022 · 8 comments
Labels
Milestone

Comments

@fr-0zt
Copy link

fr-0zt commented Nov 13, 2022

Hello all,

I'm having trouble forming my pandas data frame using the Amber parser in alchemlyb. I'm still new to python programming and I'm not sure where I might have done wrong in the python script. I really appreciate guidance on handling this error. I have also attached the code I'm using below.

Thanks.

from pathlib import Path

import pandas as pd

from alchemtest import Bunch
import alchemlyb
from alchemlyb.parsing import amber
from alchemlyb.estimators import TI

from alchemlyb.parsing.amber import extract_dHdl

#extract data

def load_ligandsolvated():
    """Load the Amber solvated dataset.

    Returns
    -------
    data : Bunch
        Dictionary-like object, the interesting attributes are:

        - 'data' : the data files by alchemical leg

    """

    module_path = Path(__file__).parent
    data = {'charge': list(map(str, module_path.glob('decharge/*/md7.out'))),
            'vdw': list(map(str, module_path.glob('vdw_bonded/*/md7.out')))}

    return Bunch(data=data)


dataset = load_ligandsolvated()

#print(dataset)

dHdl_coul = alchemlyb.concat([amber.extract_dHdl(filename, T=300)
                      for filename in dataset['data']['charge']])

Traceback (most recent call last):
File "/home/srg/senal/Documents/Project-GQuad/MD/130mM/1-2-1/1-2-1Intercalated_Bottom/ReRun-2/TI/cuda/TI-3/ligands/ti_est.py", line 37, in
dHdl_coul = alchemlyb.concat([amber.extract_dHdl(filename, T=300)
File "/home/srg/senal/Documents/Project-GQuad/MD/130mM/1-2-1/1-2-1Intercalated_Bottom/ReRun-2/TI/cuda/TI-3/ligands/ti_est.py", line 37, in
dHdl_coul = alchemlyb.concat([amber.extract_dHdl(filename, T=300)
File "/home/srg/senal/.local/lib/python3.9/site-packages/alchemlyb/parsing/init.py", line 11, in wrapper
dataframe = func(outfile, T, *args, **kwargs)
File "/home/srg/senal/.local/lib/python3.9/site-packages/alchemlyb/parsing/amber.py", line 390, in extract_dHdl
df = convert_to_pandas(file_datum)
File "/home/srg/senal/.local/lib/python3.9/site-packages/alchemlyb/parsing/amber.py", line 36, in convert_to_pandas
frame_time = file_datum.t0 + (frame_index + 1) * file_datum.dt * 1000
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'

@fr-0zt fr-0zt changed the title AMBER dVdl parsing error AMBER dVdl convert to pandas error Nov 13, 2022
@xiki-tempula
Copy link
Collaborator

Hi, Thanks for reporting this issue.
Do you mind provide the input file so we could have a look? Thanks.

@fr-0zt
Copy link
Author

fr-0zt commented Nov 13, 2022

md7.zip

I have attached one of my Amber MD output files above. I further did some more debugging on the "amber.py" code and I feel like extract section for DVDL and appending to filedatum.gradients might have some inconsistency with my amber output file. I may be wrong because this my first time debugging a complete python code. So, please let me know what you think.

Thank you!

@xiki-tempula
Copy link
Collaborator

xiki-tempula commented Nov 13, 2022

@fr-0zt Thanks. I can reproduce the error. I wonder if you mind give a brief description of how you generate the file? Especially, how you set the frequency that amber reports the dhdl values, please. Thank you.
@DrDomenicoMarson Many thanks for the AMBER contributions, do you mind have a check to see if there is a fix for this? Sorry, I'm not very familiar with the AMBER parser so felt that you could give us some help on this. Thanks.

@fr-0zt
Copy link
Author

fr-0zt commented Nov 14, 2022

@xiki-tempula Thank you for looking in to this. I am performing a 10 ns simulation, saving the energies and dvdl values every 10 ps. However, according to Amber18 (and newer) there are two TI regions specified at the beginning of the simulation. So, dvdl values are saved for both the regions simultaneously at the said frequency. I have attached the first part of the energies output using this simulation.


  1. RESULTS

| TI region 1

NSTEP = 1000 TIME(PS) = 209920.999 TEMP(K) = 297.64 PRESS = 0.0
Etot = -20862.5455 EKtot = 7675.9463 EPtot = -28538.4918
BOND = 5407.7111 ANGLE = 90.6074 DIHED = 53.5559
1-4 NB = 18.9023 1-4 EEL = 0.0504 VDWAALS = 5812.5937
EELEC = -39921.9126 EHBOND = 0.0000 RESTRAINT = 0.0000
EKCMT = 0.0000 VIRIAL = 0.0000 VOLUME = 84214.2071
Density = 1.0347
DV/DL = -208.5158

| TI region 2

NSTEP = 1000 TIME(PS) = 209920.999 TEMP(K) = 297.66 PRESS = 0.0
Etot = -20861.9237 EKtot = 7676.5681 EPtot = -28538.4918
BOND = 5407.7111 ANGLE = 90.6074 DIHED = 53.5559
1-4 NB = 18.9023 1-4 EEL = 0.0504 VDWAALS = 5812.5937
EELEC = -39921.9126 EHBOND = 0.0000 RESTRAINT = 0.0000
EKCMT = 0.0000 VIRIAL = 0.0000 VOLUME = 84214.2071
Density = 1.0347
DV/DL = -208.5158

@DrDomenicoMarson
Copy link
Contributor

Hi, I'll have a look at it right now! Just a quick comment about the "two TI regions specified at the beginning of the simulation".

It's true indeed that amber saves the two TI regions (and it did so also before amber18, as far as I remember), but if you look at the file, especially at the end of the file, where the averages over the last N time-steps are reported, DV/DL values (the only value read by alchemlyb) and their RMS fluctuations are perfectly equal between TI region 1 and 2.

If you search in the AMBER mailing list you can find some better explanation on why values are reported like that, but I can't remember where precisely.

PS: from your output file it seems you are printing values in the output file every 1 ps, and not 10 ps as you previously stated, as you have dt=1 fs and print every ntpr=1000 time-steps (it's not a problem for us, I just wanted to point it out just so you can be sure of what you are doing).

@DrDomenicoMarson
Copy link
Contributor

@fr-0zt a minor issue: in the script you provided, you used T=300.0 when calling extract_dHdl, while your simulations were run at 298.0 K, and the parser gave a ValueError to me about it, didn't it happen to you as well?

@DrDomenicoMarson
Copy link
Contributor

Hi I found the culprit in a "too strict" regex pattern, I addressed the issue in PR #273. Testing the updated version with your input file seems to run fine, extracting the right time and DV/DL values.
Just wait for the PR to be merged in the master branch and everything should be fine!

xiki-tempula pushed a commit that referenced this issue Nov 14, 2022
This PR addresses issue #272.
This issue exposed a nasty bug inside the regex pattern we used to extract sections in the amber output file.

we are trying to match a sequence of type

filed = int/float

extracting the int/float corresponding to the field. Up till now, the pattern was

fr' {field}\s+=\s+(\*+|{_FP_RE}|\d+)'
were field was the string we want to extract the value for, and _FP_RE is defined to extract the float/int after the equal.

The problem arises when the field and/or the value are not separated from the = sign by a space (as was the case in the issue). It's strange this didn't appear before!

I just changed the + in \s+=\s+ to *.

Thinking about it, this should not break anything else, but I can't be 100% sure and the tests should catch any "macro" problems eventually introduced with this PR.
@xiki-tempula
Copy link
Collaborator

The PR is merged to the master.
Doing pip install git+ssh://[email protected]:alchemistry/alchemlyb.git should give you the alchemlyb with the fix.
Please feel free to reopen this issue if there are other problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants