AMBER dVdl convert to pandas error #272

fr-0zt · 2022-11-13T02:31:47Z

Hello all,

I'm having trouble forming my pandas data frame using the Amber parser in alchemlyb. I'm still new to python programming and I'm not sure where I might have done wrong in the python script. I really appreciate guidance on handling this error. I have also attached the code I'm using below.

Thanks.

from pathlib import Path

import pandas as pd

from alchemtest import Bunch
import alchemlyb
from alchemlyb.parsing import amber
from alchemlyb.estimators import TI

from alchemlyb.parsing.amber import extract_dHdl

#extract data

def load_ligandsolvated():
    """Load the Amber solvated dataset.

    Returns
    -------
    data : Bunch
        Dictionary-like object, the interesting attributes are:

        - 'data' : the data files by alchemical leg

    """

    module_path = Path(__file__).parent
    data = {'charge': list(map(str, module_path.glob('decharge/*/md7.out'))),
            'vdw': list(map(str, module_path.glob('vdw_bonded/*/md7.out')))}

    return Bunch(data=data)


dataset = load_ligandsolvated()

#print(dataset)

dHdl_coul = alchemlyb.concat([amber.extract_dHdl(filename, T=300)
                      for filename in dataset['data']['charge']])

Traceback (most recent call last):
File "/home/srg/senal/Documents/Project-GQuad/MD/130mM/1-2-1/1-2-1Intercalated_Bottom/ReRun-2/TI/cuda/TI-3/ligands/ti_est.py", line 37, in
dHdl_coul = alchemlyb.concat([amber.extract_dHdl(filename, T=300)
File "/home/srg/senal/Documents/Project-GQuad/MD/130mM/1-2-1/1-2-1Intercalated_Bottom/ReRun-2/TI/cuda/TI-3/ligands/ti_est.py", line 37, in
dHdl_coul = alchemlyb.concat([amber.extract_dHdl(filename, T=300)
File "/home/srg/senal/.local/lib/python3.9/site-packages/alchemlyb/parsing/init.py", line 11, in wrapper
dataframe = func(outfile, T, *args, **kwargs)
File "/home/srg/senal/.local/lib/python3.9/site-packages/alchemlyb/parsing/amber.py", line 390, in extract_dHdl
df = convert_to_pandas(file_datum)
File "/home/srg/senal/.local/lib/python3.9/site-packages/alchemlyb/parsing/amber.py", line 36, in convert_to_pandas
frame_time = file_datum.t0 + (frame_index + 1) * file_datum.dt * 1000
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'

The text was updated successfully, but these errors were encountered:

xiki-tempula · 2022-11-13T10:15:18Z

Hi, Thanks for reporting this issue.
Do you mind provide the input file so we could have a look? Thanks.

fr-0zt · 2022-11-13T14:18:47Z

md7.zip

I have attached one of my Amber MD output files above. I further did some more debugging on the "amber.py" code and I feel like extract section for DVDL and appending to filedatum.gradients might have some inconsistency with my amber output file. I may be wrong because this my first time debugging a complete python code. So, please let me know what you think.

Thank you!

xiki-tempula · 2022-11-13T20:18:01Z

@fr-0zt Thanks. I can reproduce the error. I wonder if you mind give a brief description of how you generate the file? Especially, how you set the frequency that amber reports the dhdl values, please. Thank you.
@DrDomenicoMarson Many thanks for the AMBER contributions, do you mind have a check to see if there is a fix for this? Sorry, I'm not very familiar with the AMBER parser so felt that you could give us some help on this. Thanks.

fr-0zt · 2022-11-14T04:19:40Z

@xiki-tempula Thank you for looking in to this. I am performing a 10 ns simulation, saving the energies and dvdl values every 10 ps. However, according to Amber18 (and newer) there are two TI regions specified at the beginning of the simulation. So, dvdl values are saved for both the regions simultaneously at the said frequency. I have attached the first part of the energies output using this simulation.

RESULTS

| TI region 1

NSTEP = 1000 TIME(PS) = 209920.999 TEMP(K) = 297.64 PRESS = 0.0
Etot = -20862.5455 EKtot = 7675.9463 EPtot = -28538.4918
BOND = 5407.7111 ANGLE = 90.6074 DIHED = 53.5559
1-4 NB = 18.9023 1-4 EEL = 0.0504 VDWAALS = 5812.5937
EELEC = -39921.9126 EHBOND = 0.0000 RESTRAINT = 0.0000
EKCMT = 0.0000 VIRIAL = 0.0000 VOLUME = 84214.2071
Density = 1.0347
DV/DL = -208.5158

| TI region 2

NSTEP = 1000 TIME(PS) = 209920.999 TEMP(K) = 297.66 PRESS = 0.0
Etot = -20861.9237 EKtot = 7676.5681 EPtot = -28538.4918
BOND = 5407.7111 ANGLE = 90.6074 DIHED = 53.5559
1-4 NB = 18.9023 1-4 EEL = 0.0504 VDWAALS = 5812.5937
EELEC = -39921.9126 EHBOND = 0.0000 RESTRAINT = 0.0000
EKCMT = 0.0000 VIRIAL = 0.0000 VOLUME = 84214.2071
Density = 1.0347
DV/DL = -208.5158

DrDomenicoMarson · 2022-11-14T08:58:42Z

Hi, I'll have a look at it right now! Just a quick comment about the "two TI regions specified at the beginning of the simulation".

It's true indeed that amber saves the two TI regions (and it did so also before amber18, as far as I remember), but if you look at the file, especially at the end of the file, where the averages over the last N time-steps are reported, DV/DL values (the only value read by alchemlyb) and their RMS fluctuations are perfectly equal between TI region 1 and 2.

If you search in the AMBER mailing list you can find some better explanation on why values are reported like that, but I can't remember where precisely.

PS: from your output file it seems you are printing values in the output file every 1 ps, and not 10 ps as you previously stated, as you have dt=1 fs and print every ntpr=1000 time-steps (it's not a problem for us, I just wanted to point it out just so you can be sure of what you are doing).

DrDomenicoMarson · 2022-11-14T09:12:12Z

@fr-0zt a minor issue: in the script you provided, you used T=300.0 when calling extract_dHdl, while your simulations were run at 298.0 K, and the parser gave a ValueError to me about it, didn't it happen to you as well?

DrDomenicoMarson · 2022-11-14T09:50:58Z

Hi I found the culprit in a "too strict" regex pattern, I addressed the issue in PR #273. Testing the updated version with your input file seems to run fine, extracting the right time and DV/DL values.
Just wait for the PR to be merged in the master branch and everything should be fine!

This PR addresses issue #272. This issue exposed a nasty bug inside the regex pattern we used to extract sections in the amber output file. we are trying to match a sequence of type filed = int/float extracting the int/float corresponding to the field. Up till now, the pattern was fr' {field}\s+=\s+(\*+|{_FP_RE}|\d+)' were field was the string we want to extract the value for, and _FP_RE is defined to extract the float/int after the equal. The problem arises when the field and/or the value are not separated from the = sign by a space (as was the case in the issue). It's strange this didn't appear before! I just changed the + in \s+=\s+ to *. Thinking about it, this should not break anything else, but I can't be 100% sure and the tests should catch any "macro" problems eventually introduced with this PR.

xiki-tempula · 2022-11-14T15:11:55Z

The PR is merged to the master.
Doing pip install git+ssh://[email protected]:alchemistry/alchemlyb.git should give you the alchemlyb with the fix.
Please feel free to reopen this issue if there are other problems.

fr-0zt changed the title ~~AMBER dVdl parsing error~~ AMBER dVdl convert to pandas error Nov 13, 2022

xiki-tempula added parsers AMBER MD engine labels Nov 13, 2022

DrDomenicoMarson mentioned this issue Nov 14, 2022

AMBER parser had a "too strict" regex to extract values #273

Merged

xiki-tempula closed this as completed Nov 14, 2022

orbeckst added this to the 1.0.1 milestone Nov 14, 2022

DrDomenicoMarson mentioned this issue Dec 10, 2022

Error in reading amber energy files #287

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMBER dVdl convert to pandas error #272

AMBER dVdl convert to pandas error #272

fr-0zt commented Nov 13, 2022

xiki-tempula commented Nov 13, 2022

fr-0zt commented Nov 13, 2022 •

edited

Loading

xiki-tempula commented Nov 13, 2022 •

edited

Loading

fr-0zt commented Nov 14, 2022

DrDomenicoMarson commented Nov 14, 2022

DrDomenicoMarson commented Nov 14, 2022

DrDomenicoMarson commented Nov 14, 2022

xiki-tempula commented Nov 14, 2022

AMBER dVdl convert to pandas error #272

AMBER dVdl convert to pandas error #272

Comments

fr-0zt commented Nov 13, 2022

xiki-tempula commented Nov 13, 2022

fr-0zt commented Nov 13, 2022 • edited Loading

xiki-tempula commented Nov 13, 2022 • edited Loading

fr-0zt commented Nov 14, 2022

DrDomenicoMarson commented Nov 14, 2022

DrDomenicoMarson commented Nov 14, 2022

DrDomenicoMarson commented Nov 14, 2022

xiki-tempula commented Nov 14, 2022

fr-0zt commented Nov 13, 2022 •

edited

Loading

xiki-tempula commented Nov 13, 2022 •

edited

Loading