Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WaterBridgeAnalysis can't seem to do (Acceptor <-> WAT <-> WAT <-> Acceptor) water bridges? #4040

Closed
pwnusmaximus opened this issue Feb 24, 2023 · 11 comments · Fixed by #4066

Comments

@pwnusmaximus
Copy link

pwnusmaximus commented Feb 24, 2023

Expected behavior

I want to quantify a second order water bridge (two waters between a protein residue (resid 43 and name OD2) and a DNA nucleotide (resid 230 and name P). I would like a time series of the presence of these bonds.

resid 43 = resname ASP
resid 230 = resname DA

Troubleshooting thus far

Notably, selection two is a phosphorus atom. (not a usual hydrogen bond acceptor atom) I know this 'P' isn't in any standard library so I added it with the "acceptors" keyword in addition to forcing selection2_type to be an acceptor. I tried troubleshooting by enlarging the acceptable ranges for angle and distance, and even manually specifying every donor and acceptor used in the interaction in-case any were not in the default atom mask libraries. I also played with using resid, resname or simply atom masks to see if the selection was the issue. I tried on a full trajectory and truncated trajectories where I've manually extracted the frames with water bridges present.

I've verified visually in pymol that the water bridge is indeed present and within spec for the identities, angles and distances I used.

Actual behavior

MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: selection = 'resname DA and name P' (update: True)
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: water selection = 'resname WAT' (update: True)
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: criterion: donor heavy atom and acceptor atom distance <= 4.000 A
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: criterion: angle D-H-A >= 120.000 degrees
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: force field CHARMM27 to guess donor and         acceptor names
MDAnalysis.analysis.base: INFO     Choosing frames to analyze
MDAnalysis.analysis.base: INFO     Starting preparation
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: initial checks passed.
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     WaterBridgeAnalysis: starting
MDAnalysis.analysis.WaterBridgeAnalysis: INFO     Starting analysis (frame index start=0 stop=10, step=1)
MDAnalysis.analysis.base: INFO     Starting analysis loop over 10 trajectory frames
100%|██████████| 10/10 [00:00<00:00, 21.44it/s]
MDAnalysis.analysis.base: INFO     Finishing up
[[], [], [], [], [], [], [], [], [], []]

you can see on the last line no hydrogen bonds are identified.
I've opened up the 'w' dataframe and confirmed that the universe was created successfully and that the correct acceptors and donors are listed.

Code to reproduce the behavior

import MDAnalysis as mda
import MDAnalysis.analysis.hydrogenbonds
import numpy.linalg
import numpy as np
np.set_printoptions(linewidth=100)

#%%
#Set working directory
import os
os.chdir('c:\\Users\\makay.murray\\Documents\\GitHub\\md_water_bridge\\analysis_dir')
[MDAnalysis.log](https://github.com/MDAnalysis/mdanalysis/files/10827603/MDAnalysis.log)

#Create Universe
parm = "../trajectory_data/2w35-DL-HID.prmtop"
traj = "../trajectory_data/2w35-DL-HID-rep2-500ns-10frames.mdcrd"

u = mda.Universe(parm,traj)  # always start with a Universe

w = MDAnalysis.analysis.hydrogenbonds.WaterBridgeAnalysis(u, 
                                                          'resname ASP and name OD2', 
                                                          'resname DA and name P', 
                                                          update_selection=True,
                                                          selection1_type='acceptor',
                                                          selection2_type='acceptor',
                                                          water_selection='resname WAT', 
                                                          order=2, 
                                                          update_water_selection=True,
                                                          distance=4.0, angle=120.0,
                                                          distance_type = 'heavy',
                                                          debug=True,
                                                          verbose=True,
                                                          acceptors=('P', 'OD1', 'OD2', 'O'),
                                                          donors=('H1', 'H2'),
                                                          filter_first=True
                                                          )
w.run()

print(w.timeseries)
....

Current version of MDAnalysis

  • MDAnalaysis Version 2.4.2
  • Python 3.9.16
  • OS, Windows 10 Education
  • IDE Spyder 5.4.2

Attached files:
2w35-DL-HID-rep2-500ns-10frames.zip
2w35-DL-HID.zip
Water Bridge Image black
MDAnalysis.log

@pwnusmaximus pwnusmaximus changed the title WaterBridgeAnalysis fails to identify waterbridges that are present by inspection WaterBridgeAnalysis can't identify waterbridges that are present by inspection Feb 24, 2023
@pwnusmaximus
Copy link
Author

pwnusmaximus commented Feb 24, 2023

While looking through the MDAnalysis.log file it appears that selection2_type is ignored when selection1_type is set. ie. It appears that selection 2 and selection 1 are inverse of each other. For most cases this makes sense. However, for second order water bridges an Acceptor <-> WAT <-> WAT <-> Acceptor is perfectly reasonable.

Perhaps there is a method to overrule this behaviour and I didn't do it properly.

@pwnusmaximus pwnusmaximus changed the title WaterBridgeAnalysis can't identify waterbridges that are present by inspection WaterBridgeAnalysis cant seem to do (Acceptor <-> WAT <-> WAT <-> Acceptor) water bridges? Feb 24, 2023
@pwnusmaximus pwnusmaximus changed the title WaterBridgeAnalysis cant seem to do (Acceptor <-> WAT <-> WAT <-> Acceptor) water bridges? WaterBridgeAnalysis can't seem to do (Acceptor <-> WAT <-> WAT <-> Acceptor) water bridges? Feb 24, 2023
@orbeckst
Copy link
Member

orbeckst commented Mar 1, 2023

@xiki-tempula as you're the orginal author of the wbridge code, can you have a look at this question/issue please? Thanks!

@xiki-tempula
Copy link
Contributor

@Pwnus Thanks for the issue. I will have a look in the weekends. I wonder why do you do distance_type = 'heavy'?

@pwnusmaximus
Copy link
Author

Hi @xiki-tempula, for me distance_type = 'heavy' is just a convention. Our usual analysis software of choice ,cpptraj from Ambertools2021, defaults to heavy atom distances and I've grown accustomed to it as a measure.

@xiki-tempula
Copy link
Contributor

I can reproduce the problem. Let me have a look.

@xiki-tempula
Copy link
Contributor

xiki-tempula commented Mar 12, 2023

Ok, I found out why this failed. selection1_type is a keyword argument, but selection2_type is not a keyword argument.

        selection1_type : {"donor", "acceptor", "both"} (optional)
            Selection 1 can be 'donor', 'acceptor' or 'both'. Note that the
            value for `selection1_type` automatically determines how
            `selection2` handles donors and acceptors: If `selection1` contains
            'both' then `selection2` will also contain 'both'. If `selection1`
            is set to 'donor' then `selection2` is 'acceptor' (and vice versa).
            ['both'].

So the code only checks for three cases,
When selection1_type='donor', it assumes selection1_type='donor', selection2_type='acceptor' .
When selection1_type='acceptor', it assumes selection1_type='acceptor', selection2_type='donor' .
When selection1_type='both', it assumes selection1_type='both', selection2_type='both' .

So when it sees selection1_type='acceptor', selection2_type='acceptor', it just reverts to selection1_type='acceptor', selection2_type='donor' .

@xiki-tempula
Copy link
Contributor

Ok, there is another thing. Sorry if the doc is not very clear but donors is actually the heavy atom associated with the donor.
Sorry the distance_type = 'heavy' is not very well maintained mainly because I originally don't understand how this works and kind of inherit the code from other part of the MDA.

However, the distance_type='hydrogen' does work. In this case, it would be

import MDAnalysis as mda
#Create Universe
parm = "2w35-DL-HID.prmtop"
traj = "2w35-DL-HID-rep2-500ns-10frames.mdcrd"

u = mda.Universe(parm,traj)  # always start with a Universe

w = MDAnalysis.analysis.hydrogenbonds.WaterBridgeAnalysis(u,
                                                          'resname ASP and name OD2',
                                                          'resname DA and name P',
                                                          update_selection=True,
                                                          water_selection='resname WAT',
                                                          order=2,
                                                          update_water_selection=True,
                                                          distance=4.0, angle=120.0,
                                                          debug=True,
                                                          verbose=True,
                                                          acceptors=('P', 'OD1', 'OD2', 'O'),
                                                          donors=('O'),
                                                          filter_first=True
                                                          )
w.run()
print(w.timeseries)

Gives
[[(1817, 4251, ('ASP', 110, 'OD2'), ('WAT', 351, 'H2'), 3.393591355721304, 133.6406052508985), (4251, 4252, ('WAT', 351, 'H2'), ('WAT', 352, 'O'), 3.1510935551442802, 129.3098825714121), (4253, 3774, ('WAT', 352, 'H1'), ('DA', 230, 'P'), 2.9032325126961362, 142.4088206311063), (720, 4254, ('ASP', 43, 'OD2'), ('WAT', 352, 'H2'), 1.8726619038368872, 132.56006363521206), (720, 4292, ('ASP', 43, 'OD2'), ('WAT', 365, 'H1'), 1.850974019502122, 150.58244069848672)], [(1817, 4253, ('ASP', 110, 'OD2'), ('WAT', 352, 'H1'), 1.849568207751335, 172.96529196011156), (4254, 9151, ('WAT', 352, 'H2'), ('WAT', 1985, 'O'), 1.8087280419588374, 148.28580215335526), (9153, 3774, ('WAT', 1985, 'H2'), ('DA', 230, 'P'), 2.246521446223992, 167.5724465796556), (720, 4254, ('ASP', 43, 'OD2'), ('WAT', 352, 'H2'), 3.429838909172761, 139.9579791603104), (720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 1.8959299736466242, 171.89039279707592), (720, 9152, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H1'), 1.8671204264943673, 153.31218009273726), (720, 14307, ('ASP', 43, 'OD2'), ('WAT', 3703, 'H2'), 3.8169042950036887, 144.42441139786533)], [(720, 4257, ('ASP', 43, 'OD2'), ('WAT', 353, 'H2'), 3.64963647552059, 139.45071906498148), (4256, 3774, ('WAT', 353, 'H1'), ('DA', 230, 'P'), 2.834134138868142, 129.92633270270065), (720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 1.818111829419461, 161.60502328555168), (720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 2.015430888351284, 136.59739586186618)], [(720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 1.8203121752801656, 158.58513345751518), (9151, 4293, ('WAT', 1985, 'O'), ('WAT', 365, 'H2'), 3.479766986115513, 133.83175881878554), (4293, 3774, ('WAT', 365, 'H2'), ('DA', 230, 'P'), 3.060998864481397, 140.26838790231153), (9153, 4291, ('WAT', 1985, 'H2'), ('WAT', 365, 'O'), 3.608196269337274, 122.48698663619565)], [(720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 1.843827286086128, 144.45964232357517), (4292, 3774, ('WAT', 365, 'H1'), ('DA', 230, 'P'), 2.8939357380099944, 134.56966096070747), (4292, 5590, ('WAT', 365, 'H1'), ('WAT', 798, 'O'), 3.1605338845771804, 138.05900583208216), (720, 5591, ('ASP', 43, 'OD2'), ('WAT', 798, 'H1'), 3.9242542859284426, 149.26346158504964), (720, 13614, ('ASP', 43, 'OD2'), ('WAT', 3472, 'H2'), 1.7618811985131957, 154.11506898283204)], [(720, 9152, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H1'), 1.892213094315617, 158.488408366625), (9151, 16995, ('WAT', 1985, 'O'), ('WAT', 4599, 'H2'), 1.957615569970479, 165.2898111917652), (16994, 3774, ('WAT', 4599, 'H1'), ('DA', 230, 'P'), 2.7888123616948706, 134.82884905540615)], [(1817, 4106, ('ASP', 110, 'OD2'), ('WAT', 303, 'H1'), 1.833734411648239, 148.7257072909245), (4107, 9151, ('WAT', 303, 'H2'), ('WAT', 1985, 'O'), 2.4095311772259493, 122.19942481564095), (9152, 3774, ('WAT', 1985, 'H1'), ('DA', 230, 'P'), 3.5913188516541963, 146.17430132993326), (720, 4292, ('ASP', 43, 'OD2'), ('WAT', 365, 'H1'), 1.8416829164441975, 152.57002058611997), (720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 1.8229850078234018, 177.70100450495477)], [(1817, 4106, ('ASP', 110, 'OD2'), ('WAT', 303, 'H1'), 1.740858427970088, 151.83055523471987), (4107, 6952, ('WAT', 303, 'H2'), ('WAT', 1252, 'O'), 3.445888301940925, 140.5532946438981), (6953, 3774, ('WAT', 1252, 'H1'), ('DA', 230, 'P'), 3.2492178833565544, 141.50511711223422), (720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 1.951102535539977, 158.61324639824332), (720, 6954, ('ASP', 43, 'OD2'), ('WAT', 1252, 'H2'), 1.9869096728458162, 155.66059556502225), (720, 13515, ('ASP', 43, 'OD2'), ('WAT', 3439, 'H2'), 3.9463436426809833, 132.2588781538232)], [(720, 4292, ('ASP', 43, 'OD2'), ('WAT', 365, 'H1'), 2.2405464382157216, 122.84506810889205), (4292, 3774, ('WAT', 365, 'H1'), ('DA', 230, 'P'), 3.1773631323951115, 126.97784464396653), (4291, 6954, ('WAT', 365, 'O'), ('WAT', 1252, 'H2'), 3.748528653581033, 123.84650122269805), (720, 6954, ('ASP', 43, 'OD2'), ('WAT', 1252, 'H2'), 3.5016609783952184, 167.98769036196092), (1817, 9152, ('ASP', 110, 'OD2'), ('WAT', 1985, 'H1'), 1.9499123859961573, 145.10239635368376), (720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 1.5926231286622126, 175.76655312759172)], []]

@xiki-tempula
Copy link
Contributor

This is also fixed in #4066

import MDAnalysis as mda

#Create Universe
parm = "2w35-DL-HID.prmtop"
traj = "2w35-DL-HID-rep2-500ns-10frames.mdcrd"

u = mda.Universe(parm,traj)  # always start with a Universe

w = MDAnalysis.analysis.hydrogenbonds.WaterBridgeAnalysis(u,
                                                          'resname ASP and name OD2',
                                                          'resname DA and name P',
                                                          update_selection=True,
                                                          water_selection='resname WAT',
                                                          order=2,
                                                          update_water_selection=True,
                                                          distance=4.0, angle=120.0,
                                                          distance_type = 'heavy',
                                                          debug=True,
                                                          verbose=True,
                                                          acceptors=('P', 'OD1', 'OD2', 'O'),
                                                          donors=('O'),
                                                          filter_first=True
                                                          )
w.run()

print(w.timeseries)

[[(720, 4254, ('ASP', 43, 'OD2'), ('WAT', 352, 'H2'), 2.6173148105009867, 132.56006363521206), (4253, 3774, ('WAT', 352, 'H1'), ('DA', 230, 'P'), 3.7078930697486436, 142.4088206311063), (720, 4292, ('ASP', 43, 'OD2'), ('WAT', 365, 'H1'), 2.7255888461800195, 150.58244069848672)], [(1817, 4253, ('ASP', 110, 'OD2'), ('WAT', 352, 'H1'), 2.8022313402893086, 172.96529196011156), (4254, 9151, ('WAT', 352, 'H2'), ('WAT', 1985, 'O'), 2.670479345137013, 148.28580215335526), (9153, 3774, ('WAT', 1985, 'H2'), ('DA', 230, 'P'), 3.1885889101394085, 167.5724465796556), (720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 2.846776898715446, 171.89039279707592), (720, 9152, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H1'), 2.756608767156218, 153.31218009273726)], [(720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 2.742949075816077, 161.60502328555168), (4291, 4257, ('WAT', 365, 'O'), ('WAT', 353, 'H2'), 2.9116107495969175, 160.3826600196096), (4256, 3774, ('WAT', 353, 'H1'), ('DA', 230, 'P'), 3.525871616409098, 129.92633270270065), (720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 2.789022173597509, 136.59739586186618)], [], [(720, 4293, ('ASP', 43, 'OD2'), ('WAT', 365, 'H2'), 2.680570453923368, 144.45964232357517), (4292, 3774, ('WAT', 365, 'H1'), ('DA', 230, 'P'), 3.6304643721412186, 134.56966096070747), (720, 13614, ('ASP', 43, 'OD2'), ('WAT', 3472, 'H2'), 2.6558193872943874, 154.11506898283204)], [(720, 9152, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H1'), 2.8048798451527235, 158.488408366625), (9151, 16995, ('WAT', 1985, 'O'), ('WAT', 4599, 'H2'), 2.8941625694269955, 165.2898111917652), (16994, 3774, ('WAT', 4599, 'H1'), ('DA', 230, 'P'), 3.5297488120647786, 134.82884905540615)], [], [], [(720, 4292, ('ASP', 43, 'OD2'), ('WAT', 365, 'H1'), 2.873803650917737, 122.84506810889205), (4292, 3774, ('WAT', 365, 'H1'), ('DA', 230, 'P'), 3.829556436424445, 126.97784464396653), (720, 9153, ('ASP', 43, 'OD2'), ('WAT', 1985, 'H2'), 2.549059271453923, 175.76655312759172), (1817, 9152, ('ASP', 110, 'OD2'), ('WAT', 1985, 'H1'), 2.7889963607587984, 145.10239635368376)], []]

@pwnusmaximus
Copy link
Author

Thank you xiki-tempula

I've made notes to myself for that little "gotcha" to look out for when running this type of analysis and the correct options to ask for. Thank you for the solution :)

It'll be highly valuable to me and my lab-mates.

@xiki-tempula
Copy link
Contributor

@pwnusmaximus Please don't close the issue yet as we still need to merge the fix for the distance_type = 'heavy'

@pwnusmaximus pwnusmaximus reopened this Mar 13, 2023
@pwnusmaximus
Copy link
Author

@pwnusmaximus Please don't close the issue yet as we still need to merge the fix for the distance_type = 'heavy'

My mistake, I've re-opened the issue,

@IAlibay IAlibay added this to the 2.5.0 milestone Mar 30, 2023
orbeckst added a commit that referenced this issue May 9, 2023
…4066)

* fix #4040 and #4136 
* ensure that for distance_type == 'heavy' all possible hydrogens are taken into account when analyzing H-bonds;
  previously, only one connected H was considered and that could lead to incorrect results and IndexErrors
* ensure that WaterBridgeAnalysis ONLY accepts distance_type 'heavy' or 'hydrogen' (anything else now
  raises ValueError)
* add tests
* update CHANGELOG
* Long explanation (see also PR #4066)

  In water bridge analysis, in the distance_type = 'hydrogen' mode, I get the pair index (hydrogen bond hydrogen atom   and acceptor) and the pair distance from the distance search.
  Then for each hydrogen atom bonded to the donor heavy atom, I check the angle between the hydrogen bond donor heavy atom, the hydrogen atom and the acceptor and map the index to the pair distance. This works because the  distance is computed based on the hydrogen atom and we map the hydrogen atom index back to the distance.

  In the distance_type = 'heavy' mode, In the original code, I get the pair index (hydrogen bond donor heavy atom and acceptor) and the pair distance from the distance search. Then for each hydrogen atom bonded to the donor heavy atom, I check the angle between the hydrogen bond donor heavy atom, the hydrogen atom and the acceptor. In this case, I cannot map back to the pair distance, as the distance search is done on the heavy atom and we are mapping the index of the hydrogen atom back to the distance.

 This PR fixes the issue.

Co-authored-by: Irfan Alibay <[email protected]>
Co-authored-by: Oliver Beckstein <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants