Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems of determination of ligand-protein interactions #68

Closed
GuzelMinibaeva opened this issue Jul 22, 2022 · 4 comments · Fixed by #84 or #97
Closed

Problems of determination of ligand-protein interactions #68

GuzelMinibaeva opened this issue Jul 22, 2022 · 4 comments · Fixed by #84 or #97
Labels
enhancement New feature or request

Comments

@GuzelMinibaeva
Copy link

Hi,

I tried to use ProLIF for my complexes and found some points about which I would like to clarify. PiStacking is found by ProLIF in the first complex (2414_protein_frame3.pdb and 2414_ligand_frame3_censor.sdf), but the geometry of the fragments of pyrimidine and TYR84 does not imply interaction during visual inspection. The second question on this complex is devoted to the fact that the ALA85.AHBAcceptor interaction is not found, while the GLU83.AHBDonor tool determines. By visual inspection, these two interactions are determined. The same result was found for another complex (2414_protein_frame2.pdb and 2414_ligand_frame2_censor.sdf).
To determine the protein-ligand interactions used the following code:

import prolif as plf
from rdkit import Chem

protein_fname = '~/2414_protein_frame2.pdb'
mols = Chem.SDMolSupplier('~/2414_ligand_frame2_censor.sdf', removeHs=False)
prot = plf.Molecule(Chem.MolFromPDBFile(protein_fname, removeHs=False))
fp = plf.Fingerprint()
fps = fp.run_from_iterable([plf.Molecule.from_rdkit(mols[0])], prot, n_jobs=1)
df = fps.to_dataframe()
df.transpose()

Also for the third complex (5es1_last_connect_wo_asn.pdb and 10222_pymol.sdf) is found ALA85.AHBDonor, although this interaction is not obvious during visual inspection.

Could you check please if the tool works correctly in this case? All files of complexes are attached.

Kind regards,
Guzel
files.zip

@DrrDom
Copy link
Contributor

DrrDom commented Jul 28, 2022

I analyzed the attached files and found issues in ProLIF rules. Hope this will simplify analysis and fixing.

case 1:

import prolif as plf
from rdkit import Chem

protein_fname = '5es1_last_connect_wo_asn.pdb'
mols = Chem.SDMolSupplier('10222_pymol.sdf', removeHs=False)
prot = plf.Molecule(Chem.MolFromPDBFile(protein_fname, removeHs=False))
fp = plf.Fingerprint()
m = plf.Molecule.from_rdkit(mols[0])
fps = fp.run_from_iterable([m], prot, n_jobs=1)
df = fps.to_dataframe()

print(df.transpose())

Output;

Frame                           0
ligand protein  interaction      
UNL1   ILE12.A  Hydrophobic  True
       VAL20.A  Hydrophobic  True
       ALA33.A  Hydrophobic  True
       TYR84.A  Hydrophobic  True
                PiStacking   True
       ALA85.A  HBDonor      True
       LEU135.A Hydrophobic  True

Here are two issues:

  1. HBD with Ala85 is detected, but there two N-H groups looking to each other and none of them can be H-acceptor. The issue is in the acceptor SMARTS which wrongly recognizes amide NH as acceptor.
  2. pi-stacking with Tyr84 is wrong. There is an issue in a chosen combination of rules in distances and angles. At such a large distance of the angle is close to 0 there is no pi-stacking possible.

case 2:

protein_fname = '2414_protein_frame3.pdb'
mols = Chem.SDMolSupplier('2414_ligand_frame3_censor.sdf', removeHs=False)
prot = plf.Molecule(Chem.MolFromPDBFile(protein_fname, removeHs=False))
fp = plf.Fingerprint()
m = plf.Molecule.from_rdkit(mols[0])
fps = fp.run_from_iterable([m], prot, n_jobs=1)
df = fps.to_dataframe()

print(df.transpose())

Output:

Frame                           0
ligand protein  interaction      
UNL1   ILE12.A  Hydrophobic  True
       VAL20.A  Hydrophobic  True
       ALA33.A  Hydrophobic  True
       VAL66.A  Hydrophobic  True
       MET82.A  Hydrophobic  True
       GLU83.A  HBDonor      True
       TYR84.A  Hydrophobic  True
                PiStacking   True
       LEU135.A Hydrophobic  True

Here is the same issue with pi-stacking in Tyr84 and a new issue with Ala85.
H-bond to Ala85 was not detected because the acceptor SMARTS does not recognize aromatic nitrogen (in a ligand) as acceptor.

I suggest:

  1. Change acceptor SMARTS ([N,O,F,-{1-};!+{1-}]) to a more comprehensive one - [#7&!$([nX3])&!$([NX3]-*=[!#6])&!$([NX3]-[a])&!$([NX4]),$([O])&!$([OX2](C)C=O)&!$(*(~a)~a),-{1-};!+{1-}] (the source of some patterns is here https://sourceforge.net/p/pharmit/code/ci/master/tree/src/). This will exclude some N and O atoms.
  2. pi-stacking can be fixed or users can disable it and enable face-to-face and face-to-edge instead, or this may be set as default behavior

I looked at other patterns in the paper and found further potential issues:

  1. Ligand [O,N,-{1-};!+{1-}] will match only aliphatic N and O, at least aromatic N can interact with a metal center. So, it might be better to fix it - [O,#7&!$([nX3])&!$([NX3]-*=[!#6])&!$([NX3]-[a])&!$([NX4]),-{1-};!+{1-}].
  2. Hydrophobic interaction pattern [#6,#16,F,Cl,Br,I,At;+0] is too broad, it includes tetracoordinated S and C atoms which are not accessible to any interaction and will result in false positive detection. It can be fixed like [$([#6])&!$([#6X4H0]),$([#16])&!$([#16X3])&!$([#16X4]),F,Cl,Br,I;+0]. Here I excluded S with 3 connections as well (sulfoxides). It might be also reasonable to exclude C in carboxylates.

It might be convenient if patterns will be provided as arguments for those pre-defined interactions and a user can changes them easily. Currently, as far as I understand, the only option is to create a custom class, but then a user should copy-paste all other code from default classes.

@cbouy
Copy link
Member

cbouy commented Jul 30, 2022

Hi both of you, and thanks a lot @DrrDom for contributing.

The SMARTS patterns used in ProLIF are indeed too simple in some cases and can incorrectly detect or miss some interactions. It's something I've meant to change for quite some time now but never found the time.
The changes proposed by @DrrDom should essentially fix these problems so I'll try to review them in the coming days, thanks a lot for opening a PR.

Regarding pi stacking, again, I have another more precise implementation in the works but haven't found the time to thoroughly test it.

As for customizing interactions, you should just have to do this:

class Hydrophobic(plf.interactions.Hydrophobic): 
    def __init__(self):
        super().__init__(
            hydrophobic="[$([#6])&!$([#6X4H0]),$([#16])&!$([#16X3])&!$([#16X4]),F,Cl,Br,I;+0]"
        )

and the code that actually runs the interaction detection will be inherited from the original hydrophobic class, but it will use the overwritten hydrophobic smarts pattern.
So there shouldn't be any need for copy pasting that part (unless you want to change the actual implementation like for pi stacking).

Best,
Cedric

@DrrDom
Copy link
Contributor

DrrDom commented Aug 5, 2022

Thanks for the example!

It might be reasonable to open a discussion on pharmacophore patterns or transfer this issue to discussions.

I'm still not completely sure how hydrophobic pattern should be changed. It may be further extended. For example, should we recognize close methyl and carbonyl groups as hydrophobic contacts or not?

I also found another issue. This time with the pattern encoding positive center. It neglects delocalization of charge in conjugated systems. For example, there are two oxygens in a carboxyl group and only one has an explicit charge. This group may interact with a charged NH3+ group. However, if a charged oxygen looks away from NH3+ only a H-bond will be detected;
(-)O-C=O..NH3(+)
Of course it is difficult to take into account all possible charge delocalization but at least for some simple cases it would be reasonable to encode: carboxylic, sulfonic, phosphoric, amidine, guanidine and other groups. These patterns should be applicable only if a group is charged and they should encode all relevant atoms as possible centers of a negative/positive charge. For example for carboxylic group it may be [$(O=C-[O-])]

This issue is not that critical as a previous one because charged contacts are rarely occurred and nevertheless there will be H-bond contacts, but it would be good to solve it as well.

@cbouy
Copy link
Member

cbouy commented Aug 9, 2022

@DrrDom yes good suggestion, I've created a discussion #77.

And yes, ProLIF doesn't really consider charge delocalization and resonance structures.
In the past, I've proposed the snippet in here to try to handle that, but RDKit actually has a way to perform delocalization-aware substructure search and I've meant to implement it as an alternative to the simple substructure search currently used, but haven't found the time yet.

@cbouy cbouy added the enhancement New feature or request label Aug 9, 2022
cbouy added a commit that referenced this issue Oct 4, 2022
- The SMARTS for the following groups have been updated to a more accurate definition:
  - HBond acceptor: exclude amides and some amines, exclude biaryl ethers and alkoxy
    oxygen from esters, include aromatic nitrogens,
  - Anion: include resonance forms of carboxylic, sulfonic and phosphorus acids,
  - Cation: include amidine and guanidine,
  - Metal ligand: exclude amides and some amines.

Co-authored-by: Cédric Bouysset <[email protected]>
@cbouy cbouy closed this as completed in #84 Oct 5, 2022
@cbouy cbouy mentioned this issue Nov 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants