Empty dataframe #15

TheKipiDragon · 2021-03-09T10:18:50Z

Greetings.
I made a program following the instructions in the "How to" page, to obtain the descriptors of the interaction between ligand and protein, but all I get as a result are empty dataframes.
I'll attach the code that I'm using.

Thank you for your attention.
Alberto Blanco, University Rovira i Virgili, Tarragona, Spain.

cbouy · 2021-03-09T16:39:25Z

Hi @TheKipiDragon ,

I'm assuming that in your script rootdir corresponds to the prolif/data folder.
The vina subdirectory contains 2 PDB files (lig.pdb and rec.pdb) and 1 SDF file, so good ends up being equal to 3.
If you modify if file.endswith(".pdb"): with if file.endswith("rec.pdb"): your script should work fine!

Don't hesitate if there's another problem
Cédric

TheKipiDragon · 2021-03-10T09:55:23Z

Sorry, I didnt specified the path.
rootdir corresponds to a directory in which I have several folders, that each include a pdb file for the protein and a sdf file for the ligand. What I try to do is go into each folder and get both files to execute prolif, thats why I used as a counter 2.
So the subdir variable is the path to the folder that contains both files.

cbouy · 2021-03-10T13:13:28Z

Ok, then the problem probably comes from using plf.datafiles.datapath / subdir / protfile and plf.datafiles.datapath / subdir / ligfile when you're loading files. plf.datafiles.datapath points to the ProLIF installation dir (something like ~/miniconda3/envs/prolif/lib/python3.8/site-packages/prolif/data), so plf.datafiles.datapath / subdir / protfile should point to something that doesn't exist. But I'm surprised it doesn't even throw an error in that case.

Anyway, I would suggest you modify your script like this:

import pathlib
import prolif as plf
import MDAnalysis as mda

# path to your rootdir, containing all the folders with PDB and SDF files
rootdir = pathlib.Path("/home/cedric/projects/ProLIF/prolif/data")

for subdir in rootdir.glob("*"):
    # skip if subdir is not a directory
    if not subdir.is_dir():
        continue
    try:
       # search for pdb and sdf files in subdir
        protfile = next(subdir.glob("*.pdb"))
        ligfile = next(subdir.glob("*.sdf"))
    except StopIteration:
        # skip if subdir is missing either pdb or sdf file
        continue
    print(subdir.name, ligfile.name, protfile.name)
    # load protein
    prot = mda.Universe(protfile)
    prot = plf.Molecule.from_mda(prot)
    # load ligands
    lig_suppl = list(plf.sdf_supplier(str(ligfile)))
    # generate fingerprint
    fp = plf.Fingerprint()
    fp.run_from_iterable(lig_suppl, prot)
    df = fp.to_dataframe()
    print(df)

Tell me if that works

PS: if you want to share some code, put it between triple backticks like so:

```python
# Put your code in here
```

TheKipiDragon · 2021-03-10T14:00:43Z

Alright, thank you!
I've tried it, but it still gives me empty dataframes. I've only modified

rootdir = pathlib.Path("/home/milax/Escriptori/test")

The terminal shows me the following:

Mpro-x10387_protein Mpro-x10387_ligand.sdf Mpro-x10387_protein.pdb 
/home/milax/anaconda3/envs/prolif/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:492: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if isinstance(value, (float, np.float)):
/home/milax/anaconda3/envs/prolif/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:494: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  elif isinstance(value, (int, np.int)):
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 38.67it/s]
Empty DataFrame
Columns: []
Index: [0]

Thank you, again, for all the help providen!

cbouy · 2021-03-10T14:47:20Z

Ok, can you try to run this as well:

import MDAnalysis as mda
import prolif as plf

protfile = "path/to/Mpro-x10387_protein.pdb"
ligfile = "/path/to/Mpro-x10387_ligand.sdf"

prot = mda.Universe(protfile)
prot = plf.Molecule.from_mda(prot)
lig_suppl = plf.sdf_supplier(ligfile)
lig = next(lig_suppl)
pocket_residues = plf.get_residues_near_ligand(lig, prot, cutoff=6.0)
print(pocket_residues)

It should print a list of residues that are close to your ligand.

TheKipiDragon · 2021-03-11T09:52:13Z

Alright, that works, but what I want to obtain are the descriptors for the ligand-protein interaction.
Is there any way I can obtain those, because the fingerprinting part didn't seem to work for me.

Thank you for your assistance!

cbouy · 2021-03-11T13:41:16Z

Okay that's reassuring! Do both PDB and SDF file contain explicit hydrogens for all atoms ?
For the SDF file it's not problematic if it's not the case there's a quick workaround, but for the PDB file it's mandatory.

TheKipiDragon · 2021-03-11T21:16:27Z

Yeah, they both contain the hydrogens.

Thanks for the help!

cbouy · 2021-03-11T22:12:05Z

Mmh that's strange then... Have you tried running the quickstart tutorial ? Just to make sure that there's no problem with your installation.

If you get empty dataframes even with the tutorial, can you tell me which python version are you using, and the output of the following commands:

import prolif, MDAnalysis, rdkit
print("prolif", prolif.__version__, 
      "MDAnalysis", MDAnalysis.__version__, 
      "rdkit", rdkit.__version__)

If the tutorial worked, can you try running the code snippet from 2 messages ago (the one with the pocket_residues), then run the following:

fp = plf.Fingerprint()
for resid in pocket_residues:
    bv = fp.bitvector(lig, prot[resid])
    print(resid, bv)

If it prints some True values, then there's a bug somewhere in the fp.run_from_iterable method or when converting the results to a dataframe that I will track down.

TheKipiDragon · 2021-03-12T10:21:06Z

I dont know if this prolif version is supposed to show like that, since i followed the tutorial to install it, if I installed it incorrectly, please do tell me.
This is from the first snippet:

prolif 0+unknown MDAnalysis 2.0.0-dev0 rdkit 2020.09.5

As for the second one, it didn't print any trues at all.

Thanks for all the help!

cbouy · 2021-03-12T10:43:55Z

And if you run the quickstart tutorial (with the topology and trajectory available in plf.datafiles) do you get empty dataframes or not? If yes, can you also check if fp.ifp is empty?
Regarding the weird version string for prolif, I'll look into it

TheKipiDragon · 2021-03-12T11:51:22Z

I was trying it, but when doing

fp.run(u.trajectory[::10], lig, prot)

I get the following error

Traceback (most recent call last):
  File "/home/milax/Escriptori/test/pfv2.py", line 25, in <module>
    fp.run(..., residues=["TYR38.A", "ASP129.A"])
TypeError: run() missing 2 required positional arguments: 'lig' and 'prot'

I copy pasted the code from the quickstart.

Thanks for the assistance!

cbouy · 2021-03-12T12:13:02Z

It's actually the line just below that is failing: fp.run(..., residues=["TYR38.A", "ASP129.A"]) because you're supposed to replace ... with u.trajectory[::10], lig, prot I was just being lazy and wanted to put the emphasis on the residues argument but no worries we don't need it here.
You can stop at fp.run(u.trajectory[::10], lig, prot), then just do

df = fp.to_dataframe()
print(df)

If it's empty, also do print(fp.ifp)

TheKipiDragon · 2021-03-12T16:39:56Z

Alright!
Tried it out, and dont get any empty dataframes, all seem to be working out fine!

cbouy · 2021-03-15T13:12:29Z

Ok so the installation is working correctly which is reassuring. I have a few more ideas as to where the problem could come from, could you run the following code snippet please:

import MDAnalysis as mda
from rdkit import Chem
from rdkit.Chem import Draw
import prolif as plf

print(plf.__file__)

protfile = "path/to/Mpro-x10387_protein.pdb"
ligfile = "/path/to/Mpro-x10387_ligand.sdf"

# draw protein residues
prot = mda.Universe(protfile)
prot = plf.Molecule.from_mda(prot)
frags = []
for res in prot:
    mol = Chem.RemoveHs(res)
    mol.RemoveAllConformers()
    frags.append(mol)
img = Draw.MolsToGridImage(frags, legends=[str(res.resid) for res in prot], subImgSize=(200, 140), molsPerRow=5, maxMols=prot.n_residues)
img.save("prot.png")

# draw ligand
lig_suppl = plf.sdf_supplier(ligfile)
lig = next(lig_suppl)
lig = Chem.RemoveHs(lig)
lig.RemoveAllConformers()
img = Draw.MolToImage(lig, size=(500, 500))
img.save("lig.png")

The first print should show where prolif is installed, and include which version of prolif is installed (that's the only way to obtain it for now, until I find a way to fix it).
The rest of the script should save two images (prot and lig), could you have a look at them and make sure that there are no unexpected formal charges (typically (-) charges on carbons) for both the protein and the ligand ?

Thanks,
Cédric

TheKipiDragon · 2021-03-16T11:49:48Z

Sorry for taking so long to answer!
Firstly, this is the path where prolif is installed

/home/milax/anaconda3/envs/prolif/lib/python3.9/site-packages/prolif/__init__.py

As for the images, I cant visualize them as I'm executing it in linux cmd. Also, it gave me a lot of lines as the following:

WARNING: not removing hydrogen atom without neighbors

And this error:

Traceback (most recent call last):
  File "/home/milax/Escriptori/test/pfv3.py", line 19, in <module>
    img = Draw.MolsToGridImage(frags, legends=[str(res.resid) for res in prot], subImgSize=(200, 140), molsPerRow=5, maxMols=prot.n_residues)
  File "/home/milax/anaconda3/envs/prolif/lib/python3.9/site-packages/rdkit/Chem/Draw/__init__.py", line 612, in MolsToGridImage
    return _MolsToGridImage(mols, molsPerRow=molsPerRow, subImgSize=subImgSize, legends=legends,
  File "/home/milax/anaconda3/envs/prolif/lib/python3.9/site-packages/rdkit/Chem/Draw/__init__.py", line 555, in _MolsToGridImage
    d2d.DrawMolecules(list(mols), legends=legends or None, highlightAtoms=highlightAtomLists,
Boost.Python.ArgumentError: Python argument types in
    MolDraw2D.DrawMolecules(MolDraw2DCairo, list)
did not match C++ signature:
    DrawMolecules(RDKit::MolDraw2D {lvalue} self, boost::python::api::object mols, boost::python::api::object highlightAtoms=None, boost::python::api::object highlightBonds=None, boost::python::api::object highlightAtomColors=None, boost::python::api::object highlightBondColors=None, boost::python::api::object highlightAtomRadii=None, boost::python::api::object confIds=None, boost::python::api::object legends=None)

Thanks for the help!

cbouy · 2021-03-16T12:41:59Z

No worries take your time!
Regarding the version, I confused the path of installation with something else so it doesn't show the version in there, but I just realized you can simply run

pip list | grep prolif

in a bash shell and it should give you the full version string (something like prolif 0.3.1+2.gfb07487).

For the error with the images code, can you add the following import at the beginning of the script:
from rdkit.Chem.Draw import IPythonConsole
and run it again ? Hopefully it will fix the image issue

And I think the warning might be a clue as to why the fingerprint is empty but we'll see

TheKipiDragon · 2021-03-19T11:15:54Z

Sorry for the wait again!
This is the version:

prolif          0.3.1+2.gfb07487

As for the image itself, i've gotten this error

Traceback (most recent call last):
  File "/home/milax/Escriptori/test/pfv3.py", line 21, in <module>
    img.save("prot.png")
AttributeError: 'Image' object has no attribute 'save'

Thanks for the help! I'll try not to take as long this time, sorry for any inconveniences again!

cbouy · 2021-03-19T12:32:07Z

Okay so you're on the latest version.

For the error, try adding returnPNG=False inside Draw.MolsToGridImage as mentioned on this thread

TheKipiDragon · 2021-03-22T09:02:46Z

I tried it, but still gives me the same error, 'Image' object has no attribute 'save'

cbouy · 2021-03-22T09:17:29Z

Mmh ok let's try it slightly differently then:

import MDAnalysis as mda
from rdkit import Chem
from rdkit.Chem import Draw
import prolif as plf

protfile = "path/to/Mpro-x10387_protein.pdb"
ligfile = "/path/to/Mpro-x10387_ligand.sdf"

# draw protein residues
prot = mda.Universe(protfile)
prot = plf.Molecule.from_mda(prot)
for i in range(3):
    res = prot[i]
    mol = Chem.RemoveHs(res)
    mol.RemoveAllConformers()
    Draw.MolToImageFile(mol, f"{res.resid}.png", size=(500, 400))

# draw ligand
lig_suppl = plf.sdf_supplier(ligfile)
lig = next(lig_suppl)
lig = Chem.RemoveHs(lig)
lig.RemoveAllConformers()
Draw.MolToImageFile(lig, "lig.png", size=(500, 400))

This should save PNG images for the 3 first residues in your protein and 1 file for your ligand. Could you have a look at them and make sure that there are no unexpected formal charges (typically (-) charges on carbons) ?

Thanks!

TheKipiDragon · 2021-03-22T10:13:19Z

Alright! It's working now, although it only prints one image file for the residues, not three (I've checked the protein, it's supposed to have 306 residues). The ligand structure is printed properly.

cbouy · 2021-03-22T14:40:02Z

Okay I'm even more confused now 😆
What's the output of

import MDAnalysis as mda
import prolif as plf

protfile = "path/to/Mpro-x10387_protein.pdb"

prot = mda.Universe(protfile)
prot = plf.Molecule.from_mda(prot)
for i in range(10):
    res = prot[i]
    print(i, res.resid)

Actually is it possible for you to send me the protein file (my email is on my profile) ? Or is it sensitive data ? It will make it easier for me to debug as I'm starting to run out of ideas

TheKipiDragon · 2021-03-22T16:19:49Z

This is the output of the code

0 ACE0.A
1 ACE0.A
2 ACE0.A
3 SER1.A
4 SER1.A
5 SER1.A
6 SER1.A
7 SER1.A
8 SER1.A
9 SER1.A

I dont think that the pdb is sensitive data, but I'll check with the director about it nonetheless.
Thanks for the help!

cbouy · 2021-04-23T11:46:48Z

Hi @TheKipiDragon,

Just checking back on this issue, is your director ok with sharing the pdb file ? You could also just share a part of it, something like the first 5 residues should be enough for me to pinpoint the source of the problem. Just make sure the "reduced" file also reproduces the same behavior as the previous code snippet, i.e:

import MDAnalysis as mda
import prolif as plf

protfile = "path/to/first_5_residues.pdb"

prot = mda.Universe(protfile)
prot = plf.Molecule.from_mda(prot)
for i in range(5):
    res = prot[i]
    print(i, res.resid)

should output

0 ACE0.A
1 ACE0.A
2 ACE0.A
3 SER1.A
4 SER1.A
5 SER1.A

soumyosen · 2021-06-03T14:25:27Z

Hi, I am recently trying to use prolif to get protein-protein interactions from a single pdb file. I also faced very similar problem, the dataframe is empty. All the test files are running properly, results as expected. I am going to share the 10 lines script. If you want, I can also give you the pdb file. I am not sure if the topology file is necessary for the analysis.

`import MDAnalysis as mda
import prolif as plf

u1 = mda.Universe("trial_part.pdb")
res1 = u1.select_atoms("protein and resid 63")
prot1 = u1.select_atoms("protein and not group res1", res1=res1)

fp = plf.Fingerprint()
fp.run(u1.trajectory[:], res1, prot1)
df = fp.to_dataframe()
df.head()
`

cbouy · 2021-06-03T14:50:30Z

Hi @soumyosen,

I don't see any problem in your script, could you send me the pdb file ([email protected]) so that I can reproduce the bug and track down the issue ?

Thanks in advance!

soumyosen · 2021-06-03T14:54:47Z

Thank you @cbouy
I just sent you.

cbouy · 2021-06-03T15:19:16Z

I managed to reproduce the issue, thanks!

Can you try this and tell me if it works ?:

u1 = mda.Universe("trial_part.pdb")

u1.atoms.guess_bonds()  # <-- fix bonds not being guessed properly from the pdb file

res1 = u1.select_atoms("protein and resid 63")
# ...

soumyosen · 2021-06-03T15:43:54Z

@cbouy Thanks it is working now. Just one more question. By default, I am supposed to get only the "True" results. Right?

cbouy · 2021-06-03T15:52:45Z

Yes, by default fp.to_dataframe drops the columns where all values are False, but if you want to keep them you can use fp.to_dataframe(drop_empty=False)

soumyosen · 2021-06-03T15:53:58Z

OK, Thanks a lot for your time.

akshay-sridhar · 2021-06-17T16:14:33Z

I managed to reproduce the issue, thanks!

Can you try this and tell me if it works ?:
u1 = mda.Universe("trial_part.pdb")

u1.atoms.guess_bonds()  # <-- fix bonds not being guessed properly from the pdb file

res1 = u1.select_atoms("protein and resid 63")
# ...

Hello @cbouy . Thanks for this.
I just wanted to provide feedback to say I faced the same issue with my PDB file and your suggestion fixed it.

### Added - Improved the documentation on how to properly restrict interactions to ignore the protein backbone (Issue #22), how to fix the empty dataframe issue when no bond information is present in the PDB file (Issue #15), how to save the LigNetwork diagram (Issue #21), and some clarifications on using `fp.generate` ### Fixed - Mixing residue type with interaction type in the interactive legend of the LigNetwork would incorrectly display/hide some residues on the canvas - MOL2 files starting with a comment (`#`) would lead to an error

cbouy added a commit that referenced this issue Jul 15, 2021

document solution for issue #15

c4baee7

cbouy mentioned this issue Jul 15, 2021

Feedback review 1 #23

Merged

cbouy closed this as completed Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty dataframe #15

Empty dataframe #15

TheKipiDragon commented Mar 9, 2021

cbouy commented Mar 9, 2021

TheKipiDragon commented Mar 10, 2021

cbouy commented Mar 10, 2021

TheKipiDragon commented Mar 10, 2021

cbouy commented Mar 10, 2021

TheKipiDragon commented Mar 11, 2021

cbouy commented Mar 11, 2021

TheKipiDragon commented Mar 11, 2021

cbouy commented Mar 11, 2021

TheKipiDragon commented Mar 12, 2021

cbouy commented Mar 12, 2021

TheKipiDragon commented Mar 12, 2021

cbouy commented Mar 12, 2021 •

edited

Loading

TheKipiDragon commented Mar 12, 2021

cbouy commented Mar 15, 2021

TheKipiDragon commented Mar 16, 2021

cbouy commented Mar 16, 2021

TheKipiDragon commented Mar 19, 2021

cbouy commented Mar 19, 2021 •

edited

Loading

TheKipiDragon commented Mar 22, 2021

cbouy commented Mar 22, 2021

TheKipiDragon commented Mar 22, 2021

cbouy commented Mar 22, 2021

TheKipiDragon commented Mar 22, 2021

cbouy commented Apr 23, 2021

soumyosen commented Jun 3, 2021 •

edited

Loading

cbouy commented Jun 3, 2021

soumyosen commented Jun 3, 2021

cbouy commented Jun 3, 2021

soumyosen commented Jun 3, 2021

cbouy commented Jun 3, 2021

soumyosen commented Jun 3, 2021

akshay-sridhar commented Jun 17, 2021

Empty dataframe #15

Empty dataframe #15

Comments

TheKipiDragon commented Mar 9, 2021

cbouy commented Mar 9, 2021

TheKipiDragon commented Mar 10, 2021

cbouy commented Mar 10, 2021

TheKipiDragon commented Mar 10, 2021

cbouy commented Mar 10, 2021

TheKipiDragon commented Mar 11, 2021

cbouy commented Mar 11, 2021

TheKipiDragon commented Mar 11, 2021

cbouy commented Mar 11, 2021

TheKipiDragon commented Mar 12, 2021

cbouy commented Mar 12, 2021

TheKipiDragon commented Mar 12, 2021

cbouy commented Mar 12, 2021 • edited Loading

TheKipiDragon commented Mar 12, 2021

cbouy commented Mar 15, 2021

TheKipiDragon commented Mar 16, 2021

cbouy commented Mar 16, 2021

TheKipiDragon commented Mar 19, 2021

cbouy commented Mar 19, 2021 • edited Loading

TheKipiDragon commented Mar 22, 2021

cbouy commented Mar 22, 2021

TheKipiDragon commented Mar 22, 2021

cbouy commented Mar 22, 2021

TheKipiDragon commented Mar 22, 2021

cbouy commented Apr 23, 2021

soumyosen commented Jun 3, 2021 • edited Loading

cbouy commented Jun 3, 2021

soumyosen commented Jun 3, 2021

cbouy commented Jun 3, 2021

soumyosen commented Jun 3, 2021

cbouy commented Jun 3, 2021

soumyosen commented Jun 3, 2021

akshay-sridhar commented Jun 17, 2021

cbouy commented Mar 12, 2021 •

edited

Loading

cbouy commented Mar 19, 2021 •

edited

Loading

soumyosen commented Jun 3, 2021 •

edited

Loading