Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: No objects to concatenate when encoding PDB file #22

Open
yanz-24 opened this issue Aug 26, 2024 · 1 comment
Open

ValueError: No objects to concatenate when encoding PDB file #22

yanz-24 opened this issue Aug 26, 2024 · 1 comment
Assignees

Comments

@yanz-24
Copy link
Collaborator

yanz-24 commented Aug 26, 2024

Issue Description

An error occurs when attempting to load a PDB file which does not contains charges using BindingSite.from_file().

Steps to Reproduce

  1. Download the PDB file from 4WSQ.
  2. Use the following code to reproduce the error:
from ratar.encoding import BindingSite

bs = BindingSite()
bs.from_file('../data/4wsq.pdb')
  1. The error returned is:
ValueError                                Traceback (most recent call last)
Cell In[1], [line 4](vscode-notebook-cell:?execution_count=1&line=4)
      [1](vscode-notebook-cell:?execution_count=1&line=1) from ratar.encoding import BindingSite
      [3](vscode-notebook-cell:?execution_count=1&line=3) bs = BindingSite()
----> [4](vscode-notebook-cell:?execution_count=1&line=4) bs.from_file('/Users/yanyz/Ratar/code/ratar/data/4wsq.pdb')

File ~/Ratar/code/ratar/ratar/encoding.py:131, in BindingSite.from_file(self, molecule_path, remove_solvent, molecule_index)
    [125](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:125) else:
    [126](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:126)     raise IndexError(
    [127](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:127)         f"Molecule index {molecule_index} out of range. "
    [128](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:128)         f"Number of molecules{len(molecule_loader.molecules)}"
    [129](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:129)     )
--> [131](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:131) return self.from_molecule(molecule)

File ~/Ratar/code/ratar/ratar/encoding.py:95, in BindingSite.from_molecule(self, molecule)
     [92](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:92) self.molecule = molecule
     [94](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:94) # Get representatives
---> [95](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:95) self.representatives = self.get_representatives(molecule)
     [97](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:97) # Get points
     [98](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:98) coordinates = self.get_coordinates(self.representatives)

File ~/Ratar/code/ratar/ratar/encoding.py:145, in BindingSite.get_representatives(molecule)
    [135](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:135) """
    [136](https://file+.vscode-resource.vscode-cdn.net/Users/yanyz/Ratar/code/ratar/docs/tutorials/~/Ratar/code/ratar/ratar/encoding.py:136) Get representatives of a molecule.
...
File /opt/anaconda3/envs/ratar-dev/lib/python3.11/site-packages/pandas/core/reshape/concat.py:507, in _Concatenator._clean_keys_and_objs(self, objs, keys)
    [504](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/envs/ratar-dev/lib/python3.11/site-packages/pandas/core/reshape/concat.py:504)     objs_list = list(objs)
    [506](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/envs/ratar-dev/lib/python3.11/site-packages/pandas/core/reshape/concat.py:506) if len(objs_list) == 0:
--> [507](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/envs/ratar-dev/lib/python3.11/site-packages/pandas/core/reshape/concat.py:507)     raise ValueError("No objects to concatenate")
    [509](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/envs/ratar-dev/lib/python3.11/site-packages/pandas/core/reshape/concat.py:509) if keys is None:
    [510](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/envs/ratar-dev/lib/python3.11/site-packages/pandas/core/reshape/concat.py:510)     objs_list = list(com.not_none(*objs_list))

ValueError: No objects to concatenate

Probable Cause

The issue appears to be related to the following line:

molecule_pca_df.dropna(how="any", inplace=True)

which causes the molecule_pca_df dataframe to become empty because all rows in molecule_pca_df contain NaN values.

Here is the content of molecule_pca_df before the dropna() operation:

      atom_id atom_name  res_id res_name subst_name       x       y       z  \
0          1         N      29      GLY      GLY29  14.147 -24.593 -65.068   
0       4909        ZN     401       ZN      ZN401  11.253   0.733 -47.095   
1          2        CA      29      GLY      GLY29  13.249 -24.578 -66.210   
1       4910        ZN     402       ZN      ZN402   8.294  -0.795 -52.209   
10        11       CD1      30      LEU      LEU30  10.536 -21.996 -60.915   
..       ...       ...     ...      ...        ...     ...     ...     ...   
995      996       OG1     158      THR     THR158   7.746  -9.586 -38.789   
996      997       CG2     158      THR     THR158   5.664 -10.472 -38.082   
997      998         N     159      CYS     CYS159   5.222  -7.439 -35.843   
998      999        CA     159      CYS     CYS159   4.161  -7.110 -34.882   
999     1000         C     159      CYS     CYS159   3.211  -6.002 -35.367   

     charge pc_type      pc_id   pc_atom_id  
0       NaN     HBD  PEP_HBD_1  PEP_HBD_1_N  
0       NaN     HBD  PEP_HBD_1  PEP_HBD_1_N  
1       NaN     NaN        NaN          NaN  
1       NaN     NaN        NaN          NaN  
10      NaN       H    LEU_H_1  LEU_H_1_CD1  
..      ...     ...        ...          ...  
995     NaN     NaN        NaN          NaN  
996     NaN     NaN        NaN          NaN  
997     NaN     HBD  PEP_HBD_1  PEP_HBD_1_N  
998     NaN     NaN        NaN          NaN  

The following code checks whether every row in the DataFrame molecule_pca_df in line 480 contains at least one missing value,

print(molecule_pca_df.isnull().any(axis=1).all())

which returns:

True

Additional Context

Same behaviour is not observed on MOL2 file.

Possible solutions

  1. calculate charges for PDB files (charges is used in 4D method)
  2. not consider charges when dropna
  3. assign charge as 0:
    add at line 277 of ratar.auxiliary._load_pdb
# Add charges = 0 if not present in PDB file
if molecule.df["charge"].isnull().all():
    molecule.df["charge"] = 0.0
molecule.df.reset_index(drop=True, inplace=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@dominiquesydow @yanz-24 and others