Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream support for exporting pdbs not working with OTHERS record #141

Closed
gate-tec opened this issue Mar 4, 2024 · 1 comment
Closed

Comments

@gate-tec
Copy link

gate-tec commented Mar 4, 2024

Describe the bug

When trying to export pdb data with ATOM and OTHERS entries using .to_pdb_stream I always get a pandas.errors.IntCastingNaNError (cf. Steps/Code to Reproduce).
As I need to maintain the TER markers in the resulting pdb data, the content of the OTHERS frame is necessary.

When writing directly to a pdb file with .to_pdb there is no such issue. A possible approach in fixing could be an abstract base function for both methods or to specify the desired output (i.e. file or stream) in to_pdb as mentioned in #108

Steps/Code to Reproduce

Example:

from biopandas.pdb import PandasPdb

pdb_df = PandasPdb().fetch_pdb('1ou5')
out_string = pdb_df.to_pdb_stream(records=('ATOM', 'OTHERS'))

Expected Results

Stream containing the specified records in pdb format.

Actual Results

A pandas.errors.IntCastingNaNError stemming from Line 909 in pandas_pdb.py

df.residue_number = df.residue_number.astype(int)

which is executed on the entire concatenated DataFrame.
As the OTHERS frame doesn't contain residue number entries, these cells are always NaN after concatenating.

Versions

biopandas 0.5.0dev
Linux-5.4.0-91-generic-x86_64-with-glibc2.31
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
Scikit-learn 1.3.0
NumPy 1.23.5
SciPy 1.11.1
@a-r-j
Copy link
Contributor

a-r-j commented Aug 1, 2024

Hi @gate-tec thanks for raising.

I think we should switch this to: pd.to_numeric(df.residue_number, errors='corce') and subsequently strip the NaNs. What do you think?

a-r-j added a commit that referenced this issue Aug 1, 2024
* include testing on newer python versions

* bump version string

* linting: remove print statements from tests

* fix: improve robustness of  and add a test #141

* fix: add init to test data module

* fix: add init to remaining test data modules

* tests: add tests with github actions

* tests: add tests with github actions

* tests: rename build job

* update changelog

---------

Co-authored-by: Arian Jamasb <[email protected]>
@a-r-j a-r-j closed this as completed Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants