Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame (U)Int8/16 cols converted to Int32 after pickle #9347

Closed
2 tasks done
connor-elliott opened this issue Jun 12, 2023 · 0 comments · Fixed by #9393
Closed
2 tasks done

DataFrame (U)Int8/16 cols converted to Int32 after pickle #9347

connor-elliott opened this issue Jun 12, 2023 · 0 comments · Fixed by #9393
Labels
bug Something isn't working python Related to Python Polars

Comments

@connor-elliott
Copy link

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

pl.DataFrame Columns of dtype pl.Int8, pl.UInt8, pl.Int16, and pl.UInt16 turn into pl.Int32 after a pickle roundtrip. pl.LazyFrame schema is unchanged by pickle and converting to/from a pandas pd.DataFrame also leaves schema unchanged.

Reproducible example

import polars as pl
from polars import col
import pickle

for dtype in pl.NUMERIC_DTYPES:
    df = pl.DataFrame(data={"A":[1, 2, 3]}).with_columns(col("A").cast(dtype))
    before = df.schema
    
    after = pickle.loads(pickle.dumps(df)).schema
    if before != after:
        # Int8, UInt8, Int16, UInt16 change to Int32
        print(f"pl.Dataframe: dtype {before['A']} changes to {after['A']} after pickle")

    after = pickle.loads(pickle.dumps(df.lazy())).schema
    if before != after:
        # dtypes unchanged
        print(f"pl.LazyFrame: dtype {before['A']} changes to {after['A']} after pickle")

    after = pl.from_pandas(pickle.loads(pickle.dumps(df.to_pandas()))).schema
    if before != after:
        # dtypes unchanged
        print(f"pd.DataFrame: dtype {before['A']} changes to {after['A']} after pickle")


>>>pl.Dataframe: dtype Int16 changes to Int32 after pickle
>>>pl.Dataframe: dtype UInt8 changes to Int32 after pickle
>>>pl.Dataframe: dtype UInt16 changes to Int32 after pickle
>>>pl.Dataframe: dtype Int8 changes to Int32 after pickle

Expected behavior

Schema of DataFrame should be unchanged after a pickle round trip

Installed versions

--------Version info---------
Polars:      0.18.2
Index type:  UInt32
Platform:    macOS-10.16-x86_64-i386-64bit
Python:      3.11.3 (main, Apr 19 2023, 18:51:09) [Clang 14.0.6 ]

----Optional dependencies----
numpy:       1.24.3
pandas:      2.0.1
pyarrow:     10.0.1
connectorx:  <not installed>
deltalake:   <not installed>
fsspec:      <not installed>
matplotlib:  3.7.1
xlsx2csv:    <not installed>
xlsxwriter:  <not installed>```

</details>
@connor-elliott connor-elliott added bug Something isn't working python Related to Python Polars labels Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant