Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: pl.from_pandas(pandas_series_with_categorical_dtype_having_None_value) #1686

Closed
mizuy opened this issue Nov 6, 2021 · 1 comment · Fixed by #1706
Closed

Panic: pl.from_pandas(pandas_series_with_categorical_dtype_having_None_value) #1686

mizuy opened this issue Nov 6, 2021 · 1 comment · Fixed by #1706

Comments

@mizuy
Copy link

mizuy commented Nov 6, 2021

Are you using Python or Rust?

Python.

Which feature gates did you use?

This can be ignored by Python users.

What version of polars are you using?

'0.10.15'

What operating system are you using polars on?

OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Sep 16 20:58:47 PDT 2021; root:xnu-6153.141.40.1~1/RELEASE_X86_64
machine : x86_64

python : 3.9.7.final.0
python-bits : 64
pandas : 1.3.3
numpy : 1.21.1

Describe your bug.

pl.from_pandas is broken for pd.Series with categorical dtype and having None value.

What are the steps to reproduce the behavior?

import pandas as pd
import polars as pl
s = pd.Series(['a','b','c',pd.NA],dtype='category')
print(s)
print(pl.from_pandas(s))

What is the actual behavior?

0      a
1      b
2      c
3    NaN
dtype: category
Categories (3, object): ['a', 'b', 'c']
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /Users/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow2-0.7.0/src/array/growable/dictionary.rs:103:62
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
/var/folders/yv/9c_5vbb95sj0bv0j70jrxtt40000gn/T/ipykernel_4621/1310187828.py in <module>
      7 s = pd.Series(['a','b','c',pd.NA],dtype='category')
      8 print(s)
----> 9 print(pl.from_pandas(s))

~/projects/nccdb/.venv/lib/python3.9/site-packages/polars/convert.py in from_pandas(df, rechunk, nan_to_none)
    274 
    275     if isinstance(df, (pd.Series, pd.DatetimeIndex)):
--> 276         return pl.Series._from_pandas("", df, nan_to_none=nan_to_none)
    277     elif isinstance(df, pd.DataFrame):
    278         return pl.DataFrame._from_pandas(df, rechunk=rechunk, nan_to_none=nan_to_none)

~/projects/nccdb/.venv/lib/python3.9/site-packages/polars/eager/series.py in _from_pandas(cls, name, values, nan_to_none)
    259         """
    260         return cls._from_pyseries(
--> 261             pandas_to_pyseries(name, values, nan_to_none=nan_to_none)
    262         )
    263 

~/projects/nccdb/.venv/lib/python3.9/site-packages/polars/internals/construction.py in pandas_to_pyseries(name, values, nan_to_none)
    187     if not name and values.name is not None:
    188         name = str(values.name)
--> 189     return arrow_to_pyseries(
    190         name, _pandas_series_to_arrow(values, nan_to_none=nan_to_none)
    191     )

~/projects/nccdb/.venv/lib/python3.9/site-packages/polars/internals/construction.py in arrow_to_pyseries(name, values)
     60     """
     61     array = coerce_arrow(values)
---> 62     return PySeries.from_arrow(name, array)
     63 
     64 

PanicException: called `Option::unwrap()` on a `None` value

If pd.NA is np.nan or None, the result is the same
If dtype is not category, it will be handled properly

What is the expected behavior?

None should be handled properly.

@mizuy
Copy link
Author

mizuy commented Nov 6, 2021

this is result of "RUST_BACKTRACE=full"

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /Users/runner/.cargo/git/checkouts/arrow2-8a2ad61d97265680/0dda942/src/array/growable/dictionary.rs:103:62
stack backtrace:
   0:        0x1561e9b0a - _rust_eh_personality
   1:        0x15563713b - _BrotliDecoderVersion
   2:        0x1561e87ca - _rust_eh_personality
   3:        0x1561e8f05 - _rust_eh_personality
   4:        0x1561e840c - _rust_eh_personality
   5:        0x1562152ea - _rust_eh_personality
   6:        0x156215289 - _rust_eh_personality
   7:        0x156215245 - _rust_eh_personality
   8:        0x15637d27f - _rust_eh_personality
   9:        0x15637d3a7 - _rust_eh_personality
  10:        0x15539fd72 - _PyInit_polars
  11:        0x1554d0e0e - _PyInit_polars
  12:        0x15575019a - _rust_eh_personality
  13:        0x1551db0c8 - <unknown>
  14:        0x109ffaba5 - _cfunction_call
  15:        0x109fb8fc7 - __PyObject_MakeTpCall
  16:        0x10a0a2850 - _call_function
  17:        0x10a09f806 - __PyEval_EvalFrameDefault
  18:        0x109fb9835 - _function_code_fastcall
  19:        0x10a0a277b - _call_function
  20:        0x10a09f8b0 - __PyEval_EvalFrameDefault
  21:        0x10a0a3644 - __PyEval_EvalCode
  22:        0x109fb9740 - __PyFunction_Vectorcall
  23:        0x10a0a277b - _call_function
  24:        0x10a09f962 - __PyEval_EvalFrameDefault
  25:        0x10a0a3644 - __PyEval_EvalCode
  26:        0x109fb9740 - __PyFunction_Vectorcall
  27:        0x109fbbbfc - _method_vectorcall
  28:        0x10a0a277b - _call_function
  29:        0x10a09f962 - __PyEval_EvalFrameDefault
  30:        0x10a0a3644 - __PyEval_EvalCode
  31:        0x109fb9740 - __PyFunction_Vectorcall
  32:        0x10a0a277b - _call_function
  33:        0x10a09f806 - __PyEval_EvalFrameDefault
  34:        0x10a0a3644 - __PyEval_EvalCode
  35:        0x10a098b30 - _PyEval_EvalCode
  36:        0x10a0e79f5 - _PyRun_InteractiveOneObjectEx
  37:        0x10a0e6fb9 - _PyRun_InteractiveLoopFlags
  38:        0x10a0e6edc - _PyRun_AnyFileExFlags
  39:        0x10a106788 - _Py_RunMain
  40:        0x10a106b03 - _pymain_main
  41:        0x10a106b5b - _Py_BytesMain
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mizuy/.pyenv/versions/3.9.7/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/polars/convert.py", line 276, in from_pandas
    return pl.Series._from_pandas("", df, nan_to_none=nan_to_none)
  File "/Users/mizuy/.pyenv/versions/3.9.7/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/polars/eager/series.py", line 261, in _from_pandas
    pandas_to_pyseries(name, values, nan_to_none=nan_to_none)
  File "/Users/mizuy/.pyenv/versions/3.9.7/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/polars/internals/construction.py", line 210, in pandas_to_pyseries
    return arrow_to_pyseries(
  File "/Users/mizuy/.pyenv/versions/3.9.7/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/polars/internals/construction.py", line 62, in arrow_to_pyseries
    return PySeries.from_arrow(name, array)
pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant