Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.__setitem__ fails with non range index while upcasting dtype #45070

Closed
2 of 3 tasks
LucasG0 opened this issue Dec 25, 2021 · 6 comments · Fixed by #45232
Closed
2 of 3 tasks

BUG: Series.__setitem__ fails with non range index while upcasting dtype #45070

LucasG0 opened this issue Dec 25, 2021 · 6 comments · Fixed by #45232
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@LucasG0
Copy link
Contributor

LucasG0 commented Dec 25, 2021

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

>>> s = pd.Series([1, 2, 3], index=["a", "b", "c"])
>>> s
a    1
b    2
c    3
dtype: int64
>>> s[0] = "X"
Traceback (most recent call last):
  File "/home/lucas/.local/lib/python3.8/site-packages/pandas/core/series.py", line 1000, in __setitem__
    self._set_with_engine(key, value)
  File "/home/lucas/.local/lib/python3.8/site-packages/pandas/core/series.py", line 1033, in _set_with_engine
    loc = self.index._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lucas/.local/lib/python3.8/site-packages/pandas/core/series.py", line 1005, in __setitem__
    values[key] = value
ValueError: invalid literal for int() with base 10: 'X'

Issue Description

Currently it raises an error, I think it is considered to be a bug while 39584 is not decided.

Expected Behavior

Either correctly set the value and upcast dtype to object, either keep raising an error depending on 39584 output.

Installed Versions

INSTALLED VERSIONS

commit : db08276
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.11.0-43-generic
Version : #47~20.04.2-Ubuntu SMP Mon Dec 13 11:06:56 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8

pandas : 1.1.3
numpy : 1.19.5
pytz : 2019.3
dateutil : 2.8.2
pip : 20.0.2
setuptools : 45.2.0
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.25.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 2021.10.1
fastparquet : 0.7.1
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.0
pytables : None
pyxlsb : None
s3fs : 2021.10.1
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : None

@LucasG0 LucasG0 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 25, 2021
@phofl
Copy link
Member

phofl commented Dec 25, 2021

0 is not part of your index, so the KeyError is correct

the traceback is a bit weird, but the behavior is ok

@phofl phofl added Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 25, 2021
@LucasG0
Copy link
Contributor Author

LucasG0 commented Dec 26, 2021

__setitem__ behaves like iloc in this case, it works when setting the first element with an integer

>>> s[0] = 0                                                                                                            
>>> s                                                                                                                  
 a    0                                                                                                                 
 b    2                                                                                                                  
 c    3                                                                                                                  
dtype: int64

So I think the KeyError is not expected here

@phofl
Copy link
Member

phofl commented Dec 26, 2021

Did not remember that.

We use numpy for this here. Which is where the error comes from:

na = np.array([1, 2, 3])

na[1] = "a"

Reading the code and the user guide: This is intended to work like numpy does, so not sure if the error is unexpected here.

With Series, the syntax works exactly as with an ndarray, returning a slice of the values and the corresponding labels:

In general I would recommend using iloc for this case.

@jreback
Copy link
Contributor

jreback commented Dec 26, 2021

this is hitting the fallback path
this should work i think

@simonjayhawkins simonjayhawkins added Bug and removed Enhancement Error Reporting Incorrect or improved errors from pandas labels Dec 27, 2021
@jbrockmendel
Copy link
Member

Best guess: in _set_with_engine we call validate_numeric_casting, which fails to raise here. instead we can do if not can_hold_element(self._values, value): raise TypeError, then we'd go through a different path and i think this would work as expected.

replacing validate_numeric_casting with can_hold_element checks changes some other behavior (and im concerned about perf) but deduplicating these would be worthwhile.

@phofl
Copy link
Member

phofl commented Dec 28, 2021

_set_with_engine raises a KeyError and this triggers the fallback path, which is expected I think. The fallback runs into self._mgr.setitem_inplace(key, value), which falls back onto numpy logic. And numpy raises here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants