-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where func returns tuple #35518
Comments
The "tuple trick" to force Pandas to treat the return value of an apply func as a scalar stopped working between Pandas 1.0.5 and 1.1.0: pandas-dev/pandas#35518
If you have a viable way to avoid this in your code, I'd encourage you to use it. Regardless of how this issue is addressed, tuples-as-scalars is fragile |
Yep. Well at least this issue forced me to clean up my code :) I'm now wrapping the value inside a fully opaque container object. |
moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline |
moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline |
moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline |
According to the docstring, I would say that the behaviour of 1.0.5 was correct, and this is a regression. @jbrockmendel would you have time to look into it? |
just to be clear, in the
The return type for user function in the OP is a tuple (considered list-like) so we expect a Series of those. This issue also occurs with a very list-like list where we also expect the default result_type behaviour to be to reduce.
agreed.
ping |
can confirm, first bad commit: [91802a9] PERF: avoid creating many Series in apply_standard (#34909) |
Aside from reverting #34909, the solution that comes to mind is calling the function on the first row in wrap_results_for_axis and seeing if we get a tuple. That runs into other problems with non-univalent or mutating functions. |
removing milestone |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Output of Pandas 1.0.5
Output of Pandas 1.1.0
It is not clear to me if this behaviour change is intended or not. I couldn't find anything obvious in the release notes.
Possibly related: #35517, #34909 @simonjayhawkins @jbrockmendel
This broke my code, which is actively relying on tuples being treated as scalars and stored as single objects (instead of being laid across the dataframe).
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : d9fff27
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.104+
Version : #1 SMP Wed Feb 19 05:26:34 PST 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.0
numpy : 1.19.1
pytz : 2018.9
dateutil : 2.8.1
pip : 19.3.1
setuptools : 49.2.0
Cython : 0.29.21
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 5.5.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.18
tables : 3.4.4
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.48.0
The text was updated successfully, but these errors were encountered: