BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where func returns tuple #35518

dechamps · 2020-08-02T13:04:16Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

print(
  pd.DataFrame([['orig1', 'orig2']])
  .apply(func=lambda col: ('new1', 'new2')))

Output of Pandas 1.0.5

0    (new1, new2)
1    (new1, new2)
dtype: object

Output of Pandas 1.1.0

      0     1
0  new1  new1
1  new2  new2

It is not clear to me if this behaviour change is intended or not. I couldn't find anything obvious in the release notes.

Possibly related: #35517, #34909 @simonjayhawkins @jbrockmendel

This broke my code, which is actively relying on tuples being treated as scalars and stored as single objects (instead of being laid across the dataframe).

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : d9fff27
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.104+
Version : #1 SMP Wed Feb 19 05:26:34 PST 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.19.1
pytz : 2018.9
dateutil : 2.8.1
pip : 19.3.1
setuptools : 49.2.0
Cython : 0.29.21
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 5.5.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.18
tables : 3.4.4
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

The "tuple trick" to force Pandas to treat the return value of an apply func as a scalar stopped working between Pandas 1.0.5 and 1.1.0: pandas-dev/pandas#35518

jbrockmendel · 2020-08-04T16:32:08Z

which is actively relying on tuples being treated as scalars and stored as single objects

If you have a viable way to avoid this in your code, I'd encourage you to use it. Regardless of how this issue is addressed, tuples-as-scalars is fragile

dechamps · 2020-08-04T18:20:45Z

If you have a viable way to avoid this in your code, I'd encourage you to use it. Regardless of how this issue is addressed, tuples-as-scalars is fragile

Yep. Well at least this issue forced me to clean up my code :) I'm now wrapping the value inside a fully opaque container object.

simonjayhawkins · 2020-09-07T09:30:37Z

moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline

simonjayhawkins · 2020-10-05T12:50:36Z

moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline

simonjayhawkins · 2020-10-29T14:24:18Z

moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline

jorisvandenbossche · 2020-11-25T21:06:05Z

According to the docstring, I would say that the behaviour of 1.0.5 was correct, and this is a regression.

@jbrockmendel would you have time to look into it?

simonjayhawkins · 2020-12-20T13:05:42Z

According to the docstring

just to be clear, in the DataFrame.apply docstring https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html, the description for the result_type parameter is...

The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.

The return type for user function in the OP is a tuple (considered list-like) so we expect a Series of those.

This issue also occurs with a very list-like list where we also expect the default result_type behaviour to be to reduce.

>>> pd.__version__
'1.3.0.dev0+100.g54682234e3'
>>>
>>> df = pd.DataFrame([["orig1", "orig2"]])
>>>
>>> df.apply(func=lambda col: ("new1", "new2"), result_type="reduce")
0    (new1, new2)
1    (new1, new2)
dtype: object
>>>
>>> df.apply(func=lambda col: ("new1", "new2"))
      0     1
0  new1  new1
1  new2  new2
>>>
>>> df.apply(func=lambda col: ["new1", "new2"], result_type="reduce")
0    [new1, new2]
1    [new1, new2]
dtype: object
>>>
>>> df.apply(func=lambda col: ["new1", "new2"])
      0     1
0  new1  new1
1  new2  new2
>>>

I would say that the behaviour of 1.0.5 was correct, and this is a regression.

agreed.

@jbrockmendel would you have time to look into it?

ping

simonjayhawkins · 2020-12-20T14:21:30Z

Possibly related: #35517, #34909 @simonjayhawkins @jbrockmendel

can confirm, first bad commit: [91802a9] PERF: avoid creating many Series in apply_standard (#34909)

jbrockmendel · 2020-12-21T16:12:02Z

Aside from reverting #34909, the solution that comes to mind is calling the function on the first row in wrap_results_for_axis and seeing if we get a tuple. That runs into other problems with non-univalent or mutating functions.

simonjayhawkins · 2021-06-11T13:00:36Z

removing milestone

dechamps added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 2, 2020

dechamps mentioned this issue Aug 2, 2020

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where fn returns np.ndarray #35517

Closed

3 tasks

simonjayhawkins added Apply Apply, Aggregate, Transform, Map Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 3, 2020

simonjayhawkins added this to the 1.1.1 milestone Aug 3, 2020

jbrockmendel mentioned this issue Aug 10, 2020

BUG: DataFrame.apply with func altering row in-place #35633

Merged

4 tasks

simonjayhawkins modified the milestones: 1.1.1, 1.1.2 Aug 20, 2020

jbrockmendel mentioned this issue Sep 1, 2020

BUG: apply behaviour changed when using function that returns tuple #36042

Closed

3 tasks

simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 7, 2020

simonjayhawkins modified the milestones: 1.1.3, 1.1.4 Oct 5, 2020

simonjayhawkins modified the milestones: 1.1.4, 1.1.5 Oct 29, 2020

jreback modified the milestones: 1.1.5, Contributions Welcome Nov 25, 2020

simonjayhawkins added the Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). label Dec 20, 2020

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Dec 20, 2020

code sample for pandas-dev#35518

8d3a176

simonjayhawkins mentioned this issue Dec 21, 2020

RLS: 1.2 #37784

Closed

This was referenced Jan 20, 2021

Default behaviour for return_type = None corrected. #39292

Closed

Fixed code breakage for return_type = tuples for axis = 0 #39375

Closed

jreback modified the milestones: Contributions Welcome, 1.3 Jan 24, 2021

simonjayhawkins modified the milestones: 1.3, Contributions Welcome Jun 11, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where func returns tuple #35518

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where func returns tuple #35518

dechamps commented Aug 2, 2020

INSTALLED VERSIONS

jbrockmendel commented Aug 4, 2020

dechamps commented Aug 4, 2020

simonjayhawkins commented Sep 7, 2020

simonjayhawkins commented Oct 5, 2020

simonjayhawkins commented Oct 29, 2020

jorisvandenbossche commented Nov 25, 2020

simonjayhawkins commented Dec 20, 2020

simonjayhawkins commented Dec 20, 2020

jbrockmendel commented Dec 21, 2020

simonjayhawkins commented Jun 11, 2021

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where func returns tuple #35518

BUG: Pandas 1.0.5 → 1.1.0 behavior change on DataFrame.apply() where func returns tuple #35518

Comments

dechamps commented Aug 2, 2020

Code Sample, a copy-pastable example

Output of Pandas 1.0.5

Output of Pandas 1.1.0

Output of pd.show_versions()

INSTALLED VERSIONS

jbrockmendel commented Aug 4, 2020

dechamps commented Aug 4, 2020

simonjayhawkins commented Sep 7, 2020

simonjayhawkins commented Oct 5, 2020

simonjayhawkins commented Oct 29, 2020

jorisvandenbossche commented Nov 25, 2020

simonjayhawkins commented Dec 20, 2020

simonjayhawkins commented Dec 20, 2020

jbrockmendel commented Dec 21, 2020

simonjayhawkins commented Jun 11, 2021

Output of `pd.show_versions()`