-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMPAT: Emit warning when groupby by a tuple #18731
Changes from 2 commits
209ffce
ad09ade
a489b20
ade2b2b
6050226
d8c20e8
d2a2372
38ef818
a27f449
4e5ae9f
a8b4383
3057aab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2850,7 +2850,15 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True, | |
elif isinstance(key, BaseGrouper): | ||
return key, [], obj | ||
|
||
# Everything which is not a list is a key (including tuples): | ||
tuple_as_list = isinstance(key, tuple) and key not in obj | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only disadvantage I see with this approach is that pd.DataFrame(1, index=range(3), columns=pd.MultiIndex.from_product([[1, 2], [3,4]])).groupby((7, 8)).mean() will raise isinstance(key, tuple) and key not in obj and set(key).issubset(obj) is too expensive? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this change would be ok There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Except that, as @jorisvandenbossche reminded below, |
||
if tuple_as_list: | ||
msg = ("Interpreting tuple 'by' as a list of keys, rather than " | ||
"a single key. Use 'by={!r}' instead of 'by={!r}'. In the " | ||
"future, a tuple will always mean a single key.".format( | ||
list(key), key)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the key can contain a long array or column, so not sure it is a good idea to format it like this into the message. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought NumPy's short repr kicked in sooner that it does. I'll fix this |
||
warnings.warn(msg, FutureWarning, stacklevel=5) | ||
key = list(key) | ||
|
||
if not isinstance(key, list): | ||
keys = [key] | ||
match_axis_length = False | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2727,6 +2727,23 @@ def test_empty_dataframe_groupby(self): | |
|
||
assert_frame_equal(result, expected) | ||
|
||
def test_tuple_warns(self): | ||
# https://github.com/pandas-dev/pandas/issues/18314 | ||
df = pd.DataFrame({('a', 'b'): [1, 1, 2, 2], 'a': [1, 1, 1, 2], | ||
'b': [1, 2, 2, 2], 'c': [1, 1, 1, 1]}) | ||
with tm.assert_produces_warning(FutureWarning) as w: | ||
df[['a', 'b', 'c']].groupby(('a', 'b')).c.mean() | ||
|
||
assert "Interpreting tuple 'by' as a list" in str(w[0].message) | ||
|
||
with tm.assert_produces_warning(FutureWarning) as w: | ||
df[['a', 'b', 'c']].groupby(('a', 'b')).c.mean() | ||
|
||
assert "Interpreting tuple 'by' as a list" in str(w[0].message) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't this the same as above? |
||
|
||
with tm.assert_produces_warning(None): | ||
df.groupby(('a', 'b')).c.mean() | ||
|
||
|
||
def _check_groupby(df, result, keys, field, f=lambda x: x.sum()): | ||
tups = lmap(tuple, df[keys].values) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention you can simply replace the tuple with a list