Fix GroupBy.head to recognize agg_columns. #1474

ueshin · 2020-05-09T02:05:39Z

Fixing GroupBy.head to recognize agg_columns.

>>> kdf = ks.DataFrame({"a": [1, 1, 1, 1, 2, 2, 2, 3, 3, 3] * 3, "b": [2, 3, 1, 4, 6, 9, 8, 10, 7, 5] * 3, "c": [3, 5, 2, 5, 1, 2, 6, 4, 3, 6] * 3})
>>> kdf.groupby("a")[["b"]].head(2).sort_index()
   a   b  c
0  1   2  3
1  1   3  5
4  2   6  1
5  2   9  2
7  3  10  4
8  3   7  3

This should be:

>>> pdf.groupby("a")[["b"]].head(2).sort_index()
    b
0   2
1   3
4   6
5   9
7  10
8   7

codecov-io · 2020-05-09T02:47:11Z

Codecov Report

Merging #1474 into master will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1474      +/-   ##
==========================================
+ Coverage   93.74%   93.77%   +0.03%     
==========================================
  Files          36       36              
  Lines        8409     8420      +11     
==========================================
+ Hits         7883     7896      +13     
+ Misses        526      524       -2

Impacted Files	Coverage Δ
databricks/koalas/groupby.py	`88.85% <100.00%> (+0.21%)`	⬆️
databricks/koalas/frame.py	`95.58% <0.00%> (+0.04%)`	⬆️
databricks/koalas/generic.py	`97.02% <0.00%> (+0.37%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e68e03...205f99d. Read the comment docs.

HyukjinKwon · 2020-05-11T08:19:37Z

databricks/koalas/groupby.py

+        kdf = kdf[
+            [s.rename(label) for s, label in zip(self._groupkeys, groupkey_labels)] + agg_columns
+        ]
+        groupkey_scols = [kdf._internal.spark_column_for(label) for label in groupkey_labels]


@ueshin, it looks good but can we have a private function for these cases? Looks like it's duplicated in #1472 https://github.com/databricks/koalas/pull/1473/files and #1471.

Sure, I'll address it in the following PRs.

HyukjinKwon

LGTM except https://github.com/databricks/koalas/pull/1474/files#r422865501

ueshin · 2020-05-11T18:09:20Z

Thanks! I'd merge this now. I'll address the comment in the following PRs.

Fix GroupBy.head to recognize agg_columns.

205f99d

ueshin requested a review from HyukjinKwon May 9, 2020 02:05

HyukjinKwon reviewed May 11, 2020

View reviewed changes

HyukjinKwon approved these changes May 11, 2020

View reviewed changes

ueshin merged commit 4811f58 into databricks:master May 11, 2020

ueshin deleted the groupby_head branch May 11, 2020 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GroupBy.head to recognize agg_columns. #1474

Fix GroupBy.head to recognize agg_columns. #1474

ueshin commented May 9, 2020

codecov-io commented May 9, 2020

HyukjinKwon May 11, 2020

ueshin May 11, 2020

HyukjinKwon left a comment

ueshin commented May 11, 2020

Fix GroupBy.head to recognize agg_columns. #1474

Fix GroupBy.head to recognize agg_columns. #1474

Conversation

ueshin commented May 9, 2020

codecov-io commented May 9, 2020

Codecov Report

HyukjinKwon May 11, 2020

Choose a reason for hiding this comment

ueshin May 11, 2020

Choose a reason for hiding this comment

HyukjinKwon left a comment

Choose a reason for hiding this comment

ueshin commented May 11, 2020