Implement Series.aggregate and agg #816

itholic · 2019-09-22T15:03:53Z

Like pandas Series.aggregate (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.aggregate.html)

I implemented aggregate function for series.

Example:

>>> s = ks.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
Name: 0, dtype: int64

>>> s.agg('min')
1

>>> s.agg(['min', 'max'])
min    1
max    4
Name: 0, dtype: int64

(above example is using pandas one)

codecov-io · 2019-09-22T18:57:31Z

Codecov Report

Merging #816 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #816      +/-   ##
==========================================
+ Coverage   94.34%   94.36%   +0.02%     
==========================================
  Files          32       32              
  Lines        5849     5854       +5     
==========================================
+ Hits         5518     5524       +6     
+ Misses        331      330       -1

Impacted Files	Coverage Δ
databricks/koalas/missing/series.py	`100% <ø> (ø)`	⬆️
databricks/koalas/frame.py	`96.89% <ø> (+0.06%)`	⬆️
databricks/koalas/series.py	`95.22% <100%> (+0.05%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6f27b0...cbcb502. Read the comment docs.

ueshin · 2019-09-23T17:37:33Z

databricks/koalas/series.py

+                raise ValueError("If the given function is a list, it "
+                                 "should only contains function names as strings.")
+        elif isinstance(func, str):
+            return eval("self.{}()".format(func))


We should avoid eval() as far as possible. getattr(self, func)() instead?

@ueshin Thanks for review ueshin :) I totally agree. fixed it !

ueshin · 2019-09-23T17:41:46Z

databricks/koalas/series.py

+        if isinstance(func, list):
+            if all((isinstance(f, str) for f in func)):
+                rows = OrderedDict((f, eval("self.{}()".format(f), dict(self=self))) for f in func)
+                return Series(rows)


This runs Spark jobs many times. In this case, I think we can reuse DataFrame's aggregate.

@ueshin Thanks again!! fixed it :)
Anyway I have a question, (It may be a very basic question though 😿 )
is it right the every function call in OrderedDict comprehension(when run eval) call spark job each time?

yes, each aggregate function for Series call triggers sdf.head(2) in

koalas/databricks/koalas/series.py

Lines 3139 to 3149 in cbcb502

def _unpack_scalar(sdf):

"""

Takes a dataframe that is supposed to contain a single row with a single scalar value,

and returns this value.

"""

l = sdf.head(2)

assert len(l) == 1, (sdf, l)

row = l[0]

l2 = list(row.asDict().values())

assert len(l2) == 1, (row, l2)

return l2[0]

to return the result value.

@ueshin Oh now i totally got it. Thanks ueshin!! 😃

…ries_agg

ueshin

Otherwise, LGTM, pending tests.

ueshin · 2019-09-23T20:05:06Z

databricks/koalas/series.py

@@ -19,7 +19,7 @@
 """
 import re
 import inspect
-from collections import Iterable
+from collections import Iterable, OrderedDict


nit: no need this change anymore.

@ueshin right. i just removed it :)

softagram-bot · 2019-09-23T20:08:33Z

Softagram Impact Report for pull/816 (head commit: `cbcb502`)

⭐ Change Overview

(Open in Softagram Desktop for full details)

📄 Full report

Permalink: Full report for pull/816

Impact Report explained. Give feedback on this report to [email protected]

ueshin · 2019-09-23T20:31:11Z

Thanks! merging.

* upstream/master: Updated the koalas logo in readme.md Adding koalas-logo without label Adding Koalas logo to readme Adding koalas logo Clean pandas usage in frame.agg (databricks#821) Implement Series.aggregate and agg (databricks#816) Raise a more helpful error for duplicated columns in Join (databricks#820)

itholic added 4 commits September 22, 2019 23:51

Implement Series.aggregate and agg

b9a8b24

Fix some comments

535ca7b

Remove unused import

d1d0737

Fix syntax error in doctest

656f10d

For 3.5 dictionary order

c575bbb

ueshin reviewed Sep 23, 2019

View reviewed changes

itholic added 2 commits September 24, 2019 04:44

Merge branch 'master' of https://github.com/databricks/koalas into se…

cd7ce09

…ries_agg

Replace eval to getattr & reuse DataFrame's

461e4d4

ueshin approved these changes Sep 23, 2019

View reviewed changes

Remove unused import: OrderedDict

cbcb502

ueshin merged commit 3e263df into databricks:master Sep 23, 2019

itholic deleted the series_agg branch September 25, 2019 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Series.aggregate and agg #816

Implement Series.aggregate and agg #816

itholic commented Sep 22, 2019 •

edited

Loading

codecov-io commented Sep 22, 2019 •

edited

Loading

ueshin Sep 23, 2019

itholic Sep 23, 2019

ueshin Sep 23, 2019

itholic Sep 23, 2019 •

edited

Loading

ueshin Sep 23, 2019

itholic Sep 23, 2019

ueshin left a comment

ueshin Sep 23, 2019 •

edited

Loading

itholic Sep 23, 2019

softagram-bot commented Sep 23, 2019

ueshin commented Sep 23, 2019

	def _unpack_scalar(sdf):
	"""
	Takes a dataframe that is supposed to contain a single row with a single scalar value,
	and returns this value.
	"""
	l = sdf.head(2)
	assert len(l) == 1, (sdf, l)
	row = l[0]
	l2 = list(row.asDict().values())
	assert len(l2) == 1, (row, l2)
	return l2[0]

Implement Series.aggregate and agg #816

Implement Series.aggregate and agg #816

Conversation

itholic commented Sep 22, 2019 • edited Loading

codecov-io commented Sep 22, 2019 • edited Loading

Codecov Report

ueshin Sep 23, 2019

Choose a reason for hiding this comment

itholic Sep 23, 2019

Choose a reason for hiding this comment

ueshin Sep 23, 2019

Choose a reason for hiding this comment

itholic Sep 23, 2019 • edited Loading

Choose a reason for hiding this comment

ueshin Sep 23, 2019

Choose a reason for hiding this comment

itholic Sep 23, 2019

Choose a reason for hiding this comment

ueshin left a comment

Choose a reason for hiding this comment

ueshin Sep 23, 2019 • edited Loading

Choose a reason for hiding this comment

itholic Sep 23, 2019

Choose a reason for hiding this comment

softagram-bot commented Sep 23, 2019

Softagram Impact Report for pull/816 (head commit: cbcb502)

⭐ Change Overview

📄 Full report

ueshin commented Sep 23, 2019

itholic commented Sep 22, 2019 •

edited

Loading

codecov-io commented Sep 22, 2019 •

edited

Loading

itholic Sep 23, 2019 •

edited

Loading

ueshin Sep 23, 2019 •

edited

Loading

Softagram Impact Report for pull/816 (head commit: `cbcb502`)