-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Series.aggregate and agg #816
Conversation
Codecov Report
@@ Coverage Diff @@
## master #816 +/- ##
==========================================
+ Coverage 94.34% 94.36% +0.02%
==========================================
Files 32 32
Lines 5849 5854 +5
==========================================
+ Hits 5518 5524 +6
+ Misses 331 330 -1
Continue to review full report at Codecov.
|
databricks/koalas/series.py
Outdated
raise ValueError("If the given function is a list, it " | ||
"should only contains function names as strings.") | ||
elif isinstance(func, str): | ||
return eval("self.{}()".format(func)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid eval()
as far as possible. getattr(self, func)()
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin Thanks for review ueshin :) I totally agree. fixed it !
databricks/koalas/series.py
Outdated
if isinstance(func, list): | ||
if all((isinstance(f, str) for f in func)): | ||
rows = OrderedDict((f, eval("self.{}()".format(f), dict(self=self))) for f in func) | ||
return Series(rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs Spark jobs many times. In this case, I think we can reuse DataFrame
's aggregate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin Thanks again!! fixed it :)
Anyway I have a question, (It may be a very basic question though 😿 )
is it right the every function call in OrderedDict comprehension(when run eval) call spark job each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, each aggregate function for Series call triggers sdf.head(2)
in
koalas/databricks/koalas/series.py
Lines 3139 to 3149 in cbcb502
def _unpack_scalar(sdf): | |
""" | |
Takes a dataframe that is supposed to contain a single row with a single scalar value, | |
and returns this value. | |
""" | |
l = sdf.head(2) | |
assert len(l) == 1, (sdf, l) | |
row = l[0] | |
l2 = list(row.asDict().values()) | |
assert len(l2) == 1, (row, l2) | |
return l2[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin Oh now i totally got it. Thanks ueshin!! 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, LGTM, pending tests.
databricks/koalas/series.py
Outdated
@@ -19,7 +19,7 @@ | |||
""" | |||
import re | |||
import inspect | |||
from collections import Iterable | |||
from collections import Iterable, OrderedDict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need this change anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin right. i just removed it :)
Softagram Impact Report for pull/816 (head commit: cbcb502)⭐ Change Overview
📄 Full report
Impact Report explained. Give feedback on this report to [email protected] |
Thanks! merging. |
* upstream/master: Updated the koalas logo in readme.md Adding koalas-logo without label Adding Koalas logo to readme Adding koalas logo Clean pandas usage in frame.agg (databricks#821) Implement Series.aggregate and agg (databricks#816) Raise a more helpful error for duplicated columns in Join (databricks#820)
Like pandas Series.aggregate (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.aggregate.html)
I implemented aggregate function for series.
Example:
(above example is using pandas one)