-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DataFrame.apply #1259
Conversation
Great!
Is access to each element of the row available with axis=1? df = ks.DataFrame([[4, 9]] * 10, columns=['param1', 'param2'])
def my_func( x ) -> int:
# x should be a row here
return run_regression_and_log_to_mlflow(x['param1'], x['param2'])
#should return a series of ints
df.apply(my_func, axis=1) Thanks! |
@patryk-oleniuk, yup, that works: df = ks.DataFrame([[4, 9]] * 10, columns=['param1', 'param2'])
def my_func( x ) -> int:
# x should be a row here
print(x)
return 1
#should return a series of ints
df.apply(my_func, axis=1)
|
13179d5
to
12e47c7
Compare
Codecov Report
@@ Coverage Diff @@
## master #1259 +/- ##
==========================================
- Coverage 95.16% 93.77% -1.39%
==========================================
Files 35 35
Lines 7151 7200 +49
==========================================
- Hits 6805 6752 -53
- Misses 346 448 +102
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks, @ueshin. Merged! |
This PR proposes to implement
DataFrame.apply
with bothaxis
0 and 1. Note that,DataFrame.apply(..., axis=0)
with global aggregations is impossible.It can be tested with the examples below:
Basically the approach is using group map Pandas UDF by grouping by partitions.
Resolves #1228
Resolves #65