-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20371][R] Add wrappers for collect_list and collect_set #17672
Conversation
Test build #75906 has finished for PR 17672 at commit
|
Test build #75909 has finished for PR 17672 at commit
|
Test build #75911 has finished for PR 17672 at commit
|
cc @felixcheung |
R/pkg/R/functions.R
Outdated
|
||
#' collect_list | ||
#' | ||
#' Aggregate function: returns a list of objects with duplicates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does other function has Aggregate function:
in the description or this is carried over from Scala doc?
if latter we could go without - there's already a @family
tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Scala docs. Removed.
R/pkg/R/functions.R
Outdated
#' | ||
#' @rdname collect_list | ||
#' @name collect_list | ||
#' @family aggregate_functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's an existing agg_funcs
in R
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, corrected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: my other comments, my preference would be full text without underscore like @family aggregate functions
. This isn't a key or id and it shows up in generated doc text. Anyway, that's a bigger change all around.
@@ -918,6 +918,14 @@ setGeneric("cbrt", function(x) { standardGeneric("cbrt") }) | |||
#' @export | |||
setGeneric("ceil", function(x) { standardGeneric("ceil") }) | |||
|
|||
#' @rdname collect_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are under ###################### Expression Function Methods ##########################
which doesn't seem like the right group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's continue here #17674 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it's good.
agg(gd3, collect_set(df8$age), collect_list(df8$age)) | ||
) | ||
|
||
testthat::expect_equal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why testthat::
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a habit. Corrected.
Test build #75940 has finished for PR 17672 at commit
|
R/pkg/R/functions.R
Outdated
#' | ||
#' @rdname collect_list | ||
#' @name collect_list | ||
#' @family agg_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has an 's' though agg_funcs
- not to nit pick but it needs to match exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad.
On a side note, do you think we should provide detailed examples for each function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we really should - it's a long standing concern that something like
@examples \dontrun{collect_list(df$x)}
is not useful at all. At least we should include how this goes into a select statement, and in some cases what the output looks like. However, with the number of sql functions we have it's a bit of ongoing work to improve this and we definitely can use more help on that
Test build #75982 has finished for PR 17672 at commit
|
Test build #75988 has finished for PR 17672 at commit
|
btw, fyi, you don't have to rebase or squash each time you push - it's actually easier to review if you don't - so reviewer can track comments and see the diff from the last time. but I get that you might be rebasing because of conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Noted :) But yeah, there was a conflict after SPARK-20375.
It would be great. |
merged to master. |
Thanks @felixcheung |
BTW @felixcheung - is there any deeper reason behind current stat of
|
Not really its just inconsistent handling.
Some comment changes can be deliberated though.
|
Yeah, I have this feeling that it could be deliberate, but I cannot figure out what is the purpose. Removing I thought about cleaning this up, but I wonder if it is better to wait for SPARK-16693. |
Will be probably cleaner
|
@felixcheung Do you know by any chance what is the policy about adding new datasets to Spark? License restrictions, file size and such? |
@zero323 I think that its license needs to be compatible with Apache 2.0 and it can't be big (since example data is in the release; no more than a few MB?) https://www.apache.org/licenses/ |
## What changes were proposed in this pull request? Adds wrappers for `collect_list` and `collect_set`. ## How was this patch tested? Unit tests, `check-cran.sh` Author: zero323 <[email protected]> Closes apache#17672 from zero323/SPARK-20371.
What changes were proposed in this pull request?
Adds wrappers for
collect_list
andcollect_set
.How was this patch tested?
Unit tests,
check-cran.sh