-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12526][SPARKR]ifelse
, when
, otherwise
unable to take Column as value
#10481
Conversation
Test build #48336 has finished for PR 10481 at commit
|
Indeed. Like I said I don't see compelling reason to use these three functions in vectorized way. Let me know if you have any other comments on the fix. |
The fix is good, but some style nit: |
@@ -225,7 +225,7 @@ setMethod("%in%", | |||
setMethod("otherwise", | |||
signature(x = "Column", value = "ANY"), | |||
function(x, value) { | |||
value <- ifelse(class(value) == "Column", value@jc, value) | |||
value <- if(class(value) == "Column") { value@jc } else { value } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space between if and (
Thanks for the review @sun-rui. Hope that's better. Looks like |
Test build #48395 has finished for PR 10481 at commit
|
…umn as value `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values. For example: ```r ifelse(lit(1) == lit(1), lit(2), lit(3)) ifelse(df$mpg > 0, df$mpg, 0) ``` will both fail with ```r attempt to replicate an object of type 'environment' ``` The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR. For reference, added test cases which trigger failures: ```r . Error: when(), otherwise() and ifelse() with column on a DataFrame ---------- error in evaluating the argument 'x' in selecting a method for function 'collect': error in evaluating the argument 'col' in selecting a method for function 'select': attempt to replicate an object of type 'environment' Calls: when -> when -> ifelse -> ifelse 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage")) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126 5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label) 6: condition(object) 7: compare(actual, expected, ...) 8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1)))) Error: Test failures Execution halted ``` Author: Forest Fang <[email protected]> Closes #10481 from saurfang/spark-12526. (cherry picked from commit d80cc90) Signed-off-by: Shivaram Venkataraman <[email protected]>
ifelse
,when
,otherwise
is unable to takeColumn
typed S4 object as values.For example:
will both fail with
The PR replaces
ifelse
calls withif ... else ...
inside the function implementations to avoid attempt to vectorize(i.e.rep()
). It remains to be discussed whether we should instead support vectorization in these functions for consistency becauseifelse
in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR.For reference, added test cases which trigger failures: