Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using summarize in ddply with .parallel=T ("'...' used in an incorrect context")) #204

Closed
Ken-B opened this issue Mar 12, 2014 · 5 comments

Comments

@Ken-B
Copy link

Ken-B commented Mar 12, 2014

First of all, thanks for the amazing plyr package! I have looked around in the old issues and google, but could not find a solution to my problem. I suspect it might be a bug.

Basically, when I use ddply with the .fun=summarize function, it fails when used together with .parallel=T.
I run windows 7 x64, R 3.0.1, plyr 1.8.1.

To reproduce with one of your plyr examples:

library(plyr)
dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54))
ddply(dfx, .(group, sex), .parallel=F, .fun=summarize, 
      mean = round(mean(age), 2), sd = round(sd(age), 2))
# Outputs correctly:
# group sex  mean    sd
#1     A   F 40.01  8.94
#2     A   M 35.52 13.99
#3     B   F 33.07 10.95
#4     B   M 40.94 11.44
#5     C   F 39.59 10.37
#6     C   M 27.21  8.19

Adding a .parallel=T option throws the error:

library(doSNOW)
registerDoSNOW(makeCluster(parallel:::detectCores()))
ddply(dfx, .(group, sex), .parallel=T, .fun=summarize, 
      mean = round(mean(age), 2), sd = round(sd(age), 2))
# throws error (and two warnings):
# Error in do.ply(i) : task 1 failed - "'...' used in an incorrect context"
# In addition: Warning messages:
#  1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
# 
#  2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’

The error does not allow me to proceed, unlike the warnings mentioned in #203 .

Any help warmly appreciated, let me know if I can test anything further.

@krlmlr
Copy link

krlmlr commented Apr 1, 2014

The following works for me:

ddply(dfx, .(group, sex), .parallel=T, .fun=summarize, mean = .(mean(age)))

The . function (see ?quoted) seems to capture the expression so that it can be passed on safely to the worker processes. I don't understand what happens on the remote side, and which part is responsible for "unquoting", given that the same call with .parallel=F produces wrong results. @hadley: Do you think summarize et al. should be altered to support quoted objects?

The following alternative is safer:

ddply(dfx, .(group, sex), .parallel=T,
       .fun=function(piece) summarize(piece, mean = mean(age)))

You can make this shorter with the help of a helper function:

f <- function(fn, ...) function(piece) fn(piece, ...)
ddply(dfx, .(group, sex), .parallel=T, .fun=f(summarize, mean = mean(age)))

I'm not sure if it's at all possible to enhance plyr so that it supports your syntax, without breaking other things.

The f function is actually very similar to functional::Curry, but using the latter didn't work for me. @hadley: Do you think f should be added to plyr, perhaps under a more suitable name?

@sbraddaughdrill
Copy link

Is there an update to this? I am getting similar warnings on Windows 8.1 (Update 1) x64, R 3.1.0, plyr 1.8.1.

@hadley
Copy link
Owner

hadley commented Mar 30, 2015

Seems to work for me with this code:

library(plyr)
dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54))

doParallel::registerDoParallel(cores = 3)
ddply(dfx, .(group, sex), .parallel=T, .fun=summarize, 
  mean = round(mean(age), 2), sd = round(sd(age), 2))

@hadley hadley closed this as completed Mar 30, 2015
@colemonnahan
Copy link

@hadley Is there any update or work around on this? When I run your example I get

Error in do.ply(i) : task 1 failed - "'...' used in an incorrect context"
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'
2: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'

R 3.2.2
Windows 10 x64
plyr 1.8.3
doParallel 1.0.10

@HenrikBengtsson
Copy link
Contributor

HenrikBengtsson commented May 5, 2016

For the records, I'm confirming there's a still a problem on R 3.3.0 on Windows (details below), or more precisely with SNOW clusters.

Reproducible example (also on non-Windows machines)

The reason is most likely that doParallel uses a multicore backend on Unix/OS X, but on Windows it falls back to using a SNOW cluster of local R sessions. For instance, on Linux you get:

> doParallel::registerDoParallel(cores = 3)
> foreach::getDoParName()
[1] "doParallelMC"

whereas on Windows you get:

> doParallel::registerDoParallel(cores = 3)
> foreach::getDoParName()
[1] "doParallelSNOW"

@hadley and other non-Windows users, to reproduce this SNOW-related issues reported here, use the following setup instead:

> cl <- parallel::makeCluster(3)
> doParallel::registerDoParallel(cl)
> foreach::getDoParName()
[1] "doParallelSNOW"

and you'll get the same error.

Troubleshooting guess

I can only guess, but I believe this has to do with how "global" variables are identified by foreach/SNOW and somehow it falls flat when it gets to ... in some of the calling frames. It's probably something that needs to be fixed outside of plyr.

Details

library(plyr)
dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54))

## doParallel::registerDoParallel(cores = 3) on Windows gives:
cl <- parallel::makeCluster(3)
doParallel::registerDoParallel(cl)

ddply(dfx, .(group, sex), .parallel=T, .fun=summarize, 
  mean = round(mean(age), 2), sd = round(sd(age), 2))
Error in do.ply(i) : task 1 failed - "'...' used in an incorrect context"
In addition: Warning messages:
1: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'

2: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'

> sessionInfo()
R version 3.3.0 Patched (2016-05-03 r70575)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] doParallel_1.0.10 iterators_1.0.8   foreach_1.4.3     plyr_1.8.3

loaded via a namespace (and not attached):
[1] compiler_3.3.0   Rcpp_0.12.4.5    codetools_0.2-14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants