Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warining "... may be used in an incorrect context: ‘.fun(piece, ...)’" in plyr using parallel #203

Closed
renkun-ken opened this issue Mar 3, 2014 · 15 comments

Comments

@renkun-ken
Copy link

This package is terrific! But when I call **ply functions with parallel computing, warnings of ... may be used in an incorrect context: ‘.fun(piece, ...)’ always occur.
Here's a simple reproducible code:

require(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)

l1 <- as.list(1:1000)
require(plyr)
d2 <- ldply(l1,function(i) {
  return(data.frame(x=rnorm(100)+i))
}, .parallel=T)

stopCluster(cl)

Each time when I set .parallel=T to use the local two-node cluster, the computing finishes but with 2 warnings:

Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’

If I don't enable .parallel=T and use sequential computing, the warnings do not appear.

The same thing happens to ddply() too. Here's another example:

df <- data.frame(i=1:100,id=sample(LETTERS[1:5],100,replace=T),
  cls=sample(LETTERS[1:5],100,replace=T),x=rbinom(100,10,0.5),
  y=rbinom(100,10,0.5))
d4 <- ddply(df,.(i),function(row) {
  return(row)
}, .parallel=T)

The two warnings occurs too.
I don't get very useful information online. What's the problem here?

@krlmlr
Copy link

krlmlr commented Apr 1, 2014

I might have found a way to avoid the warnings. Will submit a pull request once it's good.

@jepusto
Copy link

jepusto commented Apr 7, 2014

The issue is in llply, which creates a function called do.ply that takes ... from the parent environment. Parallel evaluation is done through %dopar%, which doesn't seem to like that setup. I tried capturing ... prior to creating do.ply:

llply_parallel_test <- function(.data, .fun = NULL, ...) {
  pieces <- .data
  n <- length(pieces)
  result <- vector("list", n)
  dots <- eval(substitute(alist(...)))
  do.ply <- function(i) {
    piece <- pieces[[i]]
    res <- do.call(.fun, c(piece, dots))
    res
  }
  i <- seq_len(n)
  fe_call <- as.call(c(list(as.name("foreach"), i = i)))
  fe <- eval(fe_call)
  result <- fe %dopar% do.ply(i)
  attributes(result)[c("split_type", "split_labels")] <- attributes(pieces)[c("split_type", "split_labels")]
  names(result) <- names(pieces)
  if (!is.null(dim(pieces))) {
    dim(result) <- dim(pieces)
  }
  result  
}

This seems to work if each element of .data is a list of function arguments:

require(doParallel)
cluster <- makeCluster(2, type = "SOCK")
registerDoParallel(cluster)
named_data <- plyr:::splitter_a(expand.grid(mean=1:5, sd=1:5))
llply_parallel_test(named_data, .fun = rnorm, n=2)

However, it fails if the elements of .data are vectors or matrices:

margin1_data <- plyr:::splitter_a(ozone, .margins = 1)
margin1_mean <- llply_parallel_test(margin1_data, .fun = mean, na.rm=TRUE)
all.equal(unlist(margin1_mean), apply(ozone, 1, mean))
all.equal(unlist(margin1_mean), ozone[,1,1])

Note sure where to go from here...

@krlmlr
Copy link

krlmlr commented Apr 7, 2014

@jepusto: Does my pull request work for you?

@jepusto
Copy link

jepusto commented Apr 7, 2014

@krlmlr Unfortunately not--I think it may fail in the first case mentioned above. Try

mdply(expand.grid(mean=1:5, sd=1.5), .fun = rnorm, n = 2, .parallel = TRUE)

@krlmlr
Copy link

krlmlr commented Apr 8, 2014

I'm getting

> doSNOW::registerDoSNOW(snow::makeSOCKcluster(4))
> mdply(expand.grid(mean=1:5, sd=1.5), .fun = rnorm, n = 2, .parallel = TRUE)
  mean  sd        V1         V2
1    1 1.5 -1.405161  2.7829695
2    2 1.5  2.346042  5.5085988
3    3 1.5  6.804467 -0.2187333
4    4 1.5  3.183831  4.8182184
5    5 1.5  5.924252  4.4588331

with my pull request #210. Is anything wrong with that?

@jepusto
Copy link

jepusto commented Apr 8, 2014

Sorry, I had a typo in my original--it should read sd = 1:5:

mdply(expand.grid(mean=1:5, sd=1:5), .fun = rnorm, n = 2, .parallel = FALSE)
mdply(expand.grid(mean=1:5, sd=1:5), .fun = rnorm, n = 2, .parallel = TRUE)

I haven't been able to get your pull request to build (probably a problem on my end), otherwise I would test it myself.

@krlmlr
Copy link

krlmlr commented Apr 8, 2014

Still works the same for me.

Have you tried devtools::install_github("krlmlr/plyr#210")? Make sure you have version 1.5 of devtools.

@jepusto
Copy link

jepusto commented Apr 8, 2014

Ah thanks.

@hadley
Copy link
Owner

hadley commented Mar 30, 2015

I think this is fixed in the main branch now

@hadley hadley closed this as completed Mar 30, 2015
@bornakke
Copy link

The bug seem to be back in plyr version 1.8.3. @renkun-ken code and similar request yields the same two warnings.

cl <- makeCluster(2)
registerDoParallel(cl)

l1 <- as.list(1:1000)
require(plyr)
d2 <- ldply(l1,function(i) {
return(data.frame(x=rnorm(100)+i))
}, .parallel=T)

stopCluster(cl)

@buggythepirate
Copy link

It didn't work in plyr version 1.8.2 either. I tried both doSnow and doParallel both showed the same message as above.

@achetverikov
Copy link

Still does not work in plyr 1.8.3

@Nosferican
Copy link

Still does not work in plyr 1.8.4

@touala
Copy link

touala commented Nov 22, 2016

I don't have too much time to look more closely but I saw two distinct behavior

library(doParallel)
library(doSNOW)
library(plyr)
df <- data.frame(val=1:10, ind=c(rep(2, 5), rep(3, 5)))
registerDoParallel(2)
system.time(print(ddply(df, .(ind), function(x) { Sys.sleep(2); sum(x) }, .parallel=TRUE)))
registerDoSEQ()

Works properly.

cl <- makeCluster(2, type="SOCK")
registerDoParallel(cl)
system.time(print(ddply(df, .(ind), function(x) { Sys.sleep(2); sum(x) }, .parallel=TRUE)))
Warning messages:
1: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
2: <anonymous>: ... may be used in an incorrect context: ‘.fun(piece, ...)’
stopCluster(cl)

Return a warning. Hope it will help.

@Deleetdk
Copy link

@hadley @krlmlr Bug still present here 5 years later (still same version 1.8.4). Is the plan that you want for a community pull request or how do we get this fixed? IMO plyr is superior to dplyr for some tasks (e.g. ddply with 2 groups), so I rely on some plyr calls and get this warning.

bschmalbach added a commit to bschmalbach/ezCutoffs that referenced this issue Nov 20, 2019
l.263: removed ellipsis in estimation() function
l.284: suppressWarnings() in parallel estimation (seems to be a known bug with other packages too: cf. hadley/plyr#203) <- may not be the most elegant solution. Do you two have other ideas?
l. 362-4: simulation stats now outputs estimation method used in the simulations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants