finer control over future_lapply() #60

MLopez-Ibanez · 2020-06-24T15:49:25Z

I'd like to implement the following using futures, but it doesn't seem possible yet?

Apply a function over a list of objects and get a list of futures. The tasks will start running immediately up to the number of workers and, if there are more tasks than workers, the remainder are queued for running using load-balancing.
Be able to iterate over the list of futures and check if the future is resolved, running, or queued.
Be able to cancel futures that are running or queued. When a future is cancelled, the next one in the queue starts executing.

HenrikBengtsson · 2020-06-25T00:36:16Z

Apply a function over a list of objects and get a list of futures. ...

This is by design. The future.apply API mimics the base R "apply" API as far as possible - but neither more or less than that. So from the "outside", the only difference the developer sees is that the functions starts with a future_ prefix. This way there are no surprises what the future.apply package is meant to do.

Now, I do mention in the README under 'Roadmap' that:

Consider additional future_*apply() functions and features that fit in this package but don't necessarily have a corresponding function in base R. Examples of this may be "apply" functions that return futures rather than values, mechanisms for benchmarking, and richer control over load balancing.

This is also touched upon in Issue #32 and Issue #44, and possibly elsewhere too. However, it's far from obvious what such an API should look like and what it should support or not. It might also be better suited for another package. There's a risk of opening up the current API with features not existing in base R, e.g. it might be confusing and the existing API might be used in the wrong way. I see with with just future()/value() and %<-% where people attempt to do to y %<-% future(...) and end up in an trial'n'error mess.

You can always do:

fs <- lapply(X, FUN = function(x) future({
  ...
}))

to create your own futures. This wouldn't give you chunking ("load balancing") - you'd get one future per element in X. You could hack together some approach where you use chunks <- future_lapply(seq_along(X), FUN = function(idxs) { ... }) to figure out what the chunks are and what .Random.seed each element should be that's rather tedious.

To build your own map-reduce functions for future will be much easier when the future.chunks package is available. This is mentioned in Issue #59. But it's be a while before I get some solid to work on that.

Be able to cancel futures ...

Termination of futures is currently not supported by the Future API. This is something that needs to be implemented in the future package before anything can be done higher up. Getting a consistent API for terminating futures is not easy because it depends on the backend used. Such a feature most likely have to be optional, i.e. it might or might not work depending on backend and context. This further complicates how it can be used in cases like you propose. See futureverse/future#93 for more details.

HenrikBengtsson added the feature request label Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finer control over future_lapply() #60

finer control over future_lapply() #60

MLopez-Ibanez commented Jun 24, 2020

HenrikBengtsson commented Jun 25, 2020

finer control over future_lapply() #60

finer control over future_lapply() #60

Comments

MLopez-Ibanez commented Jun 24, 2020

HenrikBengtsson commented Jun 25, 2020