More Efficient Cleanup #65

shikokuchuo · 2023-04-07T08:36:29Z

Prework

Read and agree to the Contributor Code of Conduct and contributing guidelines.
If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
Post a minimal reproducible example like this one so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- Readable: format your code according to the tidyverse style guide.

Description

The initial state is obtained and then reset after each task in your crew evaluation wrapper.

Getting the initial state each time is not required.

More efficient solution is to let mirai get this once per daemon and do the cleanup after each task.

Remove Global variables
Unload packages
Reset options
Call GC

I know I was not keen on being able to pick and choose, but I am prepared to offer flexibility for each of the above through the cleanup argument being an integer number (operating a bitmask). This will make it too obscure for most users rather than drawing attention to different cleanup options, which was my initial concern.

Are the above the only operations you need?

wlandau · 2023-04-07T13:51:15Z

Thanks for the suggestion. Your comments nudged me to profile crew_eval(), and recent commits make it faster. Now if I run the following small profiling study:

library(crew)
library(proffer)
command <- quote(a + b)
data <- list(a = 1)
globals <- list(b = 2)
pprof(
  replicate(
    1e3,
    crew_eval(command = command, data = data, globals = globals)
  )
)

the flame graph looks like this:

So it looks like the bottleneck now is indeed setting and restoring global options. I think relying on mirai for global options specifically would be a demonstrable performance gain, and if feasible, I will have crew always leverage this. After that, it would be nice to let the user separately choose whether to unload packages and whether to call GC, that would simplify the signatures of both crew_eval() and controller$push().

wlandau · 2023-04-07T13:53:25Z

If you go forward with the bitmask approach for cleanup, I will add logical arguments garbage_collection and unload_packages to the crew launcher abstract class and then create the value of the bitmask internally.

wlandau · 2023-04-07T13:59:22Z

I also wonder if you think there may be a faster/native way to restore the RNG state after each task. Currently I am using withr::local_seed(), which is robust, but it shows up on the profiling studies. If I could drop it, then it could be a bit of a performance boost, and I could drop the dependency on withr.

shikokuchuo · 2023-04-08T08:59:43Z

If you go forward with the bitmask approach for cleanup, I will add logical arguments garbage_collection and unload_packages to the crew launcher abstract class and then create the value of the bitmask internally.

Implemented in mirai 0.8.2.9019. It's a simple additive bitmask, you simply assign the options values 1,2,4,8 and sum them.

I am not entirely sure this works for you as resetting options without unloading packages could be an issue - mirai can only set the options to an initial state before any user packages have been loaded.

Nevertheless, it doesn't hurt to offer the flexibility - there is enough of a disclaimer in the docs.

shikokuchuo · 2023-04-08T18:11:19Z

I also wonder if you think there may be a faster/native way to restore the RNG state after each task. Currently I am using withr::local_seed(), which is robust, but it shows up on the profiling studies. If I could drop it, then it could be a bit of a performance boost, and I could drop the dependency on withr.

I've had a look at your use of withr::local_seed() and I am wondering why you need to restore the RNG state after each eval. Otherwise you would be able to just use set.seed().

The default behaviour of mirai is to actually change the RNG state after each eval as it removes .Random.seed as part of the globals cleanup.

HenrikBengtsson · 2023-05-02T03:27:51Z

FYI, be careful when resetting R options.

There are cases where removing an option wreaks havoc. For example, in R (< 4.3.0), there were several built-in R options that could not be reset (https://bugs.r-project.org/show_bug.cgi?id=18372). There are also packages out there that rely on their options not being removed, e.g. truecluster/ff#14.

This is why I rolled back automatic setting of R options in future a while back (https://github.com/HenrikBengtsson/future/blob/f7c11acb4d2260e882dcd3b392b905c82709d5ae/R/expressions.R#L32-L50) - the risk of breaking things is simply too great.

PS. It feels like I've already mentioned this at some point, but in case I just made that memory up, I'm posting this comment.

wlandau · 2023-05-02T20:30:50Z

Thanks for the note and the examples. Thankfully @shikokuchuo also warned me about this and so crew has a note in the docs.

crew/R/crew_launcher.R

Lines 48 to 51 in ace596d

    
           #' @param reset_options `TRUE` to reset global options to their original 
        
           #'   state between each task, `FALSE` otherwise. It is recommended to 
        
           #'   only set `reset_options = TRUE` if `reset_packages` is also `TRUE` 
        
           #'   because packages sometimes rely on options they set at loading time.

I am not sure built-in base options would be affected because the original state should already have the right values, but maybe I am missing something.

From a package development perspective, it seems like good practice to gracefully handle missing options (or not rely on them at all; targets::tar_option_set() modifies an internal mutable object in the package, and tar_option_get() handles nulls). But I do recognize that many packages do this anyway and it is out of my control.

HenrikBengtsson · 2023-05-02T20:39:46Z

I am not sure built-in base options would be affected because the original state should already have the right values, but maybe I am missing something.

Correct, but only since R 4.3.0.

shikokuchuo · 2023-05-03T09:57:29Z

Also just to note on this, options cleanup in mirai is light-touch. It only reverts the options that are present at the beginning of the R session. So if a package adds additional options that it relies on, those are actually not removed as part of the cleanup, reducing the chances of breakages in this respect even if the note is not heeded.

The implied assumption is that any new options set by packages would be specific to that package - which again isn't enforceable, but should be true in general - any conflict would be no different to if both namespaces were loaded at the same time, so I contend nothing to be done specifically here.

Also, it should not run into the R (< 4.3.0) bug as it would not be setting any options to NULL.

shikokuchuo added the topic: performance label Apr 7, 2023

shikokuchuo assigned wlandau Apr 7, 2023

wlandau-lilly added a commit that referenced this issue Apr 10, 2023

Most of the work for #65

020a3ec

wlandau-lilly added a commit that referenced this issue Apr 10, 2023

Fix #65

c79f952

wlandau-lilly closed this as completed in c954c32 Apr 10, 2023

shikokuchuo mentioned this issue Apr 27, 2023

Replace withr::local_seed() with set.seed() #67

Closed

6 tasks

shikokuchuo mentioned this issue Oct 2, 2023

Support a character vector in daemon(cleanup = ) shikokuchuo/mirai#79

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Efficient Cleanup #65

More Efficient Cleanup #65

shikokuchuo commented Apr 7, 2023

wlandau commented Apr 7, 2023 •

edited

Loading

wlandau commented Apr 7, 2023

wlandau commented Apr 7, 2023

shikokuchuo commented Apr 8, 2023 •

edited

Loading

shikokuchuo commented Apr 8, 2023 •

edited

Loading

HenrikBengtsson commented May 2, 2023

wlandau commented May 2, 2023

HenrikBengtsson commented May 2, 2023

shikokuchuo commented May 3, 2023 •

edited

Loading

More Efficient Cleanup #65

More Efficient Cleanup #65

Comments

shikokuchuo commented Apr 7, 2023

Prework

Description

wlandau commented Apr 7, 2023 • edited Loading

wlandau commented Apr 7, 2023

wlandau commented Apr 7, 2023

shikokuchuo commented Apr 8, 2023 • edited Loading

shikokuchuo commented Apr 8, 2023 • edited Loading

HenrikBengtsson commented May 2, 2023

wlandau commented May 2, 2023

HenrikBengtsson commented May 2, 2023

shikokuchuo commented May 3, 2023 • edited Loading

wlandau commented Apr 7, 2023 •

edited

Loading

shikokuchuo commented Apr 8, 2023 •

edited

Loading

shikokuchuo commented Apr 8, 2023 •

edited

Loading

shikokuchuo commented May 3, 2023 •

edited

Loading