Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Efficient Cleanup #65

Closed
2 of 6 tasks
shikokuchuo opened this issue Apr 7, 2023 · 9 comments
Closed
2 of 6 tasks

More Efficient Cleanup #65

shikokuchuo opened this issue Apr 7, 2023 · 9 comments
Assignees

Comments

@shikokuchuo
Copy link
Contributor

Prework

Description

The initial state is obtained and then reset after each task in your crew evaluation wrapper.

Getting the initial state each time is not required.

More efficient solution is to let mirai get this once per daemon and do the cleanup after each task.

  1. Remove Global variables
  2. Unload packages
  3. Reset options
  4. Call GC

I know I was not keen on being able to pick and choose, but I am prepared to offer flexibility for each of the above through the cleanup argument being an integer number (operating a bitmask). This will make it too obscure for most users rather than drawing attention to different cleanup options, which was my initial concern.

Are the above the only operations you need?

@wlandau
Copy link
Owner

wlandau commented Apr 7, 2023

Thanks for the suggestion. Your comments nudged me to profile crew_eval(), and recent commits make it faster. Now if I run the following small profiling study:

library(crew)
library(proffer)
command <- quote(a + b)
data <- list(a = 1)
globals <- list(b = 2)
pprof(
  replicate(
    1e3,
    crew_eval(command = command, data = data, globals = globals)
  )
)

the flame graph looks like this:

Screenshot 2023-04-07 at 9 43 28 AM

So it looks like the bottleneck now is indeed setting and restoring global options. I think relying on mirai for global options specifically would be a demonstrable performance gain, and if feasible, I will have crew always leverage this. After that, it would be nice to let the user separately choose whether to unload packages and whether to call GC, that would simplify the signatures of both crew_eval() and controller$push().

@wlandau
Copy link
Owner

wlandau commented Apr 7, 2023

If you go forward with the bitmask approach for cleanup, I will add logical arguments garbage_collection and unload_packages to the crew launcher abstract class and then create the value of the bitmask internally.

@wlandau
Copy link
Owner

wlandau commented Apr 7, 2023

I also wonder if you think there may be a faster/native way to restore the RNG state after each task. Currently I am using withr::local_seed(), which is robust, but it shows up on the profiling studies. If I could drop it, then it could be a bit of a performance boost, and I could drop the dependency on withr.

@shikokuchuo
Copy link
Contributor Author

shikokuchuo commented Apr 8, 2023

If you go forward with the bitmask approach for cleanup, I will add logical arguments garbage_collection and unload_packages to the crew launcher abstract class and then create the value of the bitmask internally.

Implemented in mirai 0.8.2.9019. It's a simple additive bitmask, you simply assign the options values 1,2,4,8 and sum them.

I am not entirely sure this works for you as resetting options without unloading packages could be an issue - mirai can only set the options to an initial state before any user packages have been loaded.

Nevertheless, it doesn't hurt to offer the flexibility - there is enough of a disclaimer in the docs.

@shikokuchuo
Copy link
Contributor Author

shikokuchuo commented Apr 8, 2023

I also wonder if you think there may be a faster/native way to restore the RNG state after each task. Currently I am using withr::local_seed(), which is robust, but it shows up on the profiling studies. If I could drop it, then it could be a bit of a performance boost, and I could drop the dependency on withr.

I've had a look at your use of withr::local_seed() and I am wondering why you need to restore the RNG state after each eval. Otherwise you would be able to just use set.seed().

The default behaviour of mirai is to actually change the RNG state after each eval as it removes .Random.seed as part of the globals cleanup.

wlandau-lilly added a commit that referenced this issue Apr 10, 2023
wlandau-lilly added a commit that referenced this issue Apr 10, 2023
@HenrikBengtsson
Copy link

FYI, be careful when resetting R options.

There are cases where removing an option wreaks havoc. For example, in R (< 4.3.0), there were several built-in R options that could not be reset (https://bugs.r-project.org/show_bug.cgi?id=18372). There are also packages out there that rely on their options not being removed, e.g. truecluster/ff#14.

This is why I rolled back automatic setting of R options in future a while back (https://github.com/HenrikBengtsson/future/blob/f7c11acb4d2260e882dcd3b392b905c82709d5ae/R/expressions.R#L32-L50) - the risk of breaking things is simply too great.

PS. It feels like I've already mentioned this at some point, but in case I just made that memory up, I'm posting this comment.

@wlandau
Copy link
Owner

wlandau commented May 2, 2023

Thanks for the note and the examples. Thankfully @shikokuchuo also warned me about this and so crew has a note in the docs.

crew/R/crew_launcher.R

Lines 48 to 51 in ace596d

#' @param reset_options `TRUE` to reset global options to their original
#' state between each task, `FALSE` otherwise. It is recommended to
#' only set `reset_options = TRUE` if `reset_packages` is also `TRUE`
#' because packages sometimes rely on options they set at loading time.

I am not sure built-in base options would be affected because the original state should already have the right values, but maybe I am missing something.

From a package development perspective, it seems like good practice to gracefully handle missing options (or not rely on them at all; targets::tar_option_set() modifies an internal mutable object in the package, and tar_option_get() handles nulls). But I do recognize that many packages do this anyway and it is out of my control.

@HenrikBengtsson
Copy link

I am not sure built-in base options would be affected because the original state should already have the right values, but maybe I am missing something.

Correct, but only since R 4.3.0.

@shikokuchuo
Copy link
Contributor Author

shikokuchuo commented May 3, 2023

Also just to note on this, options cleanup in mirai is light-touch. It only reverts the options that are present at the beginning of the R session. So if a package adds additional options that it relies on, those are actually not removed as part of the cleanup, reducing the chances of breakages in this respect even if the note is not heeded.

The implied assumption is that any new options set by packages would be specific to that package - which again isn't enforceable, but should be true in general - any conflict would be no different to if both namespaces were loaded at the same time, so I contend nothing to be done specifically here.

Also, it should not run into the R (< 4.3.0) bug as it would not be setting any options to NULL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants