Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling .traceback() within future_Map function call can slow things down terribly depending on data size #92

Open
arunsrinivasan opened this issue Aug 27, 2021 · 0 comments

Comments

@arunsrinivasan
Copy link

I recently implemented a new feature to my logger that requires the use of .traceback to fetch the function name and line number that generated that logger function call. A while later, I realised this was slowing things down drastically when run sequentially or in parallel with future_Map function. I am not sure if this has the same effect on other future_* functions (e.g., lapply).

Took me a while to figure out and come up with a small example, but here it is:

require(future)
require(future.apply)

myDF <- function(...) {
    t <- system.time(x <- .traceback(x=1))[["elapsed"]]
    cat("time: ", t, ", length: ", length(x), ", size_mb: ", object.size(x)/1024/1024, "\n", sep="")
    # print(x) # prints the entire call stack where the data is completely populated when called from future_Map
    base::data.frame(...)
}
 
ll <- replicate(3, sample(10, 1e6, TRUE), simplify=FALSE)
system.time(Map(function(x, y) myDF(x, y), ll, ll))
# time: 0, length: 6, size_mb: 0.002937317
# time: 0, length: 6, size_mb: 0.002937317
# time: 0, length: 6, size_mb: 0.002937317
#    user  system elapsed
#    0.15    0.00    0.16

system.time(future_Map(function(x, y) myDF(x, y), ll, ll))
# time: 4.24, length: 27, size_mb: 53.27061
# time: 4.15, length: 27, size_mb: 53.27061
# time: 4.18, length: 27, size_mb: 53.27061
#    user  system elapsed
#   13.01    0.03   13.05

If you uncomment the print(x) statement and run them, you'll see the difference in the way the call is populated. This is probably because of the way you use do.call() in generating the mapply function call: do.call(mapply, args=args). Here args, every element of args gets evaluated and are therefore not just calls/expressions, but materialised objects, I suspect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant