-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve classes in pmap() #554
Conversation
Does this cover |
@mungojam not as it is, as (I think) the problem with |
Okay I don't have any ideas on fixing @hadley do you think it's OK to change |
src/map.c
Outdated
@@ -174,7 +174,7 @@ SEXP pmap_impl(SEXP env, SEXP l_name_, SEXP f_name_, SEXP type_) { | |||
SEXP i = Rf_install("i"); | |||
SEXP one = PROTECT(Rf_ScalarInteger(1)); | |||
|
|||
// Construct call like f(.x[[c(1, i)]], .x[[c(2, i)]], ...) | |||
// Construct call like f(.x[[1]][[i]], .x[[2]][[i]], ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here saying that x[[c(1, i)]]
is buggy for S3 vectors
tests/testthat/test-map_n.R
Outdated
@@ -60,3 +60,17 @@ test_that("pmap on data frames performs rowwise operations", { | |||
expect_is(pmap_chr(mtcars2, paste), "character") | |||
expect_is(pmap_raw(mtcars2, function(mpg, cyl) as.raw(cyl)), "raw") | |||
}) | |||
|
|||
test_that("preserves S3 class", { | |||
fctr <- factor("a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you only need to test a single S3 class here, since the other options don't cause the tests to exercise any more code.
The downside of using the same code is that it makes OTOH I think |
Oh yeah I see now, thanks for the explanation! I'll do some benchmarks to see what the hit would be like. |
@hadley here's one benchmark (with a contrived use-case). Looks like using library(purrr)
pwalk2 <- function(.l, .f, ...) {
pmap(.l, .f, ...)
invisible(.l)
}
res <- bench::press(
i = 0:6,
{
args <- list(n = rep(10^(6 - i), 10^i))
bench::mark(
pwalk(args, runif),
pwalk2(args, runif)
)
}
)
res
#> # A tibble: 14 x 15
#> expression i min mean median max `itr/sec` mem_alloc
#> * <chr> <int> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt>
#> 1 pwalk(arg~ 0 29.11ms 29.77ms 29.39ms 32.86ms 33.6 7.63MB
#> 2 pwalk2(ar~ 0 29.14ms 31.42ms 29.52ms 57.91ms 31.8 7.63MB
#> 3 pwalk(arg~ 1 29.19ms 30.18ms 30.28ms 33.13ms 33.1 7.65MB
#> 4 pwalk2(ar~ 1 28.62ms 29.35ms 29.02ms 31.57ms 34.1 7.65MB
#> 5 pwalk(arg~ 2 31.6ms 32.55ms 32.02ms 36.14ms 30.7 7.88MB
#> 6 pwalk2(ar~ 2 28.64ms 29.25ms 28.94ms 31.24ms 34.2 7.88MB
#> 7 pwalk(arg~ 3 32.46ms 33.03ms 32.81ms 35.63ms 30.3 10.12MB
#> 8 pwalk2(ar~ 3 30.42ms 31.07ms 30.93ms 32.99ms 32.2 10.12MB
#> 9 pwalk(arg~ 4 61.77ms 62.77ms 62.77ms 63.77ms 15.9 32.54MB
#> 10 pwalk2(ar~ 4 47.58ms 49.49ms 48.24ms 56.41ms 20.2 32.5MB
#> 11 pwalk(arg~ 5 560.56ms 560.56ms 560.56ms 560.56ms 1.78 244.52MB
#> 12 pwalk2(ar~ 5 233.72ms 247.71ms 252.81ms 256.6ms 4.04 244.14MB
#> 13 pwalk(arg~ 6 4.74s 4.74s 4.74s 4.74s 0.211 2.39GB
#> 14 pwalk2(ar~ 6 2.36s 2.36s 2.36s 2.36s 0.423 2.38GB
#> # ... with 7 more variables: n_gc <dbl>, n_itr <int>, total_time <bch:tm>,
#> # result <list>, memory <list>, time <list>, gc <list> Real use-cases would probably include I/O operations, where I would imagine the time taken to interact with the output device would dominate the results anyway; so I didn't continue with testing as it looks like the (I have to say though, I find the memory allocation results quite strange.. Considering that each run has a fixed number (1e6) of doubles being created. The larger lists with larger |
Okay I also tried some I/O by saving plots. Very little difference here, too: library(purrr)
pwalk2 <- function(.l, .f, ...) {
pmap(.l, .f, ...)
invisible(.l)
}
f <- function(file, n) {
pdf(file)
plot(runif(n))
on.exit({dev.off(); unlink(file)})
}
bench::press(
i = 0:3,
{
n <- rep(10^(4 - i), 10^i)
tf <- map_chr(seq_along(n), ~ tempfile())
args <- list(file = tf, n = n)
bench::mark(
pwalk(args, f),
pwalk2(args, f)
)
}
)
#> Running with:
#> i
#> 1 0
#> 2 1
#> 3 2
#> 4 3
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 8 x 11
#> expression i min mean median max `itr/sec` mem_alloc
#> <chr> <int> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt>
#> 1 pwalk(arg~ 0 35.89ms 40.62ms 38.12ms 57.5ms 24.6 1.56MB
#> 2 pwalk2(ar~ 0 34.39ms 36.94ms 35.98ms 41.56ms 27.1 807.95KB
#> 3 pwalk(arg~ 1 53.31ms 58.11ms 58ms 63.17ms 17.2 1.41MB
#> 4 pwalk2(ar~ 1 52.56ms 54.72ms 54.18ms 60.73ms 18.3 1.41MB
#> 5 pwalk(arg~ 2 299.54ms 299.54ms 299.54ms 299.54ms 3.34 7.88MB
#> 6 pwalk2(ar~ 2 280.8ms 280.8ms 280.8ms 280.8ms 3.56 7.88MB
#> 7 pwalk(arg~ 3 2.4s 2.4s 2.4s 2.4s 0.416 71.37MB
#> 8 pwalk2(ar~ 3 2.43s 2.43s 2.43s 2.43s 0.412 71.36MB
#> # ... with 3 more variables: n_gc <dbl>, n_itr <int>, total_time <bch:tm> Created on 2018-09-28 by the reprex package (v0.2.0.9000). |
Ok, that's compelling! Do you want to change the |
Absolutely. Should be good now. |
This will make |
Ah yeah the benchmark memory allocation numbers don't tell the whole story here as they don't show peak memory demand, just the total allocated. |
Perhaps the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the benefits of using the same code for pwalk()
and pmap()
outweight the small downside.
Constructing a call like .l[[c(j, i)]] drops attributes while .l[[j]][[i]] preserves them. Write pwalk() and walk() in terms of pmap() and map()
Thanks! |
Following this suggestion, changed the calls constructed by
pmap()
to use two double brackets for indexing rather than a single double bracket with a vector argument. This retains class for S3 atomic vector types (whose[[
method preserves the class) in the input. Fixes #358.However, list-based S3 classes, e.g.
POSIXlt
still fail.