Preserve classes in pmap() #554

mikmart · 2018-09-26T19:43:46Z

Following this suggestion, changed the calls constructed by pmap() to use two double brackets for indexing rather than a single double bracket with a vector argument. This retains class for S3 atomic vector types (whose [[ method preserves the class) in the input. Fixes #358.

However, list-based S3 classes, e.g. POSIXlt still fail.

mungojam · 2018-09-27T06:51:24Z

Does this cover pwalk too?

mikmart · 2018-09-27T06:54:36Z

@mungojam not as it is, as (I think) the problem with pwalk() comes from the implementation of transpose(). But maybe this should, I'll have a look.

mikmart · 2018-09-27T07:28:30Z

Okay I don't have any ideas on fixing transpose() and thus pwalk() directly. However, if we just rewrite pwalk() using pmap() (latest commit) it should also be fixed.

@hadley do you think it's OK to change pwalk() to call pmap()? Looks like they have been separate from the beginning but I don't see why.

hadley · 2018-09-27T12:03:12Z

src/map.c

@@ -174,7 +174,7 @@ SEXP pmap_impl(SEXP env, SEXP l_name_, SEXP f_name_, SEXP type_) {
  SEXP i = Rf_install("i");
  SEXP one = PROTECT(Rf_ScalarInteger(1));

-  // Construct call like f(.x[[c(1, i)]], .x[[c(2, i)]], ...)
+  // Construct call like f(.x[[1]][[i]], .x[[2]][[i]], ...)


Can you add a comment here saying that x[[c(1, i)]] is buggy for S3 vectors

hadley · 2018-09-27T12:04:02Z

tests/testthat/test-map_n.R

@@ -60,3 +60,17 @@ test_that("pmap on data frames performs rowwise operations", {
  expect_is(pmap_chr(mtcars2, paste), "character")
  expect_is(pmap_raw(mtcars2, function(mpg, cyl) as.raw(cyl)), "raw")
 })
+
+test_that("preserves S3 class", {
+  fctr <- factor("a")


I think you only need to test a single S3 class here, since the other options don't cause the tests to exercise any more code.

hadley · 2018-09-27T12:06:27Z

The downside of using the same code is that it makes pwalk() less efficient than it could be, because it has to initialise and store the results of each call, even when that isn't necessary. But I could probably be persuaded if you did a little benchmarking showing that had hardly any affect.

OTOH I think transpose() will need a complete rewrite, once I've fully internalised the lessons from vctrs, so it's also ok to leave as is for now, and we'll fix more holistically later.

mikmart · 2018-09-27T19:18:07Z

Oh yeah I see now, thanks for the explanation! I'll do some benchmarks to see what the hit would be like.

mikmart · 2018-09-28T07:06:14Z

@hadley here's one benchmark (with a contrived use-case). Looks like using pmap() is actually a little faster:

library(purrr)

pwalk2 <- function(.l, .f, ...) {
  pmap(.l, .f, ...)
  invisible(.l)
}

res <- bench::press(
  i = 0:6,
  {
    args <- list(n = rep(10^(6 - i), 10^i))
    bench::mark(
      pwalk(args, runif),
      pwalk2(args, runif)
    )
  }
)

res
#> # A tibble: 14 x 15
#>    expression     i      min     mean   median      max `itr/sec` mem_alloc
#>  * <chr>      <int> <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt>
#>  1 pwalk(arg~     0  29.11ms  29.77ms  29.39ms  32.86ms    33.6      7.63MB
#>  2 pwalk2(ar~     0  29.14ms  31.42ms  29.52ms  57.91ms    31.8      7.63MB
#>  3 pwalk(arg~     1  29.19ms  30.18ms  30.28ms  33.13ms    33.1      7.65MB
#>  4 pwalk2(ar~     1  28.62ms  29.35ms  29.02ms  31.57ms    34.1      7.65MB
#>  5 pwalk(arg~     2   31.6ms  32.55ms  32.02ms  36.14ms    30.7      7.88MB
#>  6 pwalk2(ar~     2  28.64ms  29.25ms  28.94ms  31.24ms    34.2      7.88MB
#>  7 pwalk(arg~     3  32.46ms  33.03ms  32.81ms  35.63ms    30.3     10.12MB
#>  8 pwalk2(ar~     3  30.42ms  31.07ms  30.93ms  32.99ms    32.2     10.12MB
#>  9 pwalk(arg~     4  61.77ms  62.77ms  62.77ms  63.77ms    15.9     32.54MB
#> 10 pwalk2(ar~     4  47.58ms  49.49ms  48.24ms  56.41ms    20.2      32.5MB
#> 11 pwalk(arg~     5 560.56ms 560.56ms 560.56ms 560.56ms     1.78   244.52MB
#> 12 pwalk2(ar~     5 233.72ms 247.71ms 252.81ms  256.6ms     4.04   244.14MB
#> 13 pwalk(arg~     6    4.74s    4.74s    4.74s    4.74s     0.211    2.39GB
#> 14 pwalk2(ar~     6    2.36s    2.36s    2.36s    2.36s     0.423    2.38GB
#> # ... with 7 more variables: n_gc <dbl>, n_itr <int>, total_time <bch:tm>,
#> #   result <list>, memory <list>, time <list>, gc <list>

Real use-cases would probably include I/O operations, where I would imagine the time taken to interact with the output device would dominate the results anyway; so I didn't continue with testing as it looks like the pmap() implementation is noticeably faster in this simple case already.

(I have to say though, I find the memory allocation results quite strange.. Considering that each run has a fixed number (1e6) of doubles being created. The larger lists with larger i of course account for something, but a 300x increase..?)

mikmart · 2018-09-28T09:50:38Z

Okay I also tried some I/O by saving plots. Very little difference here, too:

library(purrr)

pwalk2 <- function(.l, .f, ...) {
  pmap(.l, .f, ...)
  invisible(.l)
}

f <- function(file, n) {
  pdf(file)
  plot(runif(n))
  on.exit({dev.off(); unlink(file)})
}

bench::press(
  i = 0:3,
  {
    n <- rep(10^(4 - i), 10^i)
    tf <- map_chr(seq_along(n), ~ tempfile())
    args <- list(file = tf, n = n)
    bench::mark(
      pwalk(args, f),
      pwalk2(args, f)
    )
  }
)
#> Running with:
#>       i
#> 1     0
#> 2     1
#> 3     2
#> 4     3
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 8 x 11
#>   expression     i      min     mean   median      max `itr/sec` mem_alloc
#>   <chr>      <int> <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt>
#> 1 pwalk(arg~     0  35.89ms  40.62ms  38.12ms   57.5ms    24.6      1.56MB
#> 2 pwalk2(ar~     0  34.39ms  36.94ms  35.98ms  41.56ms    27.1    807.95KB
#> 3 pwalk(arg~     1  53.31ms  58.11ms     58ms  63.17ms    17.2      1.41MB
#> 4 pwalk2(ar~     1  52.56ms  54.72ms  54.18ms  60.73ms    18.3      1.41MB
#> 5 pwalk(arg~     2 299.54ms 299.54ms 299.54ms 299.54ms     3.34     7.88MB
#> 6 pwalk2(ar~     2  280.8ms  280.8ms  280.8ms  280.8ms     3.56     7.88MB
#> 7 pwalk(arg~     3     2.4s     2.4s     2.4s     2.4s     0.416   71.37MB
#> 8 pwalk2(ar~     3    2.43s    2.43s    2.43s    2.43s     0.412   71.36MB
#> # ... with 3 more variables: n_gc <dbl>, n_itr <int>, total_time <bch:tm>

Created on 2018-09-28 by the reprex package (v0.2.0.9000).

hadley · 2018-09-28T13:53:06Z

Ok, that's compelling!

Do you want to change the walk2() and walk() to wrappers around the map functions in order to be consistent?

mikmart · 2018-09-28T16:38:11Z

Absolutely. Should be good now.

lionel- · 2018-09-28T16:42:37Z

This will make walk() use more memory when the function produces large objects, but for () loops can be used in that case as well, or the function can be changed to avoid returning the large object.

mikmart · 2018-09-28T17:22:25Z

Ah yeah the benchmark memory allocation numbers don't tell the whole story here as they don't show peak memory demand, just the total allocated.

mikmart · 2018-11-04T20:51:14Z

Perhaps the walk() changes should be decoupled from this PR? I could revert this to just include the pmap() fix.

hadley

I think the benefits of using the same code for pwalk() and pmap() outweight the small downside.

Constructing a call like .l[[c(j, i)]] drops attributes while .l[[j]][[i]] preserves them. Write pwalk() and walk() in terms of pmap() and map()

lionel- · 2018-11-15T16:44:43Z

Thanks!

mikmart force-pushed the pmap branch from 342f4dc to 6888e65 Compare September 27, 2018 07:00

hadley reviewed Sep 27, 2018

View reviewed changes

cderv mentioned this pull request Oct 9, 2018

pmap over factors? #561

Closed

hadley approved these changes Nov 5, 2018

View reviewed changes

Use two double brackets in pmap constructed calls

1b5b534

Constructing a call like .l[[c(j, i)]] drops attributes while .l[[j]][[i]] preserves them. Write pwalk() and walk() in terms of pmap() and map()

lionel- force-pushed the pmap branch from 17c2e81 to 1b5b534 Compare November 15, 2018 16:24

lionel- merged commit fe5ee0c into tidyverse:master Nov 15, 2018

lionel- mentioned this pull request Nov 16, 2018

walk2 unclasses dates but other functions don't #524

Closed

lionel- mentioned this pull request Jan 4, 2019

Unexpected behavior when using pmap with factors #611

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve classes in pmap() #554

Preserve classes in pmap() #554

mikmart commented Sep 26, 2018 •

edited

Loading

mungojam commented Sep 27, 2018

mikmart commented Sep 27, 2018 •

edited

Loading

mikmart commented Sep 27, 2018 •

edited

Loading

hadley Sep 27, 2018

hadley Sep 27, 2018

hadley commented Sep 27, 2018

mikmart commented Sep 27, 2018 •

edited

Loading

mikmart commented Sep 28, 2018 •

edited

Loading

mikmart commented Sep 28, 2018 •

edited

Loading

hadley commented Sep 28, 2018

mikmart commented Sep 28, 2018

lionel- commented Sep 28, 2018

mikmart commented Sep 28, 2018 •

edited

Loading

mikmart commented Nov 4, 2018

hadley left a comment

lionel- commented Nov 15, 2018

Preserve classes in pmap() #554

Preserve classes in pmap() #554

Conversation

mikmart commented Sep 26, 2018 • edited Loading

mungojam commented Sep 27, 2018

mikmart commented Sep 27, 2018 • edited Loading

mikmart commented Sep 27, 2018 • edited Loading

hadley Sep 27, 2018

Choose a reason for hiding this comment

hadley Sep 27, 2018

Choose a reason for hiding this comment

hadley commented Sep 27, 2018

mikmart commented Sep 27, 2018 • edited Loading

mikmart commented Sep 28, 2018 • edited Loading

mikmart commented Sep 28, 2018 • edited Loading

hadley commented Sep 28, 2018

mikmart commented Sep 28, 2018

lionel- commented Sep 28, 2018

mikmart commented Sep 28, 2018 • edited Loading

mikmart commented Nov 4, 2018

hadley left a comment

Choose a reason for hiding this comment

lionel- commented Nov 15, 2018

mikmart commented Sep 26, 2018 •

edited

Loading

mikmart commented Sep 27, 2018 •

edited

Loading

mikmart commented Sep 27, 2018 •

edited

Loading

mikmart commented Sep 27, 2018 •

edited

Loading

mikmart commented Sep 28, 2018 •

edited

Loading

mikmart commented Sep 28, 2018 •

edited

Loading

mikmart commented Sep 28, 2018 •

edited

Loading