Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

across(): Function evaluation seems to happen with no context / wrong context? #1373

Closed
lschneiderbauer opened this issue Oct 5, 2023 · 3 comments
Milestone

Comments

@lschneiderbauer
Copy link

lschneiderbauer commented Oct 5, 2023

Hello,

Using across() together with a function factory that produces functions with some context information seems to produce incorrect SQL code.

The reprex below also contains the dplyr version where everything works as expected: across() evaluates the functions within the correct context (in this case the variable exp is from the context). Using a lazy table exp is not evaluated at all, and so all functions produce the same (incorrect) output.

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)
library(dbplyr)
#> 
#> Attache Paket: 'dbplyr'
#> Die folgenden Objekte sind maskiert von 'package:dplyr':
#> 
#>     ident, sql

power1 <- function(exp) {
  force(exp)
  function(x) {
    x^exp
  }
}

fn_lst <-
  map(
    1:4,
    function(exp) {
      force(exp)
      function(x) {
        x^exp
      }
    }
  )

###################################
# Data frame works:

df <- tibble(x = 2)
df |>
  summarize(
    across(x, .fns = fn_lst)
  )
#> # A tibble: 1 × 4
#>     x_1   x_2   x_3   x_4
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     2     4     8    16

###################################
# Lazy table does not work:


ldf <- tbl_lazy(df, dbplyr::simulate_hana())

ldf |>
  summarize(
    across(x, .fns = fn_lst)
  )
#> <SQL>
#> SELECT
#>   POWER(`x`, `exp`) AS `x_1`,
#>   POWER(`x`, `exp`) AS `x_2`,
#>   POWER(`x`, `exp`) AS `x_3`,
#>   POWER(`x`, `exp`) AS `x_4`
#> FROM `df`


# I expect:
# <SQL>
# SELECT
#   POWER(`x`, 1) AS `x_1`,
#   POWER(`x`, 2) AS `x_2`,
#   POWER(`x`, 3) AS `x_3`,
#   POWER(`x`, 4) AS `x_4`
# FROM `df`

Created on 2023-10-05 with reprex v2.0.2

@mgirlich mgirlich added this to the 2.5.0 milestone Oct 27, 2023
@hadley
Copy link
Member

hadley commented Nov 2, 2023

Somewhat more minimal reprex:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

power1 <- function(exp) {
  force(exp)
  function(x) x^exp
}
fns <- lapply(1:4, power1)

df <- lazy_frame(x = 1:10)
df |> summarize(across(x, fns))
#> <SQL>
#> SELECT
#>   POWER(`x`, `exp`) AS `x_1`,
#>   POWER(`x`, `exp`) AS `x_2`,
#>   POWER(`x`, `exp`) AS `x_3`,
#>   POWER(`x`, `exp`) AS `x_4`
#> FROM `df`

Created on 2023-11-02 with reprex v2.0.2

I'm not sure dbplyr can do much here since it would require evaluating the bodies of functions in a way that I suspect is going to be very complex.

Instead of relying on function scoping, you could instead generate functions that inline the value of exp, e.g.

library(rlang)
library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

power1 <- function(exp) {
  inject(function(x) x ^ !!exp)
}
fns <- lapply(1:4, power1)

df <- lazy_frame(x = 1:10)
df |> summarize(across(x, fns))
#> <SQL>
#> SELECT
#>   POWER(`x`, 1) AS `x_1`,
#>   POWER(`x`, 2) AS `x_2`,
#>   POWER(`x`, 3) AS `x_3`,
#>   POWER(`x`, 4) AS `x_4`
#> FROM `df`

Created on 2023-11-02 with reprex v2.0.2

The downsidse of this technique is the generated functions don't have useful srcrefs so they don't look correct when you print them. I've filed r-lib/rlang#1665 to see if we can do better in rlang.

@hadley hadley closed this as completed Nov 2, 2023
@lschneiderbauer
Copy link
Author

lschneiderbauer commented Nov 6, 2023

Thank you @hadley for your workaround.

Just one other thing that bothers me: If I inline fns in your reprex I get an error, see my reprex. Shouldn't the outcome of those two versions be completely equivalent?

library(rlang)
library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

df <- lazy_frame(x = 1:10)
df |>
  summarize(
    across(
      x,
      lapply(
        1:4,
        function(exp) {
          inject(function(x) x^!!exp)
        }
      )
    )
  )
#> Error in UseMethod("escape"): nicht anwendbare Methode für 'escape' auf Objekt der Klasse "function" angewendet

Created on 2023-11-06 with reprex v2.0.2

@hadley
Copy link
Member

hadley commented Nov 6, 2023

The problem is when !! gets evaluated; if you embed it inside summarise(), it attempts to do the unquoting before inject() sees it. We hope to one day fix this (r-lib/rlang#845).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants