Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tbl_sql() incorrectly fails via check_dots_used() with duckdb #1384

Closed
krlmlr opened this issue Oct 27, 2023 · 10 comments · Fixed by #1429
Closed

tbl_sql() incorrectly fails via check_dots_used() with duckdb #1384

krlmlr opened this issue Oct 27, 2023 · 10 comments · Fixed by #1429
Labels
bug an unexpected problem or unintended behavior

Comments

@krlmlr
Copy link
Member

krlmlr commented Oct 27, 2023

From https://github.com/tidyverse/dbplyr/pull/1222/files#r1374651419.

What are we missing?

options(conflicts.policy = list(warn = FALSE))
con <- DBI::dbConnect(duckdb::duckdb())

parquet <- tempfile(fileext = ".parquet")
arrow::write_parquet(data.frame(a = 1), parquet)

dplyr::tbl(con, parquet, check_from = FALSE, cache = TRUE)
#> Error in `tbl_sql()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • cache = cache
#> ℹ Did you misspell an argument name?
#> Backtrace:
#>      ▆
#>   1. ├─dplyr::tbl(con, parquet, check_from = FALSE, cache = TRUE)
#>   2. ├─duckdb:::tbl.duckdb_connection(...)
#>   3. ├─base::NextMethod("tbl")
#>   4. └─dplyr:::tbl.DBIConnection(con, parquet, check_from = FALSE, cache = TRUE)
#>   5.   ├─dplyr::tbl(...)
#>   6.   └─dbplyr:::tbl.src_dbi(...)
#>   7.     └─dbplyr::tbl_sql(c(subclass, "dbi"), src = src, from = from, ...)
#>   8.       └─rlang (local) `<fn>`()
#>   9.         └─rlang:::check_dots(env, error, action, call)
#>  10.           └─rlang:::action_dots(...)
#>  11.             ├─base (local) try_dots(...)
#>  12.             └─rlang (local) action(...)

DBI::dbDisconnect(con, shutdown = TRUE)

Created on 2023-10-27 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16)
#>  os       macOS Ventura 13.6
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Zurich
#>  date     2023-10-27
#>  pandoc   3.1.3 @ /opt/homebrew/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version  date (UTC) lib source
#>  arrow         13.0.0.1 2023-09-22 [1] RSPM
#>  assertthat    0.2.1    2019-03-21 [1] CRAN (R 4.3.0)
#>  bit           4.0.5    2022-11-15 [1] CRAN (R 4.3.0)
#>  bit64         4.0.5    2020-08-30 [1] CRAN (R 4.3.0)
#>  cli           3.6.1    2023-03-23 [1] CRAN (R 4.3.0)
#>  DBI           1.1.3    2022-06-18 [1] CRAN (R 4.3.0)
#>  dbplyr        2.4.0    2023-10-26 [1] CRAN (R 4.3.1)
#>  digest        0.6.33   2023-07-07 [1] RSPM
#>  dplyr         1.1.3    2023-09-03 [1] CRAN (R 4.3.0)
#>  duckdb        0.9.1    2023-10-13 [1] CRAN (R 4.3.1)
#>  evaluate      0.22     2023-09-29 [1] CRAN (R 4.3.1)
#>  fansi         1.0.5    2023-10-08 [1] CRAN (R 4.3.1)
#>  fastmap       1.1.1    2023-02-24 [1] CRAN (R 4.3.0)
#>  fs            1.6.3    2023-07-20 [1] CRAN (R 4.3.0)
#>  generics      0.1.3    2022-07-05 [1] RSPM
#>  glue          1.6.2    2022-02-24 [1] CRAN (R 4.3.0)
#>  htmltools     0.5.6.1  2023-10-06 [1] CRAN (R 4.3.1)
#>  knitr         1.44     2023-09-11 [1] CRAN (R 4.3.0)
#>  lifecycle     1.0.3    2022-10-07 [1] RSPM
#>  magrittr      2.0.3    2022-03-30 [1] CRAN (R 4.3.0)
#>  pillar        1.9.0    2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3    2019-09-22 [1] CRAN (R 4.3.0)
#>  purrr         1.0.2    2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0   2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2    2022-06-13 [1] RSPM
#>  R.oo          1.25.0   2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils       2.12.2   2022-11-11 [1] CRAN (R 4.3.0)
#>  R6            2.5.1    2021-08-19 [1] RSPM
#>  reprex        2.0.2    2022-08-17 [1] RSPM
#>  rlang         1.1.1    2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown     2.25     2023-09-18 [1] RSPM
#>  rstudioapi    0.15.0   2023-07-07 [1] CRAN (R 4.3.0)
#>  sessioninfo   1.2.2    2021-12-06 [1] CRAN (R 4.3.0)
#>  styler        1.10.2   2023-08-29 [1] CRAN (R 4.3.0)
#>  tibble        3.2.1    2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyselect    1.2.0    2022-10-10 [1] RSPM
#>  utf8          1.2.4    2023-10-22 [1] CRAN (R 4.3.1)
#>  vctrs         0.6.4    2023-10-12 [1] CRAN (R 4.3.1)
#>  withr         2.5.1    2023-09-26 [1] CRAN (R 4.3.1)
#>  xfun          0.40     2023-08-09 [1] CRAN (R 4.3.0)
#>  yaml          2.3.7    2023-01-23 [1] CRAN (R 4.3.0)
#> 
#>  [1] /Users/kirill/Library/R/arm64/4.3/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@mgirlich
Copy link
Collaborator

Hmm, maybe tbl.src_dbi() should actually use check_dots_empty() (see tidyverse/dplyr#6605). But this still wouldn't help duckdb here. I think it would make most sense to fix this in {duckdb-r}.
@krlmlr What do you think?

@krlmlr
Copy link
Member Author

krlmlr commented Oct 30, 2023

I believe check_dots_used() is mistaken, at least in our current example. We do use the argument in question.

@krlmlr
Copy link
Member Author

krlmlr commented Oct 30, 2023

I'll see if I can make a reprex, but it might take me a while.

@mgirlich
Copy link
Collaborator

I think this is an issue with check_dots_used() and NextMethod(). I remember there were some issues. Can you check if this works when using check_dots_empty() instead?

@krlmlr
Copy link
Member Author

krlmlr commented Oct 30, 2023

Why would that be better? Wouldn't it just fail?

We don't want to fail here, but support an extra argument to our method.

@mtmorgan
Copy link

mtmorgan commented Oct 30, 2023

This is also a problem for the Bioconductor package Organism.dplyr, where a method tbl.src_organism() uses an argument for computation and also calls NextMethod().

In the example below, NextMethod() passes all arguments to foo.default (not just, e.g., ...) and foo.default sees the arguments from foo.bas part of its ....

foo <- function(x, ...) UseMethod("foo")
foo.default <- function(x, ...) list(...)
foo.b <- function(x, ..., y = FALSE) NextMethod("foo")

foo(structure(1, class = "b"), y = "baz")
## $y
## [1] "baz"

@hadley
Copy link
Member

hadley commented Nov 2, 2023

Here's a very simple example illustrating that check_dots_used() seems to work fine with NextMethod():

foo <- function(x, ...) {
  rlang::check_dots_used()
  UseMethod("foo")
}

foo.bar <- function(x, ..., a = 1) {
  force(a)
  invisible()
}
bar <- structure(list(), class = "bar")
foo(bar)
foo(bar, a = 1)
foo(bar, b = 1)
#> Error in `foo()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • b = 1
#> ℹ Did you misspell an argument name?
#> Backtrace:
#>     ▆
#>  1. └─global foo(bar, b = 1)
#>  2.   └─rlang (local) `<fn>`()
#>  3.     └─rlang:::check_dots(env, error, action, call)
#>  4.       └─rlang:::action_dots(...)
#>  5.         ├─base (local) try_dots(...)
#>  6.         └─rlang (local) action(...)

foo.baz <- function(x, ..., b = 1) {
  force(b)
  NextMethod()
}

baz <- structure(list(), class = c("baz", "bar"))
foo(baz)
foo(baz, a = 1)
foo(baz, b = 1)

Created on 2023-11-02 with reprex v2.0.2

@hadley
Copy link
Member

hadley commented Nov 2, 2023

But tbl_sql() doesn't actually do anything with ... so it's not surprising that we're seeing this problem.

Is the root cause that tbl.src_dbi() is passing ... to tbl_sql()? Maybe it needs explicit vars and check_from arguments?

@krlmlr in your example, who is supposed to use the cache argument?

@hadley hadley added the bug an unexpected problem or unintended behavior label Nov 2, 2023
@krlmlr
Copy link
Member Author

krlmlr commented Nov 2, 2023

I believe dplyr::tbl.DBIConnection() needs to use a version of tbl() without check_dots_used() . This is a simplified replica of the situation:

library(rlang)

tbl <- function(x, ...) {
  check_dots_used()
  UseMethod("tbl")
}

tbl.duckdb_connection <- function(x, ..., cache = FALSE) {
  force(cache)
  NextMethod()
}

tbl.DBIConnection <- function(x, ...) {
  tbl(structure(list(x), class = "src_dbi"), ...)
}

tbl.src_dbi <- function(x, ...) {
  TRUE
}

tbl(structure(list(), class = c("duckdb_connection", "DBIConnection")), cache = TRUE)
#> Error in `tbl()`:
#> ! Arguments in `...` must be used.
#> ✖ Problematic argument:
#> • cache = cache
#> ℹ Did you misspell an argument name?
#> Backtrace:
#>      ▆
#>   1. ├─global tbl(...)
#>   2. ├─global tbl.duckdb_connection(...)
#>   3. ├─base::NextMethod()
#>   4. └─global tbl.DBIConnection(...)
#>   5.   └─global tbl(structure(list(x), class = "src_dbi"), ...)
#>   6.     └─rlang (local) `<fn>`()
#>   7.       └─rlang:::check_dots(env, error, action, call)
#>   8.         └─rlang:::action_dots(...)
#>   9.           ├─base (local) try_dots(...)
#>  10.           └─rlang (local) action(...)

Created on 2023-11-02 with reprex v2.0.2

@hadley
Copy link
Member

hadley commented Dec 22, 2023

I think I see what you're saving: because tbl.DBIConnection() actually initiates a new call to tbl(), it can't tell that cache was used previously. I don't fully understand the sequence of calls across packages, but I think the simplest thing is just to remove that check in tbl_sql(); it really needs to be done higher up the stack somewhere, but I think it's probably not worth figuring out where due to the complexity of analysing the full sequence of function calls.

hadley added a commit that referenced this issue Dec 22, 2023
hadley added a commit that referenced this issue Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants