You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the following reprex I show a very simple SQL query that I can't figure out how to replicate using dbplyr translation.
The operation is to calculate a random number based on the value in an existing column. Because there is no vectorized function for this in R, using dplyr I need to either call map or use rowwise. However, neither of these translates properly to SQL.
This is a general issue in that SQL is naturally rowwise, while R tries to be vectorized as much as possible for efficiency.
library(dbplyr)
library(tidyverse)
### Create in-memory DBcon= dbConnect(RSQLite::SQLite(), location=":memory:")
#> Error in dbConnect(RSQLite::SQLite(), location = ":memory:"): could not find function "dbConnect"### Create local tabledf= tibble(A=1:10)
### Write table to database
dbWriteTable(con, "tmp", df, overwrite=T)
#> Error in dbWriteTable(con, "tmp", df, overwrite = T): could not find function "dbWriteTable"### Read lazy table from DBdbf= tbl(con, "tmp")
#> Error in eval(expr, envir, enclos): object 'con' not found### Perform operation in the database:### Create random integer based on value in column A
dbGetQuery(con, "select A, abs(random() % A) as B from tmp")
#> Error in dbGetQuery(con, "select A, abs(random() % A) as B from tmp"): could not find function "dbGetQuery"### Try doing the same with local tabledf %>% mutate(B= sample(1:A, 1))
#> Warning: There was 1 warning in `mutate()`.#> ℹ In argument: `B = sample(1:A, 1)`.#> Caused by warning in `1:A`:#> ! numerical expression has 10 elements: only the first used#> # A tibble: 10 × 2#> A B#> <int> <int>#> 1 1 1#> 2 2 1#> 3 3 1#> 4 4 1#> 5 5 1#> 6 6 1#> 7 7 1#> 8 8 1#> 9 9 1#> 10 10 1df %>% mutate(across(A, ~sample(1:.x, 1), .names="B"))
#> Warning: There was 1 warning in `mutate()`.#> ℹ In argument: `across(A, ~sample(1:.x, 1), .names = "B")`.#> Caused by warning in `1:A`:#> ! numerical expression has 10 elements: only the first used#> # A tibble: 10 × 2#> A B#> <int> <int>#> 1 1 1#> 2 2 1#> 3 3 1#> 4 4 1#> 5 5 1#> 6 6 1#> 7 7 1#> 8 8 1#> 9 9 1#> 10 10 1df %>% rowwise() %>% mutate(B= sample(1:A, 1))
#> # A tibble: 10 × 2#> # Rowwise: #> A B#> <int> <int>#> 1 1 1#> 2 2 1#> 3 3 3#> 4 4 4#> 5 5 5#> 6 6 1#> 7 7 4#> 8 8 6#> 9 9 2#> 10 10 5df %>% mutate(B= map_dbl(A, ~sample(1:.x, 1)))
#> # A tibble: 10 × 2#> A B#> <int> <dbl>#> 1 1 1#> 2 2 2#> 3 3 1#> 4 4 4#> 5 5 2#> 6 6 1#> 7 7 1#> 8 8 5#> 9 9 3#> 10 10 4### Try doing the same with lazy tabledbf %>% mutate(B= sample(1:A, 1)) %>% show_query()
#> Error in eval(expr, envir, enclos): object 'dbf' not founddbf %>% mutate(across(A, ~sample(1:.x, 1), .names="B")) %>% show_query()
#> Error in eval(expr, envir, enclos): object 'dbf' not founddbf %>% rowwise() %>% mutate(B= sample(1:A, 1)) %>% show_query()
#> Error in eval(expr, envir, enclos): object 'dbf' not founddbf %>% mutate(B= map_dbl(A, ~sample(1:., 1))) %>% show_query()
#> Error in eval(expr, envir, enclos): object 'dbf' not found
Quite honestly, my preference would be for dplyr to, by default, map non-vectorized operations. I'm having trouble thinking of a situation where the meaning isn't implicitly clear. That doesn't mean there aren't.
Alternatively, find a way to deal with the situation where rowwise is used so that code can be used for both local and lazy tables.
Or maybe in dplyr have a way to explicitly specify whether a given operation is to be done row-wise or column-wise. Consider this:
I'm having a hard time following your reprex as there seems to be a lot of content that's not germane to the problem. But lets start with a simple problem: what does abs(random() % A) do?
In the following reprex I show a very simple SQL query that I can't figure out how to replicate using
dbplyr
translation.The operation is to calculate a random number based on the value in an existing column. Because there is no vectorized function for this in R, using
dplyr
I need to either callmap
or userowwise
. However, neither of these translates properly to SQL.This is a general issue in that SQL is naturally rowwise, while R tries to be vectorized as much as possible for efficiency.
Created on 2023-10-25 with reprex v2.0.2
Quite honestly, my preference would be for dplyr to, by default,
map
non-vectorized operations. I'm having trouble thinking of a situation where the meaning isn't implicitly clear. That doesn't mean there aren't.Alternatively, find a way to deal with the situation where
rowwise
is used so that code can be used for both local and lazy tables.Or maybe in
dplyr
have a way to explicitly specify whether a given operation is to be done row-wise or column-wise. Consider this:Created on 2023-10-25 with reprex v2.0.2
I think the following should produce the outputs shown:
There would be no question how those are to be evaluated or translated to SQL
The text was updated successfully, but these errors were encountered: