Skip to content

Commit

Permalink
feat: experimental $sql() method for LazyFrame and DataFrame (#1065)
Browse files Browse the repository at this point in the history
  • Loading branch information
eitsupi authored Apr 28, 2024
1 parent 7d8ee7f commit bb3b753
Show file tree
Hide file tree
Showing 12 changed files with 323 additions and 22 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ jobs:
run: task build-website

- name: upload docs
if: ${{ github.event_name == 'pull_request' }}
if: always()
uses: actions/upload-artifact@v4
with:
name: docs
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
### New features

- New method `<SQLContext>$register_globals()` (#1064).
- New experimental method `$sql()` for DataFrame and LazyFrame (#1065).

## Polars R Package 0.16.2

Expand Down
51 changes: 51 additions & 0 deletions R/dataframe__frame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2435,3 +2435,54 @@ DataFrame_clear = function(n = 0) {

out
}


# TODO: we can't use % in the SQL query
# <https://github.com/r-lib/roxygen2/issues/1616>
#' Execute a SQL query against the DataFrame
#'
#' @inherit LazyFrame_sql description details params seealso
#' @inherit pl_DataFrame return
#' @examplesIf polars_info()$features$sql
#' df1 = pl$DataFrame(
#' a = 1:3,
#' b = c("zz", "yy", "xx"),
#' c = as.Date(c("1999-12-31", "2010-10-10", "2077-08-08"))
#' )
#'
#' # Query the DataFrame using SQL:
#' df1$sql("SELECT c, b FROM self WHERE a > 1")
#'
#' # Join two DataFrames using SQL.
#' df2 = pl$DataFrame(a = 3:1, d = c(125, -654, 888))
#' df1$sql(
#' "
#' SELECT self.*, d
#' FROM self
#' INNER JOIN df2 USING (a)
#' WHERE a > 1 AND EXTRACT(year FROM c) < 2050
#' "
#' )
#'
#' # Apply transformations to a DataFrame using SQL, aliasing "self" to "frame".
#' df1$sql(
#' query = r"(
#' SELECT
#' a,
#' MOD(a, 2) == 0 AS a_is_even,
#' CONCAT_WS(':', b, b) AS b_b,
#' EXTRACT(year FROM c) AS year,
#' 0::float AS 'zero'
#' FROM frame
#' )",
#' table_name = "frame"
#' )
DataFrame_sql = function(query, ..., table_name = NULL, envir = parent.frame()) {
self$lazy()$sql(
query,
table_name = table_name,
envir = envir
)$collect() |>
result() |>
unwrap("in $sql():")
}
69 changes: 67 additions & 2 deletions R/lazyframe__lazy.R
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,7 @@ LazyFrame_width = method_as_active_binding(\() length(self$schema))
#'
#' @param ... Anything that is accepted by `pl$DataFrame()`
#'
#' @return LazyFrame
#' @keywords LazyFrame_new
#' @return [LazyFrame][LazyFrame_class]
#'
#' @examples
#' pl$LazyFrame(
Expand Down Expand Up @@ -2078,3 +2077,69 @@ LazyFrame_to_dot = function(
LazyFrame_clear = function(n = 0) {
pl$DataFrame(schema = self$schema)$clear(n)$lazy()
}


# TODO: we can't use % in the SQL query
# <https://github.com/r-lib/roxygen2/issues/1616>
#' Execute a SQL query against the LazyFrame
#'
#' The calling frame is automatically registered as a table in the SQL context
#' under the name `"self"`. All [DataFrames][DataFrame_class] and
#' [LazyFrames][LazyFrame_class] found in the `envir` are also registered,
#' using their variable name.
#' More control over registration and execution behaviour is available by
#' the [SQLContext][SQLContext_class] object.
#'
#' This functionality is considered **unstable**, although it is close to
#' being considered stable. It may be changed at any point without it being
#' considered a breaking change.
#' @inherit pl_LazyFrame return
#' @inheritParams SQLContext_execute
#' @inheritParams SQLContext_register_globals
#' @param table_name `NULL` (default) or a character of an explicit name for the table
#' that represents the calling frame (the alias `"self"` will always be registered/available).
#' @seealso
#' - [SQLContext][SQLContext_class]
#' @examplesIf polars_info()$features$sql
#' lf1 = pl$LazyFrame(a = 1:3, b = 6:8, c = c("z", "y", "x"))
#' lf2 = pl$LazyFrame(a = 3:1, d = c(125, -654, 888))
#'
#' # Query the LazyFrame using SQL:
#' lf1$sql("SELECT c, b FROM self WHERE a > 1")$collect()
#'
#' # Join two LazyFrames:
#' lf1$sql(
#' "
#' SELECT self.*, d
#' FROM self
#' INNER JOIN lf2 USING (a)
#' WHERE a > 1 AND b < 8
#' "
#' )$collect()
#'
#' # Apply SQL transforms (aliasing "self" to "frame") and subsequently
#' # filter natively (you can freely mix SQL and native operations):
#' lf1$sql(
#' query = r"(
#' SELECT
#' a,
#' MOD(a, 2) == 0 AS a_is_even,
#' (b::float / 2) AS 'b/2',
#' CONCAT_WS(':', c, c, c) AS c_c_c
#' FROM frame
#' ORDER BY a
#' )",
#' table_name = "frame"
#' )$filter(!pl$col("c_c_c")$str$starts_with("x"))$collect()
LazyFrame_sql = function(query, ..., table_name = NULL, envir = parent.frame()) {
result({
ctx = pl$SQLContext()$register_globals(envir = envir)$register("self", self)

if (!is.null(table_name)) {
ctx$register(table_name, self)
}

ctx$execute(query)
}) |>
unwrap("in $sql():")
}
5 changes: 3 additions & 2 deletions R/sql.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ pl_SQLContext = function(...) {
#' Execute SQL query against the registered data
#'
#' Parse the given SQL query and execute it against the registered frame data.
#' @param query A valid string SQL query.
#' @param query A character of the SQL query to execute.
#' @return A [LazyFrame][LazyFrame_class]
#' @examplesIf polars_info()$features$sql
#' query = "SELECT * FROM mtcars WHERE cyl = 4"
Expand Down Expand Up @@ -174,7 +174,8 @@ SQLContext_tables = function() {
#' Automatically maps variable names to table names.
#' @inherit SQLContext_register details return
#' @param ... Ignored.
#' @param envir The environment to search for polars DataFrames/LazyFrames.
#' @param envir The environment to search for polars
#' [DataFrames][DataFrame_class]/[LazyFrames][LazyFrame_class].
#' @seealso
#' - [`<SQLContext>$register()`][SQLContext_register]
#' - [`<SQLContext>$register_many()`][SQLContext_register_many]
Expand Down
77 changes: 77 additions & 0 deletions man/DataFrame_sql.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

74 changes: 74 additions & 0 deletions man/LazyFrame_sql.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/SQLContext_execute.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/SQLContext_register_globals.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions man/pl_LazyFrame.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 13 additions & 13 deletions tests/testthat/_snaps/after-wrappers.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,12 +89,12 @@
[41] "quantile" "rechunk" "rename" "reverse"
[45] "rolling" "sample" "schema" "select"
[49] "select_seq" "shape" "shift" "shift_and_fill"
[53] "slice" "sort" "std" "sum"
[57] "tail" "to_data_frame" "to_list" "to_series"
[61] "to_struct" "transpose" "unique" "unnest"
[65] "var" "width" "with_columns" "with_columns_seq"
[69] "with_row_index" "write_csv" "write_ipc" "write_json"
[73] "write_ndjson" "write_parquet"
[53] "slice" "sort" "sql" "std"
[57] "sum" "tail" "to_data_frame" "to_list"
[61] "to_series" "to_struct" "transpose" "unique"
[65] "unnest" "var" "width" "with_columns"
[69] "with_columns_seq" "with_row_index" "write_csv" "write_ipc"
[73] "write_json" "write_ndjson" "write_parquet"

---

Expand Down Expand Up @@ -164,13 +164,13 @@
[41] "shift_and_fill" "sink_csv"
[43] "sink_ipc" "sink_ndjson"
[45] "sink_parquet" "slice"
[47] "sort" "std"
[49] "sum" "tail"
[51] "to_dot" "unique"
[53] "unnest" "var"
[55] "width" "with_columns"
[57] "with_columns_seq" "with_context"
[59] "with_row_index"
[47] "sort" "sql"
[49] "std" "sum"
[51] "tail" "to_dot"
[53] "unique" "unnest"
[55] "var" "width"
[57] "with_columns" "with_columns_seq"
[59] "with_context" "with_row_index"

---

Expand Down
Loading

0 comments on commit bb3b753

Please sign in to comment.