-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for Microsoft SQL Server (MSSQL) #77
Conversation
`any(is.na(.))` was causing issues with MSSQL. This is now dropped in favor of instead counting `NA` values and filtering `is_na` on whether or not it has a sum greater than 0.
There seems to be some missing imports in DESCRIPTION: See these messages from R-CMD-check:
|
The test-coverage workflows throws an error for
|
Also, there are some flagged possible spelling mistakes:
|
And with all this hard work, you can also add some more to the NEWS.md :)
|
(also sorted one existing entry)
R/db_manipulating_functions.R
Outdated
filter_keys <- function(.data, filters, by = NULL, na_by = NULL) { | ||
|
||
# Check arguments | ||
assert_data_like(.data) | ||
assert_data_like(filters, null.ok = TRUE) | ||
checkmate::assert_character(by, null.ok = TRUE) | ||
checkmate::assert_character(na_by, null.ok = TRUE) | ||
|
||
if (is.null(filters)) { | ||
return(.data) | ||
} else { | ||
if (is.null(by) && is.null(na_by)) { | ||
# Determine key types | ||
key_types <- filters |> | ||
dplyr::ungroup() |> | ||
dplyr::summarise(dplyr::across(.cols = tidyselect::everything(), .fns = ~ any(is.na(.), na.rm = TRUE))) |> | ||
tidyr::pivot_longer(tidyselect::everything(), names_to = "column_name", values_to = "is_na") | ||
|
||
by <- key_types |> dplyr::filter(!.data$is_na) |> dplyr::pull("column_name") | ||
na_by <- key_types |> dplyr::filter(.data$is_na) |> dplyr::pull("column_name") | ||
|
||
if (length(by) == 0) by <- NULL | ||
if (length(na_by) == 0) na_by <- NULL | ||
} | ||
return(inner_join(.data, filters, by = by, na_by = na_by)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason all checkmate::
checks have ben removed?
I can see why the assert_data_like
are redundant now that it is S3 method.
In fact, I would suggest adding a check that all columns are defined in by
and na_by
.
(since we say so in the description of the function)
checkmate::check_set_equal(c(by, na_by), colnames(filters))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_character(by)
most likely disappeared when I removed assert_character(na_by)
due to na_by
no longer being defined in the function signature.
I reinstated the checks, but since it is possible to skip to the dplyr::inner_join
call, but have opted for checkmate::check_subset
instead of check_set_equal
(which would require the user to specify all columns in by
or na_by
if any were given).
R/db_manipulating_functions.R
Outdated
)) |> | ||
tidyr::pivot_longer(tidyselect::everything(), names_to = "column_name", values_to = "is_na") | ||
|
||
by <- key_types |> dplyr::filter(.data$is_na > 0) |> dplyr::pull("column_name") | ||
na_by <- key_types |> dplyr::filter(.data$is_na == 0) |> dplyr::pull("column_name") | ||
|
||
if (length(by) == 0) by <- NULL | ||
if (length(na_by) == 0) na_by <- NULL | ||
} | ||
return(dplyr::inner_join(.data, filters, by = by, na_by = na_by)) | ||
} | ||
|
||
#' @export | ||
filter_keys.data.frame <- function(.data, filters, by = NULL, ...) { | ||
.dots <- list(...) | ||
|
||
args <- list( | ||
x = .data, | ||
y = filters, | ||
by = by | ||
) |> | ||
append(.dots) | ||
|
||
if ("na_by" %in% names(args)) { | ||
args$na_matches <- "na" | ||
args$na_by <- NULL | ||
} | ||
|
||
if (is.null(by)) args$by <- colnames(filters) | ||
|
||
return(do.call(dplyr::inner_join, args = args)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified a lot.
In my view, SCDB joins_*
and by extension filter_keys
are designed to mimic base R joins as much as possible. Since R by default joins NA with NA, we can simplify this function:
#' @export
filter_keys.data.frame <- function(.data, filters, by = NULL, na_by = NULL) {
if (is.null(by) && is.null(na_by)) {
by <- colnames(filters)
}
return(dplyr::inner_join(.data, filters, by = c(by, na_by), na_matches = "na"))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
na_matches
defaults to "na"
for dplyr:::inner_join.data.frame
, so I have now added an even more simplified version of your suggestion 😇
Intent
This PR adds support for Microsoft SQL Server (MSSQL).
Approach
I have been careful to add functionality to existing S3 methods. Simultaneously, some of these have been "promoted" from default methods to methods for
DBIConnection
objects as this is more specific while retaining use.Testing has been enabled by treating ODBC connections a little more specifically, as these cannot be made with no arguments and it is no secret that the test suite is due for an overhaul. If one wishes to test ODBC backends, they can be defined in an environment variable
SCDB_ODBC_JSON
, which is a text string parseable byjsonlite::fromJSON
. This can be set in the user's Renviron file1 and (in a good way) forgotten about 🙂MSSQL seems to have revealed a potential issue in how dbplyr handles temporary tables, as it fails to overwrite a temporary table even if
dplyr::copy_to(overwrite=TRUE)
is given.2Known issues
db_joins.R
currently does not pass tests, but as these functions are next for a more or less complete S3 makeover, I consider them outside the scope of the PR.Checklist
NEWS.md
Footnotes
Example line:
SCDB_ODBC_JSON='{"MSSQL": {"driver": "SQL Server", "server": "localhost", "database": "my_MSSQL_db", "trusted_connection": "true"}}'
↩See e.g tests/testthat/test-update_snapshot.R. Tables are temporary by default in
dplyr::copy_to
. ↩