Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

na.rm = TRUE not respected in pmin/pmax for Snowflake #1329

Closed
fh-mthomson opened this issue Jul 8, 2023 · 0 comments · Fixed by #1330
Closed

na.rm = TRUE not respected in pmin/pmax for Snowflake #1329

fh-mthomson opened this issue Jul 8, 2023 · 0 comments · Fixed by #1330

Comments

@fh-mthomson
Copy link
Contributor

fh-mthomson commented Jul 8, 2023

As discussed in #118, the default translations for pmin() and pmax() are LEAST and GREATEST, respectively.

However, for Snowflake (akin to Oracle), LEAST and GREATEST do the following (in contrast to many other backends, discussed with Snowflake here and a community-identified workaround here):

If any of the argument values is NULL, the result is NULL.

Accordingly:

I'd expect either:

  1. [preferred] na.rm = TRUE is supported OR
  2. [interim] na.rm = TRUE errors when called on Snowflake (and does not warn when na.rm = FALSE)

Illustrative Code (masking company Snowflake instance):

conn_snowflake <- DBI::dbConnect(odbc::odbc(), .connection_string = "<company>")
id_temp <- DBI::Id(schema = "TEST", table = "TEST")

test_tbl <- tibble::tibble(
  a = c(1, NA, 3),
  b = c(NA, 2, 4)
)

dbplyr::copy_to(
  conn_snowflake,
  test_tbl,
  name = id_temp,
  overwrite = TRUE
)

tbl_sf <- conn_snowflake %>%
  tbl(id_temp)

# expected behavior: correctly ignores NA for `min()` / `max()`
tbl_sf %>%
  dplyr::summarise(
    min_a = min(a, na.rm = TRUE),
    min_b = min(b, na.rm = TRUE),
    max_a = max(a, na.rm = TRUE),
    max_b = max(b, na.rm = TRUE)
  )

# Source:   SQL [1 x 2]
# Database: Snowflake 7.22.1[michael.thomson@Snowflake/TEST]
  min_a min_b max_a max_b
  <dbl> <dbl> <dbl> <dbl>
1     1     2     3     4

na_rm <- TRUE # toggle TRUE or FALSE

# unexpected behavior: both return NA for each na.rm = TRUE and FALSE
tbl_sf %>%
  dplyr::mutate(
    pmin_a_b = pmin(a, b, na.rm = na_rm),
    pmax_a_b = pmax(a, b, na.rm = na_rm)
  )

# Source:   SQL [3 x 4]
# Database: Snowflake 7.22.1[michael.thomson@Snowflake/TEST]
      a     b pmin_a_b pmax_a_b
  <dbl> <dbl>    <dbl>    <dbl>
1     1    NA       NA       NA
2     3     4        3        4
3    NA     2       NA       NA
@fh-mthomson fh-mthomson changed the title na.rm = TRUE not respected in pmin()/pmax for Snowflake na.rm = TRUE not respected in pmin/pmax for Snowflake Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant