Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with select and group_by _PARTITIONTIME #534

Closed
ncuriale opened this issue May 9, 2023 · 2 comments
Closed

Issue with select and group_by _PARTITIONTIME #534

ncuriale opened this issue May 9, 2023 · 2 comments
Labels
reprex needs a minimal reproducible example

Comments

@ncuriale
Copy link

ncuriale commented May 9, 2023

Having issues using _PARTITIONTIME with both dplyr::select and dplyr::group_by

It works with dplyr::distinct

> tbl %>% 
+     dplyr::filter(
+       dplyr::sql("_PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)")
+     ) %>%
+     dplyr::distinct(`_PARTITIONTIME`)
Complete
Billed: 0 B
Downloading first chunk of data.
First chunk includes all requested rows.
# Source:   lazy query [?? x 1]
# Database: BigQueryConnection
   `_PARTITIONTIME`   
   <dttm>             
 1 2023-04-18 00:00:00
 2 2023-04-25 00:00:00
 3 2023-04-17 00:00:00
 4 2023-04-27 00:00:00
 5 2023-04-16 00:00:00
 6 2023-04-30 00:00:00
 7 2023-04-11 00:00:00
 8 2023-04-23 00:00:00
 9 2023-04-14 00:00:00
10 2023-05-07 00:00:00
# ℹ more rows
# ℹ Use `print(n = ...)` to see more rows

Trying with select

> tbl %>% 
+     dplyr::filter(
+       dplyr::sql("_PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)")
+     ) %>%
+     dplyr::select(pt = `_PARTITIONTIME`) 
Error in `dplyr::select()`:
! Can't subset columns that don't exist.Column `_PARTITIONTIME` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.

Trying with group_by

> tbl %>% 
+     dplyr::filter(
+       dplyr::sql("_PARTITIONTIME >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)")
+     ) %>%
+     dplyr::group_by(`_PARTITIONTIME`) %>%
+     dplyr::summarize(
+       MinReceived = min(dplyr::sql("Received"), na.rm = TRUE),
+       MaxReceived = max(dplyr::sql("Received"), na.rm = TRUE)
+     )
Error in `dplyr::group_by()`:
! Must group by variables found in `.data`.Column `_PARTITIONTIME` is not found.
Run `rlang::last_trace()` to see where the error occurred.

Any idea on whats going on here? If this is meant to work, what would be the correct syntax?

@hadley
Copy link
Member

hadley commented Nov 2, 2023

Can you give me a reprex? Or at least find a public partitioned table that I could try this out on?

@hadley hadley added the reprex needs a minimal reproducible example label Nov 2, 2023
@hadley
Copy link
Member

hadley commented Nov 14, 2023

I've closed this issue due to lack of requested reprex. If you still care about this bug, please open a new issue with a reprex.

@hadley hadley closed this as completed Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reprex needs a minimal reproducible example
Projects
None yet
Development

No branches or pull requests

2 participants