Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow 13.0.0 compatibility #761

Closed
thisisnic opened this issue Aug 3, 2023 · 4 comments · Fixed by #764
Closed

arrow 13.0.0 compatibility #761

thisisnic opened this issue Aug 3, 2023 · 4 comments · Fixed by #764

Comments

@thisisnic
Copy link

thisisnic commented Aug 3, 2023

We're planning on releasing arrow 13.0.0 to CRAN in the next couple of weeks, and our revdepchecks flagged up some test failures with pins.

-- R CMD check results ----------------------------------------- pins 1.2.0 ----
Duration: 43.1s

> checking tests ...
  See below...

-- Test failures ------------------------------------------------- testthat ----

> library(testthat)
> library(pins)
> 
> test_check("pins")
Guessing `type = 'rds'`
> <http://127.0.0.1:49432/data.txt> is not cacheable
> <http://127.0.0.1:49432/x.rds> is not cacheable
> <http://127.0.0.1:49436/x/20230724T142915Z-c3943/data.txt> is not cacheable
> <http://127.0.0.1:49436/x/20230724T142915Z-c3943/x.json> is not cacheable
> <http://127.0.0.1:49440/y/20230724T142915Z-cba09/data.txt> is not cacheable
> <http://127.0.0.1:49440/y/20230724T142915Z-5026d/data.txt> is not cacheable
[ FAIL 3 | WARN 0 | SKIP 66 | PASS 183 ]

══ Skipped tests (66) ══════════════════════════════════════════════════════════
• On CRAN (52): 'test-board_connect_bundle.R:36:3',
  'test-board_connect_bundle.R:41:3', 'test-board_connect_server.R:22:3',
  'test-board_connect_server.R:31:3', 'test-board_connect_server.R:51:3',
  'test-board_connect_server.R:60:3', ???, ???, 'test-board_folder.R:9:3',
  'test-board_folder.R:34:3', 'test-board_folder.R:42:3',
  'test-board_folder.R:88:3', 'test-board_url.R:54:3',
  'test-board_url.R:154:3', 'test-board_url.R:168:3', 'test-board_url.R:180:3',
  'test-board_url.R:199:3', 'test-board_url.R:218:3', 'test-board_url.R:237:3',
  'test-legacy_board.R:2:3', 'test-legacy_datatxt.R:4:3',
  'test-legacy_datatxt.R:16:3', 'test-legacy_datatxt.R:25:3',
  'test-legacy_datatxt.R:35:3', 'test-legacy_local.R:43:3',
  'test-legacy_packages.R:23:3', 'test-legacy_registry.R:30:3',
  'test-meta.R:13:3', 'test-meta.R:20:3', 'test-meta.R:24:3',
  'test-pin-delete.R:11:3', 'test-pin-meta.R:2:3',
  'test-pin-read-write.R:35:3', 'test-pin-read-write.R:41:3',
  'test-pin-read-write.R:58:3', 'test-pin-read-write.R:80:3',
  'test-pin-read-write.R:88:3', 'test-pin-upload-download.R:17:3',
  'test-pin-upload-download.R:46:3', 'test-pin-upload-download.R:61:3',
  'test-pin.R:76:3', 'test-pin.R:108:3', 'test-pin_info.R:5:3',
  'test-pin_info.R:23:3', 'test-pin_info.R:33:3', 'test-pin_search.R:27:3',
  'test-pin_versions.R:8:3', 'test-pin_versions.R:18:3',
  'test-pin_versions.R:26:3', 'test-pin_versions.R:41:3',
  'test-pin_versions.R:67:3', 'test-pin_versions.R:91:3'
• board_azure() tests require PINS_AZURE_KEY (3):
  'test-board_azure_adls2.R:1:1', 'test-board_azure_blob.R:1:1',
  'test-board_azure_file.R:1:1'
• board_connect() tests requires `creds.rds` (4): 'test-board_connect.R:2:1',
  'test-board_connect_url.R:4:3', 'test-board_connect_url.R:22:3',
  'test-board_connect_url.R:33:3'
• board_gcs() tests require PINS_GCS_PASSWORD (1): 'test-board_gcs.R:1:1'
• board_ms365() tests require PINS_MS365_TEST_DRIVE (1):
  'test-board_ms365.R:1:1'
• board_s3() tests require PINS_AWS_ACCESS_KEY, PINS_AWS_SECRET_ACCESS_KEY (1):
  'test-board_s3.R:1:1'
• legacy_azure() tests require TEST_AZURE_CONTAINER, TEST_AZURE_ACCOUNT,
  TEST_AZURE_KEY (1): 'test-legacy_azure.R:16:1'
• legacy_gcloud() tests require TEST_GOOGLE_BUCKET (1):
  'test-legacy_gcloud.R:12:1'
• legacy_github() tests require TEST_GITHUB_REPO, TEST_GITHUB_BRANCH (1):
  'test-legacy_github.R:1:1'
• legacy_s3() tests require TEST_AWS_BUCKET, TEST_AWS_KEY, TEST_AWS_SECRET,
  TEST_AWS_REGION (1): 'test-legacy_s3.R:14:1'

══ Failed tests ════════════════════════════════════════════════════════════════
── Failure ('test-pin-read-write.R:12:3'): can round trip all types ────────────
pin_read(board, "df-2") (`actual`) not equal to `df` (`expected`).

`class(actual)`:   "tbl_df" "tbl" "data.frame"
`class(expected)`:                "data.frame"
── Failure ('test-pin-read-write.R:15:3'): can round trip all types ────────────
pin_read(board, "df-2") (`actual`) not equal to `df` (`expected`).

`class(actual)`:   "tbl_df" "tbl" "data.frame"
`class(expected)`:                "data.frame"
── Failure ('test-pin-read-write.R:18:3'): can round trip all types ────────────
pin_read(board, "df-3") (`actual`) not equal to `df` (`expected`).

`class(actual)`:   "tbl_df" "tbl" "data.frame"
`class(expected)`:                "data.frame"

[ FAIL 3 | WARN 0 | SKIP 66 | PASS 183 ]
Error: Test failures
Execution halted

1 error x | 0 warnings v | 0 notes v

In short, a change in apache/arrow#35173 constitutes a breaking change - we now return tibble object instead of data.frame objects from our read_* functions. I was going to submit a PR, but I wasn't sure whether it'd be tests or code that would make sense to update.

@juliasilge
Copy link
Member

Thanks so much for the detail here! 🙌

I think that what we want to do here is strip off the tibble classes when we read, for consistency with how everything else works in pins. Notice, for example, the tests that will fail with the new arrow release:
https://github.com/rstudio/pins-r/blob/main/tests/testthat/test-pin-read-write.R
It seems weird in the pins context to get back out a tibble when you stored a data.frame. I'm open to discussion here, though.

@machow
Copy link
Collaborator

machow commented Aug 7, 2023

It seems weird in the pins context to get back out a tibble when you stored a data.frame. I'm open to discussion here, though.

I think from the other side too--if you write a tibble to csv, you'll currently get back a data.frame. I can't really say what R folks want to happen, but if a tibble feels like a "subclass" of a data.frame (e.g. liskov substitutable; w/e that means here, since I know the way they get printed is also important), I wonder if that might make it feel okay to return?

@juliasilge
Copy link
Member

juliasilge commented Aug 7, 2023

I am realizing that if we strip off the tibble subclass when reading, then if someone stores a tibble as parquet, it will come back as a data.frame, which would definitely feel real bad. 😬

(I don't think it's reasonable to keep track of the classes here in pins metadata, like "this was originally a tibble", given how arrow now behaves.)

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants