arrow 13.0.0 compatibility #761

thisisnic · 2023-08-03T13:48:21Z

We're planning on releasing arrow 13.0.0 to CRAN in the next couple of weeks, and our revdepchecks flagged up some test failures with pins.

-- R CMD check results ----------------------------------------- pins 1.2.0 ----
Duration: 43.1s

> checking tests ...
  See below...

-- Test failures ------------------------------------------------- testthat ----

> library(testthat)
> library(pins)
> 
> test_check("pins")
Guessing `type = 'rds'`
> <http://127.0.0.1:49432/data.txt> is not cacheable
> <http://127.0.0.1:49432/x.rds> is not cacheable
> <http://127.0.0.1:49436/x/20230724T142915Z-c3943/data.txt> is not cacheable
> <http://127.0.0.1:49436/x/20230724T142915Z-c3943/x.json> is not cacheable
> <http://127.0.0.1:49440/y/20230724T142915Z-cba09/data.txt> is not cacheable
> <http://127.0.0.1:49440/y/20230724T142915Z-5026d/data.txt> is not cacheable
[ FAIL 3 | WARN 0 | SKIP 66 | PASS 183 ]

══ Skipped tests (66) ══════════════════════════════════════════════════════════
• On CRAN (52): 'test-board_connect_bundle.R:36:3',
  'test-board_connect_bundle.R:41:3', 'test-board_connect_server.R:22:3',
  'test-board_connect_server.R:31:3', 'test-board_connect_server.R:51:3',
  'test-board_connect_server.R:60:3', ???, ???, 'test-board_folder.R:9:3',
  'test-board_folder.R:34:3', 'test-board_folder.R:42:3',
  'test-board_folder.R:88:3', 'test-board_url.R:54:3',
  'test-board_url.R:154:3', 'test-board_url.R:168:3', 'test-board_url.R:180:3',
  'test-board_url.R:199:3', 'test-board_url.R:218:3', 'test-board_url.R:237:3',
  'test-legacy_board.R:2:3', 'test-legacy_datatxt.R:4:3',
  'test-legacy_datatxt.R:16:3', 'test-legacy_datatxt.R:25:3',
  'test-legacy_datatxt.R:35:3', 'test-legacy_local.R:43:3',
  'test-legacy_packages.R:23:3', 'test-legacy_registry.R:30:3',
  'test-meta.R:13:3', 'test-meta.R:20:3', 'test-meta.R:24:3',
  'test-pin-delete.R:11:3', 'test-pin-meta.R:2:3',
  'test-pin-read-write.R:35:3', 'test-pin-read-write.R:41:3',
  'test-pin-read-write.R:58:3', 'test-pin-read-write.R:80:3',
  'test-pin-read-write.R:88:3', 'test-pin-upload-download.R:17:3',
  'test-pin-upload-download.R:46:3', 'test-pin-upload-download.R:61:3',
  'test-pin.R:76:3', 'test-pin.R:108:3', 'test-pin_info.R:5:3',
  'test-pin_info.R:23:3', 'test-pin_info.R:33:3', 'test-pin_search.R:27:3',
  'test-pin_versions.R:8:3', 'test-pin_versions.R:18:3',
  'test-pin_versions.R:26:3', 'test-pin_versions.R:41:3',
  'test-pin_versions.R:67:3', 'test-pin_versions.R:91:3'
• board_azure() tests require PINS_AZURE_KEY (3):
  'test-board_azure_adls2.R:1:1', 'test-board_azure_blob.R:1:1',
  'test-board_azure_file.R:1:1'
• board_connect() tests requires `creds.rds` (4): 'test-board_connect.R:2:1',
  'test-board_connect_url.R:4:3', 'test-board_connect_url.R:22:3',
  'test-board_connect_url.R:33:3'
• board_gcs() tests require PINS_GCS_PASSWORD (1): 'test-board_gcs.R:1:1'
• board_ms365() tests require PINS_MS365_TEST_DRIVE (1):
  'test-board_ms365.R:1:1'
• board_s3() tests require PINS_AWS_ACCESS_KEY, PINS_AWS_SECRET_ACCESS_KEY (1):
  'test-board_s3.R:1:1'
• legacy_azure() tests require TEST_AZURE_CONTAINER, TEST_AZURE_ACCOUNT,
  TEST_AZURE_KEY (1): 'test-legacy_azure.R:16:1'
• legacy_gcloud() tests require TEST_GOOGLE_BUCKET (1):
  'test-legacy_gcloud.R:12:1'
• legacy_github() tests require TEST_GITHUB_REPO, TEST_GITHUB_BRANCH (1):
  'test-legacy_github.R:1:1'
• legacy_s3() tests require TEST_AWS_BUCKET, TEST_AWS_KEY, TEST_AWS_SECRET,
  TEST_AWS_REGION (1): 'test-legacy_s3.R:14:1'

══ Failed tests ════════════════════════════════════════════════════════════════
── Failure ('test-pin-read-write.R:12:3'): can round trip all types ────────────
pin_read(board, "df-2") (`actual`) not equal to `df` (`expected`).

`class(actual)`:   "tbl_df" "tbl" "data.frame"
`class(expected)`:                "data.frame"
── Failure ('test-pin-read-write.R:15:3'): can round trip all types ────────────
pin_read(board, "df-2") (`actual`) not equal to `df` (`expected`).

`class(actual)`:   "tbl_df" "tbl" "data.frame"
`class(expected)`:                "data.frame"
── Failure ('test-pin-read-write.R:18:3'): can round trip all types ────────────
pin_read(board, "df-3") (`actual`) not equal to `df` (`expected`).

`class(actual)`:   "tbl_df" "tbl" "data.frame"
`class(expected)`:                "data.frame"

[ FAIL 3 | WARN 0 | SKIP 66 | PASS 183 ]
Error: Test failures
Execution halted

1 error x | 0 warnings v | 0 notes v

In short, a change in apache/arrow#35173 constitutes a breaking change - we now return tibble object instead of data.frame objects from our read_* functions. I was going to submit a PR, but I wasn't sure whether it'd be tests or code that would make sense to update.

The text was updated successfully, but these errors were encountered:

juliasilge · 2023-08-07T15:44:30Z

Thanks so much for the detail here! 🙌

I think that what we want to do here is strip off the tibble classes when we read, for consistency with how everything else works in pins. Notice, for example, the tests that will fail with the new arrow release:
https://github.com/rstudio/pins-r/blob/main/tests/testthat/test-pin-read-write.R
It seems weird in the pins context to get back out a tibble when you stored a data.frame. I'm open to discussion here, though.

machow · 2023-08-07T20:52:01Z

It seems weird in the pins context to get back out a tibble when you stored a data.frame. I'm open to discussion here, though.

I think from the other side too--if you write a tibble to csv, you'll currently get back a data.frame. I can't really say what R folks want to happen, but if a tibble feels like a "subclass" of a data.frame (e.g. liskov substitutable; w/e that means here, since I know the way they get printed is also important), I wonder if that might make it feel okay to return?

juliasilge · 2023-08-07T21:02:24Z

I am realizing that if we strip off the tibble subclass when reading, then if someone stores a tibble as parquet, it will come back as a data.frame, which would definitely feel real bad. 😬

(I don't think it's reasonable to keep track of the classes here in pins metadata, like "this was originally a tibble", given how arrow now behaves.)

github-actions · 2023-08-31T00:06:48Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

thisisnic mentioned this issue Aug 3, 2023

[R] CRAN packaging checklist for version 13.0.0 apache/arrow#36802

Closed

44 tasks

juliasilge mentioned this issue Aug 9, 2023

Update test for new arrow release #764

Merged

juliasilge closed this as completed in #764 Aug 16, 2023

github-actions bot locked and limited conversation to collaborators Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arrow 13.0.0 compatibility #761

arrow 13.0.0 compatibility #761

thisisnic commented Aug 3, 2023 •

edited

Loading

juliasilge commented Aug 7, 2023

machow commented Aug 7, 2023 •

edited

Loading

juliasilge commented Aug 7, 2023 •

edited

Loading

github-actions bot commented Aug 31, 2023

arrow 13.0.0 compatibility #761

arrow 13.0.0 compatibility #761

Comments

thisisnic commented Aug 3, 2023 • edited Loading

juliasilge commented Aug 7, 2023

machow commented Aug 7, 2023 • edited Loading

juliasilge commented Aug 7, 2023 • edited Loading

github-actions bot commented Aug 31, 2023

thisisnic commented Aug 3, 2023 •

edited

Loading

machow commented Aug 7, 2023 •

edited

Loading

juliasilge commented Aug 7, 2023 •

edited

Loading