Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Behavior: redcap_read() fails for older versions of REDCap (~v7.3.5) #465

Closed
the-mad-statter opened this issue Feb 3, 2023 · 5 comments
Assignees

Comments

@the-mad-statter
Copy link
Contributor

I am not sure if you want to support this given that it effects pretty old versions of REDCap, and I am not sure when Vanderbilt fixed the root cause, but redcap_read() presently fails when trying to read records on older versions of REDCap circa v7.3.5.

Specifically, I am trying to read data from a v7.3.5 instance of REDCap, and redcap_read() fails with the following:

Warning: The following named parsers don't match the column names: has_repeating_instruments_or_events, missing_data_codes, external_modules, bypass_branching_erase_field_prompt1 rows were read from REDCap in 0.3 seconds.  The http status code was 200.
Warning: Unknown or uninitialised column: `has_repeating_instruments_or_events`.Error in if (d_proj$has_repeating_instruments_or_events[1]) { : 
 argument is of length zero

I have traced the issue to a pairing of what the REDCap API returns and this commit on 2022-10-08. That is, redcap_read() broke for v7.3.5 with this commit.

The underlying issue is that redcap_metadata_internal() expects a long list of columns to be returned, but the API does not return all of them for older versions of the API.

Here are the expected columns:

col_types <- readr::cols(
  project_id                              = readr::col_integer(),
  project_title                           = readr::col_character(),
  creation_time                           = readr::col_datetime(format = ""),
  production_time                         = readr::col_datetime(format = ""),
  in_production                           = readr::col_logical(),
  project_language                        = readr::col_character(),
  purpose                                 = readr::col_integer(),
  purpose_other                           = readr::col_character(),
  project_notes                           = readr::col_character(),
  custom_record_label                     = readr::col_character(),
  secondary_unique_field                  = readr::col_character(),
  is_longitudinal                         = readr::col_logical(),
  has_repeating_instruments_or_events     = readr::col_logical(),
  surveys_enabled                         = readr::col_logical(),
  scheduling_enabled                      = readr::col_logical(),
  record_autonumbering_enabled            = readr::col_logical(),
  randomization_enabled                   = readr::col_logical(),
  ddp_enabled                             = readr::col_logical(),
  project_irb_number                      = readr::col_character(),
  project_grant_number                    = readr::col_character(),
  project_pi_firstname                    = readr::col_character(),
  project_pi_lastname                     = readr::col_character(),
  display_today_now_button                = readr::col_logical(),
  missing_data_codes                      = readr::col_character(),
  external_modules                        = readr::col_character(),
  bypass_branching_erase_field_prompt     = readr::col_character(),
  .default                                = readr::col_character()
)

names(col_types$cols)
#>  [1] "project_id"                          "project_title"                      
#>  [3] "creation_time"                       "production_time"                    
#>  [5] "in_production"                       "project_language"                   
#>  [7] "purpose"                             "purpose_other"                      
#>  [9] "project_notes"                       "custom_record_label"                
#> [11] "secondary_unique_field"              "is_longitudinal"                    
#> [13] "has_repeating_instruments_or_events" "surveys_enabled"                    
#> [15] "scheduling_enabled"                  "record_autonumbering_enabled"       
#> [17] "randomization_enabled"               "ddp_enabled"                        
#> [19] "project_irb_number"                  "project_grant_number"               
#> [21] "project_pi_firstname"                "project_pi_lastname"                
#> [23] "display_today_now_button"            "missing_data_codes"                 
#> [25] "external_modules"                    "bypass_branching_erase_field_prompt"

Created on 2023-02-03 with reprex v2.0.2

And here are the actual columns returned:

library(RCurl)

result <- postForm(
  uri='https://redcap.wustl.edu/redcap/.../api/',
  token='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
  content='project',
  format='csv',
  returnFormat='json'
)

names(
  readr::read_csv(
    I(result),
    show_col_types = FALSE
  )
)
#>  [1] "project_id"                   "project_title"               
#>  [3] "creation_time"                "production_time"             
#>  [5] "in_production"                "project_language"            
#>  [7] "purpose"                      "purpose_other"               
#>  [9] "project_notes"                "custom_record_label"         
#> [11] "secondary_unique_field"       "is_longitudinal"             
#> [13] "surveys_enabled"              "scheduling_enabled"          
#> [15] "record_autonumbering_enabled" "randomization_enabled"       
#> [17] "ddp_enabled"                  "project_irb_number"          
#> [19] "project_grant_number"         "project_pi_firstname"        
#> [21] "project_pi_lastname"          "display_today_now_button"

Created on 2023-02-03 with reprex v2.0.2

A setdiff() would show the missing columns are those listed in the part of the message produced by read_csv():

  1. has_repeating_instruments_or_events
  2. missing_data_codes
  3. external_modules
  4. bypass_branching_erase_field_prompt

The actual stopping error happens later in redcap_metadata_internal() when there is an attempt to check d_proj$has_repeating_instruments_or_events[1] which is NULL on account of not having been returned by the API.

I have a solution that produces the expected d_proj object with NA for the missing columns, but before I initiated a pull request, I wanted to see if this was something you wanted to support.

@wibeasley
Copy link
Member

@the-mad-statter, I like that idea and would love that PR. Even if I wasn't interested in supporting v7 (and honestly, I'm only lukewarm about it), I really like the idea of gracefully growing.

Please make sure that before has_repeating_instruments_or_events is referenced, the code checks to see if it exists. If it doesn't exist, throw an error (with stop()) that their version of REDCap apparently doesn't support repeated instruments/events.

@the-mad-statter
Copy link
Contributor Author

the-mad-statter commented Feb 3, 2023

I can add the stop(), but it's not that v7 REDCap doesn't support repeated instruments/events but that the API neglects to report on it via the project info endpoint.

As a work around consider the idea to read a single record and check the returned field names for either "redcap_repeat_instrument" or "redcap_repeat_instance" to determine what the has_repeating_instruments_or_events value should have been had the API reported on it and set it as appropriate.

@wibeasley
Copy link
Member

it's not that v7 REDCap doesn't support repeated instruments/events but that the API neglects to report on it via the project info endpoint.

I understand your distinction now. But I think I'm okay lumping those two cases together for a version that was released 4+ years ago.

As a work around consider the idea to read a single record and check the returned field names...

I'll try to be flexible, but in my experience, that strategy makes things less stable. I prefer to use explicitly declared values to detect the server's capabilities. If I go the indirect/infer approach, I'm worried about not accounting for all the possible corner cases.

@the-mad-statter
Copy link
Contributor Author

Sure thing, I don't particularly like the work around either. Therefore, I will add the stop() for a PR and call it a day. Should I use main or dev?

@wibeasley
Copy link
Member

Slightly prefer pulling into dev, but I can work with either.

Hope things are good in St. Louis. Tell me if you're ever back in Oklahoma.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants