downloading metadata as json solves download issues #335

januz · 2021-04-23T23:53:56Z

With a very large (over 35000 variables/fields) and complex project (bilingual setup with lots of HTML commands and Spanish characters in the field descriptions and answer alternatives), I ran into the

Error in inherits(ds, "data.frame") : object 'ds' not found

error. I assumed that it had to do with non-ascii characters like in other issues and tried to isolate the error. I was not successful as the root of the error already lies in the first step using httr::POST (the command executes successfully but results in NA instead of comma-separated text.

I experimented a bit and realized that I could successfully download the metadata when using format = json instead of format = csv. I have implemented an option to choose json as the download format in redcap_metadata_read() in a fork of your repository: januz@518925c

Just wanted to let you know in case you would like to include this option in your package as well. Thanks for all your great work with this package!

The text was updated successfully, but these errors were encountered:

wibeasley · 2021-05-05T19:39:23Z

@januz, thanks for catching & fixing this. Holy cow that's a wide project. I'd like to include it in the test suite so the coverage isn't restricted to just the vanilla projects.

Do you mind sharing the dictionary and generating a few rows of fake data? If so, created a dedicated folder in inst/test-data/

januz · 2021-05-08T14:21:11Z

Thanks, @wibeasley! I sent you an email in response to your request.

datalorax · 2021-07-02T19:22:08Z

Hi - just an FYI I ran into this today with REDCapR::redcap_read(). I forked @januz's fork and added the format arg to the metadata call in REDCapR::redcap_read() and that fixed the issue (see datalorax@32975a7). Happy to submit a PR if you'd like but it's probably just as easy for you to implement it. Whichever you prefer.

wibeasley · 2021-07-09T18:15:07Z

Thanks a lot, @datalorax! I'll need to include this condition to the test suite before I pull it into the master.

Do you mind sharing the dictionary and generating a few rows of fake data? If so, created a dedicated folder in inst/test-data/.

datalorax · 2021-07-09T18:29:04Z

Hi @wibeasley, I will have to check with a few people to see if we can do that, but I think it shouldn't be a problem. I'll get back to you soon.

januz · 2021-09-13T21:16:41Z

Hey @wibeasley, would you be willing to add my fix to REDCapR? I have a few colleagues that would like to use REDCapR but can't because of the csv vs. json issue. I would rather have them install the official version of the package than my fork. Thanks!

ref #335

( I pulled into the wrong branch) ref #335

in testing suite ref #335

wibeasley · 2021-10-30T17:25:00Z

Hi @januz & @datalorax, thanks again for raising this issue and working on it.

when I pulled the fork that you two have been working on, it failed some checks. But maybe I pulled the wrong commit. I didn't see a PR that you submitted, so I pulled the latest one.

Tell me if I chose incorrectly. There were a lot of changes beyond REDCapR::redcap_metadata_read(). There were a few things I didn't understand. I'm happy to meet about it.
I'd rather not support both csv & json. Is there any scenario where the CSV format does something the JSON cannot do? If not, I'll drop the CSV option. Tell me if you noticed differences. It shouldn't affect the caller, since they receive an R data.frame from REDCapR::redcap_metadata_read().

@datalorax, stepping back a second, can you check that you still receive the error with the latest GitHub version of REDCapR? I included the @datalorax's 5.7k-column dictionary in the test suite; it returns without an error. I'm on a slightly older version of REDCap (10.5.1), so maybe that's the difference?

Here's a one liner to pull from the new test project on my server:

redcap_metadata_read("https://bbmc.ouhsc.edu/redcap/api/", "1C31398F332FCACA4C0A7B93B18D5CD4")$data

This is what I see on my machine:

# A tibble: 5,751 × 18
   field_name  form_name  section_header  field_type field_label select_choices_…
   <chr>       <chr>      <chr>           <chr>      <chr>       <chr>           
 1 record_id   sev0_pare…  NA             text       "Record ID" NA              
 2 p_consent   sev0_pare…  NA             descripti… "<div clas… NA              
 3 p_consent_… sev0_pare…  NA             text       "<div clas… NA              
 4 p_consent_… sev0_pare…  NA             text       "<div clas… NA              
 5 p_consent_… sev0_pare…  NA             radio       NA         1, I consent to…
 6 pdem_1_s0   sev0_pare… "<div class=\"… radio      "What is y… 1, Mother | 2, …
 7 pdem_1a_s0  sev0_pare…  NA             text       "Please de… NA              
 8 pdem_2_s0   sev0_pare…  NA             radio      "What is y… 1, Married/livi…
 9 pdem_3_s0   sev0_pare…  NA             radio      "What is t… 1, Less than hi…
10 pdem_4_s0   sev0_pare…  NA             radio      "What is y… 1, $19,999 or l…
# … with 5,741 more rows, and 12 more variables: field_note <chr>,
#   text_validation_type_or_show_slider_number <chr>, text_validation_min <chr>,
#   text_validation_max <chr>, identifier <chr>, branching_logic <chr>,
#   required_field <chr>, custom_alignment <chr>, question_number <chr>,
#   matrix_group_name <chr>, matrix_ranking <chr>, field_annotation <chr>

Here's the actual test:

REDCapR/tests/testthat/test-metadata-read.R

Lines 34 to 49 in f784478

    
           test_that("Super-wide 2", { 
        
             testthat::skip_on_cran() 
        
             expected_outcome_message <- "The data dictionary describing 5,751 fields was read from REDCap in \\d+(\\.\\d+\\W|\\W)seconds\\.  The http status code was 200\\." 
        
             expected_row_count    <- 5751L 
        
             expected_column_count <- 18L 
        
             expected_na_cells     <- 63750L 
        
             expect_message( 
        
               regexp           = expected_outcome_message, 
        
               returned_object <- redcap_metadata_read(redcap_uri=credential_super_wide_2$redcap_uri, token=credential_super_wide_2$token) 
        
             ) 
        
             expect_equal(nrow(returned_object$data), expected=expected_row_count) # dput(returned_object$data) 
        
             expect_equal(ncol(returned_object$data), expected=expected_column_count) 
        
             expect_equal(sum(is.na(returned_object$data)), expected=expected_na_cells) 
        
           })

I noticed these warnings when I uploaded your dictionary. I'm guessing they aren't related any errors returned by the metadata API call.

@januz, if you're not allowed to share your dictionary of 35k variables, I understand. I'll programmatically create a dictionary with that many variables. But the test won't be as good. Although I can recreate the wide-ness of it, I won't be able to mimic the complex html commands and the Spanish labels.

cherry picked from 518925c ref #335

ref #335

wibeasley · 2021-10-31T02:43:10Z

@januz, will you try the dev branch on your 35k variables? (As you can see from the commit above, it uses json only --no csv anymore.) If it works, I'll pull it into the main branch.
3716237

I'll still like to have that dictionary to check for regression errors.

datalorax · 2021-11-01T23:46:56Z

Hi @wibeasley, sorry I didn't get to this today. I will try to get to testing it tomorrow.

januz · 2021-11-02T02:22:20Z

Hey @wibeasley, yes, your commit works with our project! As expected, I can download the data dictionary without a problem.

To your points above:

Yes, I added quite a few more small commits that include changes that I wouldn't propose to be included into your package. Once the fix is implemented in main, I will likely just re-implement those in a workgroup-specific package wrapping functions from your package.
Yes, I think that using only json shouldn't make any difference for users.
I think I could send you the data dictionary privately but it shouldn't be included into your test suite (ie., be publicly available). Maybe you could derive an "anonymized" test case from the original? Let me know what you think.

Thanks so much for implementing this change! I'm happy to be able to get back to using (and recommending to use) your package instead of my fork.

wibeasley · 2021-11-02T14:38:40Z

Cool. Tell me if you think any are general enough (and test-able) to include in REDCapR. I like how you made revisions to the metadata function.
--
--
If you email me that dictionary, I'll remove/disguise the parts you don't want public and send it back to you for approval before including it in the repo. See Mimic @januz's dictionary with 35k variables #360.

januz · 2021-11-02T15:19:53Z

Thanks so much, @wibeasley!

ref #335 zip file b/c it's too big for CRAN

datalorax · 2021-11-03T00:25:35Z

Looks like you've got this covered now, but just popping back in to say it is working on my end now as well! Thanks so much.

wibeasley self-assigned this Apr 24, 2021

wibeasley added a commit that referenced this issue Oct 30, 2021

update documents

b52250c

ref #335

wibeasley added a commit that referenced this issue Oct 30, 2021

returning to before merge

ac5e922

( I pulled into the wrong branch) ref #335

wibeasley added a commit that referenced this issue Oct 30, 2021

include new super wide dataset from @datalorax

f784478

in testing suite ref #335

wibeasley added the testing label Oct 30, 2021

wibeasley pushed a commit that referenced this issue Oct 31, 2021

merge most of @januz's additions

ba94df8

cherry picked from 518925c ref #335

wibeasley added a commit that referenced this issue Oct 31, 2021

use jsonlite::fromJSON() to convert to tibble

3716237

ref #335

wibeasley mentioned this issue Nov 2, 2021

Mimic @januz's dictionary with 35k variables #360

Closed

2 tasks

januz closed this as completed Nov 2, 2021

wibeasley added a commit that referenced this issue Nov 2, 2021

update news; add @januz

9dc3fb4

ref #335 zip file b/c it's too big for CRAN

wibeasley mentioned this issue Nov 4, 2021

use json for metadata #361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

downloading metadata as json solves download issues #335

downloading metadata as json solves download issues #335

januz commented Apr 23, 2021

wibeasley commented May 5, 2021

januz commented May 8, 2021

datalorax commented Jul 2, 2021

wibeasley commented Jul 9, 2021

datalorax commented Jul 9, 2021

januz commented Sep 13, 2021

wibeasley commented Oct 30, 2021

wibeasley commented Oct 31, 2021 •

edited

Loading

datalorax commented Nov 1, 2021

januz commented Nov 2, 2021

wibeasley commented Nov 2, 2021

januz commented Nov 2, 2021

datalorax commented Nov 3, 2021

downloading metadata as json solves download issues #335

downloading metadata as json solves download issues #335

Comments

januz commented Apr 23, 2021

wibeasley commented May 5, 2021

januz commented May 8, 2021

datalorax commented Jul 2, 2021

wibeasley commented Jul 9, 2021

datalorax commented Jul 9, 2021

januz commented Sep 13, 2021

wibeasley commented Oct 30, 2021

wibeasley commented Oct 31, 2021 • edited Loading

datalorax commented Nov 1, 2021

januz commented Nov 2, 2021

wibeasley commented Nov 2, 2021

januz commented Nov 2, 2021

datalorax commented Nov 3, 2021

wibeasley commented Oct 31, 2021 •

edited

Loading