Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downloading metadata as json solves download issues #335

Closed
januz opened this issue Apr 23, 2021 · 13 comments
Closed

downloading metadata as json solves download issues #335

januz opened this issue Apr 23, 2021 · 13 comments
Assignees
Labels

Comments

@januz
Copy link
Contributor

januz commented Apr 23, 2021

With a very large (over 35000 variables/fields) and complex project (bilingual setup with lots of HTML commands and Spanish characters in the field descriptions and answer alternatives), I ran into the

Error in inherits(ds, "data.frame") : object 'ds' not found

error. I assumed that it had to do with non-ascii characters like in other issues and tried to isolate the error. I was not successful as the root of the error already lies in the first step using httr::POST (the command executes successfully but results in NA instead of comma-separated text.

I experimented a bit and realized that I could successfully download the metadata when using format = json instead of format = csv. I have implemented an option to choose json as the download format in redcap_metadata_read() in a fork of your repository: januz@518925c

Just wanted to let you know in case you would like to include this option in your package as well. Thanks for all your great work with this package!

@wibeasley wibeasley self-assigned this Apr 24, 2021
@wibeasley
Copy link
Member

@januz, thanks for catching & fixing this. Holy cow that's a wide project. I'd like to include it in the test suite so the coverage isn't restricted to just the vanilla projects.

Do you mind sharing the dictionary and generating a few rows of fake data? If so, created a dedicated folder in inst/test-data/

@januz
Copy link
Contributor Author

januz commented May 8, 2021

Thanks, @wibeasley! I sent you an email in response to your request.

@datalorax
Copy link
Contributor

Hi - just an FYI I ran into this today with REDCapR::redcap_read(). I forked @januz's fork and added the format arg to the metadata call in REDCapR::redcap_read() and that fixed the issue (see datalorax@32975a7). Happy to submit a PR if you'd like but it's probably just as easy for you to implement it. Whichever you prefer.

@wibeasley
Copy link
Member

Thanks a lot, @datalorax! I'll need to include this condition to the test suite before I pull it into the master.

Do you mind sharing the dictionary and generating a few rows of fake data? If so, created a dedicated folder in inst/test-data/.

@datalorax
Copy link
Contributor

Hi @wibeasley, I will have to check with a few people to see if we can do that, but I think it shouldn't be a problem. I'll get back to you soon.

@januz
Copy link
Contributor Author

januz commented Sep 13, 2021

Hey @wibeasley, would you be willing to add my fix to REDCapR? I have a few colleagues that would like to use REDCapR but can't because of the csv vs. json issue. I would rather have them install the official version of the package than my fork. Thanks!

wibeasley added a commit that referenced this issue Oct 30, 2021
wibeasley added a commit that referenced this issue Oct 30, 2021
( I pulled into the wrong branch)

ref #335
wibeasley added a commit that referenced this issue Oct 30, 2021
@wibeasley
Copy link
Member

Hi @januz & @datalorax, thanks again for raising this issue and working on it.

  1. when I pulled the fork that you two have been working on, it failed some checks. But maybe I pulled the wrong commit. I didn't see a PR that you submitted, so I pulled the latest one.

    Tell me if I chose incorrectly. There were a lot of changes beyond REDCapR::redcap_metadata_read(). There were a few things I didn't understand. I'm happy to meet about it.

  2. I'd rather not support both csv & json. Is there any scenario where the CSV format does something the JSON cannot do? If not, I'll drop the CSV option. Tell me if you noticed differences. It shouldn't affect the caller, since they receive an R data.frame from REDCapR::redcap_metadata_read().

  3. @datalorax, stepping back a second, can you check that you still receive the error with the latest GitHub version of REDCapR? I included the @datalorax's 5.7k-column dictionary in the test suite; it returns without an error. I'm on a slightly older version of REDCap (10.5.1), so maybe that's the difference?

    Here's a one liner to pull from the new test project on my server:

    redcap_metadata_read("https://bbmc.ouhsc.edu/redcap/api/", "1C31398F332FCACA4C0A7B93B18D5CD4")$data

    This is what I see on my machine:

    # A tibble: 5,751 × 18
       field_name  form_name  section_header  field_type field_label select_choices_<chr>       <chr>      <chr>           <chr>      <chr>       <chr>           
     1 record_id   sev0_pareNA             text       "Record ID" NA              
     2 p_consent   sev0_pareNA             descripti"<div clas… NA              
     3 p_consent_… sev0_pare…  NA             text       "<div clasNA              
     4 p_consent_sev0_pareNA             text       "<div clas… NA              
     5 p_consent_… sev0_pare…  NA             radio       NA         1, I consent to…
     6 pdem_1_s0   sev0_pare… "<div class=\"… radio      "What is y1, Mother | 2, …
     7 pdem_1a_s0  sev0_pareNA             text       "Please de… NA              
     8 pdem_2_s0   sev0_pare…  NA             radio      "What is y1, Married/livi9 pdem_3_s0   sev0_pareNA             radio      "What is t… 1, Less than hi…
    10 pdem_4_s0   sev0_pare…  NA             radio      "What is y1, $19,999 or l# … with 5,741 more rows, and 12 more variables: field_note <chr>,
    #   text_validation_type_or_show_slider_number <chr>, text_validation_min <chr>,
    #   text_validation_max <chr>, identifier <chr>, branching_logic <chr>,
    #   required_field <chr>, custom_alignment <chr>, question_number <chr>,
    #   matrix_group_name <chr>, matrix_ranking <chr>, field_annotation <chr>

    Here's the actual test:

    test_that("Super-wide 2", {
    testthat::skip_on_cran()
    expected_outcome_message <- "The data dictionary describing 5,751 fields was read from REDCap in \\d+(\\.\\d+\\W|\\W)seconds\\. The http status code was 200\\."
    expected_row_count <- 5751L
    expected_column_count <- 18L
    expected_na_cells <- 63750L
    expect_message(
    regexp = expected_outcome_message,
    returned_object <- redcap_metadata_read(redcap_uri=credential_super_wide_2$redcap_uri, token=credential_super_wide_2$token)
    )
    expect_equal(nrow(returned_object$data), expected=expected_row_count) # dput(returned_object$data)
    expect_equal(ncol(returned_object$data), expected=expected_column_count)
    expect_equal(sum(is.na(returned_object$data)), expected=expected_na_cells)
    })

    I noticed these warnings when I uploaded your dictionary. I'm guessing they aren't related any errors returned by the metadata API call.
    image

  4. @januz, if you're not allowed to share your dictionary of 35k variables, I understand. I'll programmatically create a dictionary with that many variables. But the test won't be as good. Although I can recreate the wide-ness of it, I won't be able to mimic the complex html commands and the Spanish labels.

wibeasley pushed a commit that referenced this issue Oct 31, 2021
@wibeasley
Copy link
Member

wibeasley commented Oct 31, 2021

@januz, will you try the dev branch on your 35k variables? (As you can see from the commit above, it uses json only --no csv anymore.) If it works, I'll pull it into the main branch.
3716237

I'll still like to have that dictionary to check for regression errors.

@datalorax
Copy link
Contributor

Hi @wibeasley, sorry I didn't get to this today. I will try to get to testing it tomorrow.

@januz
Copy link
Contributor Author

januz commented Nov 2, 2021

Hey @wibeasley, yes, your commit works with our project! As expected, I can download the data dictionary without a problem.

To your points above:

  1. Yes, I added quite a few more small commits that include changes that I wouldn't propose to be included into your package. Once the fix is implemented in main, I will likely just re-implement those in a workgroup-specific package wrapping functions from your package.

  2. Yes, I think that using only json shouldn't make any difference for users.

  3. I think I could send you the data dictionary privately but it shouldn't be included into your test suite (ie., be publicly available). Maybe you could derive an "anonymized" test case from the original? Let me know what you think.

Thanks so much for implementing this change! I'm happy to be able to get back to using (and recommending to use) your package instead of my fork.

@wibeasley
Copy link
Member

  1. Cool. Tell me if you think any are general enough (and test-able) to include in REDCapR. I like how you made revisions to the metadata function.

  2. --

  3. --

  4. If you email me that dictionary, I'll remove/disguise the parts you don't want public and send it back to you for approval before including it in the repo. See Mimic @januz's dictionary with 35k variables #360.

@januz
Copy link
Contributor Author

januz commented Nov 2, 2021

Thanks so much, @wibeasley!

@januz januz closed this as completed Nov 2, 2021
wibeasley added a commit that referenced this issue Nov 2, 2021
ref #335

zip file b/c it's too big for CRAN
@datalorax
Copy link
Contributor

Looks like you've got this covered now, but just popping back in to say it is working on my end now as well! Thanks so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants