Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Behavior: redcap_metadata_read doesn't limit forms when forms parameter is used #431

Closed
ezraporter opened this issue Sep 28, 2022 · 8 comments · Fixed by #445
Closed
Assignees

Comments

@ezraporter
Copy link
Contributor

Describe the behavior:
When an argument is passed in the forms parameter to redcap_metadata_read(), the metadata for all forms in the project is returned.

Reprex using dummy REDCapR dummy project:

# uri and token from ?redcap_metadata_read examples
uri   <- "https://bbmc.ouhsc.edu/redcap/api/"
token <- "9A81268476645C4E5F03428B8AC3AA7B"
resp <- REDCapR::redcap_metadata_read(redcap_uri=uri,
                                      token=token,
                                      # name of a single form in the example data
                                      forms = "health")$data

all(resp$form_name == "health")
#> FALSE

Expected behavior:
Metadata for only the specified forms ("health" in this case) is returned

Additional Info
The problem may be related to how the REDCap API expects the form to be passed. For a different test REDCap in our REDCap instance, the API playground suggests:

token <- # [REDACTED]
url <- # [REDACTED]
formData <- list("token"=token,
    content='metadata',
    format='json',
    returnFormat='json',
    'forms[0]'='first_form',      # Look here :)
    'forms[1]'='second_form'
)
response <- httr::POST(url, body = formData, encode = "form")
result <- httr::content(response)
print(result)

Using the forms[index] style for the example in my reprex seems to work:

Works:
api-works

Doesn't work:
api-fails

Shareable Postman collection containing those 2 api calls for reproducibility: https://www.getpostman.com/collections/dc243e2bacef264725c1

Desktop (please complete the following information):

  • OS: macOS Monterey
  • REDCap version 12.5.8
  • REDCapR Version 1.1.0
@wibeasley wibeasley self-assigned this Sep 28, 2022
@wibeasley
Copy link
Member

@ezraporter, good catch, and thanks for doing the hard work to figure it out. Do you want to take a shot at finishing it?

If so, here's a suggestion how to dynamically name & append those to post_body.

post_body <- list(token="abcd", content="metadata", returnFormat="csv")
forms     <- NULL                               # The default
# forms     <- c("first", "second", "third")      # Well-behaved
# forms     <- character(0)                       # Still legal

if (0L < length(forms)) {
  names(forms) <- sprintf("forms[%i]", seq_along(forms))
  # form_list <-
  #   c(
  #     "forms[0]" = "first",
  #     "forms[1]" = "second",
  #     "forms[2]" = "third"
  #   )
  
  post_body <-
    append(
      post_body,
      forms
    )
}
print(post_body)

@ezraporter
Copy link
Contributor Author

@wibeasley yup I can take a shot at this. I'll make a PR

wibeasley added a commit that referenced this issue Oct 8, 2022
@wibeasley
Copy link
Member

@ezraporter, it looks like you haven't tried it yet? Good, b/c I've changed my mind. I think it needs to be filtered in R --not in REDCap. In other words, everything gets extracted from REDCap and then dplyr::filter() removes the rows. Since this is metadata with at most 5k rows (and not real data with sometimes 100k rows), there shouldn't be much of a performance difference.

I think the metadata should always return the plumbing variables when they exist (ie, record_id, event_name, redcap_repeating_instrument, and redcap_repeating_instance). see #439

@ezraporter
Copy link
Contributor Author

Yes, sorry! Last week got away from me. Glad I didn't hold you up. That approach makes sense to me. Still helpful for me to implement that in a PR using filter()? I have some time today so I can give it a shot

@wibeasley
Copy link
Member

I did something differently this weekend so it's not as important, but I'm still leaning toward dplyr::filter() for two reasons

  1. there might be an occasion where the funtion still needs to know about the other fields, even if they're not returned.
  2. i'm kinda wondering if the API might change, since the way it accepts forms is different than how the other API functions accept forms (as well as records, events, & fields).

I guess performance is the only reason I'd say to filter at the server. Can you do a quick proof of concept?

For a gerbil, use the massive 35k column project (https://github.com/OuhscBbmc/REDCapR/blob/main/inst/misc/example.credentials#L26). If there isn't a noticeable performance difference, use dplyr::filter()?

@ezraporter
Copy link
Contributor Author

Yes I'll give it a shot!

@ezraporter
Copy link
Contributor Author

I'm actually seeing a non-trivial performance difference. Takes less than a second to pull metadata for a single form from the 35k field redcap but almost 20 seconds to get all the metadata. It's about 4 seconds to get all the metadata from a 5k field redcap.

Expand Code
devtools::load_all()

# 35k field redcap
redcap_metadata_read(
  "https://bbmc.ouhsc.edu/redcap/api/",
  "5C1526186C4D04AE0A0630743E69B53C"
)$elapsed_seconds

#> [1] 18.2162

# 5k field redcap
redcap_metadata_read(
  "https://bbmc.ouhsc.edu/redcap/api/",
  "1C31398F332FCACA4C0A7B93B18D5CD4"
)$elapsed_seconds

#> [1] 3.845438

# Single form for 35k field redcap
# Defined for testing here: https://github.com/ezraporter/REDCapR/commit/54012b611d444fefbd6df2945c91ccae386a8c17
redcap_metadata_read_hacky_single_form(
  "https://bbmc.ouhsc.edu/redcap/api/",
  "5C1526186C4D04AE0A0630743E69B53C",
  forms = "form_0001"
)$elapsed_seconds

#> [1] 0.511425

Given that, make sense to mock up a full-fledged version that filters at the server?

@wibeasley
Copy link
Member

wow, I didn't expect that big of a difference. That crosses my threshold, and it sounds like it crosses your too. So yes, please filter at the server. Thanks for running those scenarios to inform our decision.

wibeasley added a commit that referenced this issue Oct 16, 2022
wibeasley added a commit that referenced this issue Oct 16, 2022
wibeasley added a commit that referenced this issue Oct 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants