-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check for missing records #400
Comments
@TimMonahan, here's a new version we can try with your dataset. The error message could be better. I can add more info when we discover the reason why they aren't being exported from the server.
For now, I throwing a warning, instead of a full error. unique_ids_actual <- sort(unique(ds_stacked[[id_position]]))
ids_missing_rows <- setdiff(unique_ids, unique_ids_actual)
if (0L < length(ids_missing_rows)) {
warning(sprintf(
"There are %i subject(s) that are missing rows in the final dataset.\nCheck for funny values that could trip up REDCap's PHP code:\n%s.",
length(ids_missing_rows),
paste(ids_missing_rows, collapse="; ")
))
} to install the dev version: |
Advice to people browsing this issue: if you suspect you're losing data through the API, request fewer rows and columns. Ditch any columns you don't need, and reduce If anyone wants to try themselves, the url is "https://bbmc.ouhsc.edu/redcap/api/" and the token (to this PHI-free dataset) is "5C1526186C4D04AE0A0630743E69B53C". @TimMonahan, a little followup yesterday's session: I believe I have isolated the originating problem to the PHP side. I was exploring similar scenarios on my own with large datasets (including the super wide 3 dataset). An incomplete dataset is returned in the playground and Postman, despite also returning a 200 http status code. I've upgraded REDCapR's warning to an error/stop. I'll also add advice to the message about decreasing the batch size or dropping some forms/events. I got completely upstream of R/httr/readr/REDCapR, and used Postman & bash. I made four identical calls and saved to files "raw-text-postman-2.csv" ... "raw-text-postman-5.csv" and then looked at the first 100 characters in each file. They weren't consistent and all of them were missing the first 20,000+ variables --which is bad. But there was a repeating pattern. Some started at When I use the DataExport feature in the browser, I start with the initial variables --which is good. Back to Postman, when I ask for only Before I post something to the REDCap Community site, any other diagnostic information that would be helpful for people to know? Any holes/flaws in this examination? Snippet of first 100 characters from four identical Postman calls: $ head -c 100 raw-text-postman-2.csv
# record_id,variable_30253,variable_30254,variable_30255___1,variable_30255___2,variable_30255___3,var
$ head -c 100 raw-text-postman-3.csv
# record_id,variable_25495,variable_25496,variable_25497___1,variable_25497___2,variable_25497___3,var
$ head -c 100 raw-text-postman-4.csv
# record_id,variable_30253,variable_30254,variable_30255___1,variable_30255___2,variable_30255___3,var
$ head -c 100 raw-text-postman-5.csv
# record_id,variable_25495,variable_25496,variable_25497___1,variable_25497___2,variable_25497___3,var Snippet of first 100 lines of REDCap export: $ head -c 100 export-1.csv
# record_id,variable_00002,variable_00003___1,variable_00003___2,variable_00003___3,variable_00004,var Similar alternating pattern I saw earlier inside kernel-api.R: I think you reporting seeing a similar alternation with your dataset yesterday. Browse[2]> substr(raw_text, 1, 100)
[1] "record_id,variable_25495,variable_25496,variable_25497___1,variable_25497___2,variable_25497___3,var" |
@TimMonahan, Anything you'd change about this error message? (And I'm submitting the previous post to the Community site. EDIT: https://community.projectredcap.org/questions/131318/api-returns-200-ok-despite-returning-an-incomplete.html)
|
Looks great Will! ship it.
And thanks again!
…________________________________
From: Will Beasley ***@***.***>
Sent: Friday, July 22, 2022 3:43:33 PM
To: OuhscBbmc/REDCapR ***@***.***>
Cc: Monahan, Tim M ***@***.***>; Mention ***@***.***>
Subject: Re: [OuhscBbmc/REDCapR] check for missing records (Issue #400)
@TimMonahan<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_TimMonahan&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhfEazhEXT91ASHynm_9f1N0&r=pPw25ao3EG1PtuhCPPP-vJYqkiMKdiJmk3JFDboZ0a4&m=YF-8u4-EKo6JD5Z3cMQkduxNx7a1pYDXt8w302Grjp5no71DKsdWdB805LwTHmgd&s=jtY2_wUzSqppzoY_KYEf3YILuFLy15Wgm67p_gxeXhs&e=>, Anything you'd change about this error message? (And I'm submitting the previous post to the Community site.)
Error: There are 32 subject(s) that are missing rows in the returned dataset. REDCap's PHP code is likely trying to process too much text in one bite.
Common solutions this problem are:
- specifying only the records you need (w/ `records`)
- specifying only the fields you need (w/ `fields`)
- specifying only the forms you need (w/ `forms`)
- specifying a subset w/ `filter_logic`
- reduce `batch_size`
The missing ids are:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32.
—
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_OuhscBbmc_REDCapR_issues_400-23issuecomment-2D1192986335&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhfEazhEXT91ASHynm_9f1N0&r=pPw25ao3EG1PtuhCPPP-vJYqkiMKdiJmk3JFDboZ0a4&m=YF-8u4-EKo6JD5Z3cMQkduxNx7a1pYDXt8w302Grjp5no71DKsdWdB805LwTHmgd&s=KcH5c2VjMYZaFv2W4apPo2PmVWhJYohcbNne8UHpg1w&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AEBJL6GQ6JRSZF3IG47ZOUTVVMPZLANCNFSM54I3DJLQ&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhfEazhEXT91ASHynm_9f1N0&r=pPw25ao3EG1PtuhCPPP-vJYqkiMKdiJmk3JFDboZ0a4&m=YF-8u4-EKo6JD5Z3cMQkduxNx7a1pYDXt8w302Grjp5no71DKsdWdB805LwTHmgd&s=BmG1CWG3V0uGOgCPe9bs3s0Gl-IUJppqjm9zv0QFqBo&e=>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
In a conversation w/ @TimMonahan, we found a scenario where something prevented records for being returned. It looks like it's happening in REDCap's API PHP code. Like something is choking on large sparse datasets and there's a relatively simple way to at least detect it in REDCapR. Make sure that everyone contained in
initial_call
has 1+ records inds_stacked
.REDCapR/R/redcap-read.R
Lines 266 to 281 in e5994dc
REDCapR/R/redcap-read.R
Line 384 in e5994dc
The text was updated successfully, but these errors were encountered: