-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leveraging metadata to determine data type in R. #294
Comments
Are you sure your concerns about the variability in the date types is warranted? I know the backend storage for date-time types is always YMD. I've attached an export of a test project that uses all 9 date-time stamps. The CSV export is always ymd. I don't see a problem here. |
@pbchase has a point here. The Anything that has a field type starting with |
@pbchase & @nutterb, thanks for pointing out that I didn't need to be as worried about the date formats. One less thing to worry about. @thomasnwilson and I are still concerned about the scenario where (a) some values are entered initially, and (b) later validation is added. In that case, it's very easy that the metadata information will be too restrictive when determining data type. library(magrittr)
uri <- "https://bbmc.ouhsc.edu/redcap/api/"
token <- "14A41597332864D74460CBBF52EE49A6"
ds_metadata <- REDCapR::redcap_metadata_read(redcap_uri=uri, token=token)$data
ds <- REDCapR::redcap_read(redcap_uri=uri, token=token)$data
ds_metadata %>%
dplyr::select(
field_name
,text_validation_type_or_show_slider_number
)
ds results Notice the
@pbchase, @thomasnwilson, @nutterb, and anyone else. Do you agree this is a problem, or am I inventing something? If so, what are your overall goals with this desired capability (i.e., the ability for |
You're probably more kind to your users than I am. My function has a There are too many ways things can go wrong, and I personally feel that trying to accommodate all of those problems is a drain on development time. But I'm a cranky old man, too. |
I think I think My idea is to make a tool that simplifies some problems. By no means can we fix all the problems that might arise. |
VS Code regex & sub `(.*) \| (.*) +$` & `- validation_name : $1\n fx_export : "$2"` ref #294
issue replaced by #405 |
@pbchase re-energized the conversation about
redcap_read()
using metadata to determine the data type. This may be less important that it was before October's enhancements with types (#257 & #258), bt I agree we should still think about it.I pulled the stock validation options with
SELECT * FROM redcap_validation_types;
and displayed a few relevant columns. (I excluded a few huge and/or obvious ones that distort the table.) The 2nd column is my best guess how readr columns should be specified. Thevalidation_name
column is exposed throughREDCapR::redcap_metadata_read()
.I don't know how the dates can work reliably. Most of them may have to be read as character. The JavaScript regex for the 2nd row below is
/^((29([-\/])02\3(\d{2}([13579][26]\|[2468][048]\|04\|08)\|(1600\|2[048]00)))\|((((0[1-9]\|1\d\|2[0-8])([-\/])(0[1-9]\|1[012]))\|((29\|30)([-\/])(0[13-9]\|1[012]))\|(31([-\/])(0[13578]\|1[02])))(\11\|\15\|\18)\d{4}))$/
.Notice the character class
[-\/]
that's sprinkled in, which accepts either-
or/
between date parts. I don't think readr can be that flexible. You must specify either-
or/
in readr, but from the REDCap regex, there's no way to determine what it's going to be. In fact, the same column can contain cells that mix & match. (And I think within a cell, you can mix & match, such as "4/3-2001", because the regex isn't using a back reference.)A related option is that REDCap suggests the col_types, by printing them to the console. The user then adjusts them and specifies it in the code right before
redcap_read()
. I have had scucess with a very similar strategy withOuhscMunge::readr_spec_aligned()
.@pbchase and others, what are your thoughts?
col_character()
col_date("%d-%m-%Y")
col_date("%m-%d-%Y")
col_date()
col_datetime("%d-%m-%Y %H:%M")
col_datetime("%m-%d-%Y %H:%M")
col_datetime("%d-%m-%Y %H:%M:%S")
col_datetime("%m-%d-%Y %H:%M:%S")
col_datetime("%Y-%m-%d %H:%M:%S")
col_datetime("%Y-%m-%d %H:%M")
col_character()
col_integer()
/^[-+]?\b\d+\b$/
col_character()
/^\d{10}$/
col_character()
/^[a-z0-9-_]+$/i
col_double()
/^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$/
col_double()
/^-?\d+\.\d$/
col_double()
/^-?\d+,\d$/
col_double()
/^-?\d+\.\d{2}$/
col_double()
/^-?\d+,\d{2}$/
col_double()
/^-?\d+\.\d{3}$/
col_double()
/^-?\d+,\d{3}$/
col_double()
/^-?\d+\.\d{4}$/
col_double()
/^-?\d+,\d{4}$/
col_double()
/^[-+]?[0-9]*,?[0-9]+([eE][-+]?[0-9]+)?$/
col_character()
col_character()
col_character()
/^\d{4}$/
col_character()
/^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1}\s*\d{1}[A-Z]{1}\d{1}$/i
col_character()
col_character()
col_character()
/^\d{3}-\d\d-\d{4}$/
col_time("%H:%M")
col_time("%M:%H")
/^[0-5]\d:[0-5]\d$/
col_character()
/^[0-9]{4,9}$/
col_character()
/^\d{5}(-\d{4})?$/
The text was updated successfully, but these errors were encountered: