Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Behavior: Field validation type number_comma_decimal breaks in export #366

Closed
joundso opened this issue Nov 11, 2021 · 8 comments
Assignees

Comments

@joundso
Copy link
Contributor

joundso commented Nov 11, 2021

It seems that number-fields with the validation number_comma_decimal are exported or casted as doubles but with removed decimal separator (the comma).

How to reproduce

  1. Create a new REDCap project
  2. Create a new instrument
  3. Create two new fields:
  4. One with Validation = "Number"
  5. The other one with Validation = "Number (comma with decimal)" (maybe this must be activated before in the admin console if this type is not available in the list of validation types)
  6. Create a demo record with values 1.5 (as value for the "normal" number field) and 1,5 (as value for the comma separated field)
  7. Create an API key
  8. Switch to R and export the whole dataset with REDCapR::redcap_read():
redcap_export <-
  REDCapR::redcap_read(
    redcap_uri = Sys.getenv("REDCAP_API_URL_TEST"),
    token = Sys.getenv("REDCAP_API_KEY_TEST")
  )
#> The data dictionary describing 3 fields was read from REDCap in 0.1 seconds.  The http status code was 200.
#> 1 records and 2 columns were read from REDCap in 0.1 seconds.  The http status code was 200.
#> Starting to read 1 records  at 2021-11-11 10:19:46.
#> Reading batch 1 of 1, with subjects 1 through 1 (ie, 1 unique subject records).
#> 1 records and 5 columns were read from REDCap in 0.1 seconds.  The http status code was 200.
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   study_id = col_double(),
#>   redcap_event_name = col_character(),
#>   test_comma = col_number(),
#>   test_dot = col_double(),
#>   test_complete = col_double()
#> )
redcap_export$data
#>   study_id redcap_event_name test_comma test_dot test_complete
#> 1        1  enrollment_arm_1         15      1.5             0

Here (in the last line) I would assume that the result would be 1.5 for both fields. But 15 as result for the input 1,5 (with the comma as decimal separator) is of course not correct.

System environment

redcap_data_dict <- REDCapR::redcap_metadata_read(
  redcap_uri = Sys.getenv("REDCAP_API_URL_TEST"),
  token = Sys.getenv("REDCAP_API_KEY_TEST")
)
#> The data dictionary describing 3 fields was read from REDCap in 0.1 seconds.  The http status code was 200.
redcap_data_dict$data
#> # A tibble: 3 × 18
#>   field_name form_name section_header field_type field_label   select_choices_o…
#>   <chr>      <chr>     <chr>          <chr>      <chr>         <chr>            
#> 1 study_id   test      <NA>           text       Study ID      <NA>             
#> 2 test_comma test      <NA>           text       Number (with… <NA>             
#> 3 test_dot   test      <NA>           text       Number (with… <NA>             
#> # … with 12 more variables: field_note <chr>,
#> #   text_validation_type_or_show_slider_number <chr>,
#> #   text_validation_min <chr>, text_validation_max <chr>, identifier <chr>,
#> #   branching_logic <chr>, required_field <chr>, custom_alignment <chr>,
#> #   question_number <chr>, matrix_group_name <chr>, matrix_ranking <chr>,
#> #   field_annotation <chr>

R.version
#>                _                           
#> platform       x86_64-pc-linux-gnu         
#> arch           x86_64                      
#> os             linux-gnu                   
#> system         x86_64, linux-gnu           
#> status                                     
#> major          4                           
#> minor          1.2                         
#> year           2021                        
#> month          11                          
#> day            01                          
#> svn rev        81115                       
#> language       R                           
#> version.string R version 4.1.2 (2021-11-01)
#> nickname       Bird Hippie

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] pillar_1.6.4      compiler_4.1.2    highr_0.9         R.methodsS3_1.8.1
#>  [5] R.utils_2.11.0    tools_4.1.2       bit_4.0.4         digest_0.6.28    
#>  [9] evaluate_0.14     lifecycle_1.0.1   tibble_3.1.5      checkmate_2.0.0  
#> [13] R.cache_0.15.0    pkgconfig_2.0.3   rlang_0.4.12      reprex_2.0.1     
#> [17] rstudioapi_0.13   cli_3.1.0         DBI_1.1.1         parallel_4.1.2   
#> [21] curl_4.3.2        yaml_2.2.1        xfun_0.28         fastmap_1.1.0    
#> [25] httr_1.4.2        withr_2.4.2       styler_1.6.2      stringr_1.4.0    
#> [29] dplyr_1.0.7       knitr_1.36        REDCapR_1.0.0     hms_1.1.1        
#> [33] generics_0.1.1    fs_1.5.0          vctrs_0.3.8       bit64_4.0.5      
#> [37] tidyselect_1.1.1  glue_1.4.2        R6_2.5.1          fansi_0.5.0      
#> [41] vroom_1.5.5       rmarkdown_2.11    tzdb_0.2.0        readr_2.0.2      
#> [45] purrr_0.3.4       magrittr_2.0.1    backports_1.3.0   ellipsis_0.3.2   
#> [49] htmltools_0.5.2   assertthat_0.2.1  utf8_1.2.2        stringi_1.7.5    
#> [53] crayon_1.4.2      R.oo_1.24.0

packageVersion("REDCapR")
#> [1] '1.0.0'

Created on 2021-11-11 by the reprex package (v2.0.1)

Thank you

Thanks for your great work so far! If I am missing anything, please let me know!

@wibeasley wibeasley self-assigned this Nov 11, 2021
@wibeasley
Copy link
Member

wibeasley commented Nov 11, 2021

@joundso, to make sure I understand correctly: you're proposing that redcap_read() produces different results, based on values from redcap_metadata_read()? @pbchase and other have advocated for similar approaches, and I agree. I'm a little concerned of scenarios where the validation field was added after illegal values were entered (part of this discussion is captured in #294).

Can you confirm this works? It is a variation of https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#specify-everything-is-a-character. Everything is kept as a string, and then you convert each column explicitly, after manually determining if it uses a comma or period as the separator.

col_types <- readr::cols(.default = readr::col_character())

ds1 <- 
  REDCapR::redcap_read(
    redcap_uri  = Sys.getenv("REDCAP_API_URL_TEST"),
    token       = Sys.getenv("REDCAP_API_KEY_TEST"), 
    col_types   = col_types
  ) |>
  dplyr::mutate(
    test_comma  = readr::parse_number(test_comma, locale = readr::locale(decimal_mark = ",")),
    test_dot    = readr::parse_number(test_dot  , locale = readr::locale(decimal_mark = ".")),
  )

For a REDCap project that doesn't mix & match separators, I'm thinking there's a way to pass/incorporate a dataset-wide value of readr::locale in this part of the code.

col_types <-
if (!is.null(col_types)) col_types
else if (guess_type) NULL
else readr::cols(.default = readr::col_character())
try(
# Convert the raw text to a dataset.
ds <-
readr::read_csv(
file = I(kernel$raw_text),
col_types = col_types,
guess_max = guess_max,
show_col_types = FALSE
) %>%

I don't think I've ever needed to use commas as separators. Do you use readr? How do you personally prefer to specify commas as the separator when you read a normal csv into R?

@joundso
Copy link
Contributor Author

joundso commented Nov 19, 2021

@wibeasley Sorry for the delay - Yes exactly!

  • I think there is a wrong default interpretation in R for fields defined with validation type Number (comma with decimal) in REDCap. Comma separated values will be handeled like period separated values which in fact removes the comma which changes the semantic of the values (1,5 becomes 15 but semantically correct would be 1.5).
  • Exporting all data as strings using your approach with
    col_types <- readr::cols(.default = readr::col_character())
    
    ds1 <- 
      REDCapR::redcap_read(
        redcap_uri  = Sys.getenv("REDCAP_API_URL_TEST"),
        token       = Sys.getenv("REDCAP_API_KEY_TEST"), 
        col_types   = col_types
      )
    keeps the comma as separator. ✔️
  • "For a REDCap project that doesn't mix & match separators, I'm thinking there's a way to pass/incorporate a dataset-wide value of readr::locale in this part of the code. ✔️
  • Of course I would strongly recommend not to mix different separators in one project. In fact I would prefer to not even use commas as separators at all ;-)
  • Thanks for your fast reply and constructive feedback!

@wibeasley
Copy link
Member

Great, in the inst/test-data/, can you please add a data dictionary & a data file that tightly operationalizes your desired output? So you're confident that if REDCapR can read this project, then it's working as expected. Similar to inst/test-data/project-simple/simple-data.csv, but without the metadata file. (I'll add that.)

Maybe the directory is called "inst/test-data/decimal-comma/", and the two files are "decimal-comma.csv" and "decimal-comma-data-dictionary.csv".

Maybe 5 variables and 10 rows. But I'll let you decide whatever allows the tightest scenario to assure the package is producing the desired output.

@joundso
Copy link
Contributor Author

joundso commented Nov 24, 2021

Just added a very rudimentary demo dataset in #374 if it helps.

wibeasley added a commit that referenced this issue Nov 26, 2021
wibeasley added a commit that referenced this issue Nov 26, 2021
wibeasley added a commit that referenced this issue Nov 26, 2021
@wibeasley
Copy link
Member

@joundso, for those comma-decimal fields, I see your value is "number_comma_decimal" for Text Validation Type OR Show Slider Number. I don't seem to have that option for any other fields. Is this a custom regex?

I like the idea and I've seem other people use them, but I never have. Will this be different on other REDCap projects? What about on other REDCap instances?

image

@joundso
Copy link
Contributor Author

joundso commented Nov 29, 2021

@wibeasley The field-validation type number_comma_decimal is not a custom regex but needs to be enabled in the control center via:
https://redcap.your-hospital.org/redcap_v10.6.19/ControlCenter/validation_type_setup.php
img
We strongly recommend the users to not use it - but some projects (e.g. where we get a data dictionary from other sites and just deploy it internally) have it enabled anyway.

@wibeasley
Copy link
Member

@joundso, I need to submit a small update to CRAN and I'm tying up loose ends. Did #377 address your needs? Is there anything you want added to the main branch before I submit to CRAN?

@joundso
Copy link
Contributor Author

joundso commented Jan 21, 2022

@wibeasley Absolutely! Many thanks for the kind enquiry and especially for the stringent and instructive implementation! In my view, there is nothing more that needs to be added. Especially nothing that is worth delaying the CRAN submission.

@joundso joundso closed this as completed Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants