Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_range_begin not filtering results #444

Closed
svenhalvorson opened this issue Oct 10, 2022 · 7 comments
Closed

date_range_begin not filtering results #444

svenhalvorson opened this issue Oct 10, 2022 · 7 comments
Assignees

Comments

@svenhalvorson
Copy link

svenhalvorson commented Oct 10, 2022

Thanks for this package ya'll. I'm having trouble with reading in data from a specific datetime range: redcap_read -- it doesn't seem to care about the datetime_range_begin parameter. I am working on a project (still in design phase) that I know no one else is manipulating. I edited two records today but it seems like redcap_read downloads the whole data set regardless of what start time is supplied. For comparison purposes, I tried just using RCurl and it does work but is off by a few hours (time zone differences I assume). Example:

test_hours = function(hrs){
  RCurl::postForm(
    uri=uri,
    token=token,
    content='record',
    format='csv',
    type='flat',
    rawOrLabel='raw',
    rawOrLabelHeaders='raw',
    exportCheckboxLabel='false',
    exportSurveyFields='false',
    exportDataAccessGroups='false',
    returnFormat='json',
    dateRangeBegin = strftime(Sys.time() + lubridate::hours(hrs), "%Y-%m-%d %H:%M:%S")
  ) |> 
    readr::read_csv() |> 
    nrow()
}

hour_records = tibble::tibble(
  hours = seq(from = -48, to = 48, by = 2),
  rows = purrr::map_dbl(.x = hours, .f = test_hours)
)


test_hours2 = function(hrs){
  
  REDCapR::redcap_read(
    redcap_uri = uri,
    token = token,
    datetime_range_begin  = Sys.time() + lubridate::hours(hrs)
  ) |> 
    purrr::pluck('data') |> 
    nrow()
}


hour_records2 = tibble::tibble(
  hours = seq(from = -48, to = 48, by = 2),
  rows = purrr::map_dbl(.x = hours, .f = test_hours2)
)

The version with RCurl gives the correct number of records and eventually zero records at +5 hours. The version with REDCapR gives all the records for each time offset.

I tried to run this through the debugger and suspect it's something going wrong in httr::POST but I'm not positive.

Thanks!

Desktop (please complete the following information):

  • OS: Windows 10 Enterprise
  • REDCap version 9.4.2
  • REDCapR Version 1.1.0
@svenhalvorson svenhalvorson changed the title Unexpected Behavior: {short description here} date_range_begin not filtering results Oct 10, 2022
@wibeasley
Copy link
Member

Thanks @svenhalvorson for thoroughly explaining the problem and including an example. Unfortunately I can't reproduce the difference between the approaches. I'm wondering what's different between our clients and servers.

The thing about +5 hours makes me wonder if UTC is involved somewhere? The only difference between the RCurl & REDCapR code is the strftime(), but that doesn't look suspicious to me.

I like your demonstration code. I've adjusted it to include more comparisons. What do you get when you run my exact code (that uses my server's example dataset)? What do you use your token (on your project & server)?

and eventually zero records at +5 hours

+5 hours for dateRangeBegin, right? Shouldn't zero records be returned when if dateRangeBegin equals Sys.time() (i.e., at 0 hours)? If not, I'm misunderstanding the scenario or problem.

I don't think it should matter, but use the most recent GitHub version with remotes::install_github("OuhscBbmc/REDCapR").

# Run this first line once to reset the values for the `dateRangeBegin` filter.
# REDCapR:::clean_start_simple(batch = FALSE, verbose = TRUE)

credential  <- REDCapR:::retrieve_credential_testing(213L)
col_types   <- readr::cols(.default = readr::col_character())

rcurl <- function(offset_hours) {
  RCurl::postForm(
    uri                     = credential$redcap_uri,
    token                   = credential$token,
    content                 = 'record',
    format                  = 'csv',
    type                    = 'flat',
    rawOrLabel              = 'raw',
    rawOrLabelHeaders       = 'raw',
    exportCheckboxLabel     = 'false',
    exportSurveyFields      = 'false',
    exportDataAccessGroups  = 'false',
    returnFormat            = 'json',
    dateRangeBegin = strftime(Sys.time() + lubridate::hours(offset_hours), "%Y-%m-%d %H:%M:%S")
  ) |> 
  readr::read_csv(col_types = col_types) |> 
  nrow()
}

httr <- function(offset_hours) {
  post_body <- list(
    token                   = credential$token,
    content                 = "record",
    format                  = "csv",
    type                    = "flat",
    dateRangeBegin          = strftime(Sys.time() + lubridate::hours(offset_hours), "%Y-%m-%d %H:%M:%S")
  )
  
  credential$redcap_uri |> 
    httr::POST(body   = post_body) |> 
    httr::content(as  = "text") |> 
    readr::read_csv(col_types = col_types) |> 
    nrow()
}

redcapr_batch <- function(offset_hours) {
  REDCapR::redcap_read(
    redcap_uri            = credential$redcap_uri,
    token                 = credential$token,
    col_types             = col_types,
    datetime_range_begin  = Sys.time() + lubridate::hours(offset_hours)
  )$data |>
  nrow()
}

redcapr_oneshot <- function(offset_hours) {
  REDCapR::redcap_read_oneshot(
    redcap_uri            = credential$redcap_uri,
    token                 = credential$token,
    col_types             = col_types,
    datetime_range_begin  = Sys.time() + lubridate::hours(offset_hours)
  )$data |>
  nrow()
}

hour_records <- tibble::tibble(
  hours = seq.int(from = -2L, to = 2L, by = 1L),
  rcurl   = purrr::map_int(.x = hours, .f = rcurl),
  httr    = purrr::map_int(.x = hours, .f = httr),
  batch   = purrr::map_int(.x = hours, .f = redcapr_batch),
  oneshot = purrr::map_int(.x = hours, .f = redcapr_oneshot)
)

hour_records
# A tibble: 5 × 5
  hours rcurl  httr batch oneshot
  <int> <int> <int> <int>   <int>
1    -2     5     5     5       5
2    -1     5     5     5       5
3     0     0     0     0       0
4     1     0     0     0       0
5     2     0     0     0       0

@svenhalvorson
Copy link
Author

svenhalvorson commented Oct 24, 2022

Thank you @wibeasley for the detailed response. I figured out that the server is set on UTC (action tags @NOW-SERVER and @NOW-UTC give the same value). The RCurl and httr results are consistent with this.

When I run it with your credentials I get the same output as you do. Using my own credentials and adjusting the hours, I get this:

image

I edited two records today (about 4 hours ago) and these are the two that RCurl/httr are picking up. Still not sure why I'm getting every record using REDCapr.

Might be some difference between my server and yours? Not really sure what to do from here.

@wibeasley
Copy link
Member

wibeasley commented Oct 24, 2022

Where are you located in the world? What is the locale of your client? Maybe there's a relevant REDCap setting too?

I'm stumped. I can imagine how our servers are set up differently. I don't understand how the httr and oneshot versions are different. The oneshot uses the same strftime() format...

datetime_range_begin<- dplyr::coalesce(strftime(datetime_range_begin, "%Y-%m-%d %H:%M:%S"), "")

...and passes it to the same element of post_body.

dateRangeBegin = datetime_range_begin,

What is the right answer? 2 records in the past 8 hours?

If you extended the timeline forwards & backwards, when does oneshot and batch start returning zero? When does rcurl and httr start returning 93?

@svenhalvorson
Copy link
Author

I'm on the west coast (PST). Updating a record and testing this gives me the 7 hours offset I would expect when using RCurl and httr. I think the example I gave in the original issue had modifications that were few hours old which is why they stopped being downloaded at +5.

The correct answer is 2 records in the past 8 hours. I tried running this from -2000 to +200 and the REDCapR functions always give all the rows (including 200 hours in the future). The RCurl and httr versions get to the same number of records at about -2000 hours which is about when I started working on the project.

@wibeasley
Copy link
Member

  1. Anything revealing from your client's locale (ie, readr::default_locale())?
  2. Are all the packages using the latest version on CRAN (besides REDCapR, which is using the latest version on GitHub)?
  3. If you're still interested in getting to bottom of it, I guess the next phase is to step-through each line of the inner function (eg, run debug(REDCapR:::kernel_api)) before & after contacting the server. Compare those internal REDCapR values against the httr code above.

@svenhalvorson
Copy link
Author

svenhalvorson commented Oct 25, 2022

Alright, I learned something!

I went through debug(REDCapR:::kernel_api)) and saved the last post_body to the global environment (where content = 'record'). Then tried adding each field in post_body to this httr call

post_httr = list(
  token                   = credential$token,
  content                 = "record",
  format                  = "csv",
  type                    = "flat",
  dateRangeBegin          = strftime(Sys.time() + lubridate::hours(-2), "%Y-%m-%d %H:%M:%S")
)

Then counted the rows returned and got this:

image

Which lead me to understand that the problem is omitting datetime_range_end! Adding dateRangeBegin = '' to the RCurl and httr versions will also give wrong number of rows (all of them). Very strange, and I'm not sure why this happening.

This solves my immediate problem -- I can specify datetime_range_end for my project -- but that doesn't help you so much. Is there any other information I can provide to help with the library?

Thank you for your suggestions and patience!

@wibeasley
Copy link
Member

wibeasley commented Oct 25, 2022

Yeah, I'm glad that helped. debug() is a useful tool in scenarios where you don't want to load the source code.

Thanks for using it to figure this out. That explains why the approaches were acting differently on the same server.

I think the remaining problem will be solved when your server upgraded. This is from the release notes for REDCap v9.7.4. Although it's not exactly the behavior you're describing, it's close enough that I'm guessing your problem was solved at the same time. My site has a newer version of REDCap, so I think that explains our differences.

  • Bug fix: When using the dateRangeBegin parameter for the "Export Records" API method, if the dateRangeEnd was left blank or not included as a parameter in the API request, then the API would not function correctly and would mistakenly return no data in the API response.

It sounds like you have a good workaround for your current server (ie, by passing an explicit value to both boundaries). If this workaround is still required after your site upgrades REDCap, please start a new REDCapR issue and refer to this one.


Note to future self: I considered adding a non-blank default value to the date range parameters, but decided against it because:

  • I'm guessing it could introduce a performance penalty, if the PHP/SQL code adds it as a new entry in a WHERE clause. (But I haven't looked at the source code to confirm this.)
  • I don't want to risk interfering with locales, time zones, or complications like this that would make maintenance harder.

@wibeasley wibeasley self-assigned this Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants