Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Behavior: redcap_read_oneshot crashes due to different encoding #391

Closed
izabelafs opened this issue May 2, 2022 · 7 comments
Closed

Comments

@izabelafs
Copy link

Hello, thanks for the great integration solution! I'm having an issue that seems to be related to the way how encoding is taken into the functions.

The importing instance fails due to non-UTF-8 characters in some of the fields of the project. I'm including an exert of the problematic data fields to reproduce the issue behavior.

There are a few open issues that seem to be similar to this but none of them were able to provide a good solution. #269, #270, #272, #136.

Apparently, there is an additional option, encoding in the function options but it does not appear in the latest version I have installed.

Initially, I thought that the issue could be solved by providing to the function the option:

config_options = list(accept_encoding = "UTF-16")

which as far as I went through the code, provides additional options to httr::content in the kernel_api function. However, when debugging the function the additional option was not taken.

Importing data from REDCap using the redcap_read_oneshot should be able to deal with different encodings, which might be present due to insertion from different operating systems, such as Unix-based or Windows.

  • OS: macOS Monterey 12.3.1
  • REDCap version 12.2.6
  • REDCapR Version 1.0.0
    gge.csv
@wibeasley
Copy link
Member

Q1) Let me make sure I'm understanding the "importing" terminology. Is your goal to

  1. read the gg.csv into an R data.frame, and then upload the data.frame to REDCap? (If so, can I see what it looks like in RStudio's preview window?
  2. or to read from REDCap, and the gge.csv is the result?

I think it's the latter, but I want to be sure. I've addressed encoding in a ~6 issues (most recently in #357) but would love more example datasets to add to REDCapR's test suite.

Q2) When you manually export the dataset to a csv, is it correct? What program & encoding do you use?

Q3) Also, can you try it with the GitHub version of REDCapR?

remotes::install_github(repo="OuhscBbmc/REDCapR")

@izabelafs
Copy link
Author

Q1) In that case, would be the second, I want to import from REDCap into R. The gge.csv is just one of the instances that raise the issue.

Q2) Even when I manually export, which I did to get the gge.csv file, the characters from different encoding are still shown. But the exporting succeed. I'm using excel in a macOS, so utf-8.

Q3) I just reinstalled the GitHub version but the option encoding still does not appear in the redcap_read_oneshot function.

@wibeasley
Copy link
Member

Q2) Even when I manually export, which I did to get the gge.csv file, the characters from different encoding are still shown. But the exporting succeed. I'm using excel in a macOS, so utf-8.

This suggests to me that the REDCapR and the API aren't involved, and this is an issue/question that's better addressed at the level of REDCap. In other words, it's probably best to figure out the encoding issue before the plain-text is returned on the webserver side and before the plain-text is processed through R. If you think I'm missing something, please tell me.

Go to the community site (or ask your site's REDCap admin if you don't have permission) and see if your question has been asked before. You might start with https://community.projectredcap.org/search.html?q=encoding.

Q3) I just reinstalled the GitHub version but the option encoding still does not appear in the redcap_read_oneshot function.

I think the parameter you're looking for is http_response_encoding.
https://ouhscbbmc.github.io/REDCapR/reference/redcap_read_oneshot.html
image

@wibeasley wibeasley reopened this May 3, 2022
@wibeasley
Copy link
Member

Sorry, didn't mean to close the issue. Can you post screenshots of what the file looks like on your machine, and what the text looks like in REDCap? I'd like to understand better where it's rendered correctly and where it stops. Here are three common encodings rendered on my machine (specifically with Libre Office).

Also, I wondering if gge.csv is csv straight from REDCap. The "Index/Document.iwa" snippet on line 1 and "gregorianlatn" snippet on line 6 look like metadata, which wouldn't be contained in a conventional csv.

image

image

image

@izabelafs
Copy link
Author

izabelafs commented May 5, 2022

Q3) Thank you for the right option, it worked perfectly in the function.

Q2) The curious thing I noticed yesterday is for my production project, which is the one I'm having the issue with that I cannot export due to the different encoding characters.

I did the same checking in the reciprocate development project and it also contains different encoding characters but did not throw the same issue in the REDCapR exporting.

It follows the screenshot:

The gge.csv file is a short export, for the cases in the development project I cannot export as is sensible data.

@wibeasley
Copy link
Member

Q3) sweet, glad the REDCapR function worked.

Q2) Maybe try reading the csv with Libre Office? It's a little easier to find the right encodings. I used it for the screenshots in the post above.

(Your screenshot didn't come through in your previous post.)

@izabelafs
Copy link
Author

Thank you for looking into this! My guess is that the main issue is in fact in REDCap database management and not into the importing/exporting steps using REDCapR. For that reason, I'm closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants