Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special (German) characters #296

Closed
teigentler opened this issue Feb 10, 2020 · 7 comments
Closed

Special (German) characters #296

teigentler opened this issue Feb 10, 2020 · 7 comments
Assignees
Labels
nonascii accommodate non-ascii character

Comments

@teigentler
Copy link

Hi there,

thank you very much for this great package; really appreciated.
I just ran into an issue when writing special German characters into RedCap. I tried it with config_options like this:
REDCapR::redcap_write(ds=reference,redcap_uri=url_server,token = token, config_options = list(encoding="ISO-8859-15"))
but it does not seem to work. All special characters are stripped out. Also setting encoding to UFT-8 did not solve the problem.

Any ideas?

Thank you in advance
Thomas

@wibeasley wibeasley self-assigned this Feb 13, 2020
@wibeasley
Copy link
Member

I'm willing to try suggestions, but this isn't an area I've dealt with much. Have you checked the forums (community.projectredcap.org) for a relevant discussion?

Reading from REDCap

The raw text gets set to a data.frame almost immediately. So if there's a problem, I guess it's before

ds <-
kernel$raw_text %>%
readr::read_csv(col_types = col_types, guess_max = guess_max) %>%
as.data.frame(),

And raw_text comes soon after the HTTP::POST() call:

REDCapR/R/kernel-api.R

Lines 58 to 63 in b25965c

raw_text <- httr::content(
x = result,
as = "text",
encoding = encoding, # UTF-8 is the default parameter value
type = content_type # text/csv is the default parameter value
)

https://github.com/OuhscBbmc/REDCapR/blob/6c4e6b531481f44295a6a28408f34983befc18cc/tests/testthat/test-read-russian.R

Writing to REDCap

This is also a pretty straight shot. I'm not intentionally changing the encoding or flattening to ASCII for the data itself (I am doing this for the column names though).

con <- base::textConnection(
object = "csv_elements",
open = "w",
local = TRUE
)
utils::write.csv(ds, con, row.names = FALSE, na = "")
close(con)

Testing

I've tried a few ways to test the results, but I met problems with different machines/OSes producing slightly different results. This is the current incomplete state.

https://github.com/OuhscBbmc/REDCapR/blob/6c4e6b531481f44295a6a28408f34983befc18cc/tests/testthat/test-read-russian.R

@teigentler
Copy link
Author

Interestingly, I have now read out the string in the MySQL database directly that is used by RedCap. I noticed that the special characters are basically transferred, but in quoted-printable format (https://en.wikipedia.org/wiki/Quoted-printable).

@wibeasley
Copy link
Member

wibeasley commented Feb 15, 2020

That's interesting. Can you follow its path (with something like browser() or debug(), and see where the translation occurs?

@teigentler
Copy link
Author

Things are even getting more strange.

I tried it on my Windows system with
REDCapR::redcap_write(ds=reference,redcap_uri=url_server,token = token)
and it works perfectly. On the Debian machine, I get this encoding problem.

Tried it with
REDCapR::redcap_write(ds=reference,redcap_uri=url_server,token = token, config_options = list(c("httpheader"= c("Content-Type"= "application/x-www-form-urlencoded"), c("charset"="utf-8"))))
but without success.

Could you please provide an example of how to deal with the config_options? Thank you!

@wibeasley
Copy link
Member

  1. Yeah, I remember the Windows & Linux machines giving slightly different results with those Russian characters.

  2. I think the only time I've used config_options has been for SSL certificates. Here's an example:
    https://ouhscbbmc.github.io/REDCapR/articles/advanced-redcapr-operations.html#ssl-options

  3. If you attach a csv with some of these problematic values, I'll give it a shot. If so, give me a simplified version, with just one or two columns and ~5 rows, each with a word containing a different non-ASCII character.

  4. I'm curious what happens with Postman on the two different OSes.

@wibeasley
Copy link
Member

@teigentler, do you have any more information?

Issue #307 identified a limitation with config options, so it might improve your scenario if you're using 2+ options. I'm about to pull those changes into the master branch (update your machine with remotes::install_github(repo="OuhscBbmc/REDCapR")).

@wibeasley
Copy link
Member

@teigentler, I hope that #307 a year ago fixed it. I'm closing this issue. Reopen it if you're still encountering the problem and are able to provide the info (described in the Feb 15, 2020 post).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nonascii accommodate non-ascii character
Projects
None yet
Development

No branches or pull requests

2 participants