Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CJK in footer text renders incorrectly on posts #417

Closed
shikokuchuo opened this issue Oct 22, 2021 · 6 comments · Fixed by #446
Closed

CJK in footer text renders incorrectly on posts #417

shikokuchuo opened this issue Oct 22, 2021 · 6 comments · Fixed by #446
Labels
bug 🐛 Something isn't working

Comments

@shikokuchuo
Copy link
Member

I have CJK text in _footer.html

Knitting individual posts does not render this text correctly but produces gobbledygook such as:
���中央�役�����������

Using rmarkdown::render_site() also causes this footer text to be mangled on all posts (as I guess it re-renders the footer text on each post). However the text renders correctly for the site pages such as index.html in the base directory.

It should be that everything is UTF-8 and we would not encounter such issues.

sessionInfo() output:

R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/oneapi/mkl/2021.4.0/lib/intel64/libmkl_rt.so.1

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] magrittr_2.0.1  leaflet_2.0.4.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7        imager_0.42.10    compiler_4.1.1    bslib_0.3.1       jquerylib_0.1.4   tools_4.1.1      
 [7] digest_0.6.28     downlit_0.2.1     jsonlite_1.7.2    lubridate_1.8.0   evaluate_0.14     pkgconfig_2.0.3  
[13] png_0.1-7         rlang_0.4.12      igraph_1.2.7      rstudioapi_0.13   crosstalk_1.1.1   distill_1.3      
[19] yaml_2.2.1        xfun_0.27         readbitmap_0.1.5  bmp_0.3           fastmap_1.1.0     stringr_1.4.0    
[25] knitr_1.36        xml2_1.3.2        htmlwidgets_1.5.4 generics_0.1.0    sass_0.4.0        askpass_1.1      
[31] rprojroot_2.0.2   R6_2.5.1          jpeg_0.1-9        rmarkdown_2.11    bookdown_0.24     purrr_0.3.4      
[37] htmltools_0.5.2   rsconnect_0.8.24  mime_0.12         tiff_0.1-8        stringi_1.7.5     openssl_1.4.5    
@shikokuchuo
Copy link
Member Author

Can one of the UTF / text encoding experts have a look into this please?

@cderv
Copy link
Collaborator

cderv commented Feb 28, 2022

Is _footer.html correctly encoding in UTF-8 ? Can you provide a reproducible example ?

Similar issue has been presumably fixed in the past (#98), so we need to have something to work on and test ourself. Thank you.

@cderv cderv added the reprex needs a minimal reproducible example label Feb 28, 2022
@shikokuchuo
Copy link
Member Author

I have created a minimal reprex using a clean website blog created from RStudio. All I've done is add _footer.html with some Japanese, knit the post and render the site.

Repo at: https://github.com/shikokuchuo/demo

As you can see, the text renders correctly in index.html here:
https://shikokuchuo.net/demo/
but is garbled in the actual post here:
https://shikokuchuo.net/demo/posts/welcome/

Thanks!

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/oneapi/mkl/2022.0.1/lib/intel64/libmkl_rt.so.2

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] distill_1.3.2

loaded via a namespace (and not attached):
 [1] compiler_4.1.3  fastmap_1.1.0   cli_3.2.0       htmltools_0.5.2 tools_4.1.3     yaml_2.3.5     
 [7] memoise_2.0.1   rmarkdown_2.13  downlit_0.4.0   cachem_1.0.6    knitr_1.37      xfun_0.30      
[13] digest_0.6.29   rlang_1.0.2     evaluate_0.15  

@cderv
Copy link
Collaborator

cderv commented Mar 17, 2022

Thanks a lot for your example repo. This helped me found where the error is. The footer file will be process in some cases here

distill/R/navigation.R

Lines 278 to 285 in dcfab66

if (!is.null(offset)) {
html <- xml2::read_html(file)
fixup_element_paths(html, "a", "href")
fixup_element_paths(html, "img", "src")
tmp <- tempfile(fileext = ".html")
xml2::write_html(html, tmp, options = c("format", "no_declaration"))
file <- tmp
}

and encoding is not correctly handle with xml2 here - I'll see how this can be fixed. Thanks a lot !

@cderv cderv added reprex needs a minimal reproducible example bug bug 🐛 Something isn't working and removed reprex needs a minimal reproducible example bug labels Mar 17, 2022
@cderv
Copy link
Collaborator

cderv commented Mar 17, 2022

Please can you try rendering your website using

remotes::install_github("rstudio/distill#446")

This works for me, I would like you to confirm. thanks!

@shikokuchuo
Copy link
Member Author

@cderv thanks yes, I can confirm it works (on the real site as well as the reprex). Thanks for the quick turnaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants