Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 throws an error when used in the first figure #436

Closed
egodrive opened this issue Jan 18, 2022 · 9 comments · Fixed by #437
Closed

UTF-8 throws an error when used in the first figure #436

egodrive opened this issue Jan 18, 2022 · 9 comments · Fixed by #437
Labels
bug 🐛 Something isn't working

Comments

@egodrive
Copy link

egodrive commented Jan 18, 2022

After playing around a bit it seems like distill/markdown throws an error when the first figure includes e.g. Æ, Ø, Å. In the code sample below it throws an error every time the first chunk that produces a figure output also includes æ,ø, å in the fig.cap-argument.

The error message I get is:
Error in substring(u, so, so + ml - 1L) :
invalid multibyte string, element 1
Calls: ... regmatches -> Map -> mapply -> -> substring
In addition: Warning messages:
1: In grepl("data-distill-preview=", line, fixed = TRUE) :
input string 1 is invalid UTF-8
2: In grepl("data-distill-preview=", line, fixed = TRUE) :
input string 1 is invalid UTF-8
Execution halted

But if you produce one figure with a fig.cap without these, you can use æ,ø, in subsequent figure captions without errors.

---
title: "Test"
description: > 
  Other test
draft: true
author:
  - name: Me and I

output:
  distill::distill_article:
    self_contained: false
    code_folding: true
---


Some texty text text.
 
```{r chunk 1}
# Generate some sample data, then compute mean and standard deviation
# in each group
library(ggplot2)
df <- data.frame(
  gp = factor(rep(letters[1:3], each = 10)),
  y = rnorm(30)
)
ds <- do.call(rbind, lapply(split(df, df$gp), function(d) {
  data.frame(mean = mean(d$y), sd = sd(d$y), gp = d$gp)
}))

# The summary data frame ds is used to plot larger red points on top
# of the raw data. Note that we don't need to supply `data` or `mapping`
# in each layer because the defaults from ggplot() are used.

aa = ggplot(df, aes(gp, y)) +
  geom_point() +
  geom_point(data = ds, aes(y = mean), colour = 'red', size = 3)

```

```{r 1, fig.show='hide', results='hide', echo = F, fig.cap = "khsd"}
aa
```

```{r 2, fig.cap = "khsdæ"}
aa
```
```{r 3, fig.cap = "ÆØÅ"}
aa
```
@cderv
Copy link
Collaborator

cderv commented Jan 19, 2022

I can reproduce the error by copying the example code above only if self_contained = FALSE is set. It seems it is related to preview feature which activated when self_contained is FALSE. In that case, a preview will be discovered if none is provided. It will use the first image found in the article. The issue is in discover_preview() function when dealing with Encoding. I'll push a fix.

Are you on Windows Perhaps ? If so this is the kind of issue that should be resolved also in next R version when Windows R will use UTF-8 encoding by default.

BTW, I edited your post - See this guide on how to correctly format for next time https://yihui.org/issue/#please-format-your-issue-correctly

@cderv
Copy link
Collaborator

cderv commented Jan 19, 2022

Can you try the PR to see if it solves the issue on your end ?

remotes::install_github("rstudio/distill#437")

Thanks!

@cderv cderv added the bug 🐛 Something isn't working label Jan 19, 2022
@egodrive
Copy link
Author

Great, that did the trick!

@cderv
Copy link
Collaborator

cderv commented Jan 24, 2022

Thanks for the confirmation. The fix is now merged in the dev version of distill

@cderv cderv moved this from Backlog to Done in R Markdown Team Projects Jan 24, 2022
@cderv cderv moved this to Backlog in R Markdown Team Projects Jan 24, 2022
@Mihiretukebede
Copy link

@cderv Thanks for your help. I am also having this problem. It didn't fix it after installing the dev version of distil. DO you sggest removing the self_contained: false part of the code? Will that solve it? It is not also working for rendering the figures that were produced after knitting my codes into distil article. It shows a huge chunks of UTF-8 ... invalid byte sequence in UTF-8 error.

@cderv
Copy link
Collaborator

cderv commented Mar 27, 2023

@Mihiretukebede can you reopen an issue with an example ?

I fixed this issue based on the example provided here. Maybe some further fixes are still needed and were missed. I need a reproducible example to investigate. Can you help me fix this for you ?

Thanks

@Mihiretukebede
Copy link

Mihiretukebede commented Mar 27, 2023

Hmm, basically the failer is caused by the figure files created after knitting into distil article. It creates new folder for files associated with the distil html file. Within that folder, it creates 12 folders and one is named "figure-html5" and it saves all the plots under this folder as file names "unnamed-chunk-1", "unnamed-chunk-2", etc. These unnamed chunk plot PNG files are the main cause. These files are plots created using ggplot2.

@cderv
Copy link
Collaborator

cderv commented Mar 27, 2023

Error message you get is the exact same as this issue ? Can you share a Rmd file throwing the error that I can render on my side maybe ?

@Mihiretukebede
Copy link

Mihiretukebede commented Mar 28, 2023

One example is below.
https://github.com/Mihiretukebede/mihiretukebede.gitlub.io/blob/master/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/scraping-individual-participant-data-from-scatter-plots.Rmd

After knitting that rmd file and pushing the changes, the build page fails with the following error (only for this one but I have a hge chunk of such error for other files).

Error: could not read file /github/workspace/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/scraping-individual-participant-data-from-scatter-plots_files/figure-html5/unnamed-chunk-4-1.png: invalid byte sequence in UTF-8 Error: could not read file /github/workspace/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/df.gif: invalid byte sequence in UTF-8 Error: could not read file /github/workspace/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/paste-586399E8.png: invalid byte sequence in UTF-8 Error: could not read file /github/workspace/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/scatterPlotdigitize.JPG: invalid byte sequence in UTF-8 Error: could not read file /github/workspace/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/digitize.jpg: invalid byte sequence in UTF-8 Error: could not read file /github/workspace/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/Animationnewlastr.gif: invalid byte sequence in UTF-8 Error: could not read file /github/workspace/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/lm.png: invalid byte sequence in UTF-8 Error: could not read file /github/workspace/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/publication.png: invalid byte sequence in UTF-8 Error: could not read file /github/workspace/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/corr.png: invalid byte sequence in UTF-8 Error: could not read file /github/workspace/_posts/2021-06-07-scraping-individual-participant-data-from-scatter-plots/points.gif: invalid byte sequence in UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants