Skip to content

Commit

Permalink
Simpler parsing algorthim
Browse files Browse the repository at this point in the history
Fixes #189
  • Loading branch information
hadley committed May 28, 2024
1 parent c0a8f64 commit 0c3455c
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 8 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ License: MIT + file LICENSE
URL: https://downlit.r-lib.org/, https://github.com/r-lib/downlit
BugReports: https://github.com/r-lib/downlit/issues
Depends:
R (>= 3.6)
R (>= 4.0.0)
Imports:
brio,
desc,
Expand All @@ -40,4 +40,4 @@ Config/Needs/website: tidyverse/tidytemplate
Config/testthat/edition: 3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
RoxygenNote: 7.3.1
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# downlit (development version)

* Use simpler parsing algorithm for R 4.0, which avoids crash with certain UTF-8 characters (#189).

# downlit 0.4.3

* Fix for upcoming R-devel (#169).
Expand Down
7 changes: 1 addition & 6 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,8 @@ safe_parse <- function(text, standardise = TRUE) {
lines <- strsplit(text, "\n", fixed = TRUE, useBytes = TRUE)[[1]]
srcfile <- srcfilecopy("test.r", lines)

# https://github.com/gaborcsardi/rencfaq#how-to-parse-utf-8-text-into-utf-8-code
Encoding(text) <- "unknown"
con <- textConnection(text)
on.exit(close(con), add = TRUE)

tryCatch(
parse(con, keep.source = TRUE, encoding = "UTF-8", srcfile = srcfile),
parse(text = text, keep.source = TRUE, encoding = "UTF-8", srcfile = srcfile),
error = function(e) NULL
)
}
Expand Down
4 changes: 4 additions & 0 deletions tests/testthat/test-utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,7 @@ test_that("converts Latin1 encoded text to utf8", {
expect_equal(Encoding(y), "UTF-8")
expect_equal(y, "\u00fc")
})

test_that("doesn't crash on utf-8 characters", {
expect_equal(safe_parse("×"), NULL)
})

0 comments on commit 0c3455c

Please sign in to comment.