Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider supporting character encodings other than utf-8 #112

Open
drevell opened this issue Jul 27, 2023 · 1 comment
Open

Consider supporting character encodings other than utf-8 #112

drevell opened this issue Jul 27, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@drevell
Copy link
Contributor

drevell commented Jul 27, 2023

TL;DR

Currently actions like string_replace and regex_replace assume that the files they are operating on are utf-8 encoded. We could consider making this more flexible.

Detailed design

No response

Alternatives considered

No response

Additional information

No response

@drevell drevell added the enhancement New feature or request label Jul 27, 2023
@sethvargo
Copy link
Contributor

Devil's advocate: we could document that we only support utf-8 and explicitly check that inputs are utf8.ValidString. I don't see a use case where we'd want to support binary encodings, and I'm not sure how prevalent utf-16 and utf-32 strings will be for our use cases.

It's also possible to "force" a string to be utf-8:

v := make([]rune, 0, len(s))

for i, r := range s {
  if r == utf8.RuneError {
    if _, size := utf8.DecodeRuneInString(s[i:]); size == 1 {
      continue
    }
  }
  v = append(v, r)
}

s = string(v)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

2 participants