Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex-lite with a &[u8] haystack #1196

Open
SimonSapin opened this issue May 18, 2024 · 2 comments
Open

regex-lite with a &[u8] haystack #1196

SimonSapin opened this issue May 18, 2024 · 2 comments

Comments

@SimonSapin
Copy link
Contributor

Describe your feature request

regex::bytes::Regex can search &[u8] haystacks that are not necessarily well-formed UTF-8. This is great for text-based file formats that predate Unicode and hand-wave encodings as a platform detail.

Could the regex-lite crate have a similar feature? As far as I can tell it doesn’t in 0.1.5

@BurntSushi
Copy link
Member

Yeah that is definitely the intent. The main internal APIs are specifically defined on &[u8]. I just didn't do this initially because I wasn't 100% certain folks would want it.

The main hitch here is that I think it needs to be a disabled by default opt-in feature. The reason is that it will add a fair bit of code (basically a copy of string.rs, but for bytes), and the primary purpose of regex-lite is to keep binary size small and compilation times short.

I'm not sure when I'll have a chance to work on this, but I could review a PR adding this.

@SimonSapin
Copy link
Contributor Author

Thanks for the feedback. For what it’s worth, for now I went with regex::bytes and disabled Unicode-related cargo features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants