Skip to content

Commit

Permalink
compile: make Regex::new(r"(?-u:\B)") fail again
Browse files Browse the repository at this point in the history
This regex failed to compile in `regex <1.8`, but the migration to
regex-automata tweaked the rules in a subtle way that permitted it
to compile despite the fact that the old/status-quo matching engines
can't handle it correctly. By that, I mean that they may permit the \B
to match between code units. That in turn results in panicking when
slicing a &str.

In `regex 1.9`, this regex will actually be able to be compiled, but
the matching engines will correctly and robustly never report matches
that split UTF-8 code units. For now, we just add code that causes
`regex 1.8` to have the same behavior as previous releases.

Fixes #1006
  • Loading branch information
BurntSushi committed Jun 5, 2023
1 parent a1a9ebe commit b2ca9c1
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions src/compile.rs
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,15 @@ impl Compiler {
}

fn compile_one(mut self, expr: &Hir) -> result::Result<Program, Error> {
if self.compiled.only_utf8
&& expr.properties().look_set().contains(Look::WordAsciiNegate)
{
return Err(Error::Syntax(
"ASCII-only \\B is not allowed in Unicode regexes \
because it may result in invalid UTF-8 matches"
.to_string(),
));
}
// If we're compiling a forward DFA and we aren't anchored, then
// add a `.*?` before the first capture group.
// Other matching engines handle this by baking the logic into the
Expand Down

0 comments on commit b2ca9c1

Please sign in to comment.