You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've run into a crippling performance regression on certain types of queries and non-UTF-8 files between 0.10.0 and 11.0.0, which looks like it might even be an infinite loop.
If this is a bug, what are the steps to reproduce the behavior?
A very simple way is to create a file containing only two bytes, "sä" encoded with ISO 8559-1, and search for a pattern with a short prefix that matches the "s" but not the rest, like '\bs(?:thiswillnotmatch|norwillthis)':
This fixes a bug introduced by a bug fix for #557. In particular, the
termination condition wasn't exactly right, and this appears to have
slipped through the test suite. This probably reveals a hole in our test
suite, which is specifically the testing of Unicode regexes with
bytes::Regex on invalid UTF-8.
This bug was originally reported against ripgrep:
BurntSushi/ripgrep#1247
Thanks for reporting this bug! This was actually a regression introduced in the underlying regex engine (as a result of fixing an unrelated bug). I've published a fix for the regex engine and brought in the updated version on ripgrep master. I'll put out a new point release of ripgrep with this fix soon.
What version of ripgrep are you using?
And I'm comparing it to:
How did you install ripgrep?
From the binary releases for
x86_64-unknown-linux-musl
:What operating system are you using ripgrep on?
Arch Linux
Describe your question, feature request, or bug.
I've run into a crippling performance regression on certain types of queries and non-UTF-8 files between 0.10.0 and 11.0.0, which looks like it might even be an infinite loop.
If this is a bug, what are the steps to reproduce the behavior?
A very simple way is to create a file containing only two bytes, "sä" encoded with ISO 8559-1, and search for a pattern with a short prefix that matches the "s" but not the rest, like
'\bs(?:thiswillnotmatch|norwillthis)'
:The
\b
does seem to be required at least in this case.Another example file that reproduces this is
sherlock.br
in ripgrep's own source code, using the exact same pattern.If this is a bug, what is the actual behavior?
11.0.0 seems to spin forever:
If this is a bug, what is the expected behavior?
0.10.0 has no problems and gives a result in a few milliseconds:
The text was updated successfully, but these errors were encountered: