You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Regex has a variety of vectorized approaches to quickly skipping to the first character that might possibly start a match. For example, given "\shi", it will likely IndexOf('h') and then back off one character. However, given an expression like "(\w+)://", where the literals in the expression are a variable distance from the beginning of the match, it won't currently employ any vectorized searching and will instead walk each character checking it against being a word char (\w). We could, however, optimize a subset of these cases, in particular where we have a setloopatomic (potentially in a capture) followed by a literal that's not part of the set; there, we can IndexOf for the literal and then walk backwards from the literal as long as the prior characters are in the set. A prototype of this shows huge gains on benchmarks like https://github.com/mariomka/regex-benchmark/blob/17d073ec864931546e2694783f6231e4696a9ed4/csharp/Benchmark.cs#L23. The main downside is we'll end up paying double the cost for the set lookups, once backwards in FindFirstChar, once forwards in Go, unless we can successfully pass more information from FindFirstChar to Go to signal where that setloopatomic ends.
The text was updated successfully, but these errors were encountered:
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.
Issue Details
Regex has a variety of vectorized approaches to quickly skipping to the first character that might possibly start a match. For example, given "\shi", it will likely IndexOf('h') and then back off one character. However, given an expression like "(\w+)://", where the literals in the expression are a variable distance from the beginning of the match, it won't currently employ any vectorized searching and will instead walk each character checking it against being a word char (\w). We could, however, optimize a subset of these cases, in particular where we have a setloopatomic (potentially in a capture) followed by a literal that's not part of the set; there, we can IndexOf for the literal and then walk backwards from the literal as long as the prior characters are in the set. A prototype of this shows huge gains on benchmarks like https://github.com/mariomka/regex-benchmark/blob/17d073ec864931546e2694783f6231e4696a9ed4/csharp/Benchmark.cs#L23. The main downside is we'll end up paying double the cost for the set lookups, once backwards in FindFirstChar, once forwards in Go, unless we can successfully pass more information from FindFirstChar to Go to signal where that setloopatomic ends.
Regex has a variety of vectorized approaches to quickly skipping to the first character that might possibly start a match. For example, given "\shi", it will likely IndexOf('h') and then back off one character. However, given an expression like "(\w+)://", where the literals in the expression are a variable distance from the beginning of the match, it won't currently employ any vectorized searching and will instead walk each character checking it against being a word char (\w). We could, however, optimize a subset of these cases, in particular where we have a setloopatomic (potentially in a capture) followed by a literal that's not part of the set; there, we can IndexOf for the literal and then walk backwards from the literal as long as the prior characters are in the set. A prototype of this shows huge gains on benchmarks like https://github.com/mariomka/regex-benchmark/blob/17d073ec864931546e2694783f6231e4696a9ed4/csharp/Benchmark.cs#L23. The main downside is we'll end up paying double the cost for the set lookups, once backwards in FindFirstChar, once forwards in Go, unless we can successfully pass more information from FindFirstChar to Go to signal where that setloopatomic ends.
The text was updated successfully, but these errors were encountered: