You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of parsing a regex, we perform various transformations on the tree, many of which are focused on creating a tree that will yield more efficient matching with a backtracking engine, e.g. we'll transform "this|that" into "th(?:is|at)". But many of these transformations don't affect a finite automata built from this tree, e.g. "this|that" and "th(?:is|at)" yield identical DFAs. Thus, in NonBacktracking mode, we should avoid spending any cycles doing these transformations that won't make a difference.
The text was updated successfully, but these errors were encountered:
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.
Issue Details
As part of parsing a regex, we perform various transformations on the tree, many of which are focused on creating a tree that will yield more efficient matching with a backtracking engine, e.g. we'll transform "this|that" into "th(?:is|at)". But many of these transformations don't affect a finite automata built from this tree, e.g. "this|that" and "th(?:is|at)" yield identical DFAs. Thus, in NonBacktracking mode, we should avoid spending any cycles doing these transformations that won't make a difference.
Does this show up in a profile such that it's worth the special casing?
Yes. And we also already special-case out of necessity, e.g. the analysis we do to automatically apply atomicity are disabled because atomic groups aren't supported in the DFA-based implementation today.
After discussions with @veanes and @olsaarik, it seems like in general these optimizations for backtracking are still valuable for non-backtracking, just in order to reduce the lazy construction time. So, we'll leave it all and can just disable ones on a case-by-case basis as need proves.
As part of parsing a regex, we perform various transformations on the tree, many of which are focused on creating a tree that will yield more efficient matching with a backtracking engine, e.g. we'll transform "this|that" into "th(?:is|at)". But many of these transformations don't affect a finite automata built from this tree, e.g. "this|that" and "th(?:is|at)" yield identical DFAs. Thus, in NonBacktracking mode, we should avoid spending any cycles doing these transformations that won't make a difference.
The text was updated successfully, but these errors were encountered: