Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip unnecessary RegexNode reductions in NonBacktracking mode #62681

Closed
stephentoub opened this issue Dec 12, 2021 · 4 comments
Closed

Skip unnecessary RegexNode reductions in NonBacktracking mode #62681

stephentoub opened this issue Dec 12, 2021 · 4 comments
Labels
area-System.Text.RegularExpressions untriaged New issue has not been triaged by the area owner
Milestone

Comments

@stephentoub
Copy link
Member

As part of parsing a regex, we perform various transformations on the tree, many of which are focused on creating a tree that will yield more efficient matching with a backtracking engine, e.g. we'll transform "this|that" into "th(?:is|at)". But many of these transformations don't affect a finite automata built from this tree, e.g. "this|that" and "th(?:is|at)" yield identical DFAs. Thus, in NonBacktracking mode, we should avoid spending any cycles doing these transformations that won't make a difference.

@ghost
Copy link

ghost commented Dec 12, 2021

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

As part of parsing a regex, we perform various transformations on the tree, many of which are focused on creating a tree that will yield more efficient matching with a backtracking engine, e.g. we'll transform "this|that" into "th(?:is|at)". But many of these transformations don't affect a finite automata built from this tree, e.g. "this|that" and "th(?:is|at)" yield identical DFAs. Thus, in NonBacktracking mode, we should avoid spending any cycles doing these transformations that won't make a difference.

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: 7.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Dec 12, 2021
@danmoseley
Copy link
Member

Does this show up in a profile such that it's worth the special casing?

@stephentoub
Copy link
Member Author

Does this show up in a profile such that it's worth the special casing?

Yes. And we also already special-case out of necessity, e.g. the analysis we do to automatically apply atomicity are disabled because atomic groups aren't supported in the DFA-based implementation today.

@stephentoub
Copy link
Member Author

After discussions with @veanes and @olsaarik, it seems like in general these optimizations for backtracking are still valuable for non-backtracking, just in order to reduce the lazy construction time. So, we'll leave it all and can just disable ones on a case-by-case basis as need proves.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.RegularExpressions untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

2 participants