Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a non-backtracking NFA/DFA regex engine where possible #18614

Closed
DemiMarie opened this issue Sep 17, 2016 · 6 comments · Fixed by #60607
Closed

Use a non-backtracking NFA/DFA regex engine where possible #18614

DemiMarie opened this issue Sep 17, 2016 · 6 comments · Fixed by #60607
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Text.RegularExpressions
Milestone

Comments

@DemiMarie
Copy link

Stack Exchange suffered an outage due to a regular expression that used quadratic time.

This is a feature request for using a DFA/NFA engine whenever possible – that is, on all regexps that don't have backreferences or any other non-regular features. Note that backreferences almost certainly cannot be handled by any efficient algorithm in general – the matching problem for regular expressions plus backreferences is NP hard.

@Priya91
Copy link
Contributor

Priya91 commented Sep 24, 2016

@DemiMarie .NET supports the non-backtracking atomic groups (?>). We can have a DFA engine in .NET Regex (been in backlog for sometime now) and allow options to chose this over the NFA, but again that doesn't prevent the regex stack overflow from happening, if the regex author is not careful with the quantifiers or unexpected input patterns.

@Priya91
Copy link
Contributor

Priya91 commented Nov 29, 2016

Putting it in backlog of features for Regex space.

@ViktorHofer
Copy link
Member

Stack Exchange suffered an outage due to a regular expression that used quadratic time.

Why isn't Stack Exchange using timeouts on their regex expressions to avoid unwanted computations?

@danmoseley
Copy link
Member

@NickCraver you may have some thoughts to share?

@NickCraver
Copy link
Member

Why isn't Stack Exchange using timeouts on their regex expressions to avoid unwanted computations?

We have for many years in our web tier, but this isn't the default .NET behavior and these happening were in a section of code that doesn't respect the web.config setting (e.g. defaultRegexMatchTimeout="00:00:20"). I gave feedback on making this easier for developers, but ultimately others didn't agree: https://github.com/dotnet/corefx/issues/15173

In my opinion the lack of a intuitive way to set this (or even default it to something reasonable) is bad. The API was ported back from desktop "because it was there" (as far as I can tell), and I don't agree with this at all. The messaging has been that app domains are dead in .NET Core (kinda true, kinda not) yet the only way to set this is there. It's just a really weird message to developers and approach overall. The old API was bad, porting it makes sense in some ways but it's still severely lacking from a usability perspective. Environmental variables (that's effectively what this is) you have to look up will never be intuitive solutions. For something like this which is security critical, lack of discoverability is doubly bad.

I still believe there should be a static method on Regex to set this, regardless of the implementation underneath.

@ViktorHofer
Copy link
Member

but this isn't the default .NET behavior

We had a discussion about this some weeks ago and unfortunately we can't change the default behavior which uses an infinite timeout without breaking someone.

I still believe there should be a static method on Regex to set this, regardless of the implementation underneath.

We will consider this if we make substantial changes to Regex i.e. introducing a new engine.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@stephentoub stephentoub removed the untriaged New issue has not been triaged by the area owner label Jun 4, 2020
@stephentoub stephentoub modified the milestones: 5.0, Future Jun 4, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Nov 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Text.RegularExpressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants