-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize newline handling for RegexOptions.Multiline #34566
Conversation
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Nice improvement!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!!
Another case to handle if @jzabroski still plans to revive dotnet/corefx#41195 edit: I meant #1449 |
@danmosemsft I do plan to. Just got trapped under mountain of COVID-19 DR planning work. I expect to give it the old college try in about 2 weeks. Thanks for thinking of me and tagging me. |
Sure @jzabroski.. we will get there eventually! |
We previously didn't do any special handling of beginning-of-line anchors (^ when RegexOptions.Multiline is specified). This PR adds special handling for the anchor so that FindFirstChar will jump to the next newline as part of its processing. As part of this, I also cleaned up some of the anchor handling code. The RegexPrefixAnalyzer only ever returns a single anchor, but the rest of the code was written such that it was expecting multiple anchors.
Also factor out a few lines of duplication.
We previously didn't do any special handling of beginning-of-line anchors (
^
whenRegexOptions.Multiline
is specified). This PR adds special handling for the anchor so thatFindFirstChar
will jump to the next newline as part of its processing.As part of this, I also cleaned up some of the anchor handling code. The
RegexPrefixAnalyzer
only ever returns a single anchor, but the rest of the code was written such that it was expecting multiple anchors.Example:
Finding all lines in Romeo and Juliet that contain the word "love":
Contributes to #1349
cc: @pgovind, @eerhardt, @ViktorHofer, @danmosemsft