Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable RegexOptions.RightToLeft and lookbehinds in compiler / source generator #66280

Merged
merged 2 commits into from
Mar 7, 2022

Conversation

stephentoub
Copy link
Member

Replaces #66127
Fixes #62345

For .NET 7 we rewrote RegexCompiler as we were writing the source generator, and in doing so we left out support for RegexOptions.RightToLeft as well as lookbehinds (which are implemented via RightToLeft). This adds support for both. I initially started incrementally adding in support for various constructs in lookbehinds, but from a testing perspective it made more sense to just add it all, as then all of the RightToLeft tests are used to validate the constructs that are also in lookbehinds.

@ghost
Copy link

ghost commented Mar 7, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Replaces #66127
Fixes #62345

For .NET 7 we rewrote RegexCompiler as we were writing the source generator, and in doing so we left out support for RegexOptions.RightToLeft as well as lookbehinds (which are implemented via RightToLeft). This adds support for both. I initially started incrementally adding in support for various constructs in lookbehinds, but from a testing perspective it made more sense to just add it all, as then all of the RightToLeft tests are used to validate the constructs that are also in lookbehinds.

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: 7.0.0

@stephentoub
Copy link
Member Author

Just to make sure it's clear, with this our entire regex corpus is supported with both RegexCompiler and the source generator:

Total:     18,964
Supported: 18,964

The only cases now that aren't supported are those that use the new RegexOptions.NonBacktracking or those with super duper deep nesting, which none of the expressions we see in real-life get anywhere close to.

…generator

For .NET 7 we rewrote RegexCompiler as we were writing the source generator, and in doing so we left out support for RegexOptions.RightToLeft as well as lookbehinds (which are implemented via RightToLeft).  This adds support for both.  I initially started incrementally adding in support for various constructs in lookbehinds, but from a testing perspective it made more sense to just add it all, as then all of the RightToLeft tests are used to validate the constructs that are also in lookbehinds.

string expr = !rtl ?
$"{sliceSpan}[{sliceStaticPos}]" :
"inputSpan[pos - 1]";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputSpan

NIT: Should we consider to have this saved as a label instead, in order to avoid issues if for some reason we rename the parameter in the future?

Copy link
Member Author

@stephentoub stephentoub Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hard code "inputSpan" and "pos" everywhere already. You mean changing it everywhere? Or something specific to this use?

I'm not against creating a local for it, but not in this PR :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to do it now, I only brought it up here since you modified this particular line but I understand this is not the only instance of it, so we can do this later instead if we feel it would be good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to extract it or see it extracted separately.

yield return (@"abc(?!XXX)\w+", "abcXXXdef", RegexOptions.None, 0, 9, false, string.Empty);

// Zero-width positive lookbehind assertion: Actual - "(\\w){6}(?<=XXX)def"
// Zero-width positive lookbehind assertion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Comment says positive lookbehind but there seems to be also negative ones. Of course don't reset CI just for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll fix the comment separately. Thanks.

@stephentoub stephentoub merged commit 457b1ff into dotnet:main Mar 7, 2022
@stephentoub stephentoub deleted the regexrighttoleft branch March 7, 2022 21:52
@EgorBo
Copy link
Member

EgorBo commented Mar 17, 2022

Improvement on win-x64 dotnet/perf-autofiling-issues#3994

@ghost ghost locked as resolved and limited conversation to collaborators Apr 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider adding RegexOptions.RightToLeft support back to compilation
4 participants