Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IndexOfAnyValues in Regex.Escape #78667

Merged
merged 2 commits into from
Nov 22, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Buffers;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
Expand Down Expand Up @@ -149,18 +150,13 @@ public static RegexReplacement ParseReplacement(string pattern, RegexOptions opt
/// </summary>
public static string Escape(string input)
{
for (int i = 0; i < input.Length; i++)
{
if (IsMetachar(input[i]))
{
return EscapeImpl(input, i);
}
}

return input;
int indexOfMetachar = IndexOfMetachar(input.AsSpan());
return indexOfMetachar < 0
? input
: EscapeImpl(input.AsSpan(), indexOfMetachar);
}

private static string EscapeImpl(string input, int i)
private static string EscapeImpl(ReadOnlySpan<char> input, int indexOfMetachar)
{
// For small inputs we allocate on the stack. In most cases a buffer three
// times larger the original string should be sufficient as usually not all
Expand All @@ -171,12 +167,18 @@ private static string EscapeImpl(string input, int i)
new ValueStringBuilder(stackalloc char[EscapeMaxBufferSize]) :
new ValueStringBuilder(input.Length + 200);

char ch = input[i];
vsb.Append(input.AsSpan(0, i));

do
while (true)
{
vsb.Append('\\');
vsb.Append(input.Slice(0, indexOfMetachar));
input = input.Slice(indexOfMetachar);

if (input.IsEmpty)
{
break;
}

char ch = input[0];

switch (ch)
{
case '\n':
Expand All @@ -193,23 +195,16 @@ private static string EscapeImpl(string input, int i)
break;
}

vsb.Append('\\');
vsb.Append(ch);
i++;
int lastpos = i;
input = input.Slice(1);

while (i < input.Length)
indexOfMetachar = IndexOfMetachar(input);
if (indexOfMetachar < 0)
{
ch = input[i];
if (IsMetachar(ch))
{
break;
}

i++;
indexOfMetachar = input.Length;
}

vsb.Append(input.AsSpan(lastpos, i - lastpos));
} while (i < input.Length);
}

return vsb.ToString();
}
Expand Down Expand Up @@ -2081,6 +2076,27 @@ internal static int MapCaptureNumber(int capnum, Hashtable? caps) =>
// ' a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, Q, S, 0, 0, 0};

#if NET7_0_OR_GREATER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the PR, but... this should be NET8_0_OR_GREATER. Seems like an artifact of our build system right now that this work. Once we fix that, these are going to start to fail presumably.
cc: @ViktorHofer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the NET8 directives don't seem to exist yet.

Once they are added, would we ever build this with an actual 7.0 target? That is, would this continue to just "work", while being misleading to the reader since the API doesn't actually exist on 7?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might. We have several libraries that build for older .NET Core versions, and while System.Text.RegularExpressions.dll itself isn't one of them, it's feasible the source generator could.

Plus, it's confusing.

Copy link
Member

@ViktorHofer ViktorHofer Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would have used #if NET here instead as presumably at the time that we would ship a .NETCoreApp version of the Regex source generator (which isn't planned), we wouldn't build older tfms like net6.0 and net7.0 anymore.

Can you please either submit a PR to change this to #NET or create an issue that tracks updating this to NET8_0_OR_GREATER?

Updating the tfm to net8.0 is being take care of via #78354.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would have used #if NET here

FWIW I think that's just as if not more confusing. All of this is .NET.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same objection.

Copy link
Member

@ViktorHofer ViktorHofer Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that you object to that. Can you please elaborate? We use #if NETCOREAPP all over the places here in the repo to differentiate between "modern .NET -> NETCOREAPP" and older frameworks like ".NET Standard" and ".NET Framework".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both .NET 7 and .NET 8 are netcoreapp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I suggested to use #if NET as if we would ever make the source generator target NETCOREAPP we likely wouldn't build net7.0 anymore. We don't version preprocessor directives in the BCL if there isn't a current need. I.e., there aren't any NETCOREAPP_3_1_OR_GREATER (or earlier) symbols because we don't build such frameworks anymore.

Copy link
Member

@stephentoub stephentoub Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.NET Core 3.1 is old and unsupported. We just shipped NET 7 last week. I'm glad it's not confusing to you; it's highly confusing to me. Many projects in the repo build for .NET 7, so seeing code that doesn't actually work on .NET 7 ifdef'd in a way that suggests it should is wrong, IMHO.

private static readonly IndexOfAnyValues<char> s_metachars =
IndexOfAnyValues.Create("\t\n\f\r #$()*+.?[\\^{|");

private static int IndexOfMetachar(ReadOnlySpan<char> input) =>
input.IndexOfAny(s_metachars);
#else
private static int IndexOfMetachar(ReadOnlySpan<char> input)
{
for (int i = 0; i < input.Length; i++)
{
if (IsMetachar(input[i]))
{
return i;
}
}

return -1;
}
#endif

/// <summary>Returns true for those characters that terminate a string of ordinary chars.</summary>
private static bool IsSpecial(char ch) => ch <= '|' && Category[ch] >= S;

Expand Down