Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand an optimization for short values to IgnoreCase in single-value SearchValues<string> #108368

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MihaZupan
Copy link
Member

For values length 2 or 3, we have a special case where we don't verify matches at all if we know that false positives are impossible when all the anchors match.

// If the value is short (!TValueLength.AtLeast4Chars => 2 or 3 characters), the anchors already represent the whole value.
// With case-sensitive comparisons, we've therefore already confirmed the match, so we can skip doing so here.
// With case-insensitive comparisons, we applied a mask to the input, so while the anchors likely matched, we can't be sure.
if ((typeof(TCaseSensitivity) == typeof(CaseSensitive) && !TValueLength.AtLeast4Chars) ||
TCaseSensitivity.Equals<TValueLength>(ref matchRef, _value))

We were using this trick with case-sensitive comparisons, but it's actually also valid if we know that all the characters are ASCII letters, so this PR extends the optimization to apply to those values as well.


public class EarlyMatches
{
    private const string Needle = "the";
    private static readonly string s_haystack = string.Concat(Enumerable.Repeat(Needle, 10_000));
    private static readonly SearchValues<string> s_values = SearchValues.Create([Needle], StringComparison.OrdinalIgnoreCase);

    [Benchmark]
    public int Count()
    {
        int count = 0;
        ReadOnlySpan<char> haystack = s_haystack;
        while (true)
        {
            int pos = haystack.IndexOfAny(s_values);
            if (pos < 0) break;
            count++;
            haystack = haystack.Slice(Needle.Length);
        }
        return count;
    }
}
Method Toolchain Mean Error Ratio Code Size
Count main 29.41 us 0.517 us 1.00 946 B
Count pr 23.30 us 0.269 us 0.79 726 B

@MihaZupan MihaZupan added this to the 10.0.0 milestone Sep 29, 2024
@MihaZupan MihaZupan self-assigned this Sep 29, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

@MihaZupan MihaZupan force-pushed the searchvalues-singleStringSkipVerificationIC branch from 1f386bf to f94a3fb Compare October 1, 2024 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant