-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve vectorization of IndexOf(chars, StringComparison.OrdinalIgnoreCase) #85437
Conversation
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsUse the same general "Algorithm 1: Generic SIMD" that we do for StringComparison.Ordinal, adapter for OrdinalIgnoreCase. [Params("watson", "elementary", "holmes", "the")]
public string Needle { get; set; }
[Benchmark]
public int Count()
{
int count = 0;
ReadOnlySpan<char> haystack = s_haystack;
while (true)
{
int pos = haystack.IndexOf(Needle, StringComparison.OrdinalIgnoreCase);
if (pos < 0) break;
count++;
haystack = haystack.Slice(pos + Needle.Length);
}
return count;
}
|
…eCase) Use the same general "Algorithm 1: Generic SIMD" that we do for StringComparison.Ordinal, adapter for OrdinalIgnoreCase.
3fb61ee
to
fe53637
Compare
// Load a vector from the current search space offset and another from the offset plus the distance between the two characters. | ||
// For each, | with 0x20 so that letters are lowercased, then & those together to get a mask. If the mask is all zeros, there | ||
// was no match. If it wasn't, we have to do more work to check for a match. | ||
Vector128<ushort> cmpCh2 = Vector128.Equals(ch2, Vector128.BitwiseOr(Vector128.LoadUnsafe(ref searchSpace, (nuint)(offset + ch1ch2Distance)), Vector128.Create((ushort)0x20))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nit: Vector128.BitwiseOr
-> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just style, right? Happy to change it, just questioning whether it's worth rerunning ci.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely not worth it 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right its "just style". There is also the general considerations of "methods" vs "operators" (such as precedence and readability) but we're not super consistent today just due to the operators being relatively new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Assuming we're fine with the overhead for the worst case - it's slightly bigger in case of OrdinalIgnoreCase due to more work + the path to find unique chars is more expensive, but the benefits should outweight that 👍
I think it's worth it. It's more expensive to set up than ordinal, but the match validation that happens on every potential match is also more expensive, and this generally lessens the latter. It will regress in cases similar to ordinal regressed, eg where the starting character never matches, but on the balance I expect it'll be a meaningful win. Let's try and see what falls out. :-) |
Use the same general "Algorithm 1: Generic SIMD" that we do for StringComparison.Ordinal, adapted for OrdinalIgnoreCase.