-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IndexOfAnyValues #78093
Add IndexOfAnyValues #78093
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
ab527b0
to
79ac614
Compare
cc: @gfoidl |
src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/BitVector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/BitVector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/BitVector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAnyValues.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAnyValues.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAnyValues.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAnyValues.cs
Outdated
Show resolved
Hide resolved
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsImplements #68328 I also added vectorized paths for The currently employed categories of specializations for
ToDo: benchmarks
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I've seen so far looks good 👍🏻 (need to have another look in the next days)
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAnyAsciiByteValues.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAnyAsciiSearcher.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAnyValues.cs
Outdated
Show resolved
Hide resolved
Thanks for the reviews, they're much appreciated. I'll get back to this PR in about a week. Hopefully, #78015 is resolved by then. |
571a279
to
55817fe
Compare
I've updated the PR and moved it out of draft. I removed the |
Had another look, no comments added, so LGTM. |
src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/BitVector256.cs
Outdated
Show resolved
Hide resolved
} | ||
|
||
return new IndexOfAnyCharValuesProbabilistic(values); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little concerned about the size impact this method is going to have. Any use of IndexOfAnyValues.Create is going to root all possible implementations (IndexOfEmptyValues, IndexOfAny1Value, IndexOfAny2Values, IndexOfAny3Values, IndexOfAny4Values, IndexOfAny5Values, IndexOfAnyLatin1CharValues, IndexOfAnyCharValuesProbabilistic, and IndexOfAnyValuesInRange), regardless of which is actually used. Basically we're adding a choke point.
I don't have a better answer, unless we want to limit what this API produces, such that we don't special-case APIs that already have a direct public entrypoint (IndexOf0/1/2/3, IndexOfRange).
Let's go with it for now, but keep an eye on it. It'd also be helpful in this regard for us to either in this PR or immediately after follow it up with using these APIs everywhere they're applicable and see what kind of impact it actually has on our size benchmarks, and then what we can do about it. For example, maybe there are ways to share most of the code associated with the vectorization of the individual algorithms, such that each doesn't bring in nearly as much as it does today (Adam and I had spoken about an approach where we'd parameterize the algorithms with a generic struct that provided the setup and comparisons as methods that the driver could then call appropriately as part of loops, unrolled call sites, etc.) For use within corelib, we might also want to make some of these helpers internal and allow those uses to bypass the public Create in order to directly instantiate the needed type. If things still end up being bad, we could consider replacing the general Create with more specialized ones focused on known characteristics of the data. Worst case, we could also employ some additional linker switches to remove various code paths if we want to trade off optimal speed for size.
All that said, I still really like the simplicity of the design we currently have, and that it affords us the ability to pick the best implementation given the supplied data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little concerned about the size impact this method is going to have
This PR adds 50kB to System.Private.CoreLib.dll with R2R. It is not end of the world given how much we add to the product in each release.
we'd parameterize the algorithms with a generic struct
Note that this only helps with IL size. It does not help with native code size. Also, the generic structs cost some startup time and memory at runtime.
We may want to look into whether the factory can be interpreted at compile time by the native aot compiler. cc @MichalStrehovsky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with it for now, but keep an eye on it. It'd also be helpful in this regard for us to either in this PR or immediately after follow it up with using these APIs everywhere they're applicable and see what kind of impact it actually has on our size benchmarks
👍 already working on that.
FWIW, in quite a few places so far I've been able to delete substantial chunks of code with this API, so it'd be pretty cool if this was actually a net size improvement in the long run.
Over a bunch of places across runtime where I've used this API locally, I've never used it for the "(IndexOf0/1/2/3, IndexOfRange)" set you called out. Likewise, for cases with 4/5 values, I've never used it if I knew that it would just defer to the existing 4/5 value implementation that could be reached via IndexOfAny(const)
.
Ignoring cases where the underlying hardware doesn't support these algorithms, we're really just using Ascii in the vast majority of cases from within runtime. Regex can also make similar decisions and avoid using this API for cases that wouldn't benefit.
That said, it's still nice that this API gives you an optimal implementation even if you aren't as concerned about startup/size costs and just want to use the same API for everything without requiring you to be aware of internal implementation details.
Adam and I had spoken about an approach where we'd parameterize the algorithms with a generic struct that provided the setup and comparisons as methods that the driver could then call appropriately as part of loops, unrolled call sites, etc.
I'm curious about what this would look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds 50kB to System.Private.CoreLib.dll with R2R. It is not end of the world given how much we add to the product in each release.
Yeah, my concern isn't the all-up size of corelib, rather the impact on a small trimmed app. But your lack of concern lowers my concern :)
We may want to look into whether the factory can be interpreted at compile time by the native aot compiler
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it'd be pretty cool if this was actually a net size improvement.
It would be.
I've never used it for the "(IndexOf0/1/2/3, IndexOfRange)" set you called out
Right. Which begs the question of whether it's actually worth including those, since whether we use them or not, at present all that code is going to be kept with the current trimming.
Regex can also make similar decisions and avoid using this API for cases that wouldn't benefit.
Yes, my intention is that the source generator and compiler only use this for cases where there isn't currently a better direct API for it. In large part that's to help with readability, but it'll also help reduce startup overheads.
I'm curious about what this would look like.
The rough strawman, which we never actually tried, was something along the lines of (very rough pseudo code)
internal interface IIndexOfComparer<T>
{
bool IsMatch(T item);
bool IsMatch(Vector128<T> items);
bool IsMatch(Vector256<T> items);
}
then with a shared driver like:
internal int IndexOfCore<T, TCore>(ReadOnlySpan<T> span, TComparer comparer) where TComparer : IIndexOfComparer<T>
{
if (!Vector128.IsHardwareAccelerated || span.Length < Vector128<T>.Count)
{
// TODO: Unroll for short spans
for (int i = 0; i < span.Length; i++)
if (comparer.IsMatch(span[i]))
return i;
}
else if (!Vector256.IsHardwareAccelerated || span.Length < Vector256<T>.Count)
{
...
if (comparer.IsMatch(currentVector))
return FindIndex(currentVector);
...
}
else
{
... // same for Vector256
}
return -1;
}
and then use like:
public static bool IndexOfAny(ReadOnlySpan<char> span, char value0, char value1) =>
IndexOfCore(span, new IndexOfAny2(value0, value1));
private readonly struct IndexOfAny2<T> : IIndexOfComparer<T>
{
private T _value1, value2;
private readonly Vector128<T> _vector128_1, _vector128_2;
private readonly Vector256<T> _vector256_1, _vector256_2;
public IndexOfAny2(T value1, T value2)
{
_value1 = value1;
_value2 = value2;
_vector128_1 = Vector128.Create(value1);
_vector128_2 = Vector128.Create(value2);
...
}
public bool IsMatch(T value) => value == _value1 || value == _value2;
public bool IsMatch(Vector128 values) => values == _vector128_1 || values == _vector128_2;
...
}
etc. It'd end up looking similar in structure to what you currently have in this PR, actually, just at a lower level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to look into whether the factory can be interpreted at compile time by the native aot compiler. cc @MichalStrehovsky
You mean to interpret the Create
method this comment is on ahead of time depending on the callsite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexOfAnyValues.Create
is expected to be often used in static constructors with constant input. For example, like this: https://github.com/dotnet/runtime/pull/78666/files#diff-2599fdb4dd17bc235b019eb03aed3a26260765050d1e48419bc3e44319ecb147R31-R32
Awesome. |
Closes #68328
Depends on all
IndexOfAny
implementations for byte/char of 1-5 values being available, therefore depends on #78015 to make that a reality for Mono.I also added vectorized paths for
IndexOfAnyExcept
,LastIndexOfAny
, andLastIndexOfAnyExcept
for 5 values toSpanHelpers
and started using them from existingIndexOfAny(Span, Span)
APIs for byte and char.The changes in
SpanHelpers
are effectively a copy-paste from existing overloads.The currently employed categories of specializations for
IndexOfAnyValues
are:[0, 127]
(ASCII) range[0, 255]
(Latin1) range for chars (not vectorized, but using efficient checks)ProbabilisticMap
Efficiency assumptions for different search primitives are based on numbers collected at 0f3a88b on an Intel i9 10900X CPU:
IndexOf, IndexOfAny, IndexOfAnyInRange and IndexOfAnyAscii numbers
Benchmarks
Benchmark source
ASCII
This case is already vectorized for chars (#76740), but now the init cost is removed as well.
For bytes, it's the difference between an
O(n * m)
loop and a vectorizedO(n)
.1-5 values
1-5 values are pretty much the same as calling
IndexOfAny("aeiou")
if the value is constant at the call site.If the value is not constant,
IndexOfAnyValues
is about 1 ns cheaper as we don't have to go through the length check switch.Single range
The single range case is pretty much identical to calling
IndexOfAnyInRange
directly.[0, 255] values
This test is using
"abcÀÄÈÌÐÔØÜàäèì"
as the value (Latin1 for chars).For chars, this is the difference between
ProbabilisticMap
andBitVector256
.For bytes, this is the difference between a simple
O(n * m)
loop and a vectorizedO(n)
.ProbabilisticMap for chars
Both will use a simple loop for short values and a
ProbabilisticMap
for longer ones.IndexOfAnyValues
can avoid the init overhead for constructing theProbabilisticMap
.Short values list (
"ažćčš" + "\u1000"
).Long values list (
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" + "\u1000"
).HttpRuleParser.IsToken
This is an example of an implementation we will replace with
IndexOfAnyValues
.