Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API] Add MemoryExtensions.Contains #25228

Closed
GrabYourPitchforks opened this issue Feb 27, 2018 · 38 comments
Closed

[API] Add MemoryExtensions.Contains #25228

GrabYourPitchforks opened this issue Feb 27, 2018 · 38 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

Proposal:

public static class MemoryExtensions {
    bool Contains<T>(this Span<T> span, T value);
    bool Contains<T>(this ReadOnlySpan<T> span, T value);
    bool Contains<T>(this Span<T> span, T value, IEqualityComparer<T> comparer);
    bool Contains<T>(this ReadOnlySpan<T> span, T value, IEqualityComparer<T> comparer);
}

This is different from the IndexOf method in that we can perform certain optimizations if we know you don't need the exact index of where the value was found. There are a few places in the framework that would benefit from this.

https://github.com/dotnet/corefx/blob/f0a032e4a1c8a848574a5db3e69e8a6fa0aae91e/src/System.Net.Primitives/src/System/Net/IPAddressParser.cs#L19

https://github.com/dotnet/coreclr/blob/8ec9bdfd53220b555f0b19eebd6a2c4ab5dab8bf/src/mscorlib/shared/System/Guid.cs#L430

https://github.com/dotnet/corefxlab/blob/c3624ba2c5f4fc4926ef64e5e35f2f6e895df6eb/src/System.Buffers.Experimental/System/Buffers/BufferExtensions.cs#L101

https://github.com/dotnet/coreclr/blob/22f1bc00d018a49f9550ee3b564f5f7737960b0d/src/mscorlib/shared/System/Version.cs#L342

@ahsonkhan
Copy link
Member

We already have Contains for T = char (for ReadOnlySpan). Do we need a generic API? The examples you provided showcase only T = byte and char (which is already covered).

@GrabYourPitchforks
Copy link
Member Author

Yes, the intent was for this to be generic, with specializations for T = byte / char / other "integral" type when no comparer is specified.

@ygc369
Copy link

ygc369 commented Feb 28, 2018

@ahsonkhan
It would be better to have a generic API.

@GrabYourPitchforks
Copy link
Member Author

Looked through the code a bit more. The current API is Contains(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparison), i.e., substring searching. The API under proposal here is for locating individual matching elements, not substrings.

@AlexRadch
Copy link
Contributor

I do not understand the advantages of Contains over IndexOf? I do not think that Contains will be significant faster then IndexOf.

@GrabYourPitchforks
Copy link
Member Author

@AlexRadch In vectorized code paths, IndexOf needs to back up once it has found a match so that it can figure out where the match occurred. If the caller cares only that a match was found but not where the match occurred then the Contains method can skip this extra step. We'll confirm before we commit, but in other similar APIs this has offered measurable savings.

@AlexRadch
Copy link
Contributor

We have start offset and current offset how simple subtraction can draw back performance?

@vcsjones
Copy link
Member

We have start offset and current offset how simple subtraction can draw back performance?

IndexOf works by examining multiple items at the same time for certain types like byte using SIMD hardware acceleration, like a windowing function.

https://github.com/dotnet/corefx/blob/f391820c4808bcb02c7b3dd88b684f36a0a98b6a/src/Common/src/CoreLib/System/SpanHelpers.Byte.cs#L172

It examins n-number of bytes at a time, depending on the size of the Vector. Once it has found a place where vector.equals is non-zero, it has to look at each individual byte of the resulting vector to determine where in that specific chunk of data it actually is.

The last part is what should be removed with this API. If all we care about is a yes or no answer instead of where specifically, it can save examining individual elements.

@grant-d
Copy link
Contributor

grant-d commented Sep 7, 2018

I am considering working on this, but trying to understand the code. There's one section that confuses me:
https://github.com/dotnet/corefx/blob/f391820c4808bcb02c7b3dd88b684f36a0a98b6a/src/Common/src/CoreLib/System/SpanHelpers.Byte.cs#L163
I get that the JIT will shake free this unreachable code on a non-vectorized platform. But if the platform does support it, then the preceding sequential loops have already found the search term anyway. Why run the vector, is it not redundant? In other words, both the sequential and vectorized code will be reachable and (unless I misunderstand) run in series.

@benaadams
Copy link
Member

The block above the SequentialScan: label shortens nLength for the first pass; and is used to align the input for the vector loop (processing the unaligned data in the sequenial loop):
https://github.com/dotnet/corefx/blob/f391820c4808bcb02c7b3dd88b684f36a0a98b6a/src/Common/src/CoreLib/System/SpanHelpers.Byte.cs#L107-L111
Once it passes the vector loop; it then jumps back to the SequentialScan: label to finish anything that didn't fit in a Vector size

@grant-d
Copy link
Contributor

grant-d commented Sep 7, 2018

Ah. That makes sense now - thanks

@grant-d
Copy link
Contributor

grant-d commented Sep 7, 2018

@jkotas has advised that the PR (referenced above) should be submitted to the CoreCLR repo, not CoreFX.

@grant-d
Copy link
Contributor

grant-d commented Sep 7, 2018

Replacement PR 19863 created in CoreCLR.
Should this issue be moved there too?

@vcsjones
Copy link
Member

vcsjones commented Sep 7, 2018

Should this issue be moved there too?

API proposals that go through the API review process are always issues in the CoreFx repository.

@vcsjones
Copy link
Member

vcsjones commented Sep 7, 2018

I think you will still need to make a PR to CoreFx to expose the new members to the ref assembly.

https://github.com/dotnet/corefx/blob/e34fa6ac5fcc49be5cb22f46119c6d99219483b6/src/System.Memory/ref/System.Memory.cs#L10

@jkotas
Copy link
Member

jkotas commented Sep 7, 2018

Right, the implementation should be in CoreCLR. The reference assemblies and tests should be in CoreFX. More details on how this works is in https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/adding_new_public_apis.md

@grant-d
Copy link
Contributor

grant-d commented Sep 7, 2018

In term of the proposed API, I am planning on doing the following:

public static class MemoryExtensions {
    // Following 2 members in PR-1
    bool Contains<T>(this Span<T> span, T value);
    bool Contains<T>(this ReadOnlySpan<T> span, T value);

    // Following 2 members in PR-2
    bool Contains<T>(this Span<T> span, T value, IEqualityComparer<T> comparer);
    bool Contains<T>(this ReadOnlySpan<T> span, T value, IEqualityComparer<T> comparer);
}

In terms of PR-2, there do not appear to be any IndexOf overloads using IEqualityComparer<T> (though there might be for specific <T>s)
Which would make PR-2 inconsistent with the existing api surface.
Should we not create a separate issue to track overloading all said apis as such?

@benaadams
Copy link
Member

The IEqualityComparer<T> overloads can't be vectorized unfortunately

@grant-d
Copy link
Contributor

grant-d commented Sep 7, 2018

I can write the sequential code, but I don't know that it's the right course of action since IndexOf lacks a corresponding overload.

@grant-d
Copy link
Contributor

grant-d commented Sep 7, 2018

Please assign this to me. The PR is passing all checks.

@grant-d
Copy link
Contributor

grant-d commented Sep 7, 2018

Status per this recipe for adding new public apis:

@ahsonkhan
Copy link
Member

Please assign this to me. The PR is passing all checks.

@karelz, can you please help assign this issue to @grant-d? Thanks.

@karelz
Copy link
Member

karelz commented Sep 7, 2018

Collaborator invitation sent to @grant-d. Let us know once you accept.
It will auto-subscribe you to all repo notifications (500+ per day). we recommend to switch it to "Not Watching" which will send you notification just for issues/PRs which are assigned, where you are mentioned and which you subscribe to explicitly.

@grant-d
Copy link
Contributor

grant-d commented Sep 8, 2018

Thanks @karelz, I have accepted the invitation. Please assign this issue to me

@grant-d
Copy link
Contributor

grant-d commented Sep 8, 2018

Considering that the aim of this change is to improve perf of such calls, do we also need to update corresponding callsites? A quick manual scan found about 30 such calls in CoreFx, eg

https://github.com/dotnet/corefx/blob/8971f384e0c9bb967386a3a9e64b97fb886ab3d2/src/System.Net.Requests/src/System/Net/HttpWebRequest.cs#L314

https://github.com/dotnet/corefx/blob/7e0cbf1f27d8420a3f89f7a293de145f1cd26cd2/src/System.ComponentModel.TypeConverter/src/System/ComponentModel/EnumConverter.cs#L81

Is it within scope to update these to use str.Contains too?
cc @ahsonkhan

@ahsonkhan
Copy link
Member

do we also need to update corresponding callsites

Yes, I think we should.

Is it within scope to update these to use str.Contains too?

If you want to tackle that change as well, go for it :)

I wonder how much this change would impact perf on benchmarks. Do you have a rough estimate?

@grant-d
Copy link
Contributor

grant-d commented Sep 8, 2018

Ok, I will track that in a separate PR. I will also benchmark the core change, to make sure there's no regressions and get a feel for perf impact. PR linked below

@grant-d
Copy link
Contributor

grant-d commented Sep 14, 2018

I wonder how much this change would impact perf on benchmarks. Do you have a rough estimate?

New PR for units and benchmarks linked above.
I will also run a comparative benchmark to get a feel for perf delta

@grant-d
Copy link
Contributor

grant-d commented Sep 15, 2018

Perf analysis here: dotnet/corefx#32293 (comment)

@grant-d
Copy link
Contributor

grant-d commented Sep 18, 2018

All task done (see checklist above).
Please close this issue.

@ahsonkhan
Copy link
Member

Please close this issue.

We still have the following APIs left, so we should leave the issue open until they have been implemented:

    bool Contains<T>(this Span<T> span, T value, IEqualityComparer<T> comparer);
    bool Contains<T>(this ReadOnlySpan<T> span, T value, IEqualityComparer<T> comparer);

@karelz
Copy link
Member

karelz commented Sep 18, 2018

@ahsonkhan just curious: Why did we take PR which implements only 2 APIs out of 4? Was there a good reason to stage it?

@grant-d
Copy link
Contributor

grant-d commented Sep 18, 2018

I couldn't get any feedback on the last 2 api's - see https://github.com/dotnet/corefx/issues/27526#issuecomment-419511258.
I am happy to do a separate PR to close that once it's decided.

@jkotas
Copy link
Member

jkotas commented Sep 18, 2018

Why did we take PR which implements only 2 APIs out of 4?

The other 2 APIs are inconsistent with the rest of System.Memory public surface (none of the several hundreds other System.Memory APIs take IEqualityComparer). I assume that this was oversight during API review.

We should have a new issue opened to discuss the IEqualityComparer<T> overloads. We should either add the overloads that take IEqualityComparer<T> everywhere in System.Memory (ie add ~30 new APIs), or nowhere. It does not make sense to add them to a random subset.

@karelz
Copy link
Member

karelz commented Sep 18, 2018

Thanks @jkotas that makes sense :) ... oversights do happen. Let's separate them in that case. @ahsonkhan can you please bring the IEqualityComparer question up in API review for entire System.Memory? (separate issue would be IMO best)

@ahsonkhan
Copy link
Member

@GrabYourPitchforks, did you have a particular scenario or need that motivated adding the Contains overloads with IEqualityComparer<T>?

Let's separate them in that case. @ahsonkhan can you please bring the IEqualityComparer question up in API review for entire System.Memory? (separate issue would be IMO best)

Sure, I will create a new issue.

@stephentoub
Copy link
Member

@ahsonkhan, did you open an issue? I didn't see one so created https://github.com/dotnet/corefx/issues/35962. Closing this one as completed.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Projects
None yet
Development

No branches or pull requests