Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type test List<T> in Enumerable.SequenceEquals and forward to MemoryExtensions.SequenceEqual(span...) #97004

Merged
merged 2 commits into from
Jan 18, 2024

Conversation

neon-sunset
Copy link
Contributor

@neon-sunset neon-sunset commented Jan 15, 2024

Type test List<T> in Enumerable.SequenceEqual and forward to MemoryExtensions.SequenceEqual(span...), move the loops to local functions to allow the method to get inlined and type tests get optimized away.

Seems like a simple performance win that was accidentally missed when CollectionsMarshal.AsSpan was added.

Method Length Mean Error StdDev Ratio RatioSD
CompareLists 0 8.911 ns 0.0204 ns 0.0181 ns 1.00 0.00
CompareListsNew 0 9.860 ns 0.0375 ns 0.0351 ns 1.11 0.00
CompareSequences 0 13.220 ns 0.2870 ns 0.5862 ns 1.46 0.05
CompareSequencesNew 0 14.935 ns 0.0784 ns 0.0733 ns 1.68 0.01
CompareLists 10 17.175 ns 0.0675 ns 0.0632 ns 1.00 0.00
CompareListsNew 10 10.120 ns 0.0416 ns 0.0389 ns 0.59 0.00
CompareSequences 10 22.020 ns 0.4596 ns 0.7679 ns 1.27 0.05
CompareSequencesNew 10 23.233 ns 0.0630 ns 0.0589 ns 1.35 0.01
CompareLists 100 97.854 ns 0.2919 ns 0.2588 ns 1.00 0.00
CompareListsNew 100 13.812 ns 0.0460 ns 0.0430 ns 0.14 0.00
CompareSequences 100 104.157 ns 1.2829 ns 1.2000 ns 1.07 0.01
CompareSequencesNew 100 104.935 ns 0.2738 ns 0.2561 ns 1.07 0.00
CompareLists 1000 1,068.779 ns 4.2436 ns 3.7618 ns 1.00 0.00
CompareListsNew 1000 61.721 ns 0.1062 ns 0.0993 ns 0.06 0.00
CompareSequences 1000 860.058 ns 3.0423 ns 2.8458 ns 0.80 0.00
CompareSequencesNew 1000 869.841 ns 2.3611 ns 2.0930 ns 0.81 0.00

(list here is List<int>, sequence here is Enumerable.Range)

Related: #97000 once/if this is fixed, list1.SequenceEqual(list2) will be able to get optimized down to direct call on their spans.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 15, 2024
@ghost
Copy link

ghost commented Jan 15, 2024

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Issue Details

Type test List<T> in Enumerable.SequenceEqual and forward to MemoryExtensions.SequenceEqual(span...), move the loops to local functions to allow the method to get inlined and type tests get optimized away.

Seems like a simple performance win that was accidentally missed when CollectionsMarshal.AsSpan was added.

Method Length Mean Error StdDev Ratio RatioSD
CompareLists 0 8.911 ns 0.0204 ns 0.0181 ns 1.00 0.00
CompareListsNew 0 9.860 ns 0.0375 ns 0.0351 ns 1.11 0.00
CompareSequences 0 13.220 ns 0.2870 ns 0.5862 ns 1.46 0.05
CompareSequencesNew 0 14.935 ns 0.0784 ns 0.0733 ns 1.68 0.01
CompareLists 10 17.175 ns 0.0675 ns 0.0632 ns 1.00 0.00
CompareListsNew 10 10.120 ns 0.0416 ns 0.0389 ns 0.59 0.00
CompareSequences 10 22.020 ns 0.4596 ns 0.7679 ns 1.27 0.05
CompareSequencesNew 10 23.233 ns 0.0630 ns 0.0589 ns 1.35 0.01
CompareLists 100 97.854 ns 0.2919 ns 0.2588 ns 1.00 0.00
CompareListsNew 100 13.812 ns 0.0460 ns 0.0430 ns 0.14 0.00
CompareSequences 100 104.157 ns 1.2829 ns 1.2000 ns 1.07 0.01
CompareSequencesNew 100 104.935 ns 0.2738 ns 0.2561 ns 1.07 0.00
CompareLists 1000 1,068.779 ns 4.2436 ns 3.7618 ns 1.00 0.00
CompareListsNew 1000 61.721 ns 0.1062 ns 0.0993 ns 0.06 0.00
CompareSequences 1000 860.058 ns 3.0423 ns 2.8458 ns 0.80 0.00
CompareSequencesNew 1000 869.841 ns 2.3611 ns 2.0930 ns 0.81 0.00

(sequence here is Enumerable.Range)

Related: #97000 once/if this is fixed, list1.SequenceEqual(list2) will be able to get optimized down to direct call on their spans.

Author: neon-sunset
Assignees: -
Labels:

area-System.Linq, community-contribution

Milestone: -

@stephentoub
Copy link
Member

Seems like a simple performance win that was accidentally missed when CollectionsMarshal.AsSpan was added.

It's a win when the test succeeds. It's pure overhead when it doesn't. If your scenario involves many calls to short non-lists, it's a net negative, which is why it wasn't previously done. We've been slowly measuring and opting in more cases using TryGetSpan.

@EgorBo
Copy link
Member

EgorBo commented Jan 15, 2024

Seems like a simple performance win that was accidentally missed when CollectionsMarshal.AsSpan was added.

It's a win when the test succeeds. It's pure overhead when it doesn't. If your scenario involves many calls to short non-lists, it's a net negative, which is why it wasn't previously done. We've been slowly measuring and opting in more cases using TryGetSpan.

My 5 cents: there were multiple improvements for casts over last few years so maybe we can reconsider some of the decisions against adding more fast-paths in LINQ (including ReadOnlyCollection) E.g. recent "profiled casts" improved quite a few LINQ benchmarks.

@stephentoub
Copy link
Member

there were multiple improvements for casts over last few years so maybe we can reconsider some of the decisions against adding more fast-paths in LINQ

That's why we've been more lenient over the last few releases.

recent "profiled casts" improved quite a few LINQ benchmarks.

I'm still a little skeptical of this one. It's great for microbenchmarks in LINQ, but these APIs ends up being used from many different call sites with many different inputs. I suspect the wins that show up in microbenchmarks won't have nearly as positive impact on many uses. I'm glad it's there, and it'll certainly help some cases, but we can't use it as a crutch, and could easily be mislead if we're not careful, I think.

@EgorBo
Copy link
Member

EgorBo commented Jan 16, 2024

I'm still a little skeptical of this one. It's great for microbenchmarks in LINQ, but these APIs ends up being used from many different call sites with many different inputs. I suspect the wins that show up in microbenchmarks won't have nearly as positive impact on many uses.

You can say the same about other things optimized with PGO in general. Polymorphic casts (where different callers use different data) are not expected to be optimized with PGO as we try to optimize only monomorphic cases. Sure, it doesn't work well if an app starts to behave differently after certain point, but we don't have a better option besides recommending turning PGO off completely for such apps. Eventually, we'll improve this by enabling context-sensitive instrumentation, partial inlining, instrumentation for inlinees, etc.

@stephentoub
Copy link
Member

You can say the same about other things optimized with PGO in general.

I do :-) It's just this particular optimization applies particularly well to LINQ microbenchmarks, as you've called out.

Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't object to this being included, however I don't think it's particularly impactful given that it only additionally targets List<T>. Are there any other collections we could extract the span from? I can only think of ImmutableArray<T>.

@neon-sunset
Copy link
Contributor Author

Since #97005 deemed to be unprofitable, I have reverted the rest of the change and only kept seq is T[] -> seq.TryGetSpan.

Numbers

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3007/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 8.0.101
  [Host]     : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
Method Length Mean Error StdDev Ratio RatioSD Code Size
CompareLists 0 8.447 ns 0.0409 ns 0.0342 ns 1.00 0.00 1,272 B
CompareListsNew 0 5.535 ns 0.0194 ns 0.0172 ns 0.66 0.00 1,676 B
CompareSequences 0 13.437 ns 0.2927 ns 0.6548 ns 1.62 0.07 1,272 B
CompareSequencesNew 0 11.315 ns 0.0310 ns 0.0290 ns 1.34 0.01 1,355 B
CompareLists 10 16.974 ns 0.0190 ns 0.0178 ns 1.00 0.00 1,545 B
CompareListsNew 10 5.979 ns 0.0101 ns 0.0090 ns 0.35 0.00 1,682 B
CompareSequences 10 21.504 ns 0.4550 ns 0.8546 ns 1.26 0.06 1,507 B
CompareSequencesNew 10 17.947 ns 0.1111 ns 0.0985 ns 1.06 0.01 1,720 B
CompareLists 100 96.927 ns 0.1403 ns 0.1312 ns 1.00 0.00 1,545 B
CompareListsNew 100 9.580 ns 0.0187 ns 0.0175 ns 0.10 0.00 1,682 B
CompareSequences 100 100.535 ns 1.4505 ns 1.3568 ns 1.04 0.01 1,507 B
CompareSequencesNew 100 96.647 ns 0.0974 ns 0.0911 ns 1.00 0.00 1,720 B
CompareLists 1000 1,060.270 ns 2.2337 ns 2.0894 ns 1.00 0.00 1,556 B
CompareListsNew 1000 59.631 ns 0.0964 ns 0.0855 ns 0.06 0.00 1,682 B
CompareSequences 1000 851.900 ns 1.0912 ns 0.9673 ns 0.80 0.00 1,519 B
CompareSequencesNew 1000 851.612 ns 0.7819 ns 0.6931 ns 0.80 0.00 1,732 B

(the non-ICollection<T> path is now consistently faster as well which does not make much sense as nothing really stands out in the codegen aside from branch ordering and larger stack frame for spans but I'll take it)

As for extending TryGetSpan with other types - I like ReadOnlyCollections and ImmutableArray<T> a lot but that's for @stephentoub to decide :)

Thanks.

Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@stephentoub stephentoub merged commit 957ab2f into dotnet:main Jan 18, 2024
106 of 111 checks passed
@neon-sunset neon-sunset deleted the sequenceequal-list branch January 18, 2024 03:40
tmds pushed a commit to tmds/runtime that referenced this pull request Jan 23, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Linq community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants