-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster equality in generic contexts #16615
Conversation
❗ Release notes required
|
Benchmark results are weird, numbers before and after are within the statistical error. What about allocations? We shouldn't be boxing as much now? |
@vzarytovskii yeah those are not the right benchmarks for this PR. We had a session with Don today to figure out the right ones and I am already getting some 25-30% improvements there, will post soon - stay tuned. |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
Well... It'll be calling IEquatable.Equals<>, not object.Equals, which is hence a breaking change - albeit I feel in a fashion that should be part of an evolution... ...and really should be paired with the optimizer change (@TIHan I believe worked on one at some stage?) Otherwise you have the somewhat bizarre situation where using an comparison of an external struct type such as a NodaTime.Instant in a generic context doesn't box and is fast, vs in a non-generic context where it does box is an is slower... Unless that has been resolved already? Anyway, glad to see this moving forward. My beard is somewhat grayer now that when I first started trying to improve comparison/equality in F#!! |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
Marking as ready to review since the CI is green finally :D |
And glad to see you here, @manofstick :) BTW thanks for the clear commits and comments to the commits in your original PRs, that really helped me a lot. As for this one, well, it's not breaking in terms of NaN behavior and in terms of IL. So far. That's probably what we care about the most. |
Since it touches an important part of the language, I would like to have three things:
|
Thanks all for the reviews and the feedback. This needs a thorough rereview with @dsyme, meanwhile I will be improving the description, clarifying the code and adding tests. |
Agreed that in principle these should go together, though can be done in a separate PR.
I think this specific change is OK - use IEquatable.Equals if it exists. |
@manofstick Could you remind me of the specific thing that causes the call to I think that is OK as a change. It is vanishingly rare to explicitly implement |
Also, does this need an addition to spec (rfc), since it changes how we do equality? Or rather shall the doc from the other PR live in the design repo? |
@vzarytovskii I think it's a good idea to have that doc (from the other PR) in the design repo. This PR is meant to be an optimization and hence focuses on the implementation. I am looking at the spec while working on this and my impression so far is that the spec doesn't go that deep in the implementation to make this change anyhow misalign with it. If we discover anything of that nature, we can make the RFC or further discuss it. |
I added benchmarks for the value tuples, options and that stuff - the results are very convincing so far. |
Yep, simple as that.
Oh, completely agree. It was just that this was (as far as I understand it) the main reason why this change never went through in the past. And just confirming there, given your comment, this change, as implemented, would be all types (that pass the check) using |
@manofstick just to be sure we're on the same page - which input cast do you mean? :) |
...forgive my github comment coding, I think memory serves, but it's more than enough for the gist of things if it's incorrect... [<Sealed>] // only for sealed
type Blah(...) =
...
override lhs.Equals (rhsObj:obj) = // (1)
if obj.ReferenceEquals (lhs, rhsObj) then
true
else
match rhsObj with
| :? Blah as rhs -> // (2)
(lhs:>IEquatable<Blah>).Equals rhs // (4)
| _ -> false
interface IEquatable<Blah> with
member lhs.Equals rhs = // (3)
// actually do the check Well Most likely the actually equality check in these cases will be the significant component of the cost, so savings is minimal, but as mentioned, this is just for complete understanding of what's going on here...
|
@psfinaki I believe it should be possible to write a test case that detects the change, e.g. [<Sealed, Struct>]
type Blah() =
override lhs.Equals (rhsObj:obj) = true
interface IEquatable<Blah> with
member lhs.Equals rhs = failwith "bang"
let eq x y =
printfn "hello" // do this enough times to make sure no inlining
printfn "hello" // do this enough times to make sure no inlining
printfn "hello" // do this enough times to make sure no inlining
printfn "hello" // do this enough times to make sure no inlining
(x = y) // generic equality context
eq (Blah()) (Blah()) // true prior to this, exception now I don't really think this needs a runtime library switch to disable, and we don't currently have such a mechanism to do those kinds of flags anyway The equality spec should probably be updated to mention this case |
Yes, correct. For reference, I guess this is the minimal code to demo the behavior difference: [<Struct; CustomEquality; NoComparison>]
type Blah =
interface IEquatable<Blah> with
member lhs.Equals rhs = failwith "bang"
let eq x y =
printfn "hello" // do this enough times to make sure no inlining
...
x = y // generic equality context
let result = eq (Blah()) (Blah()) // false prior to this, exception now Alright, so we've identified what breaks here and blessed it at the same time. Indeed, this is quite an esoteric scenario. Good stuff, I will update the spec and the description to reflect this. |
Non-inlined stacktrace examples before and after could be helpful for reviewers as well, could you add it pelase, if it's not too much work. |
Yep I am going to add it to the PR description. |
7b821e1
to
b91cdb7
Compare
Okay so hopefully getting to the finish line here, I think I have added enough micro benchmarks and grouped them in the PR description. Next week will add AOT testing, refresh the equality design doc, finalize the PR description and give this a final review with Don :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
Nice |
I'm just relying on your words here, as I have not built this, but they should be significantly faster, and non boxing (I'm not talking about my concern re tail calls here). I'm guessing that because they implement 'IStructuralEquality' they are using that path, and hence not being sped up. I thought in my original PR I handled these too, maybe I didn't, or maybe you didn't carry that code across, I don't know. Anyway, what I would suggest is that the code for 'canUseDefault...' is used at compile time, and if default can be used then the 'IStructuralEquals' interface isn't implemented. This, once again, is a breaking change, but I think it's even lesser than the one introduced by this PR. Another follow up, if it hasn't been started yet, would be used user Compare<>.Default for inequalities.... |
Yes, I think we will get to those as well. Next bigger one will be devirt equality, then we'll probably look into comparison. There is a lot of useful stuff to pull out of your contributions :) |
What is this?
This is the first part of this effort to resurrect the awesome work on compiler performance by @manofstick.
How is this different?
Many things have changed, among other things we now have a bigger team & community to review things and a different release cadence allowing us to dogfood things a bit.
More importantly, the original PRs often contained multiple ideas (fixes, breaking, optimizations, experiments, ...), whereas I want to narrow PRs as much as possible to make things easier to review and control.
Is this PR breaking?
A tiny bit. This is mostly an optimization.
Here is an example of code where the behavior does change:
We agreed that it's OK here.
What's the source of inspiration here?
This is essentially part of the #5112, with the following changes:
So where does this improve things?
More theory and motivation is in this document - link will be updated once the PR merged.
TL;DR: this improves things when
HashIdentity.Structural<'T>
comparison is used in non-inlined code.Example optimization
Let's look at the following code:
How does
List.distinct
work?There are 3 important parts of the algorithm to talk about.
The comparer is something implementing
IEqualityComparer<'T>
which means 2 methods:GetHashCode(x)
andEquals(x, y)
.List.distinct list
callsList.distinctWithComparer HashIdentity.Structural<'T>
and the logic for pickingHashIdentity.Structural<'T>
(the comparer) is written inprim-types.fs
.This means calling
let hashSet = HashSet<'T>(comparer)
with the comparer picked above.So in our case,
Now, the
Add(element)
operation works in the following way:This means, in our case there will be:
GetHashCode
calls (since there are 6 elements altogether)Equals
calls (since there are only 3 unique elements)Now, what changes in this PR is that we become smarter at picking the faster comparer (step 1). This brings enormous benefit at doing all the things in the step 3.
Before, this is how the comparer picking would be executed:
Now, this is what is going on:
Hence, this is the difference for each of the 6
GetHashCode
calls in question (taking first element ('D', 'G') as an example).Before, using the generic equality comparer:
Now, using the "native" comparer:
Now, this difference for each of the 3
Equals
calls in question (taking the elements ('D', 'G') as an example).Before, using the generic equality comparer:
Now, using the "native" comparer:
We can see that here we remove all the boxing and use the optimal call chain. This brings huge benefits for the tiny cost of executing the "smart" function on deciding about the comparer once.
The benefits of this approach vary based on the concrete algorithm and the elements in question. See the details about improved call chains in the spec mentioned above.
If we modify the example to have 1000 elements with 10 unique ones, we get the following results:
Before:
After:
Which means 6x faster and 213x less memory.
Benchmarks
Main targets: structs, enums, floats, and specia/l generic types:
Structs and enums
Before:
After:
Huge improvements in both execution time and allocs, with especially remarkable results for native F# constructs.
Value tuples
Before:
After:
~80% in speed and ~50% in memory reduction, also much steeper ratios' increase for both.
Options and co
Before:
After:
50-75% speed and 25-75% memory improvements.
Nullable<'T>
Before:
After:
About 7x speed and 3x memory improvements.
Floats
Before:
After:
PER comparison still takes more time and memory but still the improvements are 2-3 times in all cases.
Also (positively) affected: basic types, arrays, reference types - due to shorter call chains and less casting:
Arrays
Before:
After:
About 3x faster.
F# basic types
Before (
countBy
):After (
countBy
):Before (
distinct
):After:
Note that decimals also show improvement in memory allocations:
Before (
countBy
)After (
countBy
)Before (
distinct
)After (
distinct
)These mostly stay the same (as expected), apart from
decimal
, which show ~50% speed and ~25% alloc improvements.Records
Before:
After:
Which is about 10% improvement.
Generic unions
Before:
After:
About 60% and 30% improvements in speed and allocs.
Reference tuples
Before:
After:
~5-15% faster execution.
Some (positive) implications for F#.Core.
F# core functions in question
Before:
After:
The improvement varies 20-40% in speed here.
Other considerations.
>64 bit value types
Before:
After:
This became marginally faster - but more importantly, it's here to address concerns about 64 bit JIT. So likely the underlying JIT problem got fixed meanwhile.
TODO
Followups