Faster equality in generic contexts #16615

psfinaki · 2024-01-30T16:47:00Z

What is this?

This is the first part of this effort to resurrect the awesome work on compiler performance by @manofstick.

How is this different?

Many things have changed, among other things we now have a bigger team & community to review things and a different release cadence allowing us to dogfood things a bit.

More importantly, the original PRs often contained multiple ideas (fixes, breaking, optimizations, experiments, ...), whereas I want to narrow PRs as much as possible to make things easier to review and control.

Is this PR breaking?

A tiny bit. This is mostly an optimization.

Here is an example of code where the behavior does change:

[<Struct; CustomEquality; NoComparison>]
type Blah =
    interface IEquatable<Blah> with
        member lhs.Equals rhs = failwith "bang"

let eq x y = 
    printfn "hello" // long code insuring no inlining
    ...
    x = y // generic equality context

let result = eq (Blah()) (Blah())  // currently false, will be exception

We agreed that it's OK here.

What's the source of inspiration here?

This is essentially part of the #5112, with the following changes:

Optimizer is not touched (there wasn't a consensus on that)
tail calls are not touched (same, also partially was there to work around JIT issues that were resolved later)
=> hence IL is not changed
some comparison optimizations are removed (I wasn't convinced by them)
refactoring is removed (might be done in a followup)
=> hence diff is minimized
benchmarks are added

So where does this improve things?

More theory and motivation is in this document - link will be updated once the PR merged.

TL;DR: this improves things when HashIdentity.Structural<'T> comparison is used in non-inlined code.

Example optimization

Let's look at the following code:

type Musician = {
    Name: string
    Surname: string
}

let musicians = [ 
    { Name = "Dave"; Surname = "Gahan" }
    { Name = "Jim"; Surname = "Morrisson" }
    { Name = "Robert"; Surname = "Smith" }
    { Name = "Dave"; Surname = "Grohl" }
    { Name = "Johnny"; Surname = "Marr" }
    { Name = "David"; Surname = "Gilmour" }
]

let getInitials musician =
    struct
        (musician.Name     |> Seq.head, 
         musician.Surname |> Seq.head)    

let result = 
    musicians 
    |> List.map getInitials
    |> List.distinct

How does List.distinct work?

There are 3 important parts of the algorithm to talk about.

Pick the comparer

The comparer is something implementing IEqualityComparer<'T> which means 2 methods: GetHashCode(x) and Equals(x, y).

List.distinct list calls List.distinctWithComparer HashIdentity.Structural<'T> and the logic for picking HashIdentity.Structural<'T> (the comparer) is written in prim-types.fs.

Initialize the hash set for the distinct elements

This means calling let hashSet = HashSet<'T>(comparer) with the comparer picked above.

Add the elements to the hash set

So in our case,

hashSet.Add('D', 'G')
hashSet.Add('J', 'M')
hashSet.Add('R', 'S')
hashSet.Add('D', 'G')
hashSet.Add('J', 'M')
hashSet.Add('D', 'G')

Now, the Add(element) operation works in the following way:

Do all the bucket initialization stuff they teach about in the universities
Execute GetHashCode to get the hash of the element
Check if this hash is already present
If yes, execute Equals to see if this is the same element and decide on adding it to the hash set
Otherwise just add the element to the hash set

This means, in our case there will be:

6 GetHashCode calls (since there are 6 elements altogether)
3 Equals calls (since there are only 3 unique elements)

Now, what changes in this PR is that we become smarter at picking the faster comparer (step 1). This brings enormous benefit at doing all the things in the step 3.

Before, this is how the comparer picking would be executed:

List.distinct list
  List.distinctWithComparer HashIdentity.Structural<'T> list
    // check if this is a basic F# type - we optimize things for them
    FastGenericEqualityComparerTable<'T>.Function
      // no, this is not a basic type hence go the worst case - create generic equality comparer
      MakeGenericEqualityComparer<'T>
        // this is what it creates - we'll get to the consequences later
        { new IEqualityComparer<'T> with 
              member _.GetHashCode(x) = GenericHash x 
              member _.Equals(x,y) = GenericEquality x y }

Now, this is what is going on:

List.distinct list
  List.distinctWithComparer HashIdentity.Structural<'T> list
    // call "smart" `canUseDefaultEqualityComparer` to see if this is a type applicable for the default equality comparison
    FastGenericEqualityComparerTable<'T>.Function
      // yes it is
      EqualityComparer<'T>.Default
        // this is property but it basically creates kind of a "native" comparer for this type, like
        { new IEqualityComparer<(char, char)> with 
              member _.GetHashCode(x) = x.GetHashCode() 
              member _.Equals(x,y) = x.Equals(y) }

Hence, this is the difference for each of the 6 GetHashCode calls in question (taking first element ('D', 'G') as an example).

Before, using the generic equality comparer:

GetHashCode ('D', 'G')
  GenericHashIntrinsic ('D', 'G')
    GenericHashParamObj (object ('D', 'G'))                      // boxing!
      (IStructuralEquatable (ValueTuple ('D', 'G'))).GetHashCode()
        GetHashCodeCore ('D', 'G')
          comparer.GetHashCode('D')
            GenericHashParamObj (object ('D'))                    // boxing!
              (char 'D').GetHashCode()
          comparer.GetHashCode('G')
            GenericHashParamObj (object ('G'))                    // boxing!
              (char 'G').GetHashCode()
          HashCode.Combine

Now, using the "native" comparer:

GetHashCode ('D', 'G')
      (ValueTuple ('D', 'G').GetHashCode()
        GetHashCodeCore ('D', 'G')
          comparer.GetHashCode('D')
            (char 'D').GetHashCode()
          comparer.GetHashCode('G')
	    (char 'G').GetHashCode()
          HashCode.Combine

Now, this difference for each of the 3 Equals calls in question (taking the elements ('D', 'G') as an example).

Before, using the generic equality comparer:

Equals ('D', 'G') ('D', 'G')
  GenericEqualityIntrinsic ('D', 'G') ('D', 'G')
    GenericEqualityObj (object ('D', 'G')) (object ('D', 'G'))                               // boxing!
      (IStructuralEquatable (ValueTuple ('D', 'G'))).Equals(ValueTuple ('D', 'G'))
        comparer.Equals('D', 'D'))
          GenericEqualityObj (object ('D')) (object ('D'))                                         // boxing!
	    (char 'D').Equals(object 'D')
	      'D' == (char)'D'
        comparer.Equals('G', 'G'))
          GenericEqualityObj (object ('G')) (object ('G'))                                         // boxing!
	    (char 'G').Equals(object 'G')
	      'G' == (char)'G'

Now, using the "native" comparer:

Equals ('D', 'G') ('D', 'G')
      (ValueTuple ('D', 'G')).Equals(ValueTuple ('D', 'G'))
	      (char 'D').Equals(char 'D')
	        'D' == 'D'
	      (char 'G').Equals(char 'G')
	        'G' == 'G'

We can see that here we remove all the boxing and use the optimal call chain. This brings huge benefits for the tiny cost of executing the "smart" function on deciding about the comparer once.

The benefits of this approach vary based on the concrete algorithm and the elements in question. See the details about improved call chains in the spec mentioned above.

If we modify the example to have 1000 elements with 10 unique ones, we get the following results:

Before:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
TheBenchmark	142.4 us	3.75 us	10.82 us	34.1797	-	-	210.49 KB

After:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
TheBenchmark	24.90 us	0.871 us	2.569 us	0.1526	-	-	984 B

Which means 6x faster and 213x less memory.

Benchmarks

Main targets: structs, enums, floats, and specia/l generic types:

Structs and enums

Before:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
FSharpStruct	503.7 us	36.91 us	107.67 us	68.3594	-	-	420.61 KB
FSharpEnum	142.7 us	4.76 us	13.73 us	22.9492	-	-	140.65 KB
CSharpStruct	148.2 us	3.81 us	10.94 us	38.4521	12.8174	-	237.58 KB
CSharpEnum	134.0 us	8.43 us	24.44 us	22.8271	-	-	140.59 KB

After:

Method	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
FSharpStruct	73.44 us	5.158 us	15.046 us	68.59 us	0.1221	-	-	792 B
FSharpEnum	16.10 us	0.396 us	1.163 us	16.35 us	0.0458	-	-	336 B
CSharpStruct	77.63 us	2.285 us	6.482 us	79.15 us	28.5645	9.4604	-	179312 B
CSharpEnum	15.92 us	0.638 us	1.839 us	16.21 us	0.0916	-	-	656 B

Huge improvements in both execution time and allocs, with especially remarkable results for native F# constructs.

Value tuples

Before:

Method	Mean	Ratio	Gen 0	Gen 1	Gen 2	Allocated
ValueTuple3	673.4 us	1.00	61.5234	15.8691	-	378.13 KB
ValueTuple4	812.2 us	1.22	69.0918	19.7754	-	424.98 KB
ValueTuple5	1,004.2 us	1.50	84.9609	24.4141	-	523.63 KB
ValueTuple6	1,100.7 us	1.65	92.7734	23.4375	-	570.48 KB
ValueTuple7	1,324.9 us	1.97	117.1875	57.6172	29.2969	669.14 KB
ValueTuple8	1,461.9 us	2.20	117.1875	58.1055	29.2969	762.85 KB

After:

Method	Mean	Ratio	Gen 0	Gen 1	Gen 2	Allocated
ValueTuple3	173.0 us	1.00	28.5645	9.3994	-	175.11 KB
ValueTuple4	174.9 us	1.03	28.5645	9.4604	-	175.11 KB
ValueTuple5	208.9 us	1.22	34.4238	11.3525	-	211.29 KB
ValueTuple6	217.0 us	1.26	34.4238	11.3525	-	211.29 KB
ValueTuple7	293.7 us	1.73	29.2969	29.2969	29.2969	247.48 KB
ValueTuple8	293.8 us	1.73	29.2969	29.2969	29.2969	247.48 KB

~80% in speed and ~50% in memory reduction, also much steeper ratios' increase for both.

Options and co

Before:

Method	Mean	Gen 0	Gen 1	Gen 2	Allocated
Option	165.0 us	16.3574	3.1738	-	101.74 KB
ValueOption	157.1 us	28.8086	4.3945	-	177.02 KB
Result	186.3 us	40.4053	10.0098	-	248.25 KB

After:

Method	Mean	Gen 0	Gen 1	Gen 2	Allocated
Option	82.13 us	12.6953	3.0518	-	78.33 KB
ValueOption	55.09 us	9.7656	1.5869	-	59.98 KB
Result	75.92 us	22.5830	5.6152	-	138.93 KB

50-75% speed and 25-75% memory improvements.

Nullable<'T>

Before:

Method	Mean	Gen 0	Gen 1	Gen 2	Allocated
Nullable	443.7 us	24.9023	3.9063	-	153.66 KB

After:

Method	Mean	Gen 0	Gen 1	Gen 2	Allocated
Nullable	60.16 us	9.7656	1.5869	-	59.94 KB

About 7x speed and 3x memory improvements.

Floats

Before:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
FloatER	51.47 us	2.228 us	6.570 us	7.0801	0.3662	-	43.68 KB
Float32ER	56.39 us	2.566 us	7.525 us	7.1106	0.3662	-	43.68 KB
FloatPER	149.49 us	2.952 us	7.513 us	38.3301	2.6855	-	231.34 KB
Float32PER	136.09 us	5.584 us	16.201 us	37.5977	2.4414	-	227.01 KB

After:

Method	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
FloatER	15.26 us	0.487 us	1.397 us	15.38 us	3.2806	0.1678	-	20.1 KB
Float32ER	15.80 us	0.316 us	0.666 us	15.82 us	3.2806	0.1678	-	20.1 KB
FloatPER	82.88 us	5.580 us	16.452 us	93.36 us	14.9536	1.2817	-	90.46 KB
Float32PER	94.22 us	1.851 us	3.904 us	93.97 us	14.1602	0.9766	-	86.43 KB

PER comparison still takes more time and memory but still the improvements are 2-3 times in all cases.

Also (positively) affected: basic types, arrays, reference types - due to shorter call chains and less casting:

Arrays

Before:

Method	Mean	Error	StdDev
Int32	974.3 us	73.48 us	214.3 us
Int64	1,090.7 us	58.65 us	172.9 us
Byte	1,075.3 us	41.56 us	121.9 us
Obj	1,451.8 us	43.91 us	128.8 us

After:

Method	Mean	Error	StdDev
Int32	253.3 us	18.09 us	52.78 us
Int64	312.3 us	14.06 us	41.45 us
Byte	246.5 us	6.11 us	17.82 us
Obj	489.3 us	17.69 us	51.61 us

About 3x faster.

F# basic types

Before (countBy):

Method	Mean	Error	StdDev
Bool	39.06 us	1.936 us	5.709 us
SByte	55.23 us	2.032 us	5.992 us
Byte	50.62 us	1.617 us	4.766 us
Int16	85.46 us	4.668 us	13.764 us
UInt16	86.66 us	3.189 us	9.351 us
Int32	86.21 us	4.690 us	13.827 us
UInt32	87.69 us	4.911 us	14.480 us
Int64	112.81 us	3.962 us	11.681 us
UInt64	112.66 us	4.003 us	11.550 us
IntPtr	114.61 us	3.430 us	10.114 us
UIntPtr	109.40 us	3.322 us	9.796 us
Char	98.99 us	2.825 us	8.330 us
String	214.52 us	5.968 us	17.503 us
Decimal	315.84 us	12.545 us	36.988 us

After (countBy):

Method	Mean	Error	StdDev
Bool	29.19 us	1.608 us	4.740 us
SByte	50.72 us	2.167 us	6.390 us
Byte	47.40 us	1.719 us	5.069 us
Int16	83.50 us	3.508 us	10.342 us
UInt16	84.48 us	2.949 us	8.649 us
Int32	87.24 us	2.670 us	7.832 us
UInt32	86.17 us	3.630 us	10.703 us
Int64	95.78 us	4.763 us	14.044 us
UInt64	113.86 us	4.278 us	12.479 us
IntPtr	110.92 us	3.192 us	9.412 us
UIntPtr	105.11 us	3.219 us	9.440 us
Char	91.03 us	4.164 us	12.211 us
String	214.23 us	5.859 us	17.276 us
Decimal	150.39 us	3.703 us	10.683 us

Before (distinct):

Method	Mean	Error	StdDev
Bool	9.489 us	0.7345 us	2.142 us
SByte	12.568 us	0.4049 us	1.168 us
Byte	12.393 us	0.4176 us	1.231 us
Int16	23.539 us	0.8906 us	2.626 us
UInt16	22.311 us	0.8351 us	2.449 us
Int32	22.302 us	0.4448 us	1.180 us
UInt32	22.092 us	0.4760 us	1.319 us
Int64	25.567 us	0.8679 us	2.518 us
UInt64	26.200 us	1.4150 us	4.172 us
IntPtr	25.528 us	1.4250 us	4.202 us
UIntPtr	25.000 us	0.8785 us	2.590 us
Char	22.883 us	0.8124 us	2.383 us
String	47.659 us	1.6451 us	4.799 us
Decimal	58.086 us	4.7662 us	13.828 us

After:

Method	Mean	Error	StdDev
Bool	9.408 us	0.9121 us	2.689 us
SByte	11.795 us	0.3516 us	1.020 us
Byte	11.225 us	0.4078 us	1.170 us
Int16	23.211 us	0.7982 us	2.354 us
UInt16	21.682 us	1.0798 us	3.184 us
Int32	20.035 us	0.6128 us	1.787 us
UInt32	20.123 us	0.8883 us	2.591 us
Int64	22.967 us	0.7908 us	2.319 us
UInt64	24.757 us	1.3208 us	3.832 us
IntPtr	26.448 us	1.3528 us	3.989 us
UIntPtr	24.596 us	1.1292 us	3.330 us
Char	22.670 us	0.8961 us	2.642 us
String	40.480 us	1.3767 us	3.994 us
Decimal	35.401 us	1.4948 us	4.289 us

Note that decimals also show improvement in memory allocations:

Before (countBy)

Method	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
Decimal	149.9 us	13.45 us	39.01 us	136.1 us	38.4521	12.8174	-	237.55 KB

After (countBy)

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Decimal	65.17 us	5.013 us	13.807 us	28.5645	9.4604	-	175.09 KB

Before (distinct)

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Decimal	57.83 us	6.097 us	17.977 us	26.3062	6.5308	-	162.35 KB

After (distinct)

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Decimal	34.79 us	1.572 us	4.585 us	21.2708	5.3101	-	131.1 KB

These mostly stay the same (as expected), apart from decimal, which show ~50% speed and ~25% alloc improvements.

Records

Before:

Method	Mean	Error	StdDev
Record	157.8 us	8.98 us	25.61 us
RecordStruct	163.5 us	5.87 us	17.31 us

After:

Method	Mean	Error	StdDev
Record	143.1 us	7.13 us	20.00 us
RecordStruct	149.5 us	3.68 us	10.80 us

Which is about 10% improvement.

Generic unions

Before:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
GenericUnion	439.8 us	17.85 us	51.78 us	41.9922	12.2070	-	260 KB

After:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
GenericUnion	167.1 us	3.32 us	7.22 us	26.9775	8.9722	-	166.3 KB

About 60% and 30% improvements in speed and allocs.

Reference tuples

Before:

Method	Mean	Error	StdDev
SmallNonGenericTuple	306.0 us	14.40 us	42.45 us
SmallGenericTuple	360.9 us	10.43 us	30.74 us
BigNonGenericTuple	393.2 us	13.33 us	39.30 us
BigGenericTuple	480.1 us	27.41 us	80.81 us
SmallNonGenericTupleStruct	167.4 us	5.07 us	14.62 us
SmallGenericTupleStruct	192.2 us	3.55 us	5.93 us
BigNonGenericTupleStruct	409.2 us	16.19 us	44.33 us
BigGenericTupleStruct	559.1 us	52.00 us	153.33 us

After:

Method	Mean	Error	StdDev
SmallNonGenericTuple	268.7 us	8.63 us	25.45 us
SmallGenericTuple	349.7 us	9.56 us	28.20 us
BigNonGenericTuple	356.2 us	12.56 us	37.03 us
BigGenericTuple	451.0 us	26.06 us	76.83 us
SmallNonGenericTupleStruct	132.3 us	4.12 us	11.96 us
SmallGenericTupleStruct	185.4 us	6.08 us	17.45 us
BigNonGenericTupleStruct	350.0 us	15.47 us	44.63 us
BigGenericTupleStruct	405.4 us	20.45 us	54.58 us

~5-15% faster execution.

Some (positive) implications for F#.Core.

F# core functions in question

Before:

Method	Mean	Error	StdDev	Median
ArrayCountBy	230.20 us	7.098 us	20.929 us	230.07 us
ArrayGroupBy	125.80 us	5.344 us	15.674 us	127.24 us
ArrayDistinct	121.08 us	8.648 us	25.500 us	127.72 us
ArrayDistinctBy	113.45 us	2.636 us	7.732 us	114.15 us
ArrayExcept	92.28 us	1.835 us	5.354 us	92.66 us
ListCountBy	225.45 us	11.730 us	34.585 us	230.79 us
ListGroupBy	167.40 us	12.814 us	37.783 us	151.49 us
ListDistinct	125.20 us	4.192 us	12.229 us	127.01 us
ListDistinctBy	109.52 us	3.480 us	10.097 us	110.94 us
ListExcept	194.14 us	20.467 us	60.347 us	164.68 us
SeqCountBy	460.13 us	13.224 us	38.575 us	459.04 us
SeqGroupBy	354.81 us	15.000 us	43.992 us	348.92 us
SeqDistinct	359.59 us	12.339 us	36.381 us	361.93 us
SeqDistinctBy	128.8 us	5.70 us	15.97 us	127.2 us
SeqExcept	127.8 us	5.03 us	14.74 us	128.1 us

After:

Method	Mean	Error	StdDev	Median
ArrayCountBy	137.30 us	5.008 us	14.767 us	137.01 us
ArrayGroupBy	78.80 us	4.673 us	13.778 us	80.87 us
ArrayDistinct	71.63 us	3.698 us	10.904 us	73.49 us
ArrayDistinctBy	69.93 us	2.148 us	6.299 us	70.02 us
ArrayExcept	68.39 us	3.071 us	9.008 us	69.59 us
ListCountBy	138.17 us	6.708 us	19.566 us	142.05 us
ListGroupBy	129.57 us	12.765 us	37.639 us	110.78 us
ListDistinct	85.99 us	1.662 us	3.683 us	85.73 us
ListDistinctBy	67.36 us	3.001 us	8.707 us	67.95 us
ListExcept	173.01 us	21.536 us	63.499 us	138.14 us
SeqCountBy	289.70 us	12.109 us	35.514 us	295.43 us
SeqGroupBy	245.97 us	6.969 us	20.329 us	246.79 us
SeqDistinct	254.17 us	14.776 us	43.336 us	257.62 us
SeqDistinctBy	101.7 us	5.35 us	14.91 us	102.4 us
SeqExcept	105.9 us	5.20 us	15.35 us	102.1 us

The improvement varies 20-40% in speed here.

Other considerations.

>64 bit value types

Before:

Method	Mean	Error	StdDev
BigStruct	47.43 ms	3.136 ms	9.098 ms

After:

Method	Mean	Error	StdDev
BigStruct	44.57 ms	3.243 ms	9.562 ms

This became marginally faster - but more importantly, it's here to address concerns about 64 bit JIT. So likely the underlying JIT problem got fixed meanwhile.

TODO

AOT tests to see if startup performance is not too affected
Finish the design doc on equality
Describe the changed code flow

Followups

Revive equality devirtualization
Add optimizations for recently added primitive types, as suggested here
Do the same for inequality!

github-actions · 2024-01-30T16:47:37Z

❗ Release notes required

✅ Found changes and release notes in following paths:

Change path Release notes path Description

src/FSharp.Core docs/release-notes/.FSharp.Core/8.0.300.md

src/FSharp.Core/prim-types.fs

vzarytovskii · 2024-02-08T17:44:31Z

Benchmark results are weird, numbers before and after are within the statistical error. What about allocations? We shouldn't be boxing as much now?

psfinaki · 2024-02-08T17:48:39Z

@vzarytovskii yeah those are not the right benchmarks for this PR. We had a session with Don today to figure out the right ones and I am already getting some 25-30% improvements there, will post soon - stay tuned.

tests/benchmarks/CompiledCodeBenchmarks/MicroPerf/Equality.fs

psfinaki · 2024-02-12T16:05:50Z

/azp run

azure-pipelines · 2024-02-12T16:06:04Z

Azure Pipelines successfully started running 2 pipeline(s).

tests/benchmarks/CompiledCodeBenchmarks/MicroPerf/Equality.fs

manofstick · 2024-02-13T13:21:08Z

Is this PR breaking?
No. This is (supposed to be just) an optimization.

Well... It'll be calling IEquatable.Equals<>, not object.Equals, which is hence a breaking change - albeit I feel in a fashion that should be part of an evolution...

...and really should be paired with the optimizer change (@TIHan I believe worked on one at some stage?) Otherwise you have the somewhat bizarre situation where using an comparison of an external struct type such as a NodaTime.Instant in a generic context doesn't box and is fast, vs in a non-generic context where it does box is an is slower... Unless that has been resolved already?

Anyway, glad to see this moving forward. My beard is somewhat grayer now that when I first started trying to improve comparison/equality in F#!!

psfinaki · 2024-02-13T14:06:01Z

/azp run

azure-pipelines · 2024-02-13T14:06:14Z

Azure Pipelines successfully started running 2 pipeline(s).

psfinaki · 2024-02-13T15:08:05Z

Marking as ready to review since the CI is green finally :D

psfinaki · 2024-02-13T17:13:04Z

And glad to see you here, @manofstick :) BTW thanks for the clear commits and comments to the commits in your original PRs, that really helped me a lot.

As for this one, well, it's not breaking in terms of NaN behavior and in terms of IL. So far. That's probably what we care about the most.

src/FSharp.Core/prim-types.fs

vzarytovskii · 2024-02-14T08:49:26Z

Since it touches an important part of the language, I would like to have three things:

What changed, in the form of
Given (generic) code: ...
How it worked before: ...
How does it work now: ...

Unless it exists somewhere already. It should help everyone in understanding this change in future in case of any issues.
Some AOT compilation tests involving changed equality. It can be a new project, like the trimming one we have. We should start having more AOT testing, and this is a good place to start, since we know AOT was working with equality before, it's crucial it's still working after the change.
Can we have some compiler perf comparison/profiling of using compiler+fslib before and after changes on something like FCS? It would be nice to have it as reference.

psfinaki · 2024-02-14T12:04:49Z

Thanks all for the reviews and the feedback.

This needs a thorough rereview with @dsyme, meanwhile I will be improving the description, clarifying the code and adding tests.

dsyme · 2024-02-14T13:07:48Z

...and really should be paired with the optimizer change (@TIHan I believe worked on one at some stage?) Otherwise you have the somewhat bizarre situation where using an comparison of an external struct type such as a NodaTime.Instant in a generic context doesn't box and is fast, vs in a non-generic context where it does box is an is slower... Unless that has been resolved already?

Agreed that in principle these should go together, though can be done in a separate PR.

Well... It'll be calling IEquatable.Equals<>, not object.Equals, which is hence a breaking change - albeit I feel in a fashion that should be part of an evolution...

I think this specific change is OK - use IEquatable.Equals if it exists.

src/FSharp.Core/prim-types.fs

dsyme · 2024-02-14T13:15:31Z

Well... It'll be calling IEquatable.Equals<>, not object.Equals, which is hence a breaking change - albeit I feel in a fashion that should be part of an evolution...

@manofstick Could you remind me of the specific thing that causes the call to IEquatable.Equals<, and which types it applies to. I assume it comes from EqualityComparer<'T>.Default and thus only applies to types that pass canUseDefaultEqualityComparer?

I think that is OK as a change. It is vanishingly rare to explicitly implement IEquatable.Equals<> on struct types in F# code, (except the automatic implementations provided by the compiler) - and even then it would always be the desired behaviour to have that implementation invoked

vzarytovskii · 2024-02-14T13:32:24Z

Also, does this need an addition to spec (rfc), since it changes how we do equality? Or rather shall the doc from the other PR live in the design repo?

psfinaki · 2024-02-14T14:05:36Z

@vzarytovskii I think it's a good idea to have that doc (from the other PR) in the design repo. This PR is meant to be an optimization and hence focuses on the implementation. I am looking at the spec while working on this and my impression so far is that the spec doesn't go that deep in the implementation to make this change anyhow misalign with it. If we discover anything of that nature, we can make the RFC or further discuss it.

psfinaki · 2024-02-14T18:32:37Z

I added benchmarks for the value tuples, options and that stuff - the results are very convincing so far.
More to come soon!

manofstick · 2024-02-15T02:02:44Z

@manofstick Could you remind me of the specific thing that causes the call to IEquatable.Equals<, and which types it applies to. I assume it comes from EqualityComparer<'T>.Default and thus only applies to types that pass canUseDefaultEqualityComparer?

Yep, simple as that.

I think that is OK as a change. It is vanishingly rare to explicitly implement IEquatable.Equals<> on struct types in F# code, (except the automatic implementations provided by the compiler) - and even then it would always be the desired behaviour to have that implementation invoked

Oh, completely agree. It was just that this was (as far as I understand it) the main reason why this change never went through in the past.

And just confirming there, given your comment, this change, as implemented, would be all types (that pass the check) using IEquatable<>.Equals in preference to object.Equals not just value types (obviously struct types get the most benefit, as they avoid the boxing step, but ref types can also get a slight improvement as they, don't need to cast input). But the check includes IsSealed, hence negating any potential inheritance issues. (Once again, from memory, don't have a compiler here to test, but F# records are created as Sealed always)

psfinaki · 2024-02-15T17:46:39Z

@manofstick just to be sure we're on the same page - which input cast do you mean? :)

manofstick · 2024-02-15T20:08:01Z

@psfinaki

...forgive my github comment coding, I think memory serves, but it's more than enough for the gist of things if it's incorrect...

[<Sealed>] // only for sealed
type Blah(...) =
    ...

    override lhs.Equals (rhsObj:obj) = // (1)
        if obj.ReferenceEquals (lhs, rhsObj) then
            true
        else 
            match rhsObj with
            | :? Blah as rhs -> // (2)
                (lhs:>IEquatable<Blah>).Equals rhs // (4)
            | _ -> false

    interface IEquatable<Blah> with
        member lhs.Equals rhs = // (3)
            // actually do the check

Well (1) would of been the entry point, where at (2) would of been the cast (well type check), but EqualityComparer<Blah>.Default will call (3) directly.

Most likely the actually equality check in these cases will be the significant component of the cost, so savings is minimal, but as mentioned, this is just for complete understanding of what's going on here...

(4) ... just because I'm showing off, the IL that F# used to generate actually caused this operation to have a cost, but way back at the dawn of history I fixed that... ;-)

dsyme · 2024-02-15T22:27:00Z

@psfinaki I believe it should be possible to write a test case that detects the change, e.g.

[<Sealed, Struct>]
type Blah() =
    override lhs.Equals (rhsObj:obj) = true
    interface IEquatable<Blah> with
        member lhs.Equals rhs = failwith "bang"

let eq x y = 
    printfn "hello" // do this enough times to make sure no inlining
    printfn "hello" // do this enough times to make sure no inlining
    printfn "hello" // do this enough times to make sure no inlining
    printfn "hello" // do this enough times to make sure no inlining
    (x = y) // generic equality context

eq (Blah()) (Blah())  // true prior to this, exception now

I don't really think this needs a runtime library switch to disable, and we don't currently have such a mechanism to do those kinds of flags anyway

The equality spec should probably be updated to mention this case

psfinaki · 2024-02-16T16:54:23Z

Yes, correct. For reference, I guess this is the minimal code to demo the behavior difference:

[<Struct; CustomEquality; NoComparison>]
type Blah =
    interface IEquatable<Blah> with
        member lhs.Equals rhs = failwith "bang"

let eq x y = 
    printfn "hello" // do this enough times to make sure no inlining
    ...
    x = y // generic equality context

let result = eq (Blah()) (Blah())  // false prior to this, exception now

Alright, so we've identified what breaks here and blessed it at the same time. Indeed, this is quite an esoteric scenario.

Good stuff, I will update the spec and the description to reflect this.

vzarytovskii · 2024-02-17T18:31:30Z

Non-inlined stacktrace examples before and after could be helpful for reviewers as well, could you add it pelase, if it's not too much work.

psfinaki · 2024-02-19T12:24:37Z

Yep I am going to add it to the PR description.

psfinaki · 2024-02-23T16:55:02Z

Okay so hopefully getting to the finish line here, I think I have added enough micro benchmarks and grouped them in the PR description.

Next week will add AOT testing, refresh the equality design doc, finalize the PR description and give this a final review with Don :)

dsyme

Looking good!

tests/AheadOfTime/Equality/Equality.fsproj

azure-pipelines.yml

psfinaki · 2024-03-04T17:30:25Z

/azp run

azure-pipelines · 2024-03-04T17:30:44Z

Azure Pipelines successfully started running 2 pipeline(s).

KevinRansom · 2024-03-04T18:57:06Z

Nice

manofstick · 2024-03-05T07:27:51Z

64 bit value types
This became marginally faster

I'm just relying on your words here, as I have not built this, but they should be significantly faster, and non boxing (I'm not talking about my concern re tail calls here). I'm guessing that because they implement 'IStructuralEquality' they are using that path, and hence not being sped up. I thought in my original PR I handled these too, maybe I didn't, or maybe you didn't carry that code across, I don't know.

Anyway, what I would suggest is that the code for 'canUseDefault...' is used at compile time, and if default can be used then the 'IStructuralEquals' interface isn't implemented. This, once again, is a breaking change, but I think it's even lesser than the one introduced by this PR.

Another follow up, if it hasn't been started yet, would be used user Compare<>.Default for inequalities....

psfinaki · 2024-03-05T09:56:38Z

Yes, I think we will get to those as well. Next bigger one will be devirt equality, then we'll probably look into comparison. There is a lot of useful stuff to pull out of your contributions :)

dsyme reviewed Feb 7, 2024

View reviewed changes

src/FSharp.Core/prim-types.fs Show resolved Hide resolved

dsyme reviewed Feb 8, 2024

View reviewed changes

src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved

vzarytovskii reviewed Feb 9, 2024

View reviewed changes

tests/benchmarks/CompiledCodeBenchmarks/MicroPerf/Equality.fs Outdated Show resolved Hide resolved

psfinaki force-pushed the equality-3 branch from 7864404 to d5557fe Compare February 13, 2024 12:56

manofstick reviewed Feb 13, 2024

View reviewed changes

tests/benchmarks/CompiledCodeBenchmarks/MicroPerf/Equality.fs Outdated Show resolved Hide resolved

psfinaki changed the title ~~WIP: Faster equality in generic contexts~~ Faster equality in generic contexts Feb 13, 2024

psfinaki marked this pull request as ready for review February 13, 2024 15:08

psfinaki requested a review from a team as a code owner February 13, 2024 15:08

KevinRansom reviewed Feb 13, 2024

View reviewed changes

brianrourkeboll reviewed Feb 13, 2024

View reviewed changes

src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved

src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved

dsyme reviewed Feb 14, 2024

View reviewed changes

src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved

psfinaki mentioned this pull request Feb 20, 2024

Better integral range lowering: start..finish, start..step..finish #16650

Merged

3 tasks

psfinaki force-pushed the equality-3 branch 2 times, most recently from 7b821e1 to b91cdb7 Compare February 23, 2024 15:23

dsyme approved these changes Feb 29, 2024

View reviewed changes

T-Gro reviewed Mar 4, 2024

View reviewed changes

tests/AheadOfTime/Equality/Equality.fsproj Show resolved Hide resolved

T-Gro reviewed Mar 4, 2024

View reviewed changes

azure-pipelines.yml Show resolved Hide resolved

psfinaki added 2 commits March 4, 2024 16:29

Faster equality in generic contexts

a066f37

Up

fe9fcbb

psfinaki force-pushed the equality-3 branch from 14401cd to fe9fcbb Compare March 4, 2024 15:42

KevinRansom approved these changes Mar 4, 2024

View reviewed changes

psfinaki merged commit 9877cfe into dotnet:main Mar 4, 2024
31 checks passed

psfinaki deleted the equality-3 branch March 4, 2024 19:01

psfinaki mentioned this pull request Mar 5, 2024

Aligning equality docs with the new reality #16816

Merged

dotnet-bot mentioned this pull request Mar 12, 2024

[Automated] PRs inserted in VS build main-34712.16 #16864

Closed

psfinaki mentioned this pull request May 17, 2024

Generate new Equals overload to avoid boxing for structural comparison #16857

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster equality in generic contexts #16615

Faster equality in generic contexts #16615

psfinaki commented Jan 30, 2024 •

edited

Loading

github-actions bot commented Jan 30, 2024 •

edited

Loading

vzarytovskii commented Feb 8, 2024

psfinaki commented Feb 8, 2024

psfinaki commented Feb 12, 2024

azure-pipelines bot commented Feb 12, 2024

manofstick commented Feb 13, 2024

psfinaki commented Feb 13, 2024

azure-pipelines bot commented Feb 13, 2024

psfinaki commented Feb 13, 2024

psfinaki commented Feb 13, 2024

vzarytovskii commented Feb 14, 2024 •

edited

Loading

psfinaki commented Feb 14, 2024

dsyme commented Feb 14, 2024

dsyme commented Feb 14, 2024 •

edited

Loading

vzarytovskii commented Feb 14, 2024

psfinaki commented Feb 14, 2024

psfinaki commented Feb 14, 2024

manofstick commented Feb 15, 2024

psfinaki commented Feb 15, 2024

manofstick commented Feb 15, 2024 •

edited

Loading

dsyme commented Feb 15, 2024 •

edited

Loading

psfinaki commented Feb 16, 2024 •

edited

Loading

vzarytovskii commented Feb 17, 2024

psfinaki commented Feb 19, 2024

psfinaki commented Feb 23, 2024

dsyme left a comment

psfinaki commented Mar 4, 2024

azure-pipelines bot commented Mar 4, 2024

KevinRansom commented Mar 4, 2024

manofstick commented Mar 5, 2024

psfinaki commented Mar 5, 2024

Faster equality in generic contexts #16615

Faster equality in generic contexts #16615

Conversation

psfinaki commented Jan 30, 2024 • edited Loading

github-actions bot commented Jan 30, 2024 • edited Loading

❗ Release notes required

vzarytovskii commented Feb 8, 2024

psfinaki commented Feb 8, 2024

psfinaki commented Feb 12, 2024

azure-pipelines bot commented Feb 12, 2024

manofstick commented Feb 13, 2024

psfinaki commented Feb 13, 2024

azure-pipelines bot commented Feb 13, 2024

psfinaki commented Feb 13, 2024

psfinaki commented Feb 13, 2024

vzarytovskii commented Feb 14, 2024 • edited Loading

psfinaki commented Feb 14, 2024

dsyme commented Feb 14, 2024

dsyme commented Feb 14, 2024 • edited Loading

vzarytovskii commented Feb 14, 2024

psfinaki commented Feb 14, 2024

psfinaki commented Feb 14, 2024

manofstick commented Feb 15, 2024

psfinaki commented Feb 15, 2024

manofstick commented Feb 15, 2024 • edited Loading

dsyme commented Feb 15, 2024 • edited Loading

psfinaki commented Feb 16, 2024 • edited Loading

vzarytovskii commented Feb 17, 2024

psfinaki commented Feb 19, 2024

psfinaki commented Feb 23, 2024

dsyme left a comment

Choose a reason for hiding this comment

psfinaki commented Mar 4, 2024

azure-pipelines bot commented Mar 4, 2024

KevinRansom commented Mar 4, 2024

manofstick commented Mar 5, 2024

psfinaki commented Mar 5, 2024

psfinaki commented Jan 30, 2024 •

edited

Loading

github-actions bot commented Jan 30, 2024 •

edited

Loading

vzarytovskii commented Feb 14, 2024 •

edited

Loading

dsyme commented Feb 14, 2024 •

edited

Loading

manofstick commented Feb 15, 2024 •

edited

Loading

dsyme commented Feb 15, 2024 •

edited

Loading

psfinaki commented Feb 16, 2024 •

edited

Loading