Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster equality in generic contexts #16615

Merged
merged 2 commits into from
Mar 4, 2024
Merged

Conversation

psfinaki
Copy link
Member

@psfinaki psfinaki commented Jan 30, 2024

What is this?

This is the first part of this effort to resurrect the awesome work on compiler performance by @manofstick.

How is this different?

Many things have changed, among other things we now have a bigger team & community to review things and a different release cadence allowing us to dogfood things a bit.

More importantly, the original PRs often contained multiple ideas (fixes, breaking, optimizations, experiments, ...), whereas I want to narrow PRs as much as possible to make things easier to review and control.

Is this PR breaking?

A tiny bit. This is mostly an optimization.

Here is an example of code where the behavior does change:

[<Struct; CustomEquality; NoComparison>]
type Blah =
    interface IEquatable<Blah> with
        member lhs.Equals rhs = failwith "bang"

let eq x y = 
    printfn "hello" // long code insuring no inlining
    ...
    x = y // generic equality context

let result = eq (Blah()) (Blah())  // currently false, will be exception

We agreed that it's OK here.

What's the source of inspiration here?

This is essentially part of the #5112, with the following changes:

  • Optimizer is not touched (there wasn't a consensus on that)
  • tail calls are not touched (same, also partially was there to work around JIT issues that were resolved later)
  • => hence IL is not changed
  • some comparison optimizations are removed (I wasn't convinced by them)
  • refactoring is removed (might be done in a followup)
  • => hence diff is minimized
  • benchmarks are added

So where does this improve things?

More theory and motivation is in this document - link will be updated once the PR merged.

TL;DR: this improves things when HashIdentity.Structural<'T> comparison is used in non-inlined code.

Example optimization

Let's look at the following code:

type Musician = {
    Name: string
    Surname: string
}

let musicians = [ 
    { Name = "Dave"; Surname = "Gahan" }
    { Name = "Jim"; Surname = "Morrisson" }
    { Name = "Robert"; Surname = "Smith" }
    { Name = "Dave"; Surname = "Grohl" }
    { Name = "Johnny"; Surname = "Marr" }
    { Name = "David"; Surname = "Gilmour" }
]

let getInitials musician =
    struct
        (musician.Name     |> Seq.head, 
         musician.Surname |> Seq.head)    

let result = 
    musicians 
    |> List.map getInitials
    |> List.distinct

How does List.distinct work?

There are 3 important parts of the algorithm to talk about.

  1. Pick the comparer

The comparer is something implementing IEqualityComparer<'T> which means 2 methods: GetHashCode(x) and Equals(x, y).

List.distinct list calls List.distinctWithComparer HashIdentity.Structural<'T> and the logic for picking HashIdentity.Structural<'T> (the comparer) is written in prim-types.fs.

  1. Initialize the hash set for the distinct elements

This means calling let hashSet = HashSet<'T>(comparer) with the comparer picked above.

  1. Add the elements to the hash set

So in our case,

hashSet.Add('D', 'G')
hashSet.Add('J', 'M')
hashSet.Add('R', 'S')
hashSet.Add('D', 'G')
hashSet.Add('J', 'M')
hashSet.Add('D', 'G')

Now, the Add(element) operation works in the following way:

  1. Do all the bucket initialization stuff they teach about in the universities
  2. Execute GetHashCode to get the hash of the element
  3. Check if this hash is already present
  4. If yes, execute Equals to see if this is the same element and decide on adding it to the hash set
  5. Otherwise just add the element to the hash set

This means, in our case there will be:

  • 6 GetHashCode calls (since there are 6 elements altogether)
  • 3 Equals calls (since there are only 3 unique elements)

Now, what changes in this PR is that we become smarter at picking the faster comparer (step 1). This brings enormous benefit at doing all the things in the step 3.

Before, this is how the comparer picking would be executed:

List.distinct list
  List.distinctWithComparer HashIdentity.Structural<'T> list
    // check if this is a basic F# type - we optimize things for them
    FastGenericEqualityComparerTable<'T>.Function
      // no, this is not a basic type hence go the worst case - create generic equality comparer
      MakeGenericEqualityComparer<'T>
        // this is what it creates - we'll get to the consequences later
        { new IEqualityComparer<'T> with 
              member _.GetHashCode(x) = GenericHash x 
              member _.Equals(x,y) = GenericEquality x y }

Now, this is what is going on:

List.distinct list
  List.distinctWithComparer HashIdentity.Structural<'T> list
    // call "smart" `canUseDefaultEqualityComparer` to see if this is a type applicable for the default equality comparison
    FastGenericEqualityComparerTable<'T>.Function
      // yes it is
      EqualityComparer<'T>.Default
        // this is property but it basically creates kind of a "native" comparer for this type, like
        { new IEqualityComparer<(char, char)> with 
              member _.GetHashCode(x) = x.GetHashCode() 
              member _.Equals(x,y) = x.Equals(y) }

Hence, this is the difference for each of the 6 GetHashCode calls in question (taking first element ('D', 'G') as an example).

Before, using the generic equality comparer:

GetHashCode ('D', 'G')
  GenericHashIntrinsic ('D', 'G')
    GenericHashParamObj (object ('D', 'G'))                      // boxing!
      (IStructuralEquatable (ValueTuple ('D', 'G'))).GetHashCode()
        GetHashCodeCore ('D', 'G')
          comparer.GetHashCode('D')
            GenericHashParamObj (object ('D'))                    // boxing!
              (char 'D').GetHashCode()
          comparer.GetHashCode('G')
            GenericHashParamObj (object ('G'))                    // boxing!
              (char 'G').GetHashCode()
          HashCode.Combine

Now, using the "native" comparer:

GetHashCode ('D', 'G')
      (ValueTuple ('D', 'G').GetHashCode()
        GetHashCodeCore ('D', 'G')
          comparer.GetHashCode('D')
            (char 'D').GetHashCode()
          comparer.GetHashCode('G')
	    (char 'G').GetHashCode()
          HashCode.Combine

Now, this difference for each of the 3 Equals calls in question (taking the elements ('D', 'G') as an example).

Before, using the generic equality comparer:

Equals ('D', 'G') ('D', 'G')
  GenericEqualityIntrinsic ('D', 'G') ('D', 'G')
    GenericEqualityObj (object ('D', 'G')) (object ('D', 'G'))                               // boxing!
      (IStructuralEquatable (ValueTuple ('D', 'G'))).Equals(ValueTuple ('D', 'G'))
        comparer.Equals('D', 'D'))
          GenericEqualityObj (object ('D')) (object ('D'))                                         // boxing!
	    (char 'D').Equals(object 'D')
	      'D' == (char)'D'
        comparer.Equals('G', 'G'))
          GenericEqualityObj (object ('G')) (object ('G'))                                         // boxing!
	    (char 'G').Equals(object 'G')
	      'G' == (char)'G'

Now, using the "native" comparer:

Equals ('D', 'G') ('D', 'G')
      (ValueTuple ('D', 'G')).Equals(ValueTuple ('D', 'G'))
	      (char 'D').Equals(char 'D')
	        'D' == 'D'
	      (char 'G').Equals(char 'G')
	        'G' == 'G'

We can see that here we remove all the boxing and use the optimal call chain. This brings huge benefits for the tiny cost of executing the "smart" function on deciding about the comparer once.

The benefits of this approach vary based on the concrete algorithm and the elements in question. See the details about improved call chains in the spec mentioned above.

If we modify the example to have 1000 elements with 10 unique ones, we get the following results:

Before:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
TheBenchmark 142.4 us 3.75 us 10.82 us 34.1797 - - 210.49 KB

After:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
TheBenchmark 24.90 us 0.871 us 2.569 us 0.1526 - - 984 B

Which means 6x faster and 213x less memory.

Benchmarks

Main targets: structs, enums, floats, and specia/l generic types:

Structs and enums

Before:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
FSharpStruct 503.7 us 36.91 us 107.67 us 68.3594 - - 420.61 KB
FSharpEnum 142.7 us 4.76 us 13.73 us 22.9492 - - 140.65 KB
CSharpStruct 148.2 us 3.81 us 10.94 us 38.4521 12.8174 - 237.58 KB
CSharpEnum 134.0 us 8.43 us 24.44 us 22.8271 - - 140.59 KB

After:

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
FSharpStruct 73.44 us 5.158 us 15.046 us 68.59 us 0.1221 - - 792 B
FSharpEnum 16.10 us 0.396 us 1.163 us 16.35 us 0.0458 - - 336 B
CSharpStruct 77.63 us 2.285 us 6.482 us 79.15 us 28.5645 9.4604 - 179312 B
CSharpEnum 15.92 us 0.638 us 1.839 us 16.21 us 0.0916 - - 656 B

Huge improvements in both execution time and allocs, with especially remarkable results for native F# constructs.

Value tuples

Before:

Method Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
ValueTuple3 673.4 us 1.00 61.5234 15.8691 - 378.13 KB
ValueTuple4 812.2 us 1.22 69.0918 19.7754 - 424.98 KB
ValueTuple5 1,004.2 us 1.50 84.9609 24.4141 - 523.63 KB
ValueTuple6 1,100.7 us 1.65 92.7734 23.4375 - 570.48 KB
ValueTuple7 1,324.9 us 1.97 117.1875 57.6172 29.2969 669.14 KB
ValueTuple8 1,461.9 us 2.20 117.1875 58.1055 29.2969 762.85 KB

After:

Method Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
ValueTuple3 173.0 us 1.00 28.5645 9.3994 - 175.11 KB
ValueTuple4 174.9 us 1.03 28.5645 9.4604 - 175.11 KB
ValueTuple5 208.9 us 1.22 34.4238 11.3525 - 211.29 KB
ValueTuple6 217.0 us 1.26 34.4238 11.3525 - 211.29 KB
ValueTuple7 293.7 us 1.73 29.2969 29.2969 29.2969 247.48 KB
ValueTuple8 293.8 us 1.73 29.2969 29.2969 29.2969 247.48 KB

~80% in speed and ~50% in memory reduction, also much steeper ratios' increase for both.

Options and co

Before:

Method Mean Gen 0 Gen 1 Gen 2 Allocated
Option 165.0 us 16.3574 3.1738 - 101.74 KB
ValueOption 157.1 us 28.8086 4.3945 - 177.02 KB
Result 186.3 us 40.4053 10.0098 - 248.25 KB

After:

Method Mean Gen 0 Gen 1 Gen 2 Allocated
Option 82.13 us 12.6953 3.0518 - 78.33 KB
ValueOption 55.09 us 9.7656 1.5869 - 59.98 KB
Result 75.92 us 22.5830 5.6152 - 138.93 KB

50-75% speed and 25-75% memory improvements.

Nullable<'T>

Before:

Method Mean Gen 0 Gen 1 Gen 2 Allocated
Nullable 443.7 us 24.9023 3.9063 - 153.66 KB

After:

Method Mean Gen 0 Gen 1 Gen 2 Allocated
Nullable 60.16 us 9.7656 1.5869 - 59.94 KB

About 7x speed and 3x memory improvements.

Floats

Before:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
FloatER 51.47 us 2.228 us 6.570 us 7.0801 0.3662 - 43.68 KB
Float32ER 56.39 us 2.566 us 7.525 us 7.1106 0.3662 - 43.68 KB
FloatPER 149.49 us 2.952 us 7.513 us 38.3301 2.6855 - 231.34 KB
Float32PER 136.09 us 5.584 us 16.201 us 37.5977 2.4414 - 227.01 KB

After:

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
FloatER 15.26 us 0.487 us 1.397 us 15.38 us 3.2806 0.1678 - 20.1 KB
Float32ER 15.80 us 0.316 us 0.666 us 15.82 us 3.2806 0.1678 - 20.1 KB
FloatPER 82.88 us 5.580 us 16.452 us 93.36 us 14.9536 1.2817 - 90.46 KB
Float32PER 94.22 us 1.851 us 3.904 us 93.97 us 14.1602 0.9766 - 86.43 KB

PER comparison still takes more time and memory but still the improvements are 2-3 times in all cases.


Also (positively) affected: basic types, arrays, reference types - due to shorter call chains and less casting:

Arrays

Before:

Method Mean Error StdDev
Int32 974.3 us 73.48 us 214.3 us
Int64 1,090.7 us 58.65 us 172.9 us
Byte 1,075.3 us 41.56 us 121.9 us
Obj 1,451.8 us 43.91 us 128.8 us

After:

Method Mean Error StdDev
Int32 253.3 us 18.09 us 52.78 us
Int64 312.3 us 14.06 us 41.45 us
Byte 246.5 us 6.11 us 17.82 us
Obj 489.3 us 17.69 us 51.61 us

About 3x faster.

F# basic types

Before (countBy):

Method Mean Error StdDev
Bool 39.06 us 1.936 us 5.709 us
SByte 55.23 us 2.032 us 5.992 us
Byte 50.62 us 1.617 us 4.766 us
Int16 85.46 us 4.668 us 13.764 us
UInt16 86.66 us 3.189 us 9.351 us
Int32 86.21 us 4.690 us 13.827 us
UInt32 87.69 us 4.911 us 14.480 us
Int64 112.81 us 3.962 us 11.681 us
UInt64 112.66 us 4.003 us 11.550 us
IntPtr 114.61 us 3.430 us 10.114 us
UIntPtr 109.40 us 3.322 us 9.796 us
Char 98.99 us 2.825 us 8.330 us
String 214.52 us 5.968 us 17.503 us
Decimal 315.84 us 12.545 us 36.988 us

After (countBy):

Method Mean Error StdDev
Bool 29.19 us 1.608 us 4.740 us
SByte 50.72 us 2.167 us 6.390 us
Byte 47.40 us 1.719 us 5.069 us
Int16 83.50 us 3.508 us 10.342 us
UInt16 84.48 us 2.949 us 8.649 us
Int32 87.24 us 2.670 us 7.832 us
UInt32 86.17 us 3.630 us 10.703 us
Int64 95.78 us 4.763 us 14.044 us
UInt64 113.86 us 4.278 us 12.479 us
IntPtr 110.92 us 3.192 us 9.412 us
UIntPtr 105.11 us 3.219 us 9.440 us
Char 91.03 us 4.164 us 12.211 us
String 214.23 us 5.859 us 17.276 us
Decimal 150.39 us 3.703 us 10.683 us

Before (distinct):

Method Mean Error StdDev
Bool 9.489 us 0.7345 us 2.142 us
SByte 12.568 us 0.4049 us 1.168 us
Byte 12.393 us 0.4176 us 1.231 us
Int16 23.539 us 0.8906 us 2.626 us
UInt16 22.311 us 0.8351 us 2.449 us
Int32 22.302 us 0.4448 us 1.180 us
UInt32 22.092 us 0.4760 us 1.319 us
Int64 25.567 us 0.8679 us 2.518 us
UInt64 26.200 us 1.4150 us 4.172 us
IntPtr 25.528 us 1.4250 us 4.202 us
UIntPtr 25.000 us 0.8785 us 2.590 us
Char 22.883 us 0.8124 us 2.383 us
String 47.659 us 1.6451 us 4.799 us
Decimal 58.086 us 4.7662 us 13.828 us

After:

Method Mean Error StdDev
Bool 9.408 us 0.9121 us 2.689 us
SByte 11.795 us 0.3516 us 1.020 us
Byte 11.225 us 0.4078 us 1.170 us
Int16 23.211 us 0.7982 us 2.354 us
UInt16 21.682 us 1.0798 us 3.184 us
Int32 20.035 us 0.6128 us 1.787 us
UInt32 20.123 us 0.8883 us 2.591 us
Int64 22.967 us 0.7908 us 2.319 us
UInt64 24.757 us 1.3208 us 3.832 us
IntPtr 26.448 us 1.3528 us 3.989 us
UIntPtr 24.596 us 1.1292 us 3.330 us
Char 22.670 us 0.8961 us 2.642 us
String 40.480 us 1.3767 us 3.994 us
Decimal 35.401 us 1.4948 us 4.289 us

Note that decimals also show improvement in memory allocations:

Before (countBy)

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
Decimal 149.9 us 13.45 us 39.01 us 136.1 us 38.4521 12.8174 - 237.55 KB

After (countBy)

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Decimal 65.17 us 5.013 us 13.807 us 28.5645 9.4604 - 175.09 KB

Before (distinct)

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Decimal 57.83 us 6.097 us 17.977 us 26.3062 6.5308 - 162.35 KB

After (distinct)

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Decimal 34.79 us 1.572 us 4.585 us 21.2708 5.3101 - 131.1 KB

These mostly stay the same (as expected), apart from decimal, which show ~50% speed and ~25% alloc improvements.

Records

Before:

Method Mean Error StdDev
Record 157.8 us 8.98 us 25.61 us
RecordStruct 163.5 us 5.87 us 17.31 us

After:

Method Mean Error StdDev
Record 143.1 us 7.13 us 20.00 us
RecordStruct 149.5 us 3.68 us 10.80 us

Which is about 10% improvement.

Generic unions

Before:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
GenericUnion 439.8 us 17.85 us 51.78 us 41.9922 12.2070 - 260 KB

After:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
GenericUnion 167.1 us 3.32 us 7.22 us 26.9775 8.9722 - 166.3 KB

About 60% and 30% improvements in speed and allocs.

Reference tuples

Before:

Method Mean Error StdDev
SmallNonGenericTuple 306.0 us 14.40 us 42.45 us
SmallGenericTuple 360.9 us 10.43 us 30.74 us
BigNonGenericTuple 393.2 us 13.33 us 39.30 us
BigGenericTuple 480.1 us 27.41 us 80.81 us
SmallNonGenericTupleStruct 167.4 us 5.07 us 14.62 us
SmallGenericTupleStruct 192.2 us 3.55 us 5.93 us
BigNonGenericTupleStruct 409.2 us 16.19 us 44.33 us
BigGenericTupleStruct 559.1 us 52.00 us 153.33 us

After:

Method Mean Error StdDev
SmallNonGenericTuple 268.7 us 8.63 us 25.45 us
SmallGenericTuple 349.7 us 9.56 us 28.20 us
BigNonGenericTuple 356.2 us 12.56 us 37.03 us
BigGenericTuple 451.0 us 26.06 us 76.83 us
SmallNonGenericTupleStruct 132.3 us 4.12 us 11.96 us
SmallGenericTupleStruct 185.4 us 6.08 us 17.45 us
BigNonGenericTupleStruct 350.0 us 15.47 us 44.63 us
BigGenericTupleStruct 405.4 us 20.45 us 54.58 us

~5-15% faster execution.


Some (positive) implications for F#.Core.

F# core functions in question

Before:

Method Mean Error StdDev Median
ArrayCountBy 230.20 us 7.098 us 20.929 us 230.07 us
ArrayGroupBy 125.80 us 5.344 us 15.674 us 127.24 us
ArrayDistinct 121.08 us 8.648 us 25.500 us 127.72 us
ArrayDistinctBy 113.45 us 2.636 us 7.732 us 114.15 us
ArrayExcept 92.28 us 1.835 us 5.354 us 92.66 us
ListCountBy 225.45 us 11.730 us 34.585 us 230.79 us
ListGroupBy 167.40 us 12.814 us 37.783 us 151.49 us
ListDistinct 125.20 us 4.192 us 12.229 us 127.01 us
ListDistinctBy 109.52 us 3.480 us 10.097 us 110.94 us
ListExcept 194.14 us 20.467 us 60.347 us 164.68 us
SeqCountBy 460.13 us 13.224 us 38.575 us 459.04 us
SeqGroupBy 354.81 us 15.000 us 43.992 us 348.92 us
SeqDistinct 359.59 us 12.339 us 36.381 us 361.93 us
SeqDistinctBy 128.8 us 5.70 us 15.97 us 127.2 us
SeqExcept 127.8 us 5.03 us 14.74 us 128.1 us

After:

Method Mean Error StdDev Median
ArrayCountBy 137.30 us 5.008 us 14.767 us 137.01 us
ArrayGroupBy 78.80 us 4.673 us 13.778 us 80.87 us
ArrayDistinct 71.63 us 3.698 us 10.904 us 73.49 us
ArrayDistinctBy 69.93 us 2.148 us 6.299 us 70.02 us
ArrayExcept 68.39 us 3.071 us 9.008 us 69.59 us
ListCountBy 138.17 us 6.708 us 19.566 us 142.05 us
ListGroupBy 129.57 us 12.765 us 37.639 us 110.78 us
ListDistinct 85.99 us 1.662 us 3.683 us 85.73 us
ListDistinctBy 67.36 us 3.001 us 8.707 us 67.95 us
ListExcept 173.01 us 21.536 us 63.499 us 138.14 us
SeqCountBy 289.70 us 12.109 us 35.514 us 295.43 us
SeqGroupBy 245.97 us 6.969 us 20.329 us 246.79 us
SeqDistinct 254.17 us 14.776 us 43.336 us 257.62 us
SeqDistinctBy 101.7 us 5.35 us 14.91 us 102.4 us
SeqExcept 105.9 us 5.20 us 15.35 us 102.1 us

The improvement varies 20-40% in speed here.


Other considerations.

>64 bit value types

Before:

Method Mean Error StdDev
BigStruct 47.43 ms 3.136 ms 9.098 ms

After:

Method Mean Error StdDev
BigStruct 44.57 ms 3.243 ms 9.562 ms

This became marginally faster - but more importantly, it's here to address concerns about 64 bit JIT. So likely the underlying JIT problem got fixed meanwhile.

TODO

  • AOT tests to see if startup performance is not too affected
  • Finish the design doc on equality
  • Describe the changed code flow

Followups

Copy link
Contributor

github-actions bot commented Jan 30, 2024

❗ Release notes required


✅ Found changes and release notes in following paths:

Change path Release notes path Description
src/FSharp.Core docs/release-notes/.FSharp.Core/8.0.300.md

src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved
@vzarytovskii
Copy link
Member

Benchmark results are weird, numbers before and after are within the statistical error. What about allocations? We shouldn't be boxing as much now?

@psfinaki
Copy link
Member Author

psfinaki commented Feb 8, 2024

@vzarytovskii yeah those are not the right benchmarks for this PR. We had a session with Don today to figure out the right ones and I am already getting some 25-30% improvements there, will post soon - stay tuned.

@psfinaki
Copy link
Member Author

/azp run

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@manofstick
Copy link
Contributor

Is this PR breaking?
No. This is (supposed to be just) an optimization.

Well... It'll be calling IEquatable.Equals<>, not object.Equals, which is hence a breaking change - albeit I feel in a fashion that should be part of an evolution...

...and really should be paired with the optimizer change (@TIHan I believe worked on one at some stage?) Otherwise you have the somewhat bizarre situation where using an comparison of an external struct type such as a NodaTime.Instant in a generic context doesn't box and is fast, vs in a non-generic context where it does box is an is slower... Unless that has been resolved already?

Anyway, glad to see this moving forward. My beard is somewhat grayer now that when I first started trying to improve comparison/equality in F#!!

@psfinaki
Copy link
Member Author

/azp run

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@psfinaki psfinaki changed the title WIP: Faster equality in generic contexts Faster equality in generic contexts Feb 13, 2024
@psfinaki
Copy link
Member Author

Marking as ready to review since the CI is green finally :D

@psfinaki psfinaki marked this pull request as ready for review February 13, 2024 15:08
@psfinaki psfinaki requested a review from a team as a code owner February 13, 2024 15:08
@psfinaki
Copy link
Member Author

And glad to see you here, @manofstick :) BTW thanks for the clear commits and comments to the commits in your original PRs, that really helped me a lot.

As for this one, well, it's not breaking in terms of NaN behavior and in terms of IL. So far. That's probably what we care about the most.

src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved
src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved
src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved
src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved
src/FSharp.Core/prim-types.fs Show resolved Hide resolved
src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved
src/FSharp.Core/prim-types.fs Outdated Show resolved Hide resolved
@vzarytovskii
Copy link
Member

vzarytovskii commented Feb 14, 2024

Since it touches an important part of the language, I would like to have three things:

  1. What changed, in the form of
    Given (generic) code: ...
    How it worked before: ...
    How does it work now: ...

    Unless it exists somewhere already. It should help everyone in understanding this change in future in case of any issues.

  2. Some AOT compilation tests involving changed equality. It can be a new project, like the trimming one we have. We should start having more AOT testing, and this is a good place to start, since we know AOT was working with equality before, it's crucial it's still working after the change.

  3. Can we have some compiler perf comparison/profiling of using compiler+fslib before and after changes on something like FCS? It would be nice to have it as reference.

@psfinaki
Copy link
Member Author

Thanks all for the reviews and the feedback.

This needs a thorough rereview with @dsyme, meanwhile I will be improving the description, clarifying the code and adding tests.

@dsyme
Copy link
Contributor

dsyme commented Feb 14, 2024

...and really should be paired with the optimizer change (@TIHan I believe worked on one at some stage?) Otherwise you have the somewhat bizarre situation where using an comparison of an external struct type such as a NodaTime.Instant in a generic context doesn't box and is fast, vs in a non-generic context where it does box is an is slower... Unless that has been resolved already?

Agreed that in principle these should go together, though can be done in a separate PR.

Well... It'll be calling IEquatable.Equals<>, not object.Equals, which is hence a breaking change - albeit I feel in a fashion that should be part of an evolution...

I think this specific change is OK - use IEquatable.Equals if it exists.

@dsyme
Copy link
Contributor

dsyme commented Feb 14, 2024

Well... It'll be calling IEquatable.Equals<>, not object.Equals, which is hence a breaking change - albeit I feel in a fashion that should be part of an evolution...

@manofstick Could you remind me of the specific thing that causes the call to IEquatable.Equals<, and which types it applies to. I assume it comes from EqualityComparer<'T>.Default and thus only applies to types that pass canUseDefaultEqualityComparer?

I think that is OK as a change. It is vanishingly rare to explicitly implement IEquatable.Equals<> on struct types in F# code, (except the automatic implementations provided by the compiler) - and even then it would always be the desired behaviour to have that implementation invoked

@vzarytovskii
Copy link
Member

Also, does this need an addition to spec (rfc), since it changes how we do equality? Or rather shall the doc from the other PR live in the design repo?

@psfinaki
Copy link
Member Author

@vzarytovskii I think it's a good idea to have that doc (from the other PR) in the design repo. This PR is meant to be an optimization and hence focuses on the implementation. I am looking at the spec while working on this and my impression so far is that the spec doesn't go that deep in the implementation to make this change anyhow misalign with it. If we discover anything of that nature, we can make the RFC or further discuss it.

@psfinaki
Copy link
Member Author

I added benchmarks for the value tuples, options and that stuff - the results are very convincing so far.
More to come soon!

@manofstick
Copy link
Contributor

@manofstick Could you remind me of the specific thing that causes the call to IEquatable.Equals<, and which types it applies to. I assume it comes from EqualityComparer<'T>.Default and thus only applies to types that pass canUseDefaultEqualityComparer?

Yep, simple as that.

I think that is OK as a change. It is vanishingly rare to explicitly implement IEquatable.Equals<> on struct types in F# code, (except the automatic implementations provided by the compiler) - and even then it would always be the desired behaviour to have that implementation invoked

Oh, completely agree. It was just that this was (as far as I understand it) the main reason why this change never went through in the past.

And just confirming there, given your comment, this change, as implemented, would be all types (that pass the check) using IEquatable<>.Equals in preference to object.Equals not just value types (obviously struct types get the most benefit, as they avoid the boxing step, but ref types can also get a slight improvement as they, don't need to cast input). But the check includes IsSealed, hence negating any potential inheritance issues. (Once again, from memory, don't have a compiler here to test, but F# records are created as Sealed always)

@psfinaki
Copy link
Member Author

@manofstick just to be sure we're on the same page - which input cast do you mean? :)

@manofstick
Copy link
Contributor

manofstick commented Feb 15, 2024

@psfinaki

...forgive my github comment coding, I think memory serves, but it's more than enough for the gist of things if it's incorrect...

[<Sealed>] // only for sealed
type Blah(...) =
    ...

    override lhs.Equals (rhsObj:obj) = // (1)
        if obj.ReferenceEquals (lhs, rhsObj) then
            true
        else 
            match rhsObj with
            | :? Blah as rhs -> // (2)
                (lhs:>IEquatable<Blah>).Equals rhs // (4)
            | _ -> false

    interface IEquatable<Blah> with
        member lhs.Equals rhs = // (3)
            // actually do the check

Well (1) would of been the entry point, where at (2) would of been the cast (well type check), but EqualityComparer<Blah>.Default will call (3) directly.

Most likely the actually equality check in these cases will be the significant component of the cost, so savings is minimal, but as mentioned, this is just for complete understanding of what's going on here...

(4) ... just because I'm showing off, the IL that F# used to generate actually caused this operation to have a cost, but way back at the dawn of history I fixed that... ;-)

@dsyme
Copy link
Contributor

dsyme commented Feb 15, 2024

@psfinaki I believe it should be possible to write a test case that detects the change, e.g.

[<Sealed, Struct>]
type Blah() =
    override lhs.Equals (rhsObj:obj) = true
    interface IEquatable<Blah> with
        member lhs.Equals rhs = failwith "bang"

let eq x y = 
    printfn "hello" // do this enough times to make sure no inlining
    printfn "hello" // do this enough times to make sure no inlining
    printfn "hello" // do this enough times to make sure no inlining
    printfn "hello" // do this enough times to make sure no inlining
    (x = y) // generic equality context

eq (Blah()) (Blah())  // true prior to this, exception now

I don't really think this needs a runtime library switch to disable, and we don't currently have such a mechanism to do those kinds of flags anyway

The equality spec should probably be updated to mention this case

@psfinaki
Copy link
Member Author

psfinaki commented Feb 16, 2024

Yes, correct. For reference, I guess this is the minimal code to demo the behavior difference:

[<Struct; CustomEquality; NoComparison>]
type Blah =
    interface IEquatable<Blah> with
        member lhs.Equals rhs = failwith "bang"

let eq x y = 
    printfn "hello" // do this enough times to make sure no inlining
    ...
    x = y // generic equality context

let result = eq (Blah()) (Blah())  // false prior to this, exception now

Alright, so we've identified what breaks here and blessed it at the same time. Indeed, this is quite an esoteric scenario.

Good stuff, I will update the spec and the description to reflect this.

@vzarytovskii
Copy link
Member

Non-inlined stacktrace examples before and after could be helpful for reviewers as well, could you add it pelase, if it's not too much work.

@psfinaki
Copy link
Member Author

Yep I am going to add it to the PR description.

@psfinaki
Copy link
Member Author

Okay so hopefully getting to the finish line here, I think I have added enough micro benchmarks and grouped them in the PR description.

Next week will add AOT testing, refresh the equality design doc, finalize the PR description and give this a final review with Don :)

Copy link
Contributor

@dsyme dsyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

@psfinaki
Copy link
Member Author

psfinaki commented Mar 4, 2024

/azp run

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@KevinRansom
Copy link
Member

Nice

@psfinaki psfinaki merged commit 9877cfe into dotnet:main Mar 4, 2024
31 checks passed
@psfinaki psfinaki deleted the equality-3 branch March 4, 2024 19:01
@manofstick
Copy link
Contributor

64 bit value types
This became marginally faster

I'm just relying on your words here, as I have not built this, but they should be significantly faster, and non boxing (I'm not talking about my concern re tail calls here). I'm guessing that because they implement 'IStructuralEquality' they are using that path, and hence not being sped up. I thought in my original PR I handled these too, maybe I didn't, or maybe you didn't carry that code across, I don't know.

Anyway, what I would suggest is that the code for 'canUseDefault...' is used at compile time, and if default can be used then the 'IStructuralEquals' interface isn't implemented. This, once again, is a breaking change, but I think it's even lesser than the one introduced by this PR.

Another follow up, if it hasn't been started yet, would be used user Compare<>.Default for inequalities....

@psfinaki
Copy link
Member Author

psfinaki commented Mar 5, 2024

Yes, I think we will get to those as well. Next bigger one will be devirt equality, then we'll probably look into comparison. There is a lot of useful stuff to pull out of your contributions :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants