-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSharp.Core: Map: optimize tree layout #10188
Conversation
Following discussion and POC code from dotnet#5360 (comment) Changes are very straightforward and do not touch public API: * Performance improves by a huge margin * Code size is smaller or same * Memory is same * No low level tricks, just simple code (see `asNode` comments for potential micro-optimizations, which are not visible after all; these comments are to be deleted) Benchmarks code is here: https://github.com/buybackoff/fsharp-benchmarks | Method | Job | BuildConfiguration | Size | Mean | Error | StdDev | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size | |------------ |------- |------------------- |--------- |------------------:|-----------------:|-----------------:|-----:|-----------:|---------:|--------:|------------:|----------:| | getItem | After | LocalBuild | 100 | 36.21 ns | 0.199 ns | 0.167 ns | 1 | - | - | - | - | 126 B | | getItem | Before | Default | 100 | 62.51 ns | 0.143 ns | 0.127 ns | 2 | - | - | - | - | 126 B | | getItem | After | LocalBuild | 10000 | 76.57 ns | 0.140 ns | 0.124 ns | 3 | - | - | - | - | 126 B | | getItem | Before | Default | 10000 | 120.02 ns | 0.182 ns | 0.170 ns | 4 | - | - | - | - | 126 B | | getItem | After | LocalBuild | 10000000 | 129.45 ns | 0.126 ns | 0.118 ns | 5 | - | - | - | - | 126 B | | getItem | Before | Default | 10000000 | 209.35 ns | 0.496 ns | 0.464 ns | 6 | - | - | - | - | 126 B | | | | | | | | | | | | | | | | containsKey | After | LocalBuild | 100 | 35.63 ns | 0.201 ns | 0.188 ns | 1 | - | - | - | - | 177 B | | containsKey | Before | Default | 100 | 64.01 ns | 0.351 ns | 0.328 ns | 2 | - | - | - | - | 276 B | | containsKey | After | LocalBuild | 10000 | 65.63 ns | 0.150 ns | 0.125 ns | 3 | - | - | - | - | 177 B | | containsKey | Before | Default | 10000 | 123.82 ns | 0.149 ns | 0.139 ns | 5 | - | - | - | - | 276 B | | containsKey | After | LocalBuild | 10000000 | 95.05 ns | 0.082 ns | 0.072 ns | 4 | - | - | - | - | 177 B | | containsKey | Before | Default | 10000000 | 204.39 ns | 0.338 ns | 0.282 ns | 6 | - | - | - | - | 276 B | | | | | | | | | | | | | | | | itemCount | After | LocalBuild | 100 | 231.39 ns | 0.406 ns | 0.360 ns | 1 | - | - | - | - | 96 B | | itemCount | Before | Default | 100 | 539.74 ns | 1.923 ns | 1.798 ns | 2 | - | - | - | - | 151 B | | itemCount | After | LocalBuild | 10000 | 33,160.50 ns | 194.709 ns | 182.131 ns | 3 | - | - | - | - | 96 B | | itemCount | Before | Default | 10000 | 63,074.34 ns | 138.682 ns | 129.724 ns | 4 | - | - | - | - | 151 B | | itemCount | After | LocalBuild | 10000000 | 62,332,911.90 ns | 252,973.481 ns | 224,254.402 ns | 5 | - | - | - | 148 B | 96 B | | itemCount | Before | Default | 10000000 | 94,745,625.56 ns | 205,640.690 ns | 192,356.429 ns | 6 | - | - | - | - | 151 B | | | | | | | | | | | | | | | | iterForeach | After | LocalBuild | 100 | 3,355.75 ns | 9.540 ns | 7.448 ns | 1 | 0.9727 | - | - | 6120 B | 291 B | | iterForeach | Before | Default | 100 | 3,866.56 ns | 10.148 ns | 8.996 ns | 2 | 0.9689 | - | - | 6120 B | 291 B | | iterForeach | After | LocalBuild | 10000 | 348,359.43 ns | 1,148.753 ns | 959.260 ns | 3 | 95.2148 | - | - | 600120 B | 291 B | | iterForeach | Before | Default | 10000 | 398,419.61 ns | 513.959 ns | 480.758 ns | 4 | 95.2148 | - | - | 600120 B | 291 B | | iterForeach | After | LocalBuild | 10000000 | 391,889,200.00 ns | 1,604,306.946 ns | 1,500,669.712 ns | 5 | 95000.0000 | - | - | 600000120 B | 321 B | | iterForeach | Before | Default | 10000000 | 445,099,028.57 ns | 1,380,498.715 ns | 1,223,776.153 ns | 6 | 95000.0000 | - | - | 600000120 B | 321 B | | | | | | | | | | | | | | | | addItem | After | LocalBuild | 100 | 181.25 ns | 0.961 ns | 0.899 ns | 1 | 0.0586 | 0.0003 | - | 369 B | 621 B | | addItem | Before | Default | 100 | 311.85 ns | 0.601 ns | 0.562 ns | 2 | 0.0586 | - | - | 369 B | 697 B | | addItem | After | LocalBuild | 10000 | 40,893.49 ns | 174.683 ns | 163.398 ns | 3 | 11.0156 | 3.2813 | - | 69324 B | 621 B | | addItem | Before | Default | 10000 | 71,746.33 ns | 130.309 ns | 121.891 ns | 4 | 11.0156 | 3.3594 | - | 69324 B | 697 B | | addItem | After | LocalBuild | 10000000 | 87,178,251.47 ns | 250,148.324 ns | 233,988.898 ns | 5 | 18680.0000 | 960.0000 | 10.0000 | 117146915 B | 621 B | | addItem | Before | Default | 10000000 | 146,799,424.80 ns | 286,531.195 ns | 268,021.458 ns | 6 | 18680.0000 | 960.0000 | 10.0000 | 117146915 B | 697 B | | | | | | | | | | | | | | | | removeItem | After | LocalBuild | 100 | 13.64 ns | 0.112 ns | 0.105 ns | 1 | 0.0064 | - | - | 40 B | 469 B | | removeItem | Before | Default | 100 | 16.38 ns | 0.071 ns | 0.067 ns | 2 | 0.0064 | - | - | 40 B | 519 B | | removeItem | After | LocalBuild | 10000 | 1,329.24 ns | 9.087 ns | 8.056 ns | 3 | 0.6372 | - | - | 4000 B | 469 B | | removeItem | Before | Default | 10000 | 1,607.21 ns | 5.566 ns | 5.206 ns | 4 | 0.6372 | - | - | 4000 B | 519 B | | removeItem | After | LocalBuild | 10000000 | 1,232,230.00 ns | 6,303.414 ns | 5,896.218 ns | 5 | 630.0000 | - | - | 4000000 B | 469 B | | removeItem | Before | Default | 10000000 | 1,801,088.33 ns | 8,945.674 ns | 8,367.789 ns | 6 | 630.0000 | - | - | 4000000 B | 519 B |
Not sure why CI tests fail with:
I tested locally the |
Wow, this is great! I've been wanting to take a look at this for some time. The numbers speak for themselves! About the test: running it locally in VS typically runs it against one framework only. You can run them from the commandline to find out if you can get it to fail locally. See |
@abelbraaksma I ran it from Rider from both Really could not afford running all the tests locally after every change, it's taking too long. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick style pass for now - the code looks good, strange that an internal error is getting thrown. Sadly, FSharp.Core isn't the easiest thing in the world to update :(
@cartermp ctors are done I've started the entire test suite locally after the first failure in CI.. and it's still running (35+ minutes), with some failures in random places, seemingly unrelated to my changes. At least directly. Maybe there are some deeper indirect checks that depend on the layout. Without help I can't understand what's going on with the tests. Yet all the test failures point to |
Yeah, looking at the code this does feel like it is related to the changes. Subtle though. Thanks for looking into it, and thanks for the proposed changes so far! This could be really great. We also use the Map data structure very heavily internally, so this would improve the F# compiler itself, especially for tooling situations. |
@cartermp if this goes well, I will follow up with 2-3 more things:
Hope to make all tests passing over the weekend. |
there was a typo indeed, most CI tasks are OK. The one that failed "early" (27 minutes) shows timeouts in logs. The one remaining is running for more than an hour already. |
IIRC there are two different map implementations in the source. If this one is green we may want to optimize the other as well. |
@forki where is the second one? why it's not possible to use a single implementation? |
Who could trigger a re-run of the failed task? (Or how could I do that without re-running the entire suite?) |
the other one is at: https://github.com/dotnet/fsharp/blob/main/src/utils/TaggedCollections.fs#L702 I don't know all the historical details, but one is used inside the compiler and the other one is in FSharp.Core for user code. |
So this means that what @cartermp mentioned:
requires updates to the internal one and |
well IIRC it's not 100% consistent. AFAIK later compiler phases use the one from FSharp.Core as well. But yes, if you want to make the compiler itself faster, then this internal Map needs to be touched as well |
I rarely do that. If you have failing tests, you probably didn't pass the
Quote from the test guide (which I've recently updated as I wanted to make it easier to run tests):
I've worked a lot with the FSharp.Core tests, I can have a look if they continue to fail. It's also possible to add a filter to locally running tests, but that's not yet in the test guide and non trivial to get right. I agree that it's annoying to wait so long for tests, that's why many people only run them after sending in a PR. Maybe with @vzarytovskii test improvements we can mark long running tests so that they can be skipped unless specified on the commandline. |
Could we extract MapTree module to a separate file, add all needed stuff to the common implementation from the two places, and include that module to two |
@abelbraaksma I was running local tests from Rider GUI, it works normally, the only thing is build and test times are very slow (not related to Rider). |
yes that's probably possible, but I remember @dsyme saying that he was actually preferring to keep it as two copies. Mostly because both things may be optimized differently in the future. |
@buybackoff in order to get the changes in I recommend to put any possible changes to that internal map thing into a separate PR. Also: to make things more interesting we have similar situation for sets ;-) |
Yes, I've mentioned Set above. Should be as simple as Map. But before that I really want to see if this self-contained PR will be merged. And for the next thing I want to play with comparer optimization when a comparer is default. It could then be used in all 4 cases 2xMap/2xSet. |
Yes. Small steps are important. Otherwise we never get anything merged
Victor Baybekov <[email protected]> schrieb am Sa., 26. Sept. 2020,
12:17:
… in order to get the changes in I recommend to put any possible changes to
that internal map thing into a separate PR. Also: to make things more
interesting we have similar situation for sets ;-)
Yes, I've mentioned Set above. Should be as simple as Map.
But before that I really want to see if this self-contained PR will be
merged.
And for the next thing I want to play with comparer optimization when a
comparer is default. It could then be used in all 4 cases 2xMap/2xSet.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10188 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAOANEQQLEXML7ZAF3NEXDSHW5VDANCNFSM4R2O5NEQ>
.
|
It's green! 🎉 What is needed to merge it? |
Just a review by two folks :) |
The code looks nearly identical and it is likely a relic of the past:
This is no longer a problem since we stopped really caring that the latest and greatest FCS can be built on very old FSharp.Core binaries. It was never really that useful of a scenario. So the internal one can probably just be swapped out with this one once it is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. Thanks a lot! Will need another signoff on this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but probably @TIHan or @KevinRansom should take a glance as well.
FYI, I had three more micro changes, but their impact is not very visible buybackoff@b167a47. Won't do anything with them. Just wanted to use inlined IL. Must admit, for this PR I was finally nerd-sniped by this feature (as a replacement for |
This is fantastic, thank you. I presume both the Set in FSharp.Core and the compiler Map implementations could be given a similar treatment in separate PRs? The compiler has separate implementations because it tracks the comparison function in the type logic, also for flexibility in changing representations for the particular quirks of the compiler loads (e.g. in case we found we couldn't change the FSHarp.Core ones for compat reasons, but could change the internal compiler ones). It's just part of the cost of maintaining at least some degree of agility. |
Yes. If such PRs are merged easily I could do that. I was afraid there could be a long discussion on style, usage of null, inheritance, and requests for more and more tests, etc. Thank you for approving it quickly without all of that :) 3 more PRs will be with exactly same changes. |
#if-protected tracing was not updated properly in dotnet#10188 Also synchronize with Set: tolerance and mk max height calculation
Same changes as in dotnet#10188 | Method | Job | BuildConfiguration | Size | Mean | Error | StdDev | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size | |------------ |------- |------------------- |------ |--------------:|-------------:|--------------:|-----:|--------:|-------:|------:|----------:|----------:| | containsKey | After | LocalBuild | 100 | 30.48 ns | 0.319 ns | 0.367 ns | 1 | - | - | - | - | 177 B | | containsKey | Before | Default | 100 | 46.77 ns | 0.172 ns | 0.199 ns | 2 | - | - | - | - | 261 B | | containsKey | After | LocalBuild | 10000 | 56.06 ns | 0.358 ns | 0.412 ns | 3 | - | - | - | - | 177 B | | containsKey | Before | Default | 10000 | 92.14 ns | 0.314 ns | 0.362 ns | 4 | - | - | - | - | 261 B | | | | | | | | | | | | | | | | isSubsetOf | After | LocalBuild | 100 | 1,003.26 ns | 7.438 ns | 8.565 ns | 1 | 0.0040 | - | - | 32 B | 80 B | | isSubsetOf | Before | Default | 100 | 1,464.29 ns | 10.892 ns | 12.106 ns | 2 | - | - | - | 32 B | 80 B | | isSubsetOf | After | LocalBuild | 10000 | 301,895.28 ns | 3,344.640 ns | 3,851.692 ns | 3 | - | - | - | 34 B | 80 B | | isSubsetOf | Before | Default | 10000 | 418,711.41 ns | 9,323.778 ns | 10,737.277 ns | 4 | - | - | - | 34 B | 80 B | | | | | | | | | | | | | | | | maxItem | After | LocalBuild | 100 | 14.99 ns | 0.050 ns | 0.058 ns | 1 | 0.0038 | - | - | 24 B | 218 B | | maxItem | Before | Default | 100 | 31.11 ns | 0.067 ns | 0.072 ns | 3 | 0.0037 | - | - | 24 B | 218 B | | maxItem | After | LocalBuild | 10000 | 23.80 ns | 0.159 ns | 0.183 ns | 2 | 0.0037 | - | - | 24 B | 218 B | | maxItem | Before | Default | 10000 | 52.40 ns | 0.135 ns | 0.156 ns | 4 | 0.0037 | - | - | 24 B | 218 B | | | | | | | | | | | | | | | | itemCount | After | LocalBuild | 100 | 200.70 ns | 0.847 ns | 0.870 ns | 1 | - | - | - | - | 96 B | | itemCount | Before | Default | 100 | 419.93 ns | 0.937 ns | 1.079 ns | 2 | - | - | - | - | 138 B | | itemCount | After | LocalBuild | 10000 | 25,581.85 ns | 188.342 ns | 216.895 ns | 3 | - | - | - | - | 96 B | | itemCount | Before | Default | 10000 | 46,754.89 ns | 202.769 ns | 225.377 ns | 4 | - | - | - | - | 138 B | | | | | | | | | | | | | | | | iterForeach | After | LocalBuild | 100 | 2,881.03 ns | 7.049 ns | 7.835 ns | 1 | 0.9660 | - | - | 6120 B | 280 B | | iterForeach | Before | Default | 100 | 3,608.70 ns | 12.932 ns | 14.892 ns | 2 | 0.9644 | - | - | 6120 B | 280 B | | iterForeach | After | LocalBuild | 10000 | 314,901.59 ns | 1,778.653 ns | 1,976.968 ns | 3 | 95.0000 | - | - | 600128 B | 280 B | | iterForeach | Before | Default | 10000 | 381,488.32 ns | 914.308 ns | 1,016.251 ns | 4 | 94.5122 | - | - | 600127 B | 280 B | | | | | | | | | | | | | | | | addItem | After | LocalBuild | 100 | 193.98 ns | 0.587 ns | 0.652 ns | 1 | 0.0500 | - | - | 317 B | 499 B | | addItem | Before | Default | 100 | 291.34 ns | 1.804 ns | 1.931 ns | 2 | 0.0501 | - | - | 317 B | 542 B | | addItem | After | LocalBuild | 10000 | 43,575.99 ns | 216.915 ns | 249.800 ns | 3 | 9.2188 | 2.8125 | - | 58637 B | 499 B | | addItem | Before | Default | 10000 | 62,096.44 ns | 205.697 ns | 228.632 ns | 4 | 9.1667 | 2.7083 | - | 58637 B | 542 B | | | | | | | | | | | | | | | | removeItem | After | LocalBuild | 100 | 12.05 ns | 0.055 ns | 0.061 ns | 1 | 0.0064 | - | - | 40 B | 447 B | | removeItem | Before | Default | 100 | 14.71 ns | 0.248 ns | 0.285 ns | 2 | 0.0064 | - | - | 40 B | 500 B | | removeItem | After | LocalBuild | 10000 | 1,183.65 ns | 9.752 ns | 11.231 ns | 3 | 0.6343 | - | - | 4000 B | 447 B | | removeItem | Before | Default | 10000 | 1,484.44 ns | 13.358 ns | 14.848 ns | 4 | 0.6368 | - | - | 4000 B | 500 B |
#if-protected tracing was not updated properly in dotnet#10188 Also synchronize with Set: tolerance and mk max height calculation
Same changes as in dotnet#10188 | Method | Job | BuildConfiguration | Size | Mean | Error | StdDev | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size | |------------ |------- |------------------- |------ |--------------:|-------------:|--------------:|-----:|--------:|-------:|------:|----------:|----------:| | containsKey | After | LocalBuild | 100 | 30.48 ns | 0.319 ns | 0.367 ns | 1 | - | - | - | - | 177 B | | containsKey | Before | Default | 100 | 46.77 ns | 0.172 ns | 0.199 ns | 2 | - | - | - | - | 261 B | | containsKey | After | LocalBuild | 10000 | 56.06 ns | 0.358 ns | 0.412 ns | 3 | - | - | - | - | 177 B | | containsKey | Before | Default | 10000 | 92.14 ns | 0.314 ns | 0.362 ns | 4 | - | - | - | - | 261 B | | | | | | | | | | | | | | | | isSubsetOf | After | LocalBuild | 100 | 1,003.26 ns | 7.438 ns | 8.565 ns | 1 | 0.0040 | - | - | 32 B | 80 B | | isSubsetOf | Before | Default | 100 | 1,464.29 ns | 10.892 ns | 12.106 ns | 2 | - | - | - | 32 B | 80 B | | isSubsetOf | After | LocalBuild | 10000 | 301,895.28 ns | 3,344.640 ns | 3,851.692 ns | 3 | - | - | - | 34 B | 80 B | | isSubsetOf | Before | Default | 10000 | 418,711.41 ns | 9,323.778 ns | 10,737.277 ns | 4 | - | - | - | 34 B | 80 B | | | | | | | | | | | | | | | | maxItem | After | LocalBuild | 100 | 14.99 ns | 0.050 ns | 0.058 ns | 1 | 0.0038 | - | - | 24 B | 218 B | | maxItem | Before | Default | 100 | 31.11 ns | 0.067 ns | 0.072 ns | 3 | 0.0037 | - | - | 24 B | 218 B | | maxItem | After | LocalBuild | 10000 | 23.80 ns | 0.159 ns | 0.183 ns | 2 | 0.0037 | - | - | 24 B | 218 B | | maxItem | Before | Default | 10000 | 52.40 ns | 0.135 ns | 0.156 ns | 4 | 0.0037 | - | - | 24 B | 218 B | | | | | | | | | | | | | | | | itemCount | After | LocalBuild | 100 | 200.70 ns | 0.847 ns | 0.870 ns | 1 | - | - | - | - | 96 B | | itemCount | Before | Default | 100 | 419.93 ns | 0.937 ns | 1.079 ns | 2 | - | - | - | - | 138 B | | itemCount | After | LocalBuild | 10000 | 25,581.85 ns | 188.342 ns | 216.895 ns | 3 | - | - | - | - | 96 B | | itemCount | Before | Default | 10000 | 46,754.89 ns | 202.769 ns | 225.377 ns | 4 | - | - | - | - | 138 B | | | | | | | | | | | | | | | | iterForeach | After | LocalBuild | 100 | 2,881.03 ns | 7.049 ns | 7.835 ns | 1 | 0.9660 | - | - | 6120 B | 280 B | | iterForeach | Before | Default | 100 | 3,608.70 ns | 12.932 ns | 14.892 ns | 2 | 0.9644 | - | - | 6120 B | 280 B | | iterForeach | After | LocalBuild | 10000 | 314,901.59 ns | 1,778.653 ns | 1,976.968 ns | 3 | 95.0000 | - | - | 600128 B | 280 B | | iterForeach | Before | Default | 10000 | 381,488.32 ns | 914.308 ns | 1,016.251 ns | 4 | 94.5122 | - | - | 600127 B | 280 B | | | | | | | | | | | | | | | | addItem | After | LocalBuild | 100 | 193.98 ns | 0.587 ns | 0.652 ns | 1 | 0.0500 | - | - | 317 B | 499 B | | addItem | Before | Default | 100 | 291.34 ns | 1.804 ns | 1.931 ns | 2 | 0.0501 | - | - | 317 B | 542 B | | addItem | After | LocalBuild | 10000 | 43,575.99 ns | 216.915 ns | 249.800 ns | 3 | 9.2188 | 2.8125 | - | 58637 B | 499 B | | addItem | Before | Default | 10000 | 62,096.44 ns | 205.697 ns | 228.632 ns | 4 | 9.1667 | 2.7083 | - | 58637 B | 542 B | | | | | | | | | | | | | | | | removeItem | After | LocalBuild | 100 | 12.05 ns | 0.055 ns | 0.061 ns | 1 | 0.0064 | - | - | 40 B | 447 B | | removeItem | Before | Default | 100 | 14.71 ns | 0.248 ns | 0.285 ns | 2 | 0.0064 | - | - | 40 B | 500 B | | removeItem | After | LocalBuild | 10000 | 1,183.65 ns | 9.752 ns | 11.231 ns | 3 | 0.6343 | - | - | 4000 B | 447 B | | removeItem | Before | Default | 10000 | 1,484.44 ns | 13.358 ns | 14.848 ns | 4 | 0.6368 | - | - | 4000 B | 500 B |
#if-protected tracing was not updated properly in #10188 Also synchronize with Set: tolerance and mk max height calculation
* FSharp.Core: Set: optimize tree layout Same changes as in #10188 | Method | Job | BuildConfiguration | Size | Mean | Error | StdDev | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size | |------------ |------- |------------------- |------ |--------------:|-------------:|--------------:|-----:|--------:|-------:|------:|----------:|----------:| | containsKey | After | LocalBuild | 100 | 30.48 ns | 0.319 ns | 0.367 ns | 1 | - | - | - | - | 177 B | | containsKey | Before | Default | 100 | 46.77 ns | 0.172 ns | 0.199 ns | 2 | - | - | - | - | 261 B | | containsKey | After | LocalBuild | 10000 | 56.06 ns | 0.358 ns | 0.412 ns | 3 | - | - | - | - | 177 B | | containsKey | Before | Default | 10000 | 92.14 ns | 0.314 ns | 0.362 ns | 4 | - | - | - | - | 261 B | | | | | | | | | | | | | | | | isSubsetOf | After | LocalBuild | 100 | 1,003.26 ns | 7.438 ns | 8.565 ns | 1 | 0.0040 | - | - | 32 B | 80 B | | isSubsetOf | Before | Default | 100 | 1,464.29 ns | 10.892 ns | 12.106 ns | 2 | - | - | - | 32 B | 80 B | | isSubsetOf | After | LocalBuild | 10000 | 301,895.28 ns | 3,344.640 ns | 3,851.692 ns | 3 | - | - | - | 34 B | 80 B | | isSubsetOf | Before | Default | 10000 | 418,711.41 ns | 9,323.778 ns | 10,737.277 ns | 4 | - | - | - | 34 B | 80 B | | | | | | | | | | | | | | | | maxItem | After | LocalBuild | 100 | 14.99 ns | 0.050 ns | 0.058 ns | 1 | 0.0038 | - | - | 24 B | 218 B | | maxItem | Before | Default | 100 | 31.11 ns | 0.067 ns | 0.072 ns | 3 | 0.0037 | - | - | 24 B | 218 B | | maxItem | After | LocalBuild | 10000 | 23.80 ns | 0.159 ns | 0.183 ns | 2 | 0.0037 | - | - | 24 B | 218 B | | maxItem | Before | Default | 10000 | 52.40 ns | 0.135 ns | 0.156 ns | 4 | 0.0037 | - | - | 24 B | 218 B | | | | | | | | | | | | | | | | itemCount | After | LocalBuild | 100 | 200.70 ns | 0.847 ns | 0.870 ns | 1 | - | - | - | - | 96 B | | itemCount | Before | Default | 100 | 419.93 ns | 0.937 ns | 1.079 ns | 2 | - | - | - | - | 138 B | | itemCount | After | LocalBuild | 10000 | 25,581.85 ns | 188.342 ns | 216.895 ns | 3 | - | - | - | - | 96 B | | itemCount | Before | Default | 10000 | 46,754.89 ns | 202.769 ns | 225.377 ns | 4 | - | - | - | - | 138 B | | | | | | | | | | | | | | | | iterForeach | After | LocalBuild | 100 | 2,881.03 ns | 7.049 ns | 7.835 ns | 1 | 0.9660 | - | - | 6120 B | 280 B | | iterForeach | Before | Default | 100 | 3,608.70 ns | 12.932 ns | 14.892 ns | 2 | 0.9644 | - | - | 6120 B | 280 B | | iterForeach | After | LocalBuild | 10000 | 314,901.59 ns | 1,778.653 ns | 1,976.968 ns | 3 | 95.0000 | - | - | 600128 B | 280 B | | iterForeach | Before | Default | 10000 | 381,488.32 ns | 914.308 ns | 1,016.251 ns | 4 | 94.5122 | - | - | 600127 B | 280 B | | | | | | | | | | | | | | | | addItem | After | LocalBuild | 100 | 193.98 ns | 0.587 ns | 0.652 ns | 1 | 0.0500 | - | - | 317 B | 499 B | | addItem | Before | Default | 100 | 291.34 ns | 1.804 ns | 1.931 ns | 2 | 0.0501 | - | - | 317 B | 542 B | | addItem | After | LocalBuild | 10000 | 43,575.99 ns | 216.915 ns | 249.800 ns | 3 | 9.2188 | 2.8125 | - | 58637 B | 499 B | | addItem | Before | Default | 10000 | 62,096.44 ns | 205.697 ns | 228.632 ns | 4 | 9.1667 | 2.7083 | - | 58637 B | 542 B | | | | | | | | | | | | | | | | removeItem | After | LocalBuild | 100 | 12.05 ns | 0.055 ns | 0.061 ns | 1 | 0.0064 | - | - | 40 B | 447 B | | removeItem | Before | Default | 100 | 14.71 ns | 0.248 ns | 0.285 ns | 2 | 0.0064 | - | - | 40 B | 500 B | | removeItem | After | LocalBuild | 10000 | 1,183.65 ns | 9.752 ns | 11.231 ns | 3 | 0.6343 | - | - | 4000 B | 447 B | | removeItem | Before | Default | 10000 | 1,484.44 ns | 13.358 ns | 14.848 ns | 4 | 0.6368 | - | - | 4000 B | 500 B | * FSharp.Core: Set: fix possible NRE in compareStacks isEmpty check should be at the top, both x1 and x2 must be not empty to then safely match one them * FSharp.Core: Set: formatting
* FSharp.Core: Map: optimize tree layout Following discussion and POC code from dotnet#5360 (comment) Changes are very straightforward and do not touch public API: * Performance improves by a huge margin * Code size is smaller or same * Memory is same * No low level tricks, just simple code (see `asNode` comments for potential micro-optimizations, which are not visible after all; these comments are to be deleted) Benchmarks code is here: https://github.com/buybackoff/fsharp-benchmarks | Method | Job | BuildConfiguration | Size | Mean | Error | StdDev | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size | |------------ |------- |------------------- |--------- |------------------:|-----------------:|-----------------:|-----:|-----------:|---------:|--------:|------------:|----------:| | getItem | After | LocalBuild | 100 | 36.21 ns | 0.199 ns | 0.167 ns | 1 | - | - | - | - | 126 B | | getItem | Before | Default | 100 | 62.51 ns | 0.143 ns | 0.127 ns | 2 | - | - | - | - | 126 B | | getItem | After | LocalBuild | 10000 | 76.57 ns | 0.140 ns | 0.124 ns | 3 | - | - | - | - | 126 B | | getItem | Before | Default | 10000 | 120.02 ns | 0.182 ns | 0.170 ns | 4 | - | - | - | - | 126 B | | getItem | After | LocalBuild | 10000000 | 129.45 ns | 0.126 ns | 0.118 ns | 5 | - | - | - | - | 126 B | | getItem | Before | Default | 10000000 | 209.35 ns | 0.496 ns | 0.464 ns | 6 | - | - | - | - | 126 B | | | | | | | | | | | | | | | | containsKey | After | LocalBuild | 100 | 35.63 ns | 0.201 ns | 0.188 ns | 1 | - | - | - | - | 177 B | | containsKey | Before | Default | 100 | 64.01 ns | 0.351 ns | 0.328 ns | 2 | - | - | - | - | 276 B | | containsKey | After | LocalBuild | 10000 | 65.63 ns | 0.150 ns | 0.125 ns | 3 | - | - | - | - | 177 B | | containsKey | Before | Default | 10000 | 123.82 ns | 0.149 ns | 0.139 ns | 5 | - | - | - | - | 276 B | | containsKey | After | LocalBuild | 10000000 | 95.05 ns | 0.082 ns | 0.072 ns | 4 | - | - | - | - | 177 B | | containsKey | Before | Default | 10000000 | 204.39 ns | 0.338 ns | 0.282 ns | 6 | - | - | - | - | 276 B | | | | | | | | | | | | | | | | itemCount | After | LocalBuild | 100 | 231.39 ns | 0.406 ns | 0.360 ns | 1 | - | - | - | - | 96 B | | itemCount | Before | Default | 100 | 539.74 ns | 1.923 ns | 1.798 ns | 2 | - | - | - | - | 151 B | | itemCount | After | LocalBuild | 10000 | 33,160.50 ns | 194.709 ns | 182.131 ns | 3 | - | - | - | - | 96 B | | itemCount | Before | Default | 10000 | 63,074.34 ns | 138.682 ns | 129.724 ns | 4 | - | - | - | - | 151 B | | itemCount | After | LocalBuild | 10000000 | 62,332,911.90 ns | 252,973.481 ns | 224,254.402 ns | 5 | - | - | - | 148 B | 96 B | | itemCount | Before | Default | 10000000 | 94,745,625.56 ns | 205,640.690 ns | 192,356.429 ns | 6 | - | - | - | - | 151 B | | | | | | | | | | | | | | | | iterForeach | After | LocalBuild | 100 | 3,355.75 ns | 9.540 ns | 7.448 ns | 1 | 0.9727 | - | - | 6120 B | 291 B | | iterForeach | Before | Default | 100 | 3,866.56 ns | 10.148 ns | 8.996 ns | 2 | 0.9689 | - | - | 6120 B | 291 B | | iterForeach | After | LocalBuild | 10000 | 348,359.43 ns | 1,148.753 ns | 959.260 ns | 3 | 95.2148 | - | - | 600120 B | 291 B | | iterForeach | Before | Default | 10000 | 398,419.61 ns | 513.959 ns | 480.758 ns | 4 | 95.2148 | - | - | 600120 B | 291 B | | iterForeach | After | LocalBuild | 10000000 | 391,889,200.00 ns | 1,604,306.946 ns | 1,500,669.712 ns | 5 | 95000.0000 | - | - | 600000120 B | 321 B | | iterForeach | Before | Default | 10000000 | 445,099,028.57 ns | 1,380,498.715 ns | 1,223,776.153 ns | 6 | 95000.0000 | - | - | 600000120 B | 321 B | | | | | | | | | | | | | | | | addItem | After | LocalBuild | 100 | 181.25 ns | 0.961 ns | 0.899 ns | 1 | 0.0586 | 0.0003 | - | 369 B | 621 B | | addItem | Before | Default | 100 | 311.85 ns | 0.601 ns | 0.562 ns | 2 | 0.0586 | - | - | 369 B | 697 B | | addItem | After | LocalBuild | 10000 | 40,893.49 ns | 174.683 ns | 163.398 ns | 3 | 11.0156 | 3.2813 | - | 69324 B | 621 B | | addItem | Before | Default | 10000 | 71,746.33 ns | 130.309 ns | 121.891 ns | 4 | 11.0156 | 3.3594 | - | 69324 B | 697 B | | addItem | After | LocalBuild | 10000000 | 87,178,251.47 ns | 250,148.324 ns | 233,988.898 ns | 5 | 18680.0000 | 960.0000 | 10.0000 | 117146915 B | 621 B | | addItem | Before | Default | 10000000 | 146,799,424.80 ns | 286,531.195 ns | 268,021.458 ns | 6 | 18680.0000 | 960.0000 | 10.0000 | 117146915 B | 697 B | | | | | | | | | | | | | | | | removeItem | After | LocalBuild | 100 | 13.64 ns | 0.112 ns | 0.105 ns | 1 | 0.0064 | - | - | 40 B | 469 B | | removeItem | Before | Default | 100 | 16.38 ns | 0.071 ns | 0.067 ns | 2 | 0.0064 | - | - | 40 B | 519 B | | removeItem | After | LocalBuild | 10000 | 1,329.24 ns | 9.087 ns | 8.056 ns | 3 | 0.6372 | - | - | 4000 B | 469 B | | removeItem | Before | Default | 10000 | 1,607.21 ns | 5.566 ns | 5.206 ns | 4 | 0.6372 | - | - | 4000 B | 519 B | | removeItem | After | LocalBuild | 10000000 | 1,232,230.00 ns | 6,303.414 ns | 5,896.218 ns | 5 | 630.0000 | - | - | 4000000 B | 469 B | | removeItem | Before | Default | 10000000 | 1,801,088.33 ns | 8,945.674 ns | 8,367.789 ns | 6 | 630.0000 | - | - | 4000000 B | 519 B | * Simplify node ctors * FSharp.Core: Map: delete notes in asNode * FSharp.Core: Map: fix typo in spliceOutSuccessor * FSharp.Core: Map: remove unused open
#if-protected tracing was not updated properly in dotnet#10188 Also synchronize with Set: tolerance and mk max height calculation
* FSharp.Core: Set: optimize tree layout Same changes as in dotnet#10188 | Method | Job | BuildConfiguration | Size | Mean | Error | StdDev | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size | |------------ |------- |------------------- |------ |--------------:|-------------:|--------------:|-----:|--------:|-------:|------:|----------:|----------:| | containsKey | After | LocalBuild | 100 | 30.48 ns | 0.319 ns | 0.367 ns | 1 | - | - | - | - | 177 B | | containsKey | Before | Default | 100 | 46.77 ns | 0.172 ns | 0.199 ns | 2 | - | - | - | - | 261 B | | containsKey | After | LocalBuild | 10000 | 56.06 ns | 0.358 ns | 0.412 ns | 3 | - | - | - | - | 177 B | | containsKey | Before | Default | 10000 | 92.14 ns | 0.314 ns | 0.362 ns | 4 | - | - | - | - | 261 B | | | | | | | | | | | | | | | | isSubsetOf | After | LocalBuild | 100 | 1,003.26 ns | 7.438 ns | 8.565 ns | 1 | 0.0040 | - | - | 32 B | 80 B | | isSubsetOf | Before | Default | 100 | 1,464.29 ns | 10.892 ns | 12.106 ns | 2 | - | - | - | 32 B | 80 B | | isSubsetOf | After | LocalBuild | 10000 | 301,895.28 ns | 3,344.640 ns | 3,851.692 ns | 3 | - | - | - | 34 B | 80 B | | isSubsetOf | Before | Default | 10000 | 418,711.41 ns | 9,323.778 ns | 10,737.277 ns | 4 | - | - | - | 34 B | 80 B | | | | | | | | | | | | | | | | maxItem | After | LocalBuild | 100 | 14.99 ns | 0.050 ns | 0.058 ns | 1 | 0.0038 | - | - | 24 B | 218 B | | maxItem | Before | Default | 100 | 31.11 ns | 0.067 ns | 0.072 ns | 3 | 0.0037 | - | - | 24 B | 218 B | | maxItem | After | LocalBuild | 10000 | 23.80 ns | 0.159 ns | 0.183 ns | 2 | 0.0037 | - | - | 24 B | 218 B | | maxItem | Before | Default | 10000 | 52.40 ns | 0.135 ns | 0.156 ns | 4 | 0.0037 | - | - | 24 B | 218 B | | | | | | | | | | | | | | | | itemCount | After | LocalBuild | 100 | 200.70 ns | 0.847 ns | 0.870 ns | 1 | - | - | - | - | 96 B | | itemCount | Before | Default | 100 | 419.93 ns | 0.937 ns | 1.079 ns | 2 | - | - | - | - | 138 B | | itemCount | After | LocalBuild | 10000 | 25,581.85 ns | 188.342 ns | 216.895 ns | 3 | - | - | - | - | 96 B | | itemCount | Before | Default | 10000 | 46,754.89 ns | 202.769 ns | 225.377 ns | 4 | - | - | - | - | 138 B | | | | | | | | | | | | | | | | iterForeach | After | LocalBuild | 100 | 2,881.03 ns | 7.049 ns | 7.835 ns | 1 | 0.9660 | - | - | 6120 B | 280 B | | iterForeach | Before | Default | 100 | 3,608.70 ns | 12.932 ns | 14.892 ns | 2 | 0.9644 | - | - | 6120 B | 280 B | | iterForeach | After | LocalBuild | 10000 | 314,901.59 ns | 1,778.653 ns | 1,976.968 ns | 3 | 95.0000 | - | - | 600128 B | 280 B | | iterForeach | Before | Default | 10000 | 381,488.32 ns | 914.308 ns | 1,016.251 ns | 4 | 94.5122 | - | - | 600127 B | 280 B | | | | | | | | | | | | | | | | addItem | After | LocalBuild | 100 | 193.98 ns | 0.587 ns | 0.652 ns | 1 | 0.0500 | - | - | 317 B | 499 B | | addItem | Before | Default | 100 | 291.34 ns | 1.804 ns | 1.931 ns | 2 | 0.0501 | - | - | 317 B | 542 B | | addItem | After | LocalBuild | 10000 | 43,575.99 ns | 216.915 ns | 249.800 ns | 3 | 9.2188 | 2.8125 | - | 58637 B | 499 B | | addItem | Before | Default | 10000 | 62,096.44 ns | 205.697 ns | 228.632 ns | 4 | 9.1667 | 2.7083 | - | 58637 B | 542 B | | | | | | | | | | | | | | | | removeItem | After | LocalBuild | 100 | 12.05 ns | 0.055 ns | 0.061 ns | 1 | 0.0064 | - | - | 40 B | 447 B | | removeItem | Before | Default | 100 | 14.71 ns | 0.248 ns | 0.285 ns | 2 | 0.0064 | - | - | 40 B | 500 B | | removeItem | After | LocalBuild | 10000 | 1,183.65 ns | 9.752 ns | 11.231 ns | 3 | 0.6343 | - | - | 4000 B | 447 B | | removeItem | Before | Default | 10000 | 1,484.44 ns | 13.358 ns | 14.848 ns | 4 | 0.6368 | - | - | 4000 B | 500 B | * FSharp.Core: Set: fix possible NRE in compareStacks isEmpty check should be at the top, both x1 and x2 must be not empty to then safely match one them * FSharp.Core: Set: formatting
Following discussion and POC code from #5360 (comment)
Changes are very straightforward and do not touch public API:
asNode
comments for potential micro-optimizations, which are not visible after all; these comments are to be deleted)Benchmarks code is here: https://github.com/buybackoff/fsharp-benchmarks