-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Map performance #10768
Improved Map performance #10768
Conversation
Oh and I had to reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a huge amount here, it's going to take a while to work through everything. I wonder is there a way that you can think of where we can have the old and the new implementations in FSharp.Core at the same time and and switch between them, whether using a preview switch or #defines or some other mechanism? It feels like a big risk taking it in one gulp without a preview.
@@ -222,6 +222,7 @@ | |||
</ItemGroup> | |||
|
|||
<ItemGroup> | |||
<PackageReference Include="System.ValueTuple" Version="4.4.0" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary. Netstandard2.0 contains System.ValueTuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still makes me a bit nervous to take a dependence on it, we have been avoiding it for so long in FSharp.Core but it is absolutely right to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got compiler errors when not referencing it, so I simply added it, we could certainly use KeyValuePair or some other struct for the implementation but I saw several other suggestions that would require struct-tuples so I didn't go through n removing it but I can certainly do so if you think it's the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compile errors is surprising ... reference it, if you need to. I will take a look at why we don't get it vie netstandard2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within the iDE (VS) I don't get compile-time errors when using a struct tuple, but maybe something is wrong in the build process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look, sometime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, this may have been a false alarm, i accidentally pulled master
first instead of main
, then switched and cherry-picked my map changes, so it might work without the reference after all.
I can look into that tomorrow (european tomorrow that is 😆)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay it really was useless, so I removed it again (would have been necessary for the net45 build)
@KevinRansom There's a lot of discussion to wade through here: fsharp/fslang-suggestions#940 But in general we should consider taking this and carefully review with @dsyme. The CPU and memory gains are amazing, and it would quite benefit our own perf scenarios since the compiler makes such heavy use of maps and sets. I profiled our compiler last Friday and it bubbled in the profiling session (as opposed to before when other things simply dominated traces). |
@cartermp , yeah it looks great, I was wondering if there was a way to reduce the risk, it may well be that there is not. |
@KevinRansom please note that there's quite a lot of unused code in this PR (mostly due to new combinators) and I'll definetly clean that up when you consider merging that. I can also do that sooner if you think it makes reviewing the code easier. Nonetheless I think combinators like |
@krauthaufen --- It would certainly make the code review easier, if there was just the code we want to merge. Keep the extra code somewhere safe though, I'm sure it's important :-) |
* removed System.ValueTuple reference
@KevinRansom i reduced the file by ~1000 lines. |
@krauthaufen , thank you my friend. |
@krauthaufen , there is a small conflict. |
# Conflicts: # src/fsharp/FSharp.Core/map.fs
@KevinRansom working on it, btw. I noticed that I didn't include proper error messages in my exceptions, will do so after the whole christmas-thing... |
Hey, I fixed the remaining problems and the implementation is now ready for reviews. I'm of course available for questions/suggestions/etc. |
maybe remove the WIP then. |
Good point 😆 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot has changed, but this is implementation details and I'm confident the existing tests cover everything. From my view, this looks really good. The perf benchmarks look great too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
Once this is in we'll also want to update the TaggedCollections module similar to #10192 - yes, unfortunately, the compiler itself actually uses a different set of types and APIs as per #10188 (comment)
Would you be interested in making that update in a separate PR? If not it's fine, we can take care of it.
Regarding this:
A separate PR would be fantastic, yes |
@cartermp I can certainly update the TaggedCollections too, but maybe we should wait until Set/Map are merged?
I'll start to put a Set-PR together then, but first I'd like to validate that also the Set implementation is really faster. |
Hey @buybackoff since you were the one who optimized this last time: could you please take a look? |
Hi @forki
I could only repeat one of the top comments: @KevinRansom
The huge single file changes make me scary. My changes were one-evening (map only) free lunch with keeping even code layout with pipes. There were some optimization left, e.g. devirtualizing comparer calls (comparer is always the default one for the main public Map/Set). But initial thoughts:
These changes will take a while just to understand what's going on. I have a question: In the benchmarks here on a 4Hz machine So do we really measure the performance of the version 5.0 here? Why not to use the exact same benchmark setup and extend it with the additional items? Update: I linked to server GC results, the math for workstation GC is even closer. |
Well... I have just downloaded https://www.nuget.org/packages/FSharp.Core/5.0.0 and opened it with dotPeek 🤔🤦♂️ Actually my changes didn't make it to 5.0. |
Oh my, so we actually know nothing about that, do we? 🤦♂️ I think most of my code will perform more or less equal to yours but I think Cheers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting up a block for now until we figure out FSharp.Core changes. Unfortunately, yes, it's a complicated process and we slip up sometimes.
Numbers vs current BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.200-preview.20601.7
[Host] : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
Main : .NET Core 5.0.1 (CoreCLR 5.0.120.57516, CoreFX 5.0.120.57516), X64 RyuJIT DEBUG
Job=Main MaxRelativeError=0.01 BuildConfiguration=LocalBuild
IterationCount=10 WarmupCount=1
|
Hey, cool that you did some benchmarks, i will add mine as soon as i have time... So if that turns out to be true I'll just submit that as a PR. |
@buybackoff your benchmark job is running in DEBUG:
I'm not sure why that happens -- the job should've be run in a child process in release mode (no DEBUG at the end of the line) |
It's this dotnet/BenchmarkDotNet#1493. Using the workaround doesn't change results more than a noise. |
Hey, I finally got to run the benchmarks and the results (for count=100) seem consistent with what we've seen so far. After all I see two options here:
What do you think?
|
Huge improvement here are only for bulk creation, at the cost of array creation/copying and pre-sorting data to keep a tree balanced during addition. That could be done using So, with the arguments above, I'm for option 2 not only because it's mine and have never had a daylight 😄 |
Nonetheless i totally get your point and I'm absolutely fine with keeping the current implementation. I actually implemented it for having additional functions and then stumbled upon the performance gains. Cheers |
The virtual calls in many cases are after
From the very old discussion the benchmarks were faster: #5360 (comment)
I mean methods like |
Here's my (also very old) HashMap comparison https://github.com/krauthaufen/ImmutableHashCollections which also included the Map (which of course is a different datastructure) for reference which showed that add wasn't too good in the ImmutableDictionary, however the lookup was terribly fast.. |
Unfortunately there is no simple way to describe what can be devirtualized; it depends a lot on the generated IL, what other optimizations the jit can do, etc. I can look if you like but it may be a day or two before I have time. If you know how to use a checked jit you can enable jit dumps and see what the jit itself has to say. Also, we are always looking for good F# benchmarks, please consider contributing some to https://github.com/dotnet/performance. |
@AndyAyersMS
I haven't even profiled the new implementation, so cannot even say if it was devirtualized or not. It's very easy to see just in dotTrace. For this discussion a very relevant to your work is |
Improved the performance of the current version at small memory cost here: #10845 |
@krauthaufen in light of the discussion here, how possible is it to extract the improvements you've made (e.g. to |
Hey, after doing yet another map implementation (
When maintaining a count per inner-node (allowing for
However note that the If you're still interested in the (now smaller) improvements I can create a new PR (or adapt this one). Note that I have something in mind for keeping the overall count (not per inner node but globally per map) that should be quite efficient. |
Interesting. I think it'd be interesting to look at a fresh PR with your findings, with measurements against latest |
The improvements to Map have only made it to 5.0.1 nuget package, not 5.0.0 see #10768 (comment) cc @buybackoff @krauthaufen
Big changes like this scare us to death, there are a lot of conflicts, and not much traffic for 6 months. Could you possibly find a way to parcel up the changes into smaller PRs that are easier to digest and less scary. Closing for now Thanks Kevin |
PR for fslang-suggestion/940 with a faster
Map<'Key, 'Value>
implementation. The original code can be found here.Creating this PR raised two questions from my side:
Cheers