-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace LINQ's custom Set with HashSet #49543
Conversation
Tagging subscribers to this area: @eiriktsarpalis Issue DetailsThis was discussed in #42760 (comment), but I'm seeing very different numbers. There were significant improvements to Using @adamsitnik's benchmarks from the above comment, but also adding 16 and 1100 sizes:
As can be seen, almost every test gets measurably faster, in some cases 2x, and memory usage can either be better or worse depending on the use case. And we get to get rid of a custom hash set implementations. Yay. Closes #47173
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ec29094
to
ffc5696
Compare
ffc5696
to
30e0e5f
Compare
Closing in favor of #49591. It seems like something is wrong with this PR. |
This was discussed in #42760 (comment), but I'm seeing very different numbers. There were significant improvements to
HashSet<T>
in .NET 5, so maybe that's contributing, and there have also been a few tweaks since, or maybe my changes in LINQ are subtly different from what was tested there. In my tests, throughput is generally always better. Allocation was also called out as an issue there, but the main allocation difference between the internalSet<T>
and the publicHashSet<T>
is in their resizing algorithm. The internal set starts at a capacity of 7, and when it needs to grow it does(size*2)+1
. In contrast,HashSet<T>
doesRoundUpToNearestPrime(size*2)
. This means that depending on the capacity selected, one or the other may end up allocating more memory. For example, if you add 1000 items, the custom Set will end up with a capacity of 1023 andHashSet<T>
will end up with a capacity of 1597. But if you add 1100 items, the custom Set will end up with a capacity of 2047 whileHashSet<T>
will still end up with a capacity of 1597. We can't predict what sizes will be used, and since which allocates more depends on the exact sizes used, I don't think it's something we need to make a decision here based on... better to eliminate the proliferation of custom implementations and just use the public one we encourage everyone to use.Using @adamsitnik's benchmarks from the above comment, but also adding 16 and 1100 sizes:
As can be seen, almost every test gets measurably faster, in some cases 2x, and memory usage can either be better or worse depending on the use case. And we get to get rid of a custom hash set implementations. Yay.
Closes #47173