-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite HashSet<T>'s implementation based on Dictionary<T>'s #37180
Conversation
Tagging subscribers to this area: @eiriktsarpalis |
src/libraries/System.Private.CoreLib/src/System/Globalization/DateTimeFormatInfoScanner.cs
Outdated
Show resolved
Hide resolved
src/coreclr/src/System.Private.CoreLib/src/System/RuntimeType.CoreCLR.cs
Outdated
Show resolved
Hide resolved
I was adjusting it based on a the asm output from the JIT; however that has improved considerably in 5.0 so trade off may have changed. Also the Dictionary in the core find is a ref return and its |
src/libraries/System.Private.CoreLib/src/Resources/Strings.resx
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/DateTimeFormatInfoScanner.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs
Show resolved
Hide resolved
{ | ||
Debug.Assert(!comparer.Equals(_slots[i].value, value)); | ||
// If we hit the collision threshold we'll need to switch to the comparer which is using randomized string hashing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have dictionary tests that go down the "secure string comparer" paths? I thought we did but can't immediately find them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any either. It's probably worth adding some, but it'd also take a little work to find collisions. Maybe @GrabYourPitchforks has some in his back pocket :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should read "comparer that uses..." instead of "comparer which is using..." (I know this is a copy-paste).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We (@maryamariyan?) tested it and I recall it didn't take looping terribly long to find 100 collisions but I guess we didn't check them in...
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read it all and seems good to me.
I just noticed |
Thanks for reviewing. |
A few legitimate failures in the immutable collection tests; taking a look... |
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/DateTimeFormatInfoScanner.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs
Outdated
Show resolved
Hide resolved
And factor out InsertionBehavior into its own file
This effectively deletes HashSet's data structure and replaces it with the one used by Dictionary, then updated for the differences (e.g. just a value rather than a key and a value). HashSet used to have the same implementation, but Dictionary has evolved significantly and HashSet hasn't; this brings them to basic parity on implementation. Based on perf tests, I veered away from Dictionary's implementation in a few places (e.g. a goto-based implementation in the core find method led to a significant regression for Int32-based Contains operations), and we should follow-up to understand whether Dictionary should be changed as well, or why there's a difference between the two. Functionally, bringing over Dictionary's implementation yields a few notable changes, namely that Remove and Clear no longer invalidate enumerations. The tests have been updated accordingly.
Fixes #37111
Contributes to #1989
This moves HashSet into corelib, and then effectively deletes HashSet's data structure and replaces it with the one used by Dictionary, then updated for the differences (e.g. just a value rather than a key and a value). HashSet used to have basically the same implementation, but Dictionary has evolved significantly and HashSet hasn't; this brings them to basic parity on implementation.
Based on perf tests, I veered away from Dictionary's implementation in a few places (e.g. a goto-based implementation in the core find method led to a significant regression for Int32-based Contains operations), and we should follow-up to understand whether Dictionary should be changed as well, or why there's a difference between the two. @benaadams, if you have some time, it'd probably worth looking at this again; maybe you'll get different numbers than I did.
Functionally, bringing over Dictionary's implementation yields a few notable changes, namely that Remove and Clear no longer invalidate enumerations. The tests have been updated accordingly.
With HashSet now in corelib, I also updated two Dictionary uses I found in corelib that were using Dictionary as a set and switched them to use HashSet.
Running the dotnet/performance perf tests:
cc: @benaadams, @danmosemsft, @GrabYourPitchforks, @eiriktsarpalis, @layomia