-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent StackoverflowException when hashing lists above ~30k elements #14516
Conversation
Open questions:
|
I'd probably stick with doing it the same as we do with array.
Can there be a SO with any other type? Maybe Set?
Probably just bump FSharp.Core version. |
Just checked:
|
The problem that can theoretically arise is that if someone persists their hashes somewhere, it would be a breaking change for them. |
The .NET guarantee is HashCode being stable only for the same process run. |
Update after chatting with @dsyme I will change the code to hash based on the full contents, not just the first 18 elements. Reason: Devil's advocate: |
d796a1c
Plausible example - "I'm hashing the files in source repositories, which are represented as lists of lines. Now they all have the same hash code, because each has the same license at the top!" :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to look at the IStructuralEquatable GetHashCode imlpementation too?
OK, just needed to check things a little mote
It calls to the same method in the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change requested see comment
@T-Gro Separately - regarding our earlier discussion on Teams - if people want to limit their hashing they should use this little known piece of magic in FSharp.Core https://github.com/dotnet/fsharp/blob/main/src/FSharp.Core/collections.fsi#L104C8-L124. If we were to make things more consistent, then really we should hash all of arrays all of the time for unlimited hashing (though that's also a risky change to make and I don't recommend it). Likewise my understanding is .NET hashes all of strings, for example (as does both F# limited and unlimited hashing at the moment) |
Yes, agreed, I looked through and checked that, thanks This still needs changing I think: #14516 (review) |
…et/fsharp into list-hashcode-stackoverflow
Done, ready for re-review by everyone. |
Wow, I’m so glad this bug got resolved!! Amazing work! Id love to see this backported into 6.x, but as you’ve said before, that’s not standard policy. It was such a big bug, and SOE cannot be captured. Funny, in all these years I never knew about the |
This addresses #1838 and fixes .GetHashCode() for the built-in
List<T>
type.List is internally implemented as a recursive DU, and existing codegen emits code which is not tail recursive. If the list gets bigger (~30K elements), StackoverflowException appears when hashing it.
While I understand the desire to find a general solution for all recursive DUs incl. custom ones, I consider a StackoverflowException on the main language's collection type more urgent.
This solution is therefore special-cased only for the list type.
@cartermp tried to address this via standard F# attributes in https://github.com/dotnet/fsharp/pull/9070/files , but it affected the API shape of List which is too dangerous of a change.
This PR solves this by:
List<T>
and.CustomHashCode(comparer)
member..GetHashCode(comparer)
simply calls into.CustomHashCode(comparer)
As of now, I would tend against exploring on how to offer this mechanism for other types (e.g. a new CustomHashCodeOnlyAttribute), and for sure not part of this PR because the SO for list is much more visible.
Reason is, the existing mechanisms for custom types offer a solution and enforce treating GetHashCode and Equals together => for custom types which looser backwards compatibility limits, I believe it is sufficient..
The implementation follows the one for Array, which hashes based on the first 18 elements.
Note that this is not necessary, I only did it for consistency with array.