Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop mutating strings in StringBuffer. #93

Open
teo-tsirpanis opened this issue Nov 3, 2022 · 3 comments
Open

Stop mutating strings in StringBuffer. #93

teo-tsirpanis opened this issue Nov 3, 2022 · 3 comments

Comments

@teo-tsirpanis
Copy link
Collaborator

The StringBuffer class is built around allocating big strings, pinning them and mutating them. Mutating strings is undefined behavior and prone to cause crashes or data corruption now or in the future.

What I recommend instead is to use GC.AllocateUninitializedArray to allocate character arrays in the Pinned Object Heap. With that, the chunking logic will go away (you don't have to allocate a big array and slice it, one allocation per slice is OK, and you don't have to bother with GC handles.

I did not include it with #92; this will need to touch code in CharStream, something I am not comfortable with. But you can ping me once you have a PR and I can review it.

@stephan-tolksdorf
Copy link
Owner

stephan-tolksdorf commented Nov 5, 2022

The strings that get mutated in the non-LowTrust version of FParsec are newly allocated and either not externally shared or not mutated afterwards. While this isn't officially supported, this doesn't cause undefined behaviour, data corruption or crashes. If you look at the implementation of the dotnet String and StringBuilder classes, you'll see that they do the same. Is there any change int .NET runtime that makes you concerned about the future safety of this?

The only reason that I've been using strings for StringBuffer is that this makes it possible to apply a Regex without copying the string. Since .NET 7 introduces ReadOnlySpan<char> overloads, we can finally get rid of this hack.

Your suggestion to use GC.AllocationUninitializedArray to directly allocate pinned buffers of the chunk size and then replace the current chunking and pool logic with a simple chunk pool sounds great to me.

@teo-tsirpanis
Copy link
Collaborator Author

Is there any change int .NET runtime that makes you concerned about the future safety of this?

Not yet implemented but once https://github.com/dotnet/runtime/blob/main/docs/design/features/StringDeduplication.md lands we will be in trouble. See also dotnet/runtime#27515 and dotnet/runtime#36989 (comment)

If you look at the implementation of the dotnet String and StringBuilder classes, you'll see that they do the same.

Because of its tight coupling with the runtime, System.Private.CoreLib can do stuff other libraries are not allowed to do.

@stephan-tolksdorf
Copy link
Owner

Thanks for the pointers!

I'm curious whether the deduplication will ever land in this form. But you're right, we should get rid of the string mutation in FParsec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants