-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spanify some Utf8String
and Utf8StringBuilder
use
#102101
Spanify some Utf8String
and Utf8StringBuilder
use
#102101
Conversation
* The quick is-valid-ascii check in UTF8 encoding _hopefully_ makes this worth the simplification, even though 99% of the inputs are just ASCII. I'm ready to revert this.
* Very much inspired (copied) from dotnet#75851
Utf8String
and Utf8StringBuilder
use
Now that you are working on this, if you ever have an appetite for public facing API changes (either here or subsequent PR), it is worth changing |
I have all sorts of ideas like that; utf8 interpolated string handlers and what not. My first knee-jerk reaction was to look to make either of these a ref struct and actually just have a underlying span. I was gonna hoist Encoding.UTF8 accesses when done multiple times in a method. I was gonna do MaxByteCount instead of GetByteCount. That Append(char) is actually dead code in a way too. But I'm still paranoid of what is indirectly consuming this API. This won't be last PR so I start with small and try get a feel for what kind of optimizations code owners here feel comfortable with. |
Feel free to steal the code from U8String then if you do go for that, its "default" interpolated string handler pretty much reaches performance ceiling save for special-casing conversion for integers as the path there is a bit heavy but overall fast enough, it beats In the case of ILC it might be worth it to just get a flamegraph first however - good question if it's worth the effort given limited utilization (you can unroll by hand char conversion in a way for JIT to fold it, including the copy, to the correct code point length branch, unroll 1-2-3 byte span copies manually too, etc. - there is a lot of bikeshedding potential). |
Co-authored-by: Jan Kotas <[email protected]>
This reverts commit ace1494. It was used by DwarfEhFrame.cs
Co-authored-by: Jan Kotas <[email protected]>
* Only ASCII constant chars we're passed to this method
Co-authored-by: Paulus Pärssinen <[email protected]>
Co-authored-by: Paulus Pärssinen <[email protected]>
src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/ObjectWriter/Dwarf/DwarfEhFrame.cs
Outdated
Show resolved
Hide resolved
I think it would look better to keep |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
) * Address TODOs * Make Utf8String readonly * Make Utf8StringBuilder.Append do two Encoding.UTF8 calls instead of many * The quick is-valid-ascii check in UTF8 encoding _hopefully_ makes this worth the simplification, even though 99% of the inputs are just ASCII. I'm ready to revert this. * Use UTF8 literals for the appended constant strings in ILC * Use UTF8 literals for the appended constant strings in R2R * Use pattern match in Utf8String.Equals(object) * Use SequenceEquals in Utf8String.Equals(Utf8String) * Use CommonPrefixLength in Utf8String.Compare(Utf8String, Utf8String) * Very much inspired (copied) from dotnet#75851 * Remove unused Utf8StringBuilder.LastCharBoundary * Use SequenceCompareTo * UnderlyingArray -> AsSpan() * Only return filled-portion of the buffer in Utf8StringBuilder.AsSpan Co-authored-by: Jan Kotas <[email protected]> * Write null-terminator for the augmentationString. * Utf8StringBuilder.Append(char) -> Utf8StringBuilder.Append(byte) * Only ASCII constant chars we're passed to this method * Add Ascii.IsValid assert to Utf8StringBuilder.Append --------- Co-authored-by: Jan Kotas <[email protected]> Co-authored-by: Michal Strehovský <[email protected]>
Address TODO in the
Utf8String
andUtf8StringBuilder
used by ILC and R2R. Saw some potential use for newer more efficient APIs. These are seem to be primarily used by name mangling, which can definitely be improved further.Let's see if anything breaks in CI first.. Tried to avoid any "public" surface changes because these shared files seem to be used sneakily through the labyrinth of .csprojs s.
For example did not renameand did not try remove the implicitUnderlyingArray
->UnderlyingSpan
Utf8String
->string
casting even though that was a bit of an headache while doing this (changing that is too spooky due to overload resolution).update: CI seems to be good. Any extra stuff I should be concerned about and try building locally when touching these parts of code?