Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[WIP] Moving the UTF8 Parser/Formatter code into System.Private.CoreLib #20873

Closed
wants to merge 9 commits into from
Closed

[WIP] Moving the UTF8 Parser/Formatter code into System.Private.CoreLib #20873

wants to merge 9 commits into from

Conversation

tannergooding
Copy link
Member

Today, the UTF8 Parser/Formatter code lives in System.Memory. However, this presents a potential problem with sharing bits of code (namely the Decimal/Double/Single formatters/parsers).

I talked with @GrabYourPitchforks a few times about this and suggested we move the code into System.Private.CoreLib. This allows us to readily share the more complicated code and keep the code in sync between the two.

  • We now have a single NumberBuffer instance that carries a Span<byte>
    • The Span<char> buffer previously used by the UTF16 Parser never contained digits outside 0-9 and null anyways.
  • I tried to do the minimal number of changes, outside moving to share the NumberToDecimal, NumberToDouble, and NumberToSingle code where possible
    • There are several other places (some tracked by comments) of where we could share more code between the UTF8/UTF16 parser and formatter

@tannergooding
Copy link
Member Author

CC. @danmosemsft, @GrabYourPitchforks, @jkotas, @stephentoub

I'd appreciate some input on this, and if it is a direction we would like to persue. If not, it might be good if we could come up with an alternative for sharing the code (where possible).

@jkotas
Copy link
Member

jkotas commented Nov 8, 2018

Sounds good to me.

@tannergooding
Copy link
Member Author

Hmmm. Utf8Formatter.TryFormat(bool, Span<byte>, out int, StandardFormat) is currently failing in crossgen (in EvalFuncForConstantArgs) when dealing with FalsValueLowercase.

  • I am going to speculate that this is for the ReverseEndianness call in WriteUInt32BigEndian, but I am still trying to confirm. CC. @GrabYourPitchforks

@stephentoub
Copy link
Member

@tannergooding, does this help/hurt the throughput of operations like uint.TryParse and uint.TryFormat?

@tannergooding
Copy link
Member Author

does this help/hurt the throughput of operations like uint.TryParse and uint.TryFormat

Still need to check, but I wouldn't expect it to. We were already only dealing with ASCII characters (once we got it into a NumberBuffer) and most of the parsing code was already casting back and forth from int to char (for the various arithmetic operations).

@@ -0,0 +1,174 @@
// Licensed to the .NET Foundation under one or more agreements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purely a logistical comment, but should we instead do this by moving the code in corefx to shared, letting it mirror over, and then updating the rest of coreclr with the necessary changes? I think that would make the history easier to follow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea.

@tannergooding
Copy link
Member Author

Closing this and breaking it into multiple PRs:

  • First will be on CoreCLR and will move NumberBuffer to carry Span<byte>.
  • Second will be on CoreFX moving the Utf8Parser and Utf8Formatter to shared
  • Third will be on CoreCLR pulling the Utf8Parser and Utf8Formatter into S.P.Corelib
    • This includes minimally fixing it to share the newer floating-point parsing code
  • Final will be on CoreFX removing the Utf8Parser and Utf8Formatter from System.Memory and type-forwarding it to Corelib

@GrabYourPitchforks
Copy link
Member

I'm also looking into the JIT assert.

@AndyAyersMS
Copy link
Member

We generally won't prejit methods that rely on HW intrinsics.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants