-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Improve performance of Utf8Parser.TryParseInt32D #32843
[WIP] Improve performance of Utf8Parser.TryParseInt32D #32843
Conversation
BTW I expect this to be a little contentious. And that's ok. Part of this PR is to start a discussion of just how far we should go in managed code to try to eke every last bit of perf out of these code paths. I'm still myself trying to learn what the right balance is. :) |
...ies/System.Private.CoreLib/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Integer.Signed.D.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting +20% for int parsing is a really great improvement!
Overall looks good to me, the only thing I don't like is replacing clear switch
with complex if
in TryParse
method
@@ -52,6 +52,27 @@ public static bool IsDigit(int i) | |||
return (uint)(i - '0') <= ('9' - '0'); | |||
} | |||
|
|||
[MethodImpl(MethodImplOptions.AggressiveInlining)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to force the inlining of this method? I would assume that JIT is going to do this even without it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I've used https://godbolt.org/z/imxPHx to see if we could use some magic bit operation to get the same effect, but it looks like no ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The codegen turns this into three instructions: add
, cmp
, jcc
. I'd say that's fairly efficient. :)
} | ||
|
||
value = default; | ||
return TryParseUInt32X(source, out Unsafe.As<int, uint>(ref value), out bytesConsumed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after moving from the previous switch
to this if
statement it became much harder to read. Did we get any noticeable performance gain by doing it?
num = source[index]; | ||
} | ||
int sign = 0; | ||
IntPtr index = IntPtr.Zero; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntPtr index = IntPtr.Zero; | |
nint index = 0; |
case 'n': | ||
case 'N': | ||
return TryParseInt32N(source, out value, out bytesConsumed); | ||
if (standardFormat == default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helps, but it would be even better to have overload of the API that does not have the standardFormat argument at all.
@dotnet/jit-contrib What would it take for the JIT to optimize this out? |
The jit would need to retype Widening has two aspects, safety and profitability. For safety the jit would have to prove the code was not relying on any For profitability, the jit would have to look at every expression involving |
@adamsitnik I extracted rearranging the switch statement out into its own commit. I also further optimized it + improved its readability compared to this PR. See GrabYourPitchforks@b5ea028. Is that a bit cleaner from a maintainability perspective? From the switch improvements alone:
This also further validates @jkotas's theory that introducing a separate overload that doesn't have standardFormat may be worthwhile. Unless we get some kind of mechanism where we can say "hey, JIT, always inline this method into the caller if the standardFormat parameter is a constant." :) FWIW, in the ; SWITCH STATEMENT - BEFORE
push rdi
push rsi
push rbx
sub rsp,30h
xor eax,eax
mov qword ptr [rsp+20h],rax
mov rsi,qword ptr [rcx]
mov edi,dword ptr [rcx+8]
movzx ebx,r9w
cmp ebx,4Eh
ja System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c65435 (00007ff9`bbb4a695)
cmp ebx,44h
ja System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c6540d (00007ff9`bbb4a66d)
test ebx,ebx
je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c653ef (00007ff9`bbb4a64f)
cmp ebx,44h
jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c65477 (00007ff9`bbb4a6d7)
lea rcx,[rsp+20h]
mov qword ptr [rcx],rsi
mov dword ptr [rcx+8],edi
lea rcx,[rsp+20h]
call CLRStub[MethodDescPrestub]@7ff9bbb47fe8 (00007ff9`bbb47fe8)
; SWITCH STATEMENT - AFTER
push rdi
push rsi
sub rsp,38h
xor eax,eax
mov qword ptr [rsp+28h],rax
mov rsi,qword ptr [rcx]
mov edi,dword ptr [rcx+8]
movzx ecx,r9w
test ecx,ecx ; jne statement immediately after this predicted not taken (common case)
jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c98958 (00007ff9`bbb5a9b8)
lea rcx,[rsp+28h] ; <-- instruction at 00007ff9`bbb5a99b
mov qword ptr [rcx],rsi
mov dword ptr [rcx+8],edi
lea rcx,[rsp+28h]
call CLRStub[MethodDescPrestub]@7ff9bbb58388 (00007ff9`bbb58388)
nop
add rsp,38h
pop rsi
pop rdi
ret
movzx ecx,r9w
or ecx,20h
cmp ecx,67h
jg System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c98970 (00007ff9`bbb5a9d0)
cmp ecx,64h
je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c9893b (00007ff9`bbb5a99b)
cmp ecx,67h
jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c989b4 (00007ff9`bbb5aa14)
jmp System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c9893b (00007ff9`bbb5a99b) |
@@ -192,146 +201,132 @@ private static bool TryParseInt16D(ReadOnlySpan<byte> source, out short value, o | |||
return true; | |||
} | |||
|
|||
private static bool TryParseInt32D(ReadOnlySpan<byte> source, out int value, out int bytesConsumed) | |||
private static bool TryParseInt32D(in ReadOnlySpan<byte> refToSource, out int value, out int bytesConsumed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that the in
is deoptimization everywhere else, except Windows x64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, passing ReadOnlySpan as in
does not sound like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is not a good optimization, and we should back it out if we take this PR. But it's hard to argue with the data that shows minimizing parameter shuffling (see assembly in #32843 (comment)) has a measurable impact on performance.
And if you reeeeeally wanted to optimize the method with the switch statement, you could use the in keyword to avoid stack spilling the incoming
00007ff9`bc68a980 4883ec28 sub rsp,28h
00007ff9`bc68a984 410fb7c1 movzx eax,r9w
00007ff9`bc68a988 85c0 test eax,eax
00007ff9`bc68a98a 750b jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d8737 (00007ff9`bc68a997)
00007ff9`bc68a98c e8f7d9ffff call CLRStub[MethodDescPrestub]@7ff9bc688388 (00007ff9`bc688388)
00007ff9`bc68a991 90 nop
00007ff9`bc68a992 4883c428 add rsp,28h
00007ff9`bc68a996 c3 ret
00007ff9`bc68a997 410fb7c1 movzx eax,r9w
00007ff9`bc68a99b 83c820 or eax,20h
00007ff9`bc68a99e 83f867 cmp eax,67h
00007ff9`bc68a9a1 7f0c jg System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d874f (00007ff9`bc68a9af)
00007ff9`bc68a9a3 83f864 cmp eax,64h
00007ff9`bc68a9a6 74e4 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d872c (00007ff9`bc68a98c)
00007ff9`bc68a9a8 83f867 cmp eax,67h
00007ff9`bc68a9ab 7522 jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d876f (00007ff9`bc68a9cf)
00007ff9`bc68a9ad ebdd jmp System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d872c (00007ff9`bc68a98c)
00007ff9`bc68a9af 83f86e cmp eax,6Eh
00007ff9`bc68a9b2 7410 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d8764 (00007ff9`bc68a9c4)
00007ff9`bc68a9b4 83f878 cmp eax,78h
00007ff9`bc68a9b7 7516 jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d876f (00007ff9`bc68a9cf)
00007ff9`bc68a9b9 e86adaffff call CLRStub[MethodDescPrestub]@7ff9bc688428 (00007ff9`bc688428)
00007ff9`bc68a9be 90 nop
00007ff9`bc68a9bf 4883c428 add rsp,28h
00007ff9`bc68a9c3 c3 ret
00007ff9`bc68a9c4 e8dfd9ffff call CLRStub[MethodDescPrestub]@7ff9bc6883a8 (00007ff9`bc6883a8)
00007ff9`bc68a9c9 90 nop
00007ff9`bc68a9ca 4883c428 add rsp,28h
00007ff9`bc68a9ce c3 ret
00007ff9`bc68a9cf e834f5d8ff call CLRStub[MethodDescPrestub]@7ff9bc419f08 (00007ff9`bc419f08)
00007ff9`bc68a9d4 cc int 3 Of note in the assembly above: |
@dotnet/jit-contrib I don't feel comfortable using in in this manner because I think it's taking advantage of a Windows-specific calling convention. Is there any reason why the existing call sites in this method aren't candidates for tail-calling? |
The fail-to-tailcall reason is output in the JitDump, in CLR logging, and in JIT ETW events. Maybe someone else can eyeball why it's not happening in your case. |
By IL semantics the jit has to copy the struct into a local struct before the call. This blocks tail calling as we're then passing a bit of the local frame to the callee. There is a special optimization we can do for implicit byrefs when there's just one use of the struct -- we realize this must be the last use so nobody can tell if we pass the original struct or make a copy, so we suppress the copy But here you have 3 uses and so this simple optimization is defeated. We could probably fix it so we separate out uses from uses at tail call sites, the latter are always last use and so always safe to pass without copying. This is probably something we could fix relatively quickly; I actually noticed this recently (see this comment) and was prototyping how to fix it. There's an additional wrinkle that makes it a bit trickier than just counting -- we also like to promote multiply used implicit byref params, and we make the promotion decisions before we think about what they might do to tail calls, and if we promote we're back passing local state in the call and so can't tail call. |
Yes, thank you @GrabYourPitchforks ! |
I was experimenting a bit with @AndyAyersMS's change at #33004. Preliminary investigations show that his change has two positive effects: (a) it provides some good optimizations in its own right, and (b) it provides a solid foundation on which we can build additional improvements. The numbers and codegen below are from three runs. The master run is the current master branch with no changes. The utf8parser_1 run is from Andy's PR only, with no changes to any of the
; TryParse(..., out int value, ...) in current master branch
00007ffe`4c6cb1e0 57 push rdi
00007ffe`4c6cb1e1 56 push rsi
00007ffe`4c6cb1e2 53 push rbx
00007ffe`4c6cb1e3 4883ec30 sub rsp,30h
00007ffe`4c6cb1e7 33c0 xor eax,eax
00007ffe`4c6cb1e9 4889442420 mov qword ptr [rsp+20h],rax
00007ffe`4c6cb1ee 488b31 mov rsi,qword ptr [rcx]
00007ffe`4c6cb1f1 8b7908 mov edi,dword ptr [rcx+8]
00007ffe`4c6cb1f4 410fb7d9 movzx ebx,r9w
00007ffe`4c6cb1f8 83fb4e cmp ebx,4Eh
00007ffe`4c6cb1fb 7758 ja System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9355 (00007ffe`4c6cb255)
00007ffe`4c6cb1fd 83fb44 cmp ebx,44h
00007ffe`4c6cb200 772b ja System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db932d (00007ffe`4c6cb22d)
00007ffe`4c6cb202 85db test ebx,ebx
00007ffe`4c6cb204 7409 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db930f (00007ffe`4c6cb20f)
00007ffe`4c6cb206 83fb44 cmp ebx,44h
00007ffe`4c6cb209 0f8588000000 jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9397 (00007ffe`4c6cb297)
00007ffe`4c6cb20f 488d4c2420 lea rcx,[rsp+20h]
00007ffe`4c6cb214 488931 mov qword ptr [rcx],rsi
00007ffe`4c6cb217 897908 mov dword ptr [rcx+8],edi
00007ffe`4c6cb21a 488d4c2420 lea rcx,[rsp+20h]
00007ffe`4c6cb21f e89ce3ffff call CLRStub[MethodDescPrestub]@7ffe4c6c95c0 (00007ffe`4c6c95c0)
00007ffe`4c6cb224 90 nop
00007ffe`4c6cb225 4883c430 add rsp,30h
00007ffe`4c6cb229 5b pop rbx
00007ffe`4c6cb22a 5e pop rsi
00007ffe`4c6cb22b 5f pop rdi
00007ffe`4c6cb22c c3 ret
00007ffe`4c6cb22d 83fb47 cmp ebx,47h
00007ffe`4c6cb230 74dd je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db930f (00007ffe`4c6cb20f)
00007ffe`4c6cb232 83fb4e cmp ebx,4Eh
00007ffe`4c6cb235 7560 jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9397 (00007ffe`4c6cb297)
00007ffe`4c6cb237 488d4c2420 lea rcx,[rsp+20h]
00007ffe`4c6cb23c 488931 mov qword ptr [rcx],rsi
00007ffe`4c6cb23f 897908 mov dword ptr [rcx+8],edi
00007ffe`4c6cb242 488d4c2420 lea rcx,[rsp+20h]
00007ffe`4c6cb247 e894e3ffff call CLRStub[MethodDescPrestub]@7ffe4c6c95e0 (00007ffe`4c6c95e0)
00007ffe`4c6cb24c 90 nop
00007ffe`4c6cb24d 4883c430 add rsp,30h
00007ffe`4c6cb251 5b pop rbx
00007ffe`4c6cb252 5e pop rsi
00007ffe`4c6cb253 5f pop rdi
00007ffe`4c6cb254 c3 ret
00007ffe`4c6cb255 83fb64 cmp ebx,64h
00007ffe`4c6cb258 770c ja System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9366 (00007ffe`4c6cb266)
00007ffe`4c6cb25a 83fb58 cmp ebx,58h
00007ffe`4c6cb25d 7416 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9375 (00007ffe`4c6cb275)
00007ffe`4c6cb25f 83fb64 cmp ebx,64h
00007ffe`4c6cb262 7533 jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9397 (00007ffe`4c6cb297)
00007ffe`4c6cb264 eba9 jmp System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db930f (00007ffe`4c6cb20f)
00007ffe`4c6cb266 83fb67 cmp ebx,67h
00007ffe`4c6cb269 74a4 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db930f (00007ffe`4c6cb20f)
00007ffe`4c6cb26b 83fb6e cmp ebx,6Eh
00007ffe`4c6cb26e 74c7 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9337 (00007ffe`4c6cb237)
00007ffe`4c6cb270 83fb78 cmp ebx,78h
00007ffe`4c6cb273 7522 jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9397 (00007ffe`4c6cb297)
00007ffe`4c6cb275 33c9 xor ecx,ecx
00007ffe`4c6cb277 890a mov dword ptr [rdx],ecx
00007ffe`4c6cb279 488d4c2420 lea rcx,[rsp+20h]
00007ffe`4c6cb27e 488931 mov qword ptr [rcx],rsi
00007ffe`4c6cb281 897908 mov dword ptr [rcx+8],edi
00007ffe`4c6cb284 488d4c2420 lea rcx,[rsp+20h]
00007ffe`4c6cb289 e8d2e3ffff call CLRStub[MethodDescPrestub]@7ffe4c6c9660 (00007ffe`4c6c9660)
00007ffe`4c6cb28e 90 nop
00007ffe`4c6cb28f 4883c430 add rsp,30h
00007ffe`4c6cb293 5b pop rbx
00007ffe`4c6cb294 5e pop rsi
00007ffe`4c6cb295 5f pop rdi
00007ffe`4c6cb296 c3 ret
00007ffe`4c6cb297 33c0 xor eax,eax
00007ffe`4c6cb299 8902 mov dword ptr [rdx],eax
00007ffe`4c6cb29b 418900 mov dword ptr [r8],eax
00007ffe`4c6cb29e e88deed8ff call CLRStub[MethodDescPrestub]@7ffe4c45a130 (00007ffe`4c45a130)
00007ffe`4c6cb2a3 cc int 3
; TryParse(..., out int value, ...) with Andy's pending PR
00007ffe`4c69b320 56 push rsi
00007ffe`4c69b321 4883ec20 sub rsp,20h
00007ffe`4c69b325 410fb7f1 movzx esi,r9w
00007ffe`4c69b329 83fe4e cmp esi,4Eh
00007ffe`4c69b32c 772c ja System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6948a (00007ffe`4c69b35a)
00007ffe`4c69b32e 83fe44 cmp esi,44h
00007ffe`4c69b331 7713 ja System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d69476 (00007ffe`4c69b346)
00007ffe`4c69b333 85f6 test esi,esi
00007ffe`4c69b335 7405 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6946c (00007ffe`4c69b33c)
00007ffe`4c69b337 83fe44 cmp esi,44h
00007ffe`4c69b33a 754c jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d694b8 (00007ffe`4c69b388)
00007ffe`4c69b33c 4883c420 add rsp,20h
00007ffe`4c69b340 5e pop rsi
00007ffe`4c69b341 e9badaffff jmp CLRStub[MethodDescPrestub]@7ffe4c698e00 (00007ffe`4c698e00)
00007ffe`4c69b346 83fe47 cmp esi,47h
00007ffe`4c69b349 74f1 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6946c (00007ffe`4c69b33c)
00007ffe`4c69b34b 83fe4e cmp esi,4Eh
00007ffe`4c69b34e 7538 jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d694b8 (00007ffe`4c69b388)
00007ffe`4c69b350 4883c420 add rsp,20h
00007ffe`4c69b354 5e pop rsi
00007ffe`4c69b355 e9c6daffff jmp CLRStub[MethodDescPrestub]@7ffe4c698e20 (00007ffe`4c698e20)
00007ffe`4c69b35a 83fe64 cmp esi,64h
00007ffe`4c69b35d 770c ja System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6949b (00007ffe`4c69b36b)
00007ffe`4c69b35f 83fe58 cmp esi,58h
00007ffe`4c69b362 7416 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d694aa (00007ffe`4c69b37a)
00007ffe`4c69b364 83fe64 cmp esi,64h
00007ffe`4c69b367 751f jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d694b8 (00007ffe`4c69b388)
00007ffe`4c69b369 ebd1 jmp System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6946c (00007ffe`4c69b33c)
00007ffe`4c69b36b 83fe67 cmp esi,67h
00007ffe`4c69b36e 74cc je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6946c (00007ffe`4c69b33c)
00007ffe`4c69b370 83fe6e cmp esi,6Eh
00007ffe`4c69b373 74db je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d69480 (00007ffe`4c69b350)
00007ffe`4c69b375 83fe78 cmp esi,78h
00007ffe`4c69b378 750e jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d694b8 (00007ffe`4c69b388)
00007ffe`4c69b37a 33c0 xor eax,eax
00007ffe`4c69b37c 8902 mov dword ptr [rdx],eax
00007ffe`4c69b37e 4883c420 add rsp,20h
00007ffe`4c69b382 5e pop rsi
00007ffe`4c69b383 e918dbffff jmp CLRStub[MethodDescPrestub]@7ffe4c698ea0 (00007ffe`4c698ea0)
00007ffe`4c69b388 33c0 xor eax,eax
00007ffe`4c69b38a 8902 mov dword ptr [rdx],eax
00007ffe`4c69b38c 418900 mov dword ptr [r8],eax
00007ffe`4c69b38f e89cedd8ff call CLRStub[MethodDescPrestub]@7ffe4c42a130 (00007ffe`4c42a130)
00007ffe`4c69b394 cc int 3
; TryParse(..., out int value, ...) with Andy's PR and some refactoring of the switch statement
00007ffe`4c6db320 410fb7c1 movzx eax,r9w
00007ffe`4c6db324 85c0 test eax,eax
00007ffe`4c6db326 7505 jne System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93cd (00007ffe`4c6db32d)
00007ffe`4c6db328 e9d3daffff jmp CLRStub[MethodDescPrestub]@7ffe4c6d8e00 (00007ffe`4c6d8e00)
00007ffe`4c6db32d 410fb7c1 movzx eax,r9w
00007ffe`4c6db331 83c820 or eax,20h
00007ffe`4c6db334 83f867 cmp eax,67h
00007ffe`4c6db337 7f0c jg System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93e5 (00007ffe`4c6db345)
00007ffe`4c6db339 83f864 cmp eax,64h
00007ffe`4c6db33c 74ea je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93c8 (00007ffe`4c6db328)
00007ffe`4c6db33e 83f867 cmp eax,67h
00007ffe`4c6db341 74e5 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93c8 (00007ffe`4c6db328)
00007ffe`4c6db343 eb16 jmp System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93fb (00007ffe`4c6db35b)
00007ffe`4c6db345 83f86e cmp eax,6Eh
00007ffe`4c6db348 7407 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93f1 (00007ffe`4c6db351)
00007ffe`4c6db34a 83f878 cmp eax,78h
00007ffe`4c6db34d 7407 je System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93f6 (00007ffe`4c6db356)
00007ffe`4c6db34f eb0a jmp System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93fb (00007ffe`4c6db35b)
00007ffe`4c6db351 e9cadaffff jmp CLRStub[MethodDescPrestub]@7ffe4c6d8e20 (00007ffe`4c6d8e20)
00007ffe`4c6db356 e945dbffff jmp CLRStub[MethodDescPrestub]@7ffe4c6d8ea0 (00007ffe`4c6d8ea0)
00007ffe`4c6db35b e980f3ffff jmp CLRStub[MethodDescPrestub]@7ffe4c6da6e0 (00007ffe`4c6da6e0) |
With latest changes, I'm still investigating whether we can eke more performance out of |
/// Otherwise returns <see langword="false"/>. | ||
/// </summary> | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
internal unsafe bool TryGetElementAt(nuint index, [MaybeNullWhen(false)] out T value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fan of improving performance by internal helpers like this one that are not available in public surface. This should be either turned into public API, or we should live without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas I understand the concern. How about I experiment in a separate branch with plumbing it through other call sites and see what the perf impact is? That should give us evidence that this API is (or isn't) generally worthwhile.
Closing this PR since it's no longer actively being worked on. Opened GrabYourPitchforks#9 so that I don't lose track of it. |
WIP because it's not plumbed through to other
Utf8Parser
APIs and because these same improvements are likely viable for other methods likeint.Parse
. I also need to clean up / comment the code a little bit.There are a handful of patterns being used here that result in improved codegen. Some of these are things that really should be in the JIT's wheelhouse rather than worked around in this code base. Given that
Utf8Parser.TryParse
(and related APIsint.Parse
) are such hot code paths, though, it may be worthwhile to take these code changes anyway while waiting for the JIT's changes to come through.Overall perf results
Fast-track 'default' formats
Since we expect standardFormat = default to be the common case, it's fast-tracked at the start of the method rather than entering the switch statement. Additionally, entries in the switch statement are normalized to lowercase to minimize the number of if conditions present.
Repurpose some locals for more efficient register allocation
The answer local is repurposed to hold the running value from the very start of the method, no longer bouncing through num. The check for
'+'
and'-'
is moved out of the common path because those characters should be uncommon.Avoid expensive instructions
The span is indexed by
native int
rather thanint
. This avoids an expensivemovsxd
instruction in the hot path. Additionally, at the end of the method theimul
instruction is removed in favor of cheaperxor
andsub
instructions.Finally, the intermediate value resulting from the
num - '0'
operation is reused as the addend when building up the expressionanswer = answer * 10 + ...
. This avoids an expensive three-componentlea
instruction in the hot path.Benchmark application
Related issues: #28070, #12218, #12402
/cc @layomia