Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Improve performance of Utf8Parser.TryParseInt32D #32843

Closed

Conversation

GrabYourPitchforks
Copy link
Member

WIP because it's not plumbed through to other Utf8Parser APIs and because these same improvements are likely viable for other methods like int.Parse. I also need to clean up / comment the code a little bit.

There are a handful of patterns being used here that result in improved codegen. Some of these are things that really should be in the JIT's wheelhouse rather than worked around in this code base. Given that Utf8Parser.TryParse (and related APIs int.Parse) are such hot code paths, though, it may be worthwhile to take these code changes anyway while waiting for the JIT's changes to come through.

Overall perf results

Method Toolchain Mean Error StdDev Ratio
TryParseInt32 master 1,217.5 ns 12.02 ns 10.03 ns 1.00
TryParseInt32 thisbranch 954.9 ns 8.77 ns 6.85 ns 0.78

Fast-track 'default' formats

Since we expect standardFormat = default to be the common case, it's fast-tracked at the start of the method rather than entering the switch statement. Additionally, entries in the switch statement are normalized to lowercase to minimize the number of if conditions present.

; BEFORE
movzx   ebx,r9w
cmp     ebx,4Eh
ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a3c352f5 (00007ff9`b841a555)
cmp     ebx,44h
ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a3c352cd (00007ff9`b841a52d)
test    ebx,ebx
je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a3c352af (00007ff9`b841a50f)
cmp     ebx,44h
jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a3c35337 (00007ff9`b841a597)
lea     rcx,[rsp+20h]
mov     qword ptr [rcx],rsi
mov     dword ptr [rcx+8],edi
lea     rcx,[rsp+20h]
call    CLRStub[MethodDescPrestub]@7ff9b84188c0 (00007ff9`b84188c0)

; AFTER
movzx   eax,r9w
test    eax,eax
je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a3c78a18 (00007ff9`b843abb8)
or      eax,20h
cmp     eax,67h
je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a3c78a18 (00007ff9`b843abb8)
cmp     eax,64h
jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a3c78a35 (00007ff9`b843abd5)
lea     rcx,[rsp+28h]
mov     qword ptr [rcx],rsi
mov     dword ptr [rcx+8],edi
lea     rcx,[rsp+28h]
call    CLRStub[MethodDescPrestub]@7ff9b8438388 (00007ff9`b8438388)

Repurpose some locals for more efficient register allocation

The answer local is repurposed to hold the running value from the very start of the method, no longer bouncing through num. The check for '+' and '-' is moved out of the common path because those characters should be uncommon.

Avoid expensive instructions

The span is indexed by native int rather than int. This avoids an expensive movsxd instruction in the hot path. Additionally, at the end of the method the imul instruction is removed in favor of cheaper xor and sub instructions.

Finally, the intermediate value resulting from the num - '0' operation is reused as the addend when building up the expression answer = answer * 10 + .... This avoids an expensive three-component lea instruction in the hot path.

; HOT PATH - BEFORE
cmp     r10d,ecx
jae     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParseInt32D(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef)+0xffffffff`beef5172 (00007ff9`e303a882)
movsxd  r11,r10d
movzx   r11d,byte ptr [rax+r11]
lea     edi,[r11-30h]
cmp     edi,9
ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParseInt32D(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef)+0xffffffff`beef5172 (00007ff9`e303a882)
inc     r10d
lea     esi,[rsi+rsi*4]
lea     esi,[r11+rsi*2-30h]

; HOT PATH - AFTER
cmp     r11,rdi
jae     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParseInt32D(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef)+0xffffffff`befc8605 (00007ff9`e30eac25)
movzx   eax,byte ptr [r9+r11]
add     eax,0FFFFFFD0h
cmp     eax,9
ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParseInt32D(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef)+0xffffffff`befc8605 (00007ff9`e30eac25)
inc     r11
lea     ecx,[rsi+rsi*4]
lea     esi,[rax+rcx*2]

Benchmark application

public class Utf8ParserRunner
{
    private byte[][] _intStrings;

    [GlobalSetup]
    public void Setup()
    {
        // Randomly create 200 strings "19472903" and similar.
        // Vary the lengths to throw off the branch predictor.
        Random rnd = new Random(0x12345);
        byte[][] intStrings = new byte[200][];
        for (int i = 0; i < intStrings.Length; i++)
        {
            byte[] thisIntString = new byte[rnd.Next(1, 10)];
            for (int j = 0; j < thisIntString.Length; j++)
            {
                thisIntString[j] = (byte)rnd.Next(0, 10);
            }
            intStrings[i] = thisIntString;
        }
        _intStrings = intStrings;
    }

    [Benchmark]
    public bool TryParseInt32()
    {
        bool retVal = false;

        byte[][] strs = _intStrings;
        for (int i = 0; i < strs.Length; i++)
        {
            byte[] str = strs[i];
            _ = str.Length; // deref now to avoid implicit span null check

            retVal = Utf8Parser.TryParse(str, out int _, out int _);
        }

        return retVal;
    }
}

Related issues: #28070, #12218, #12402

/cc @layomia

@GrabYourPitchforks
Copy link
Member Author

BTW I expect this to be a little contentious. And that's ok. Part of this PR is to start a discussion of just how far we should go in managed code to try to eke every last bit of perf out of these code paths. I'm still myself trying to learn what the right balance is. :)

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting +20% for int parsing is a really great improvement!

Overall looks good to me, the only thing I don't like is replacing clear switch with complex if in TryParse method

@@ -52,6 +52,27 @@ public static bool IsDigit(int i)
return (uint)(i - '0') <= ('9' - '0');
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to force the inlining of this method? I would assume that JIT is going to do this even without it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I've used https://godbolt.org/z/imxPHx to see if we could use some magic bit operation to get the same effect, but it looks like no ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codegen turns this into three instructions: add, cmp, jcc. I'd say that's fairly efficient. :)

}

value = default;
return TryParseUInt32X(source, out Unsafe.As<int, uint>(ref value), out bytesConsumed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after moving from the previous switch to this if statement it became much harder to read. Did we get any noticeable performance gain by doing it?

num = source[index];
}
int sign = 0;
IntPtr index = IntPtr.Zero;
Copy link
Member

@jkotas jkotas Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IntPtr index = IntPtr.Zero;
nint index = 0;

case 'n':
case 'N':
return TryParseInt32N(source, out value, out bytesConsumed);
if (standardFormat == default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps, but it would be even better to have overload of the API that does not have the standardFormat argument at all.

@jkotas
Copy link
Member

jkotas commented Feb 26, 2020

This avoids an expensive movsxd instruction in the hot path

@dotnet/jit-contrib What would it take for the JIT to optimize this out?

@AndyAyersMS
Copy link
Member

What would it take for the JIT to optimize this out?

The jit would need to retype index as nint. This is essentially induction variable widening (#7312).

Widening has two aspects, safety and profitability.

For safety the jit would have to prove the code was not relying on any int overflow. The loop handling leading zeros is the tricky part here, initially it appears that index might overflow; the analysis would have to show this is not possible because the span length must be non-negative.

For profitability, the jit would have to look at every expression involving index and cost the computation using both nint and int, then sum over these and see which representation looks better. Ideally this summation would use profile-weighted costs. Here it is probably simple to see nint is cheaper.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Feb 26, 2020

@adamsitnik I extracted rearranging the switch statement out into its own commit. I also further optimized it + improved its readability compared to this PR. See GrabYourPitchforks@b5ea028. Is that a bit cleaner from a maintainability perspective?

From the switch improvements alone:

Method Toolchain Mean Error StdDev Ratio RatioSD
TryParseInt32 master 1.233 us 0.0220 us 0.0206 us 1.00 0.00
TryParseInt32 utf8parser 1.081 us 0.0215 us 0.0272 us 0.89 0.03

This also further validates @jkotas's theory that introducing a separate overload that doesn't have standardFormat may be worthwhile. Unless we get some kind of mechanism where we can say "hey, JIT, always inline this method into the caller if the standardFormat parameter is a constant." :)

FWIW, in the 'g' and 'd' cases the switch statement jumps straight to the address which sets up parameters for the call site. It doesn't bother evaluating if (standardFormat == default) again because the JIT has already proven this statement to evaluate to true. See codegen below.

; SWITCH STATEMENT - BEFORE
push    rdi
push    rsi
push    rbx
sub     rsp,30h
xor     eax,eax
mov     qword ptr [rsp+20h],rax
mov     rsi,qword ptr [rcx]
mov     edi,dword ptr [rcx+8]
movzx   ebx,r9w
cmp     ebx,4Eh
ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c65435 (00007ff9`bbb4a695)
cmp     ebx,44h
ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c6540d (00007ff9`bbb4a66d)
test    ebx,ebx
je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c653ef (00007ff9`bbb4a64f)
cmp     ebx,44h
jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c65477 (00007ff9`bbb4a6d7)
lea     rcx,[rsp+20h]
mov     qword ptr [rcx],rsi
mov     dword ptr [rcx+8],edi
lea     rcx,[rsp+20h]
call    CLRStub[MethodDescPrestub]@7ff9bbb47fe8 (00007ff9`bbb47fe8)

; SWITCH STATEMENT - AFTER
push    rdi
push    rsi
sub     rsp,38h
xor     eax,eax
mov     qword ptr [rsp+28h],rax
mov     rsi,qword ptr [rcx]
mov     edi,dword ptr [rcx+8]
movzx   ecx,r9w
test    ecx,ecx   ; jne statement immediately after this predicted not taken (common case)
jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c98958 (00007ff9`bbb5a9b8)
lea     rcx,[rsp+28h]  ; <-- instruction at 00007ff9`bbb5a99b
mov     qword ptr [rcx],rsi
mov     dword ptr [rcx+8],edi
lea     rcx,[rsp+28h]
call    CLRStub[MethodDescPrestub]@7ff9bbb58388 (00007ff9`bbb58388)
nop
add     rsp,38h
pop     rsi
pop     rdi
ret
movzx   ecx,r9w
or      ecx,20h
cmp     ecx,67h
jg      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c98970 (00007ff9`bbb5a9d0)
cmp     ecx,64h
je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c9893b (00007ff9`bbb5a99b)
cmp     ecx,67h
jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c989b4 (00007ff9`bbb5aa14)
jmp     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a0c9893b (00007ff9`bbb5a99b)

@@ -192,146 +201,132 @@ private static bool TryParseInt16D(ReadOnlySpan<byte> source, out short value, o
return true;
}

private static bool TryParseInt32D(ReadOnlySpan<byte> source, out int value, out int bytesConsumed)
private static bool TryParseInt32D(in ReadOnlySpan<byte> refToSource, out int value, out int bytesConsumed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the in is deoptimization everywhere else, except Windows x64.

Copy link
Member

@jkotas jkotas Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, passing ReadOnlySpan as in does not sound like a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is not a good optimization, and we should back it out if we take this PR. But it's hard to argue with the data that shows minimizing parameter shuffling (see assembly in #32843 (comment)) has a measurable impact on performance.

@GrabYourPitchforks
Copy link
Member Author

And if you reeeeeally wanted to optimize the method with the switch statement, you could use the in keyword to avoid stack spilling the incoming ReadOnlySpan<byte> entirely. This nets another few percentage gain.

Method Toolchain Mean Error StdDev Ratio RatioSD
TryParseInt32 master 1,260.8 ns 25.00 ns 32.50 ns 1.00 0.00
TryParseInt32 utf8parser 849.4 ns 17.01 ns 33.18 ns 0.69 0.03
00007ff9`bc68a980 4883ec28        sub     rsp,28h
00007ff9`bc68a984 410fb7c1        movzx   eax,r9w
00007ff9`bc68a988 85c0            test    eax,eax
00007ff9`bc68a98a 750b            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d8737 (00007ff9`bc68a997)
00007ff9`bc68a98c e8f7d9ffff      call    CLRStub[MethodDescPrestub]@7ff9bc688388 (00007ff9`bc688388)
00007ff9`bc68a991 90              nop
00007ff9`bc68a992 4883c428        add     rsp,28h
00007ff9`bc68a996 c3              ret
00007ff9`bc68a997 410fb7c1        movzx   eax,r9w
00007ff9`bc68a99b 83c820          or      eax,20h
00007ff9`bc68a99e 83f867          cmp     eax,67h
00007ff9`bc68a9a1 7f0c            jg      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d874f (00007ff9`bc68a9af)
00007ff9`bc68a9a3 83f864          cmp     eax,64h
00007ff9`bc68a9a6 74e4            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d872c (00007ff9`bc68a98c)
00007ff9`bc68a9a8 83f867          cmp     eax,67h
00007ff9`bc68a9ab 7522            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d876f (00007ff9`bc68a9cf)
00007ff9`bc68a9ad ebdd            jmp     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d872c (00007ff9`bc68a98c)
00007ff9`bc68a9af 83f86e          cmp     eax,6Eh
00007ff9`bc68a9b2 7410            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d8764 (00007ff9`bc68a9c4)
00007ff9`bc68a9b4 83f878          cmp     eax,78h
00007ff9`bc68a9b7 7516            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a12d876f (00007ff9`bc68a9cf)
00007ff9`bc68a9b9 e86adaffff      call    CLRStub[MethodDescPrestub]@7ff9bc688428 (00007ff9`bc688428)
00007ff9`bc68a9be 90              nop
00007ff9`bc68a9bf 4883c428        add     rsp,28h
00007ff9`bc68a9c3 c3              ret
00007ff9`bc68a9c4 e8dfd9ffff      call    CLRStub[MethodDescPrestub]@7ff9bc6883a8 (00007ff9`bc6883a8)
00007ff9`bc68a9c9 90              nop
00007ff9`bc68a9ca 4883c428        add     rsp,28h
00007ff9`bc68a9ce c3              ret
00007ff9`bc68a9cf e834f5d8ff      call    CLRStub[MethodDescPrestub]@7ff9bc419f08 (00007ff9`bc419f08)
00007ff9`bc68a9d4 cc              int     3

Of note in the assembly above: rsp is bumped but there's no stack-spillage at all. I don't know why the calls to helper functions aren't tail-calls. Seems like they'd be ideal candidates. The JIT also emits the instruction movzx eax,r9w twice instead of reusing the eax register. That's a minor opportunity for optimization.

@GrabYourPitchforks
Copy link
Member Author

@dotnet/jit-contrib I don't feel comfortable using in in this manner because I think it's taking advantage of a Windows-specific calling convention. Is there any reason why the existing call sites in this method aren't candidates for tail-calling?

@BruceForstall
Copy link
Member

Is there any reason why the existing call sites in this method aren't candidates for tail-calling?

The fail-to-tailcall reason is output in the JitDump, in CLR logging, and in JIT ETW events.

Maybe someone else can eyeball why it's not happening in your case.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 26, 2020

By IL semantics the jit has to copy the struct into a local struct before the call. This blocks tail calling as we're then passing a bit of the local frame to the callee.

There is a special optimization we can do for implicit byrefs when there's just one use of the struct -- we realize this must be the last use so nobody can tell if we pass the original struct or make a copy, so we suppress the copy and then can tail call (edit: we just suppress the copy, we don't do this for tail calls yet...).

But here you have 3 uses and so this simple optimization is defeated. We could probably fix it so we separate out uses from uses at tail call sites, the latter are always last use and so always safe to pass without copying.

This is probably something we could fix relatively quickly; I actually noticed this recently (see this comment) and was prototyping how to fix it. There's an additional wrinkle that makes it a bit trickier than just counting -- we also like to promote multiply used implicit byref params, and we make the promotion decisions before we think about what they might do to tail calls, and if we promote we're back passing local state in the call and so can't tail call.

@adamsitnik
Copy link
Member

Is that a bit cleaner from a maintainability perspective?

Yes, thank you @GrabYourPitchforks !

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 2, 2020

I was experimenting a bit with @AndyAyersMS's change at #33004. Preliminary investigations show that his change has two positive effects: (a) it provides some good optimizations in its own right, and (b) it provides a solid foundation on which we can build additional improvements.

The numbers and codegen below are from three runs. The master run is the current master branch with no changes. The utf8parser_1 run is from Andy's PR only, with no changes to any of the Utf8Parser code paths. The utf8parser_2 run is from Andy's PR, plus changes to the Utf8Parser.TryParse(..., out int value, ...) method which contains the switch statement. Other methods like TryParseInt32D were left unchanged from their master implementation. Any changes we make to TryParseInt32D and related methods would just be goodness on top of this.

Method Toolchain Mean Error StdDev Ratio RatioSD
TryParseInt32 master 1,224.1 ns 24.13 ns 45.33 ns 1.00 0.00
TryParseInt32 utf8parser_1 1,022.3 ns 20.25 ns 17.95 ns 0.84 0.03
TryParseInt32 utf8parser_2 733.7 ns 13.82 ns 11.54 ns 0.60 0.02
; TryParse(..., out int value, ...) in current master branch

00007ffe`4c6cb1e0 57              push    rdi
00007ffe`4c6cb1e1 56              push    rsi
00007ffe`4c6cb1e2 53              push    rbx
00007ffe`4c6cb1e3 4883ec30        sub     rsp,30h
00007ffe`4c6cb1e7 33c0            xor     eax,eax
00007ffe`4c6cb1e9 4889442420      mov     qword ptr [rsp+20h],rax
00007ffe`4c6cb1ee 488b31          mov     rsi,qword ptr [rcx]
00007ffe`4c6cb1f1 8b7908          mov     edi,dword ptr [rcx+8]
00007ffe`4c6cb1f4 410fb7d9        movzx   ebx,r9w
00007ffe`4c6cb1f8 83fb4e          cmp     ebx,4Eh
00007ffe`4c6cb1fb 7758            ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9355 (00007ffe`4c6cb255)
00007ffe`4c6cb1fd 83fb44          cmp     ebx,44h
00007ffe`4c6cb200 772b            ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db932d (00007ffe`4c6cb22d)
00007ffe`4c6cb202 85db            test    ebx,ebx
00007ffe`4c6cb204 7409            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db930f (00007ffe`4c6cb20f)
00007ffe`4c6cb206 83fb44          cmp     ebx,44h
00007ffe`4c6cb209 0f8588000000    jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9397 (00007ffe`4c6cb297)
00007ffe`4c6cb20f 488d4c2420      lea     rcx,[rsp+20h]
00007ffe`4c6cb214 488931          mov     qword ptr [rcx],rsi
00007ffe`4c6cb217 897908          mov     dword ptr [rcx+8],edi
00007ffe`4c6cb21a 488d4c2420      lea     rcx,[rsp+20h]
00007ffe`4c6cb21f e89ce3ffff      call    CLRStub[MethodDescPrestub]@7ffe4c6c95c0 (00007ffe`4c6c95c0)
00007ffe`4c6cb224 90              nop
00007ffe`4c6cb225 4883c430        add     rsp,30h
00007ffe`4c6cb229 5b              pop     rbx
00007ffe`4c6cb22a 5e              pop     rsi
00007ffe`4c6cb22b 5f              pop     rdi
00007ffe`4c6cb22c c3              ret
00007ffe`4c6cb22d 83fb47          cmp     ebx,47h
00007ffe`4c6cb230 74dd            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db930f (00007ffe`4c6cb20f)
00007ffe`4c6cb232 83fb4e          cmp     ebx,4Eh
00007ffe`4c6cb235 7560            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9397 (00007ffe`4c6cb297)
00007ffe`4c6cb237 488d4c2420      lea     rcx,[rsp+20h]
00007ffe`4c6cb23c 488931          mov     qword ptr [rcx],rsi
00007ffe`4c6cb23f 897908          mov     dword ptr [rcx+8],edi
00007ffe`4c6cb242 488d4c2420      lea     rcx,[rsp+20h]
00007ffe`4c6cb247 e894e3ffff      call    CLRStub[MethodDescPrestub]@7ffe4c6c95e0 (00007ffe`4c6c95e0)
00007ffe`4c6cb24c 90              nop
00007ffe`4c6cb24d 4883c430        add     rsp,30h
00007ffe`4c6cb251 5b              pop     rbx
00007ffe`4c6cb252 5e              pop     rsi
00007ffe`4c6cb253 5f              pop     rdi
00007ffe`4c6cb254 c3              ret
00007ffe`4c6cb255 83fb64          cmp     ebx,64h
00007ffe`4c6cb258 770c            ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9366 (00007ffe`4c6cb266)
00007ffe`4c6cb25a 83fb58          cmp     ebx,58h
00007ffe`4c6cb25d 7416            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9375 (00007ffe`4c6cb275)
00007ffe`4c6cb25f 83fb64          cmp     ebx,64h
00007ffe`4c6cb262 7533            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9397 (00007ffe`4c6cb297)
00007ffe`4c6cb264 eba9            jmp     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db930f (00007ffe`4c6cb20f)
00007ffe`4c6cb266 83fb67          cmp     ebx,67h
00007ffe`4c6cb269 74a4            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db930f (00007ffe`4c6cb20f)
00007ffe`4c6cb26b 83fb6e          cmp     ebx,6Eh
00007ffe`4c6cb26e 74c7            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9337 (00007ffe`4c6cb237)
00007ffe`4c6cb270 83fb78          cmp     ebx,78h
00007ffe`4c6cb273 7522            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4db9397 (00007ffe`4c6cb297)
00007ffe`4c6cb275 33c9            xor     ecx,ecx
00007ffe`4c6cb277 890a            mov     dword ptr [rdx],ecx
00007ffe`4c6cb279 488d4c2420      lea     rcx,[rsp+20h]
00007ffe`4c6cb27e 488931          mov     qword ptr [rcx],rsi
00007ffe`4c6cb281 897908          mov     dword ptr [rcx+8],edi
00007ffe`4c6cb284 488d4c2420      lea     rcx,[rsp+20h]
00007ffe`4c6cb289 e8d2e3ffff      call    CLRStub[MethodDescPrestub]@7ffe4c6c9660 (00007ffe`4c6c9660)
00007ffe`4c6cb28e 90              nop
00007ffe`4c6cb28f 4883c430        add     rsp,30h
00007ffe`4c6cb293 5b              pop     rbx
00007ffe`4c6cb294 5e              pop     rsi
00007ffe`4c6cb295 5f              pop     rdi
00007ffe`4c6cb296 c3              ret
00007ffe`4c6cb297 33c0            xor     eax,eax
00007ffe`4c6cb299 8902            mov     dword ptr [rdx],eax
00007ffe`4c6cb29b 418900          mov     dword ptr [r8],eax
00007ffe`4c6cb29e e88deed8ff      call    CLRStub[MethodDescPrestub]@7ffe4c45a130 (00007ffe`4c45a130)
00007ffe`4c6cb2a3 cc              int     3

; TryParse(..., out int value, ...) with Andy's pending PR

00007ffe`4c69b320 56              push    rsi
00007ffe`4c69b321 4883ec20        sub     rsp,20h
00007ffe`4c69b325 410fb7f1        movzx   esi,r9w
00007ffe`4c69b329 83fe4e          cmp     esi,4Eh
00007ffe`4c69b32c 772c            ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6948a (00007ffe`4c69b35a)
00007ffe`4c69b32e 83fe44          cmp     esi,44h
00007ffe`4c69b331 7713            ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d69476 (00007ffe`4c69b346)
00007ffe`4c69b333 85f6            test    esi,esi
00007ffe`4c69b335 7405            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6946c (00007ffe`4c69b33c)
00007ffe`4c69b337 83fe44          cmp     esi,44h
00007ffe`4c69b33a 754c            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d694b8 (00007ffe`4c69b388)
00007ffe`4c69b33c 4883c420        add     rsp,20h
00007ffe`4c69b340 5e              pop     rsi
00007ffe`4c69b341 e9badaffff      jmp     CLRStub[MethodDescPrestub]@7ffe4c698e00 (00007ffe`4c698e00)
00007ffe`4c69b346 83fe47          cmp     esi,47h
00007ffe`4c69b349 74f1            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6946c (00007ffe`4c69b33c)
00007ffe`4c69b34b 83fe4e          cmp     esi,4Eh
00007ffe`4c69b34e 7538            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d694b8 (00007ffe`4c69b388)
00007ffe`4c69b350 4883c420        add     rsp,20h
00007ffe`4c69b354 5e              pop     rsi
00007ffe`4c69b355 e9c6daffff      jmp     CLRStub[MethodDescPrestub]@7ffe4c698e20 (00007ffe`4c698e20)
00007ffe`4c69b35a 83fe64          cmp     esi,64h
00007ffe`4c69b35d 770c            ja      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6949b (00007ffe`4c69b36b)
00007ffe`4c69b35f 83fe58          cmp     esi,58h
00007ffe`4c69b362 7416            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d694aa (00007ffe`4c69b37a)
00007ffe`4c69b364 83fe64          cmp     esi,64h
00007ffe`4c69b367 751f            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d694b8 (00007ffe`4c69b388)
00007ffe`4c69b369 ebd1            jmp     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6946c (00007ffe`4c69b33c)
00007ffe`4c69b36b 83fe67          cmp     esi,67h
00007ffe`4c69b36e 74cc            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d6946c (00007ffe`4c69b33c)
00007ffe`4c69b370 83fe6e          cmp     esi,6Eh
00007ffe`4c69b373 74db            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d69480 (00007ffe`4c69b350)
00007ffe`4c69b375 83fe78          cmp     esi,78h
00007ffe`4c69b378 750e            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4d694b8 (00007ffe`4c69b388)
00007ffe`4c69b37a 33c0            xor     eax,eax
00007ffe`4c69b37c 8902            mov     dword ptr [rdx],eax
00007ffe`4c69b37e 4883c420        add     rsp,20h
00007ffe`4c69b382 5e              pop     rsi
00007ffe`4c69b383 e918dbffff      jmp     CLRStub[MethodDescPrestub]@7ffe4c698ea0 (00007ffe`4c698ea0)
00007ffe`4c69b388 33c0            xor     eax,eax
00007ffe`4c69b38a 8902            mov     dword ptr [rdx],eax
00007ffe`4c69b38c 418900          mov     dword ptr [r8],eax
00007ffe`4c69b38f e89cedd8ff      call    CLRStub[MethodDescPrestub]@7ffe4c42a130 (00007ffe`4c42a130)
00007ffe`4c69b394 cc              int     3

; TryParse(..., out int value, ...) with Andy's PR and some refactoring of the switch statement

00007ffe`4c6db320 410fb7c1        movzx   eax,r9w
00007ffe`4c6db324 85c0            test    eax,eax
00007ffe`4c6db326 7505            jne     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93cd (00007ffe`4c6db32d)
00007ffe`4c6db328 e9d3daffff      jmp     CLRStub[MethodDescPrestub]@7ffe4c6d8e00 (00007ffe`4c6d8e00)
00007ffe`4c6db32d 410fb7c1        movzx   eax,r9w
00007ffe`4c6db331 83c820          or      eax,20h
00007ffe`4c6db334 83f867          cmp     eax,67h
00007ffe`4c6db337 7f0c            jg      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93e5 (00007ffe`4c6db345)
00007ffe`4c6db339 83f864          cmp     eax,64h
00007ffe`4c6db33c 74ea            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93c8 (00007ffe`4c6db328)
00007ffe`4c6db33e 83f867          cmp     eax,67h
00007ffe`4c6db341 74e5            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93c8 (00007ffe`4c6db328)
00007ffe`4c6db343 eb16            jmp     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93fb (00007ffe`4c6db35b)
00007ffe`4c6db345 83f86e          cmp     eax,6Eh
00007ffe`4c6db348 7407            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93f1 (00007ffe`4c6db351)
00007ffe`4c6db34a 83f878          cmp     eax,78h
00007ffe`4c6db34d 7407            je      System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93f6 (00007ffe`4c6db356)
00007ffe`4c6db34f eb0a            jmp     System_Private_CoreLib!System.Buffers.Text.Utf8Parser.TryParse(System.ReadOnlySpan`1<Byte>, Int32 ByRef, Int32 ByRef, Char)+0xffffffff`a4da93fb (00007ffe`4c6db35b)
00007ffe`4c6db351 e9cadaffff      jmp     CLRStub[MethodDescPrestub]@7ffe4c6d8e20 (00007ffe`4c6d8e20)
00007ffe`4c6db356 e945dbffff      jmp     CLRStub[MethodDescPrestub]@7ffe4c6d8ea0 (00007ffe`4c6d8ea0)
00007ffe`4c6db35b e980f3ffff      jmp     CLRStub[MethodDescPrestub]@7ffe4c6da6e0 (00007ffe`4c6da6e0)

@GrabYourPitchforks
Copy link
Member Author

With latest changes, Utf8Parser.TryParse(..., out int) runs at around 10% faster than baseline (master branch). int.Parse runs at around 5% faster than baseline.

I'm still investigating whether we can eke more performance out of int.Parse to try to get it into double-digits percentage gain. At only 5% I question as to whether the added complexity of the current PR is worth it.

/// Otherwise returns <see langword="false"/>.
/// </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal unsafe bool TryGetElementAt(nuint index, [MaybeNullWhen(false)] out T value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fan of improving performance by internal helpers like this one that are not available in public surface. This should be either turned into public API, or we should live without it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas I understand the concern. How about I experiment in a separate branch with plumbing it through other call sites and see what the perf impact is? That should give us evidence that this API is (or isn't) generally worthwhile.

@GrabYourPitchforks
Copy link
Member Author

Closing this PR since it's no longer actively being worked on. Opened GrabYourPitchforks#9 so that I don't lose track of it.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants