-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve codegen for Vector128.Shift* operations where a direct intrinsic is not available #82564
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsConsider Vector128<byte> v0 = Vector128.LoadUnsafe(ref source);
Vector128<byte> v1 = Vector128.ShiftRightLogical(v0, 4); Which currently emits a scalar fallback TestClass.Foo(Byte ByRef)
L0000: push rsi
L0001: sub rsp, 0x40
L0005: vzeroupper
L0008: vmovdqu xmm0, [rcx]
L000c: vmovapd [rsp+0x20], xmm0
L0012: xor esi, esi
L0014: lea rcx, [rsp+0x20]
L0019: movsxd rdx, esi
L001c: movzx ecx, byte ptr [rcx+rdx]
L0020: mov edx, 4
L0025: mov rax, 0x7ffa0845bc60
L002f: call qword ptr [rax]
L0031: lea rdx, [rsp+0x30]
L0036: movsxd rcx, esi
L0039: mov [rdx+rcx], al
L003c: inc esi
L003e: cmp esi, 0x10
L0041: jl short L0014
L0043: vmovapd xmm0, [rsp+0x30]
L0049: vpmovmskb eax, xmm0
L004d: add rsp, 0x40
L0051: pop rsi
L0052: ret where it could instead emit a 32-bit shift and an AND to clear the overlapping bits Vector128<byte> v0 = Vector128.LoadUnsafe(ref source);
Vector128<byte> v1 = Vector128.ShiftRightLogical(v0.AsInt32(), 4).AsByte() & Vector128.Create((byte)0xF); TestClass.Bar(Byte ByRef)
L0000: vzeroupper
L0003: vmovdqu xmm0, [rcx]
L0007: vpsrld xmm0, xmm0, 4
L000c: vpand xmm0, xmm0, [0x7ffa087600d0]
L0014: vpmovmskb eax, xmm0
L0018: ret We have a few places in runtime that are aware of this issue and employ workarounds, e.g.:
|
For runtime/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs Line 594 in dc6ad37
Vector128.Create((byte)0xF) is needed which was avoided by intention by re-using the already present 0x2F which has effectively the same masking effect.
I don't think any compiler is smart enough these days to have that knowledge / information in order to do such optimizations. |
Assigning to @tannergooding to respond to the request. |
We will not have time to implement this code optimization in .NET8. |
(applies to Vector256 as well)
Consider
Vector128.ShiftRightLogical(ref byte)
where X86 does not have aShiftRightLogical
instruction that operates on bytes:Which currently emits a scalar fallback
where it could instead emit a 32-bit shift and an AND to clear the overlapping bits
We have a few places in runtime that are aware of this issue and employ workarounds, e.g.:
runtime/src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAnyAsciiSearcher.cs
Line 875 in c1abf87
runtime/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs
Line 594 in dc6ad37
The text was updated successfully, but these errors were encountered: