-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating the x64 casting behavior to be IEEE 754 compliant and to use saturation for overflow #61761
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsCC. @jkotas, @davidwrighton, @VSadov, @jeffhandley This came up as part of #61649 and so I figured it might be time to finally fix this and making it IEEE 754 compliant, also making it inline with ARM64 (at least for inputs that have a well-defined result). Leaving it in draft as I believe the
|
Codegen generally looks good/improved: - mov rax, qword ptr [rdx]
- vxorps xmm0, xmm0
- vcvtsi2sd xmm0, rax
- test rax, rax
- jge SHORT G_M51675_IG03
- vaddsd xmm0, qword ptr [reloc @RWD00]
+ vmovd xmm0, qword ptr [rdx]
+ vpunpckldq xmm0, xmm0, xmmword ptr [reloc @RWD00]
+ vsubpd xmm0, xmm0, xmmword ptr [reloc @RWD16]
+ vpshufd xmm1, xmm0, 78
+ vaddsd xmm0, xmm0, xmm1 and - vmovsd xmm0, qword ptr [rsi]
- call CORINFO_HELP_DBL2ULNG
+ vmovsd xmm0, qword ptr [rdx]
+ vcvttsd2si rax, xmm0
+ mov r8, rax
+ sar r8, 63
+ vsubsd xmm0, xmm0, qword ptr [reloc @RWD00]
+ vcvttsd2si r9, xmm0
+ and r8, r9
+ or rax, r8 It looks like I need to adjust handling for |
This will need a breaking change notification that describes what exactly changed. I am supportive of the idea of getting the behavior standardized between platforms. |
Added When you commit this breaking change:
Tagging @dotnet/compat for awareness of the breaking change. |
30a5699
to
ad0e186
Compare
ad0e186
to
39971b5
Compare
@jkotas, I've updated this to include the relevant helper APIs and tests to validate the saturating behavior. Notably, the unchecked versions of
Some diffs are as follows. For - vcvttss2si eax, xmm0
+ vmovss xmm1, dword ptr [reloc @RWD00] ; for byte, ushort; this is vxorps xmm1, xmm1, xmm1
+ vmaxss xmm1, xmm1, xmm0
+ vmovss xmm2, dword ptr [reloc @RWD04]
+ vminss xmm1, xmm2, xmm1
+ vcvttss2si esi, xmm1
+ cmp esi, 0xD1FFAB1E
+ jne SHORT G_M31120_IG04
+ G_M31120_IG03:
+ vcvtss2sd xmm0, xmm0 ; Only for input type float
+ call CORINFO_HELP_DoubleToInt8
+ G_M31120_IG04:
+ mov eax, esi
+ G_M31120_IG05:
movsx rax, al For vcvttss2si eax, xmm0
+ cmp eax, 0xD1FFAB1E
+ jne SHORT G_M61232_IG04
+ G_M61232_IG03:
+ vcvtss2sd xmm0, xmm0 ; Only for input type float
+ call CORINFO_HELP_DoubleToInt32
+ jmp SHORT G_M61232_IG04 ; This seems to be a side-effect of the QMARK
+ G_M61232_IG04: |
There are also some positive diffs around the overflow helpers for vcvtss2sd xmm0, xmm0
+ call CORINFO_HELP_DoubleToInt8_OVF
- call CORINFO_HELP_DBL2INT_OVF
- cmp eax, 127
- jg SHORT G_M39883_IG04
- cmp eax, -128
- jl SHORT G_M39883_IG04
; ...
- G_M39883_IG04:
- call CORINFO_HELP_OVERFLOW
- int3 |
aab5092
to
45d68be
Compare
27170be
to
7f94409
Compare
x64 diffs are below. Improvements are places where small casts w/ overflow were converted into helper calls rather than being inline conversions, checks, and throwing the overflow exception. x64 --frameworks --pmi
x64 --benchmarks --pmi
x64 --tests --pmi
|
ARM64 diffs are similar (can't use
- 1E780000 fcvtzs w0, d0
- 13001C00 sxtb w0, w0
+ 5C000150 ldr d16, [@RWD00]
+ 1E604210 fmov d16, d16
+ 1E704810 fmax d16, d0, d16
+ 5C000131 ldr d17, [@RWD08]
+ 1E604231 fmov d17, d17
+ 1E715A10 fmin d16, d16, d17
+ 1E780200 fcvtzs w0, d16 No Changes for |
a0ee1dd
to
04ec9dd
Compare
src/tests/JIT/Directed/Convert/out_of_range_fp_to_int_conversions.cs
Outdated
Show resolved
Hide resolved
src/tests/JIT/Directed/Convert/out_of_range_fp_to_int_conversions.cs
Outdated
Show resolved
Hide resolved
Draft Pull Request was automatically closed for inactivity. Please let us know if you'd like to reopen it. |
66d827c
to
a4a3f0f
Compare
Rebased onto dotnet/main |
a4a3f0f
to
7f0f1c5
Compare
cdbe5da
to
70331ef
Compare
6cb2186
to
2782173
Compare
Going to need help from someone on the Mono team to figure out the Was initially trying to get everything working and then would look at only enabling it where necessary and optimizing (helpers aren't needed on Arm64, for example). |
@vargaz, is this something you or someone from the mono team could help me investigate? I'm not finding what isn't being handled correctly here and its preventing the last of the prototype work to try and finalize the behavior to be deterministic. After mono is all passing, then I'll be able to finish getting perf numbers and writing up the breaking-change doc. |
Will look into it. |
2782173
to
8d6661c
Compare
The mono changes look ok to me. |
8d6661c
to
f20d0f3
Compare
…cs for specialized code paths
@vargaz, Thanks for the help! I'm still seeing
and then for OSX
For the first one, I'd expect there is some handling or other consideration I'm missing for AOT around https://github.com/dotnet/runtime/pull/61761/files#diff-ca87147826c7221d1c277d7a22d042444f21defad5b7a901d771c158243d6e14R4767 It's not clear why this is being "jitted". I thought that this was supposed to be treated as a call to the native helper and so it should just work for AOT. Notably, the new entries match up to For the second issue, it's not clear why this is only showing up on OSX. Perhaps there is some improper handling around the new I'm notably not able to reproduce this on Windows and debugging, even for interp/aot mode, looks to be doing all the right things here. |
Will look into it. |
Draft Pull Request was automatically closed for inactivity. Please let us know if you'd like to reopen it. |
CC. @jkotas, @davidwrighton, @VSadov, @jeffhandley
This came up as part of #61649 and so I figured it might be time to finally fix this and making it IEEE 754 compliant, also making it inline with ARM64 (at least for inputs that have a well-defined result).
The integer to floating-point impl here is based on the Clang/LLVM codegen which is Apache 2.0 w/ LLVM Exception.
The floating-point to integer impl is based on the Rust codegen which is Apache 2.0 and MIT.
The floating-point to integer impl is matching the Rust behavior of saturating too large/too smal values and treating
0
asNaN
: rust-lang/rust#10184