-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine the strategy that should be used by the Jit when folding implementation-defined casts #47478
Comments
I think this would only be required if the folding was done at R2R/AOT time. For JIT time scenarios, where you are targeting the current machine, you can just execute the underlying C/C++ conversion and don't need to maintain any table outside of what cases are |
One benefit of using the table anyway (of course, it'll only be used for cases that overflow) is that we'll get guaranteed consistent behavior across compilers. It also seems like using the table if we already have one (note that we say "the table", but really it is a couple of |
CC @AndyAyersMS |
@SingleAccretion thanks for the writeup. We should try to get this sorted during .NET 6. |
I'm working on a writeup for what we should be doing here. It's irritatingly complicated. My current thinking on the matter is that we should provide a CoreCLR definition for the the behavior of all casting that is unified across architectures and platforms. It will cause a slight performance impact, but even microbenchmarks show that a helper function could be used without a significant impact on real applications. (They show a 25-50% slowdown for an application which does nothing but casting, and as we all know casting may be common, but its not THAT common.) @tannergooding Unfortunately, relying on the C++ compiler for casting behavior in undefined scenarios is a complete mess. For instance, the MSVC compiler has a variety of different conversion behaviors, none of which exactly match the our codegen in various edge cases even on X64 Windows. |
Could you clarify what types of applications were tested? I'd expect a heavier hit in things like ML.NET (as well as the Pytorch and Tensorflow ports) since they tend to be more heavily dependent on doing the same floating-point operation over and over in a loop. |
The JIT has always had undefined behavior for things like this and for over/under shifting of a value (with C# handling the latter by explicitly emitting We also have much larger inconsistencies in behavior elsewhere that users already have to deal with. Like in certain IO/file handling between Unix and Windows or in the results returned by So while I think we should first and foremost aim to be IEEE compliant, I also feel where those differences are undefined behavior that its fine to have them be inconsistent between different machines. |
For what its worth, at least one of these cases is a clear bug and causes us to not be IEEE compliant: #43895 The only real edge cases that exist are values outside the range of the target type. For |
Ping @davidwrighton, do you plan to work on this in .NET 6 or will you move this to .NET 7? |
@tannergooding can you take this over? Or have we already done this? |
We now have a unified behavior for casting of |
Sounds like this is done then. Constant folding is an optimization detail, not a detail of how folding should work. |
Today, the Jit supports folding casts like these:
(uint)float.MaxValue
. The value returned by this cast when done by the Jit is defined by the C++ compiler it was compiled with, which may or may not match the value returned by the instruction that will be emitted for an equivalent dynamic cast. It also may or may not match the value Roslyn will produce (0
) when constant-folding these casts at compile time.One of the manifestations of this issue was observed in #47374, where MSVC x86 compiler gave a different answer to a question that would "normally" return
0
.However, a scenario like the above is actually not the main concern that I would like to raise for consideration in this issue. A much bigger problem is cross-compilation, which, presumably, will become fairly common once
Crossgen2
is fully productized, where the AOT Jit can potentially give a different answer to the same question than the JIT Jit.As I see it, there are a couple of options that can pursued here:
Do nothing. As per the ECMA specification, overflowing casts from floating-point types to integers are undefined behavior. This could lead to some problems (described below), but hopefully the impact will be minimal as such casts are not exactly widespread, and only an incredibly small fraction of them will involve constants (I conjecture this can only occur in real-world code when inlining).
Do not fold when the result is undefined (as per ECMA), neither in AOT, nor the JIT mode. This options has a couple of benefits associated with it:
However, this would mean that some folding opportunities will be missed and some UB not taken advantage of when optimizing, which can have cascading effects on, for example, inlining.
Only fold when not in AOT mode. This will eliminate the cross-compilation concerns, but can still potentially results in inconsistencies, depending on how the folding will be implemented (see below).
Fold always. This will be a stronger option than the above, and have the same problems/benefits, just magnified by the amount of code that will be exposed.
Now, one critical decision to make if it is to be decided that the folding is to be performed for these edge cases, is how should it be done? Again, there are a couple of options:
runtime/src/coreclr/jit/gentree.cpp
Lines 14310 to 14313 in fb001e6
This is the reason why e. g.
(uint)float.PositiveInfinity
is not folded today, as are conversions fromNaN
. Opting for this behavior is also simple in terms of the implementation, but it comes with the usual baggage that UB has: Debug vs Release inconsistencies, inlining vs no inlining inconsistencies (which may become dynamic with the advent of PGO and such), etc.x64
, for example, the Jit would followvcvttss2si
's behavior, for ARM64 -fcvtzu
s, and so on. This is a relatively expensive option implementation-wise as the existing codegen would need to be surveyed for the exact instructions used, their behavior determined and stored in a dedicated table, which would need to be kept in sync with any future changes in the code generator. Hopefully, this should be a fixed-sized cost however, as it is not really expected that the emitted instructions will be changing frequently, if at all.As I am no expert of the matter, this is only an overview of the problem, with the detailed analysis on the cost/benefit ratios of the options missing, with the intent on starting the discussion and determining those concretely.
It may turn out that the decision that will conclude this is that this simply is a non-issue extremely unlikely to be hit by anyone in the real world and thus doesn't merit any effort being dedicated to fixing it.
category:design
theme:optimization
The text was updated successfully, but these errors were encountered: