Determine the strategy that should be used by the Jit when folding implementation-defined casts #47478

SingleAccretion · 2021-01-26T17:49:55Z

Today, the Jit supports folding casts like these: (uint)float.MaxValue. The value returned by this cast when done by the Jit is defined by the C++ compiler it was compiled with, which may or may not match the value returned by the instruction that will be emitted for an equivalent dynamic cast. It also may or may not match the value Roslyn will produce (0) when constant-folding these casts at compile time.

One of the manifestations of this issue was observed in #47374, where MSVC x86 compiler gave a different answer to a question that would "normally" return 0.

However, a scenario like the above is actually not the main concern that I would like to raise for consideration in this issue. A much bigger problem is cross-compilation, which, presumably, will become fairly common once Crossgen2 is fully productized, where the AOT Jit can potentially give a different answer to the same question than the JIT Jit.

As I see it, there are a couple of options that can pursued here:

Do nothing. As per the ECMA specification, overflowing casts from floating-point types to integers are undefined behavior. This could lead to some problems (described below), but hopefully the impact will be minimal as such casts are not exactly widespread, and only an incredibly small fraction of them will involve constants (I conjecture this can only occur in real-world code when inlining).
Do not fold when the result is undefined (as per ECMA), neither in AOT, nor the JIT mode. This options has a couple of benefits associated with it:
- Behavior of folded code and non-folded code will be consistent for one architecture. This is in particular means that Release and Debug build should behave the same.
- AOT mode will be consistent with the JIT mode "for free", as no implementation-defined folding will be done at AOT time.
- It is a relatively simple approach, implementation wise.
  However, this would mean that some folding opportunities will be missed and some UB not taken advantage of when optimizing, which can have cascading effects on, for example, inlining.
Only fold when not in AOT mode. This will eliminate the cross-compilation concerns, but can still potentially results in inconsistencies, depending on how the folding will be implemented (see below).
Fold always. This will be a stronger option than the above, and have the same problems/benefits, just magnified by the amount of code that will be exposed.

Now, one critical decision to make if it is to be decided that the folding is to be performed for these edge cases, is how should it be done? Again, there are a couple of options:

Fold everything to zero. This is what Roslyn does today, but it would mean that there will be inconsistencies between the folding and the dynamic execution, which existing comments in the code highlight as undesirable:

runtime/src/coreclr/jit/gentree.cpp

Lines 14310 to 14313 in fb001e6

    
           // to an integer, ..., the value returned is unspecified."  However, it would 
        
           // at least be desirable to have the same value returned for casting an overflowing 
        
           // constant to an int as would obtained by passing that constant as a parameter 
        
           // then casting that parameter to an int type.  We will assume that the C compiler's

This is the reason why e. g. (uint)float.PositiveInfinity is not folded today, as are conversions from NaN. Opting for this behavior is also simple in terms of the implementation, but it comes with the usual baggage that UB has: Debug vs Release inconsistencies, inlining vs no inlining inconsistencies (which may become dynamic with the advent of PGO and such), etc.

Fold to the same value the native instruction of the target architecture doing the conversion would produce when casting dynamically. This means that for x64, for example, the Jit would follow vcvttss2si's behavior, for ARM64 - fcvtzus, and so on. This is a relatively expensive option implementation-wise as the existing codegen would need to be surveyed for the exact instructions used, their behavior determined and stored in a dedicated table, which would need to be kept in sync with any future changes in the code generator. Hopefully, this should be a fixed-sized cost however, as it is not really expected that the emitted instructions will be changing frequently, if at all.

As I am no expert of the matter, this is only an overview of the problem, with the detailed analysis on the cost/benefit ratios of the options missing, with the intent on starting the discussion and determining those concretely.

It may turn out that the decision that will conclude this is that this simply is a non-issue extremely unlikely to be hit by anyone in the real world and thus doesn't merit any effort being dedicated to fixing it.

category:design
theme:optimization

The text was updated successfully, but these errors were encountered:

tannergooding · 2021-01-26T18:32:17Z

Fold to the same value the native instruction of the target architecture doing the conversion would produce when casting dynamically.
This is a relatively expensive option implementation-wise as the existing codegen would need to be surveyed for the exact instructions used, their behavior determined and stored in a dedicated table, which would need to be kept in sync with any future changes in the code generator.

I think this would only be required if the folding was done at R2R/AOT time. For JIT time scenarios, where you are targeting the current machine, you can just execute the underlying C/C++ conversion and don't need to maintain any table outside of what cases are undefined behavior.

SingleAccretion · 2021-01-26T18:38:12Z

One benefit of using the table anyway (of course, it'll only be used for cases that overflow) is that we'll get guaranteed consistent behavior across compilers. It also seems like using the table if we already have one (note that we say "the table", but really it is a couple of #ifs on the TARGET) would be simpler in terms of control flow?

JulieLeeMSFT · 2021-01-28T01:03:17Z

CC @AndyAyersMS

AndyAyersMS · 2021-01-29T17:35:39Z

@SingleAccretion thanks for the writeup.
cc @davidwrighton who has also done some thinking in this area.

We should try to get this sorted during .NET 6.

davidwrighton · 2021-02-10T01:37:09Z

I'm working on a writeup for what we should be doing here. It's irritatingly complicated. My current thinking on the matter is that we should provide a CoreCLR definition for the the behavior of all casting that is unified across architectures and platforms. It will cause a slight performance impact, but even microbenchmarks show that a helper function could be used without a significant impact on real applications. (They show a 25-50% slowdown for an application which does nothing but casting, and as we all know casting may be common, but its not THAT common.)

@tannergooding Unfortunately, relying on the C++ compiler for casting behavior in undefined scenarios is a complete mess. For instance, the MSVC compiler has a variety of different conversion behaviors, none of which exactly match the our codegen in various edge cases even on X64 Windows.

tannergooding · 2021-02-10T03:44:28Z

but even microbenchmarks show that a helper function could be used without a significant impact on real applications

Could you clarify what types of applications were tested? I'd expect a heavier hit in things like ML.NET (as well as the Pytorch and Tensorflow ports) since they tend to be more heavily dependent on doing the same floating-point operation over and over in a loop.
I'd imagine it likewise negatively impacts something like ImageSharp (although less as they move to using SIMD for more scenarios).

tannergooding · 2021-02-10T03:55:52Z

The JIT has always had undefined behavior for things like this and for over/under shifting of a value (with C# handling the latter by explicitly emitting & mask on all shifts) and even when referring to a single architecture like x86/x64, you have cases where Intel and AMD differ in behavior (like with FLD, FSTP, BSF, and BSR). Its largely just with the introduction of ARM that these are becoming more visible due to explicit differences in behavior.

We also have much larger inconsistencies in behavior elsewhere that users already have to deal with. Like in certain IO/file handling between Unix and Windows or in the results returned by Math.Sin across x86, x64, ARM32, ARM64, Windows, Linux, macOS, and all the variations therein.

So while I think we should first and foremost aim to be IEEE compliant, I also feel where those differences are undefined behavior that its fine to have them be inconsistent between different machines.

tannergooding · 2021-02-10T04:13:05Z

For instance, the MSVC compiler has a variety of different conversion behaviors, none of which exactly match the our codegen in various edge cases even on X64 Windows.

For what its worth, at least one of these cases is a clear bug and causes us to not be IEEE compliant: #43895
Given .NET core doesn't have the same compat requirements, I'd expect us to be fixing this.

The only real edge cases that exist are values outside the range of the target type. For double->float, there is clear IEEE required behavior here.
For double/float->integral the behavior is undefined, but most implementations normalize to one of 0, MaxValue, or MinValue (where hardware support is not available).

JulieLeeMSFT · 2021-07-26T18:12:21Z

Ping @davidwrighton, do you plan to work on this in .NET 6 or will you move this to .NET 7?

davidwrighton · 2024-04-16T18:53:05Z

@tannergooding can you take this over? Or have we already done this?

tannergooding · 2024-04-16T18:56:10Z

We now have a unified behavior for casting of floating-point to integer. I don't think we have done the work to enable the JIT to constant fold in those scenarios (including for R2R/Crossgen/etc)

davidwrighton · 2024-04-16T20:53:31Z

Sounds like this is done then. Constant folding is an optimization detail, not a detail of how folding should work.

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Jan 26, 2021

AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Jan 29, 2021

JulieLeeMSFT added this to the 6.0.0 milestone Feb 10, 2021

JulieLeeMSFT mentioned this issue Feb 23, 2021

UInt64 to Double conversions are incorrect #43895

Closed

SingleAccretion mentioned this issue Mar 18, 2021

Do forceCastToUInt32 and forceCastToFloat still serve a purpose? #49787

Closed

JulieLeeMSFT assigned davidwrighton Mar 23, 2021

SingleAccretion mentioned this issue Mar 30, 2021

Handle casts done via helpers and fold overflow operations in value numbering #50450

Merged

SingleAccretion mentioned this issue Apr 15, 2021

System.Linq.Expressions tests are broken in CI #51346

Closed

tannergooding mentioned this issue Apr 16, 2021

Test failure: JIT\\Methodical\\Overflow\\FloatOvfToInt2_do\\FloatOvfToInt2_do.cmd #51380

Closed

SingleAccretion mentioned this issue Jun 6, 2021

Disable folding of implementation-defined casts #53782

Merged

SingleAccretion mentioned this issue Jul 23, 2021

Constant evaluation related to Float-> Int conversion are somehow broken in Roslyn dotnet/roslyn#54994

Open

davidwrighton modified the milestones: 6.0.0, 7.0.0 Aug 4, 2021

davidwrighton modified the milestones: 7.0.0, 8.0.0 Aug 4, 2022

JulieLeeMSFT modified the milestones: 8.0.0, 9.0.0 Jul 28, 2023

davidwrighton closed this as completed Apr 16, 2024

github-actions bot locked and limited conversation to collaborators May 17, 2024

JulieLeeMSFT added this to .NET Core CodeGen Jun 5, 2024

JulieLeeMSFT moved this to Done in .NET Core CodeGen Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine the strategy that should be used by the Jit when folding implementation-defined casts #47478

Determine the strategy that should be used by the Jit when folding implementation-defined casts #47478

SingleAccretion commented Jan 26, 2021 •

edited by BruceForstall

Loading

tannergooding commented Jan 26, 2021

SingleAccretion commented Jan 26, 2021

JulieLeeMSFT commented Jan 28, 2021

AndyAyersMS commented Jan 29, 2021

davidwrighton commented Feb 10, 2021

tannergooding commented Feb 10, 2021

tannergooding commented Feb 10, 2021

tannergooding commented Feb 10, 2021 •

edited

Loading

JulieLeeMSFT commented Jul 26, 2021

davidwrighton commented Apr 16, 2024

tannergooding commented Apr 16, 2024

davidwrighton commented Apr 16, 2024

Determine the strategy that should be used by the Jit when folding implementation-defined casts #47478

Determine the strategy that should be used by the Jit when folding implementation-defined casts #47478

Comments

SingleAccretion commented Jan 26, 2021 • edited by BruceForstall Loading

tannergooding commented Jan 26, 2021

SingleAccretion commented Jan 26, 2021

JulieLeeMSFT commented Jan 28, 2021

AndyAyersMS commented Jan 29, 2021

davidwrighton commented Feb 10, 2021

tannergooding commented Feb 10, 2021

tannergooding commented Feb 10, 2021

tannergooding commented Feb 10, 2021 • edited Loading

JulieLeeMSFT commented Jul 26, 2021

davidwrighton commented Apr 16, 2024

tannergooding commented Apr 16, 2024

davidwrighton commented Apr 16, 2024

SingleAccretion commented Jan 26, 2021 •

edited by BruceForstall

Loading

tannergooding commented Feb 10, 2021 •

edited

Loading