-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] An API for providing compiler hints to the JIT/AOT #24593
Comments
🤔 More seriously, I've occasionally been thinking about something like this, but I have no idea if the scenarios I have in mind are even that much of a perf impact. |
I think its the type of optimization where a big impact depends on your API. For example, a method which is But, a thousand tiny improvements can still add up to a lot as well, so it can also be useful for small APIs. |
FYI. @fiigii, @4creators, @mikedn, @CarolEidt, @jkotas I've proposed this feature as a general way for developers to provide hints to the JIT/AOT about their code. The only API for the feature I've proposed at this time would give users a way to tell the JIT to emit Otherwise, even in the case where the user has pinned a managed array, and offset their ptr appropriately, it is still expensive for the JIT to determine that the logic will result in the code being always aligned and that it is safe to emit the better code. |
This feature is IMO long overdue. It can be helpful in a lot of scenarios not only for HW intrinsics. I would even expand this proposal with
For discussion see https://github.com/dotnet/coreclr/issues/15427#issuecomment-351794198 and https://github.com/dotnet/coreclr/issues/15522 |
I would not consider these two cases to fall into the category of compiler "hints". I would expect a hint to be something that the compiler could use to guide its choices (e.g. whether to inline), but which would have no correctness requirement. For the alignment case, at least the worst that could happen is that the program would fault if the alignment were not correct (and the developer would then know they'd got it wrong). For the "Pure" case, it could get completely wrong results, and be much more difficult to debug. That said, I think these are worth considering - we should just call them something other than "hints". |
Fair enough. Now to come up with a better name 😄 |
My proposals 😄 would be than:
|
There also needs to be a name for the static class which is used to provide information inline with code (such as for a local, where attributes aren't allowed). |
Similar proposal: https://github.com/dotnet/coreclr/issues/2725 |
👍. I would opt for named assumptions over a general I do think putting them in |
I've updated the OP to suggest using |
Yet, I would still allow general
|
The point would be to provide named assumptions (e.g. A general
This requires the JIT to support and itself assume that the above IL sequence (and any other similar IL sequences that may be emitted by other compiles) meant "I am not null". It also requires the compiler to not optimize such a check away (C# is adding non-nullable types, whose to say they don't optimize the check to However, an explicit
A high level compiler (such as C#) also has no way of optimizing this call away itself so there is no worry about the JIT not being able to determine what the user was wanting. |
Got it. It is OK than as simpler solution from perspective of Jit support. |
The current purpose of the Adding assumptions there doesn't look like a very good fit to me. It rather seems to be a violation of the single responsibility principle. I'd dedicate a new class for this, here are some naming ideas:
|
Thanks for the suggestions! I'll update the OP to include a list of suggested names sometime later tonight. Based on the discussions here, and a couple I've had offline, the general idea of the proposal looks to be sound (we want a way to suggest optimizations to the compiler and we want those to be named/verifiable). The actual name of the class that holds these methods will likely undergo some discussion during the API review session (which will happen sometime after we mark this as |
If they are to be trusted rather than guidance it should be |
@benaadams when you say |
Although I think that |
@CarolEidt, is your recommendation then that we do not perform any validation of the assumption when optimizations are disabled then? Based on your statement on the CoreCLR issue (https://github.com/dotnet/coreclr/issues/2725#issuecomment-173028054), I had taken it that we should validate the condition under debug (however, that was 2 years ago and things may have changed). |
@tannergooding - thanks for reminding me of that. I still think it would be useful and probably desirable to validate the condition under debug. However, even in that context, I don't feel that the verb "assert" is quite strong enough; it says "I claim this is true". To me, "assume" says "I claim this is true and you (the compiler/JIT/runtime) can take actions based on that claim". |
IMO
|
I'm going to tag @jaredpar and @MadsTorgersen explicitly here as well. This will primarily be intended for the lower compilers (JIT/AOT, etc), but it would be nice to see if the higher level compilers (C#, VB, F#, etc) have any requirements or scenarios that this might prevent/harm/etc, or if there are any scenarios for which this would be advantageous (i.e. |
Updated the OP with:
|
Tagging @joperezr, @AlexGhiondea as area owners. Is there any other feedback here before I mark as |
You should update the proposed API at the top post to what you want folks to review. |
👍, Updated it to be Rest of the OP looks to be as intended. |
Looks good to me. |
If this is just about FWIW, what I have been hearing is that the performance penalty of the instructions that handle misaligned access is very small on modern platforms and that it is not something to really worry about. |
There is basically no difference between the However, on non-AVX hardware (anything prior to Sandy Bridge, which came out Jan 2011) you cannot use the Any machine running on post Sandy Bridge (post Bulldozer for AMD, which was released Oct 2011) micro-architecture can freely use the VEX encoding ( However, even on modern hardware, there is a non-trivial performance hit if you read unaligned data that crosses a cache line or page boundary, so it is important to ensure reads/writes (outside the first) are aligned anyways. Additionally, the Benchmark Game performance tests (CC. @danmosemsft, @eerhardt) are run on legacy hardware (specifically a Q6600 from the Kentsfield micro-architecture) for which the CC. @fiigii as he may be able to provide more details or correct any mistakes I made with the timelines. |
How can you "assume" 16 byte alignment for an array? The GC doesn't guarantee alignment larger than 8 bytes. |
|
If you do all that then you can surely simply use LoadAligned as well. We really should be careful with this "assume" stuff. We need to ensure that:
|
Which still doesn't solve the scenario for anyone using
I agree, any assumptions we add need to have a clear goal/scenario and some level of data supporting why it should exist. I know on memory alignment, I have seen several things of benchmark data (such as the benchmark games) that indicates we don't perform great on machines without AVX support.
I'm not convinced turning off range checks is a high risk assumption. The IL already supports the Users can already bypass it using unsafe code, and can likely bypass it by writing their own IL. Having a compiler assumption for it just provides an easy way for users to express the same thing without having to deal with pointers and without having to do a IL rewriting pass. |
If
Support for those prefixes is optional. The .NET JIT ignores them.
Yes, and that makes the addition of another way to shoot yourself in the foot more questionable. Especially if the new way is easier to use and easier to get wrong. |
ARM alignment performance would be vendor specific. If alignment faults were enabled (they are not) an exception would be very expensive. Otherwise cache line and page crossing penalties are real, but using aligned code is rarely a win. Large block copies are the typical exception. I did not read the full thread. Some observations.
|
Assumptions will assert the condition and fail the program if optimizations are disabled.
Other compilers consider assumptions to be correct until the input is modified. I would think the same logic would be applied here. |
Providing a general purpose mechanism that allows the compiler to detect and handle alignment is likely better than adding customized support to several types. A general, cross-architecture, mechanism also allows 3rd party authors to achieve the same thing without taking a dependency on one of those types or without requiring them to use hardware intrinsics (which might not be a good option since they are architecture specific).
These will, arguably, be less "shoot yourself in the foot". Validation when optimizations are disable already make this better than any of the alternatives. Having the assumptions additionally be named and having them work with "safe" code will also make it easier to understand the intent and to debug the issue. We are a JIT compiler, so we have a limited time to do the analysis, and even with an AOT compiler, some analysis is complex and takes a long time to do. As such, we have to accept the fact that there are some optimizations the JIT cannot do without outside input (i.e. the developer). If I do checks on my inputs at the public entry point and pass those inputs, unmodified, to some other private method which operates on the data, I should be able to tell the compiler "I already checked this, please don't check it again" and the compiler should (preferably) respect that. |
Other examples?
That's basically a lot of hand-waving. You have no idea what a JIT can or can't do in a given amount of time. You have no idea if the current JIT is efficient or not. You have no idea if the fact that the current JIT fails to perform some useful optimizations because it doesn't have time or because it's basically a pile of crap. You have no idea if the JIT could actually make effective use of the proposed assumptions. You have no idea if the performance improvements you may get from implementing such assumptions are significant enough to be worth the risk. |
@mikedn - although there's a lot we don't know about what the JIT could or couldn't do, especially if it were engineered as well as it could possibly be, I think there are a couple of things we can say:
That said, you make a very good point that we should be doing proof-of-concept work to validate the benefit of any such hints that we might propose for .NET. |
Yes, of course. But one of the problems with these assumptions is that the JIT may not be able to handle them very well for the same reasons it does not handle well other optimizations - lack of time, problematic design and implementation. I've ran into such issues already. For example, I added the And when people try to use "assume" in their code and find that the JIT doesn't perform the desired optimizations what do we do? Go back to the drawing board and introduce new ways to fill their needs? While not necessarily bad I'm not sure this is a good idea. |
I'm sympathetic to this kind of proposal but also have seen this kind of thing go wrong in many ways. So I'd prefer it if we had an actual end to end implementation that we could evaluate. |
@joperezr There is no point in sending this to API review until there is a proof of concept that demonstrates that this API is actually useful. |
I have a good use case for this. See dotnet/coreclr#16095 (comment) for better details. Basically, we cannot fold
|
@AndyAyersMS @CarolEidt This is an actual assumption that is proven in practice to help even on highly optimized C++ compilers: https://github.com/dotnet/coreclr/issues/6024 |
The jit's ability to represent branch probabilities is quite limited. Without actual profile data all we can describe is that a particular branch is almost never or almost always taken. The not taken side of the branch then gets treated as very cold, akin to how we handle throws. If that seems interesting, then it is probably not too difficult to prototype, especially with the advent of new style intrinsics. Something like of two static identity methods in some namespace yet to be determined, eg: class CompilerHint
{
[Intrinsic]
static bool Unlikely(bool x) { return x;} // hint: x is almost always true
[Intrinsic]
static bool Likely(bool x) { return x; } // hint: x is almost always false
} These would be legal anywhere but would initially impact jit codegen only when used in a branching context, and then maybe only sensibly when at top level of an expression: if (CompilerHint.Likely(a > b)) { A; } else { B; } // A is very hot, B is very cold |
I'm currently focusing my available time on helping ensure the HWIntrinsics are generally ready for 2.1. After that is done, I want to try prototyping one of these for the next release (assuming someone else doesn't first). |
@tannergooding is this required in order to ship 5.0? It seems milestone should be Future. |
Yes, updated. Not sure who the relevant area owner is. |
Rationale
There are a number of optimizations which require complex or in-depth analysis. In many cases, these optimizations are not possible to make. This can be due to time constraints (JIT), due to requiring an assumption to be made (JIT and AOT), or due to some other reason.
However, the developer often knows what assumptions are safe to make. This can be due to the way they wrote their algorithm, some check they performed higher in the code, etc.
As such, I propose we provide a set attributes and APIs which can help the compiler (JIT or AOT) produce more optimal code.
Proposed APIs
The proposed API here is very simplistic and represents a core use-case for the new Hardware Intrinsics APIs.
There are multiple other additions that may be equally useful, but they should likely be reviewed on a case-by-case basis with their own supporting arguments.
Alternative Class Names
CompilerHints
was the initial suggested name of the type which would hold these APIs. However, there was some concern over whether some of the suggested APIs actually qualified asHints
, since they may be required to be correct and may result in failure or incorrect data at runtime.The general feedback (so far) is that these are assumptions not asserts. @CarolEidt gave a good explanation of the difference here: https://github.com/dotnet/corefx/issues/26188#issuecomment-356147405
Other names suggested include:
Additional Details
All assumptions should be named. That is, there should be no generic
Assume(bool cond)
method exposed now or in the future. These assumptions are meant to convey an explicit optimization to the compiler and should not be left to interpretation.Potentially as an implementation detail, preferably as part of the contract: Any assumption made by these APIs should be verified when compiling with optimizations disabled.
Example Case 1
In many high-performance algorithms the user will be reading/writing memory in 16 or 32-byte chunks by utilizing the SIMD registers available to the hardware.
It is also generally beneficial to such algorithms to ensure that their reads/writes are aligned as it ensures a read/write never crosses a cache or page boundary (which can incur a heavy performance penalty). Additionally, on older hardware (and some architectures) the unaligned read/write instruction (
movups
on x86) is not as fast as the aligned read/write instruction (movaps
on x86) or thereg, reg/mem
encoding of the various instruction may require thatmem
be aligned.However, the GC does not currently support arbitrary alignments for objects and it is impossible to guarantee alignment without pinning and then checking the alignment of the memory. As such, users generally write their algorithm such that it does a single unaligned read/write, adjusts the offset so that the new address is aligned, and then operates on the rest of the data using aligned reads/writes.
In some scenarios, it may be difficult or impossible to determine that the pinned address is actually aligned (such as if an address is pinned and then passed to some other method to be processed), so the compiler may not emit the most optimal code. As such, providing an
AssumeAlignment(address, alignment)
that tells the compiler to emit code as if the address was aligned would be beneficial.Example Case 2
NOTE: This is not currently proposed, but represents another annotation that having compiler support for may be useful.
In many cases, a user may write a method that is considered Pure (the method does not modify state and always returns the same input for a given output).
In simplistic cases, the compiler may perform constant folding or inlining and the actual computation may not be done. However, in more complex cases the compiler may need to do more in-depth analysis which may prevent it from caching the return value or inlining the method.
As such, providing a
PureAttribute
(similarly to theContract.PureAttribute
, but actually consumed by the compiler) would be useful as it would allow the compiler to cache the return value and optimize away subsequent calls for a given input.Existing IL Support for skipping fault checks
There are currently three concrete fault checks which the CLI specifies can be skipped using the
no.
prefx.Although these are supported by IL, high level compilers may not emit them. It is potentially useful to expose
Assume.IsType
,Assume.InRange
, andAssume.NotNull
checks so that all languages (which compile to IL) can benefit from skipping these fault checks (without having to undergo an IL rewriting step).Additional Notes
There are many other potential hints or attributes which can be useful and for which the class could be extended.
Some of these represent assumptions that a higher level compiler (i.e. C#) cannot make (due to not being able to do things like cross-assembly optimizations, or being required to emit certain IL).
The text was updated successfully, but these errors were encountered: