-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RyuJIT] Avoid method call to fallback intrinsic method if immediate arg becomes constant #9989
Comments
IMO the part of function for extracting elements with index > 15 should not be implemented. Developers using HW intrinsics should be able to write However, this functionality should be a part of |
@4creators, I don't agree. This is a helper function for extracting the proper element of a 256-bit vector. This includes the upper 16 indices. I believe this is one of the "core" helper intrinsics that should be provided by the framework. |
@4creators This is just an example that shows the optimization opportunity and the users may write. I have moved the implementation of |
@dotnet/jit-contrib |
Overall it sounds like a good idea. The main potential drawback is that the end result (whether a fallback is used or not) depends on the JIT optimization abilities. Someone writes some code where the fallback happens to not be used and gets good performance. The code ends up running on different version of JIT that perhaps fails to do some constant folding so the fallback version is used and the code gets significantly slower. But such mishaps are always possible when a JIT is used, it's not something that affects only intrinsics. In terms of implementation there may be some problems. You'll want to delay the use of fallbacks until lowering. I don't think you want to generate the fallback code (that uses large switches) inline so you'll need to turn the intrinsic node into a call node during lowering. This is not something that the JIT does often today, calls are usually expected to exist before morphing so they pass through |
We have made it a design parameter that we expect developers using these intrinsics to be savvy enough to use analysis tools to ensure that they are getting what they expect. This is especially true if they are planning to rely on the JIT to optimize and inline. That said, it is an interesting thought to delay the immediate fallback until Even still I'm not certain that the benefit is worth the cost. I think we should hold off until we get some feedback from developers. |
@CarolEidt @mikedn Thank you so much for the excellent comments. The answer to the above question looks like "YES". But, of course, we need more work loads and date at first. |
Updated the title and description. @AndyAyersMS @CarolEidt @mikedn Does this approach look okay to you? |
Why not in lowering?
It seems that this should work fine because:
Good luck with that, |
@mikedn I just looked |
@mikedn @CarolEidt I am trying to expand the fallback calls to intrinsic nodes in
How can I take the actual arguments ( |
Hrm, yes, because vectors are still treated as structs for ABI purposes, intrinsics calls are unfortunately rather complicated. Vector args get spilled to temporaries and that forces args into the late arg list. You should find the constant in It would be better if intrinsic calls would follow vector calling conventions to avoid this mess but that's probably not going to happen too soon, if ever. |
Ah, great, thanks!
Yes, another problem is that in |
Yes, you should be able to replace the call with the intrinsic but some of the consequences of call morphing might be more difficult to remove. In particular, since vectors are treated as structs, you'll probably end up with copies. You really don't have many options. It's either lowering or global morph. And as explained earlier, doing this in global morph will miss various cases. |
Hmm, let me try the two options both. |
If you're referring to the morph option then it won't work with your example. |
I am going to move this to Future. |
Un-assigning myself |
Now, certain hardware intrinsics that accept an
imm8
argument would be replaced by a function call (usually the function body is big jump-table) if theimm8
argument is not a JIT time constant.This feature provides more stable runtime behaviors instead of throwing exceptions, but it may cause the significant performance regression, so we should avoid the fallback-replacement if possible.
For example, the code below is not allowed in C++ but legal in C#.
In the first
return
statement,Sse41.Extract
gets an expression(byte)(index - 16)
that is not a static constant, locally. However, once the function is called with a literal argument ofindex
and inlined at the call-site,(byte)(index - 16)
could be a JIT time constant.The current problem is that we check if the
imm8
argument is constant in the importer, which is too early for some situations (e.g., casted argument).In this example,
(byte)(index - 16)
is not a constant in the importer, but the expression could finally be a constant at the backend of RyuJIT. If we expand the fallback again after the mid-end optimizations (e.g., CSE, conditional constant propagation, integer-promotion elimination, etc.) the CQ of imm-intrinsics would be much better.cc @CarolEidt @AndyAyersMS @mikedn @tannergooding
category:cq
theme:hardware-intrinsics
skill-level:expert
cost:medium
The text was updated successfully, but these errors were encountered: