-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failure JIT\\superpmi\\superpmicollect\\superpmicollect.cmd #90711
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsFailed in: runtime-coreclr jitstress 20230815.1 Failed tests:
Error message:
Stack trace:
|
Latest jitstress run has none of these failures: (However, filtering on 'superpmicollect' only shows that it ran on linux, which is worrisome. Maybe just a bug in reporting/AzDO, and it actually did run?) |
Latest outerloop run also doesn't have these failures: |
Looking at a win-x86 dump, it looks like a problem with the SuperPMI macro-ized assertion helper. The JIT is calling runtime/src/coreclr/tools/superpmi/superpmi-shared/methodcontext.cpp Lines 2367 to 2369 in 64243bb
This indicates that the VM has returned two different values for the same helper number. In particular, the VM changes from indirect to direct address. It looks like this recently changed with #90412. @EgorBo PTAL -- there is probably a superpmi issue with your change, if the value returned by this function changes over time. However, there is a superpmi issue here because the formatting string gets macro string-ized before the "PRIX64" are replaced, so superpmi needs to handle this so we get an actual assert, and not a failure in the CRT of "Incorrect format specifier". |
cc @dotnet/jit-contrib -- there's a new problem where SuperPMI collection is failing because the JIT is calling |
Thanks for finding the cause! it seems that it's indeed possible. Since it happens in lower ( Should we special-case this in SPMI or better to introduce a jit-level cache for addresses to make sure it uses the same address per compilation (potentially might decrease perf for a case where previosly we could emit a direct call)? |
So the JIT could possibly generate one call as indirect and a subsequent call as direct simply based on timing of when the two calls from the JIT are made? That would make any codegen issues really hard to debug as we couldn't guarantee repeatable codegen. I would argue the JIT should pick one and use it everywhere (even if that makes the code slightly less optimal). That presumably means the JIT should only call |
This problem is not limited to For system as a whole, we have been going further and further away from guaranteeing repeatable codegen with tiering and dynamic PGO. |
Yes, I do not see a problem with it. You can get into same situation with constructor triggers or with direct/indirect method calls. |
It seems generally worrisome for the JIT to make decisions based on the results of calling a JIT-EE API and then call that API again and have the results change, leading to different decisions. It seems dangerous if those decisions need to be coherent. In fact, why does the JIT need to call the API multiple times? Why isn't the result incorporated into the IR (e.g., ABI information)? SuperPMI only stores one value for a given set of input arguments. This happens to be a case where someone added an assert that the value doesn't change. If we allow the values to change, you can just remove the assert. SuperPMI replay will then not generate the same codegen as the recording since it will only use the latest value. This makes SuperPMI less useful as a repro tool for failures, etc. (which is currently only an aspirational goal, not actual behavior). There are various cases where it seems like the JIT should cache the results of JIT/EE interface calls to avoid re-calling on the same arguments over and over again, especially for things that might be expensive (e.g., possibly SysV ABI info). |
The repro case is two different calls with the same helperId, it's not that JIT calls that API for the same Let me quickly prototypr a jit-level cache for addresses to see how it'd look like |
Why does SPMI have that assert in runtime/src/coreclr/tools/superpmi/superpmi-shared/methodcontext.cpp Lines 2363 to 2370 in 6229f5f
getClassAttribs: runtime/src/coreclr/tools/superpmi/superpmi-shared/methodcontext.cpp Lines 731 to 736 in 6229f5f
|
I don't know. You can try looking through history, but it's likely to have been this way for a while. My guess would be that it was considered a problem at some point. The more I think about it, the more it worries me that JIT-EE calls can change what they return during the time of a single function compilation. I understand that the VM state might change due to threading, etc., but that makes me think the JIT should be caching data that might change between calls. In general, it seems like the JIT shouldn't be asking for the same data over and over again anyway; that seems potentially like the JIT is just being lazy and NOT caching (e.g., storing in result in IR). It looks like SPMI actually ignores any attempt to overwrite a value that's already been added: runtime/src/coreclr/tools/superpmi/superpmi-shared/lightweightmap.h Lines 371 to 376 in 6229f5f
However, it uses |
Maybe we should audit the list of JIT-EE functions and mark the ones which can change the value as NonInvariant so SPMI could be more strict for the ones which are invariant. I think we can't make a general cache for JIT-EE values, e.g. we're jitting void Foo()
{
Bar();
Bar();
} We try to inline |
That would be a good start. What does SPMI do in your example case today? Presumably it stores the first result during collection, so during playback the JIT gets the first result for both calls, and resulting codegen is the same as original codegen -- the NOINLINE bit is simply an optimization? (I.e., if the JIT decided it wasn't profitable/legal the first time, it presumably will make the same determination the second time) |
Also, consider refactoring the JIT/EE interface to split the variant and invariant info. For example, most |
Failed in: runtime-coreclr jitstress 20230815.1
Failed tests:
Error message:
Stack trace:
More failures
The text was updated successfully, but these errors were encountered: