[release/6.0] JIT: Abandon loop search if we reach the end of the bbNext chain #69525

github-actions · 2022-05-18T22:47:17Z

Backport of #69503 to release/6.0

Analysis of a customer dump shows the JIT is causing an AV within System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart while being rejitted at Tier1 with PGO data. This is with 6.0.4 running on x64 ubuntu.

See #69323.

The fix is that during the "make compact" stage of loop recognition, when the JIT is scanning the list of blocks looking for the next in loop block, it will abandon the search if it reaches the end of the list. When the search is abandoned like this, the JIT will report that this sequence of blocks is not a proper loop. There is existing logic to handle this sort of reporting already, as loop recognition can fail for other reasons.

Normally the make compact algorithm expects to find such a block before reaching the end, but in this particular case the block list has been reordered. Without this fix the JIT will unexpectedly reach the end of the list and dereference nullptr, causing the crash.

Customer Impact

Customer updated to .NET 6 recently and enabled Full Dynamic PGO. Application was stable running on .NET 5.

After the update, a key application started to crash unexpectedly. Customer reports this crash has a significant impact on business as this disrupts data processing needed for the next day's work.

Customer has (so far) been able to work around the issue by not setting DOTNET_TC_QuickJitForLoops=1 which makes this particular method ineligible for PGO.

Testing

The cause of the AV in the JIT is self-evident and there is a simple mitigation to avoid crashing and still produce correct code. Details in the dump suggest a plausible but rare sequence of events that could lead to this crash.

I was able to reproduce this issue in local testing by applying randomly generated PGO data to the method (see note), and verified the fix.

Risk

Low, we avoid crashing the jit (and hence the app) and forgo recognizing a loop.

…t chain In #69323 the 6.0.4 jit caused an AV because it walked off the end of the bbNext chain during `optFindNaturalLoops`. Analysis of a customer-provided dump showed that `MakeCompactAndFindExits` might fail to find an expected loop block and so walk the entire bbNext chain and then fall off the end. Details from the dump suggested that this happened because a prior call to `MakeCompactAndFindExits` had moved most but not all of a loop's blocks later in bbNext order, leaving that loop's bottom block earlier in the bbNext chain then it's top. This ordering was unexpected. I cannot repro this failure. The customer was using PGO and it's likely that earlier PGO-driven block reordering contributed to this problem by interleaving the blocks from two loops. We can recover the root method PGO schema from the dump, but applying this is insufficient to cause the problem. This method does quite a bit of inlining so it's likely that some inlinee PGO data must also be a contributing factor. At any rate, we can guard against this case easily enough, and simply abandon recognition of any loop where we fail to find an expected loop block during the bbNext chain walk.

ghost · 2022-05-18T22:47:24Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Backport of #69503 to release/6.0

/cc @AndyAyersMS

Customer Impact

Testing

Risk

IMPORTANT: If this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.

Author:	github-actions[bot]
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2022-05-18T23:02:05Z

cc @dotnet/jit-contrib

AndyAyersMS · 2022-05-20T01:06:50Z

@jeffschwMSFT fyi

JulieLeeMSFT · 2022-05-20T03:57:34Z

@AndyAyersMS, could you please put a short description of the fix right above the customer impact section?

jeffschwMSFT

Approved. We will take for consideration in 6.0.x

JulieLeeMSFT · 2022-05-31T20:51:43Z

cc @carlossanlop for the servicing approved PR.

carlossanlop · 2022-06-09T23:15:43Z

Servicing-approved label applied. CI passed. Area owners signed-off.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 18, 2022

jakobbotsch approved these changes May 19, 2022

View reviewed changes

JulieLeeMSFT added this to the 6.0.x milestone May 20, 2022

JulieLeeMSFT assigned AndyAyersMS May 20, 2022

jeffschwMSFT approved these changes May 20, 2022

View reviewed changes

JulieLeeMSFT added the Servicing-consider Issue for next servicing release review label May 24, 2022

rbhanda added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels May 31, 2022

rbhanda modified the milestones: 6.0.x, 6.0.7 May 31, 2022

carlossanlop merged commit 4f8654f into release/6.0 Jun 9, 2022

carlossanlop deleted the backport/pr-69503-to-release/6.0 branch June 9, 2022 23:15

ghost locked as resolved and limited conversation to collaborators Jul 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release/6.0] JIT: Abandon loop search if we reach the end of the bbNext chain #69525

[release/6.0] JIT: Abandon loop search if we reach the end of the bbNext chain #69525

github-actions bot commented May 18, 2022 •

edited by AndyAyersMS

Loading

ghost commented May 18, 2022

Customer Impact

Testing

Risk

AndyAyersMS commented May 18, 2022

AndyAyersMS commented May 20, 2022

JulieLeeMSFT commented May 20, 2022

jeffschwMSFT left a comment

JulieLeeMSFT commented May 31, 2022

carlossanlop commented Jun 9, 2022

[release/6.0] JIT: Abandon loop search if we reach the end of the bbNext chain #69525

[release/6.0] JIT: Abandon loop search if we reach the end of the bbNext chain #69525

Conversation

github-actions bot commented May 18, 2022 • edited by AndyAyersMS Loading

Customer Impact

Testing

Risk

ghost commented May 18, 2022

Customer Impact

Testing

Risk

AndyAyersMS commented May 18, 2022

AndyAyersMS commented May 20, 2022

JulieLeeMSFT commented May 20, 2022

jeffschwMSFT left a comment

Choose a reason for hiding this comment

JulieLeeMSFT commented May 31, 2022

carlossanlop commented Jun 9, 2022

github-actions bot commented May 18, 2022 •

edited by AndyAyersMS

Loading