Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/9.0-preview7] JIT: Fix placement of GT_START_NOGC for tailcalls in face of bulk copy with write barrier calls #105572

Merged

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Jul 26, 2024

Backport of #105551 to release/9.0-preview7

/cc @hoyosjs @jakobbotsch

Customer Impact

  • Customer reported
  • Found internally

Customer code is subject to crashes, memory corruption, and process control reaching unexplainable points.

For instance, Roslyn has been observed crashing in several unexplainable ways. Some of the observed cases were tracked down to a GC hole caused by a safe point contained in a no-gc region of parameter setup in fast tail calls. The GC would kick when the method was calling BulkMoveWithWriteBarrier to copy a large struct into the frame and this would result in a bad object pointer getting copied. This fix removed the safe point from the region.

#102370
#104123
#105441
dotnet/roslyn-analyzers#7349
dotnet/dnceng#3305

Testing

Regression test added and manual verification of partner scenarios.

…opy with write barrier calls

When the JIT generates code for a tailcall it must generate code to
write the arguments into the incoming parameter area. Since the GC ness
of the arguments of the tailcall may not match the GC ness of the
parameters, we have to disable GC before we start writing these. This is
done by finding the earliest `GT_PUTARG_STK` node and placing the start
of the NOGC region right before it.

In addition, there is logic to take care of potential overlap between
the arguments and parameters. For example, if the call has an operand
that uses one of the parameters, then we must take care that we do not
override that parameter with the tailcall argument before the use of it.
To do so, we sometimes may need to introduce copies from the parameter
locals to locals on the stack frame.

This used to work fine, however, with #101761 we started transforming
block copies into managed calls in certain scenarios. It was possible
for the JIT to decide to introduce a copy to a local and for this
transformation to then kick in. This would cause us to end up with the
managed helper call after starting the nogc region. In checked builds
this would hit an assert during GC scan; in release builds, it would end
up with corrupted data.

The fix here is to make sure we insert the `GT_START_NOGC` after all the
potential temporary copies we may introduce as part of the tailcat stll
logic.

There was an additional assumption that the first `PUTARG_STK` operand
was the earliest one in execution order. That is not guaranteed, so this
change stops relying on that as well by introducing a new
`LIR::FirstNode` and using that to determine the earliest `PUTARG_STK`
node.

Fix #102370
Fix #104123
Fix #105441
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 26, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@ericstj ericstj added the Servicing-consider Issue for next servicing release review label Jul 26, 2024
@JulieLeeMSFT
Copy link
Member

CC @jeffschwMSFT for approval.

@JulieLeeMSFT
Copy link
Member

cc @carlossanlop.

@carlossanlop
Copy link
Member

Have we received Tactics approval?

@hoyosjs hoyosjs added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Jul 26, 2024
Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants