Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT into scratch buffer #53173

Merged
merged 9 commits into from
May 26, 2021
Merged

Conversation

janvorli
Copy link
Member

@janvorli janvorli commented May 24, 2021

This change generates JITted code including the code header into a scratch buffer and copies it to its final location after the JITting is complete. This is another preparation step for the W^X work.

Contributes to #50391

@janvorli janvorli added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 24, 2021
@janvorli janvorli added this to the 6.0.0 milestone May 24, 2021
@janvorli janvorli requested a review from jkotas May 24, 2021 14:22
@janvorli janvorli self-assigned this May 24, 2021
@janvorli janvorli force-pushed the jit-into-scratch-buffer branch from 0b6350b to afa6c2f Compare May 24, 2021 14:57
src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved
pCodeHdr = ((CodeHeader *)pCode) - 1;

*pAllocatedSize = sizeof(CodeHeader) + totalSize + reserveForJumpStubs;
pCodeHdrRW = (CodeHeader *)new BYTE[*pAllocatedSize];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocation has potential real cost. It may be nice to omit it in this change (together with the copy at the end) and only do it when W^X is turned on so that we can include it in the perf measurements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have measured the perf with this change using ASPNet benchmarks in our lab on Windows and haven't seen any perf difference. Running 10 rounds of PlainText benchmark that I have found to be quite sensitive to the JIT performance with R2R disabled with and without my changes show the same average results.
I have actually intentionally made this change separate from the W^X to see its effect isolated from the other W^X changes and verify that it doesn't have possible negative impact on some other tests.
Here are the results of time to first request:

State
With this change 174 166 165 171 167 174 171 173 167 166
Before 166 171 165 174 165 166 168 165 169 165

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on how much JITing in parallel is going on and how big the methods are. PlainText does not have too much parallel JITIng and the methods are not big either.

We have seen the JIT being sensitive to performance of unmanaged memory allocations and it is why we have cache of memory blocks to be used by the JIT in JitHost (slabAllocator).

#if defined(TARGET_AMD64)
// Publish the new unwind information in a way that the ETW stack crawler can find
if (m_usedUnwindInfos == m_totalUnwindInfos)
UnwindInfoTable::PublishUnwindInfoForMethod(baseAddress, m_CodeHeader->GetUnwindInfo(0), m_totalUnwindInfos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For good performance, PublishUnwindInfoForMethod expects to be called in the order that the allocations happen. If we move it to be done later, we are increasing the probability that it will be called out of order and do an expensive reallocation. Just pointing it out - I am not sure what or whether to do something about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't feel safe to publish it before the data are written there.

@janvorli
Copy link
Member Author

There are some asserts on ARM/ARM64, I am looking into it:
Assert failure(PID 93 [0x0000005d], Thread: 113 [0x0071]): ( RUNTIME_FUNCTION__BeginAddress(pOtherFunction) >= RUNTIME_FUNCTION__EndAddress(pRuntimeFunction, baseAddress) || RUNTIME_FUNCTION__EndAddress(pOtherFunction, baseAddress) <= RUNTIME_FUNCTION__BeginAddress(pRuntimeFunction)) File: /__w/1/s/src/coreclr/vm/jitinterface.cpp Line: 11563

@janvorli
Copy link
Member Author

@jkotas I have put the allocation / deletion under FEATURE_WXORX until the real W^X stuff is added.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@janvorli janvorli merged commit dc5a1c8 into dotnet:main May 26, 2021
@janvorli janvorli deleted the jit-into-scratch-buffer branch May 26, 2021 15:20
@ghost ghost locked as resolved and limited conversation to collaborators Jun 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants