JIT into scratch buffer #53173

janvorli · 2021-05-24T14:22:17Z

This change generates JITted code including the code header into a scratch buffer and copies it to its final location after the JITting is complete. This is another preparation step for the W^X work.

Contributes to #50391

Copy it to the final location after the JITing is done.

src/coreclr/vm/jitinterface.h

src/coreclr/vm/codeman.cpp

src/coreclr/vm/jitinterface.cpp

jkotas · 2021-05-25T03:54:44Z

src/coreclr/vm/codeman.cpp

        pCodeHdr = ((CodeHeader *)pCode) - 1;

+        *pAllocatedSize = sizeof(CodeHeader) + totalSize + reserveForJumpStubs;
+        pCodeHdrRW = (CodeHeader *)new BYTE[*pAllocatedSize];


This allocation has potential real cost. It may be nice to omit it in this change (together with the copy at the end) and only do it when W^X is turned on so that we can include it in the perf measurements.

I have measured the perf with this change using ASPNet benchmarks in our lab on Windows and haven't seen any perf difference. Running 10 rounds of PlainText benchmark that I have found to be quite sensitive to the JIT performance with R2R disabled with and without my changes show the same average results.
I have actually intentionally made this change separate from the W^X to see its effect isolated from the other W^X changes and verify that it doesn't have possible negative impact on some other tests.
Here are the results of time to first request:

State

With this change 174 166 165 171 167 174 171 173 167 166

Before 166 171 165 174 165 166 168 165 169 165

That depends on how much JITing in parallel is going on and how big the methods are. PlainText does not have too much parallel JITIng and the methods are not big either.

We have seen the JIT being sensitive to performance of unmanaged memory allocations and it is why we have cache of memory blocks to be used by the JIT in JitHost (slabAllocator).

jkotas · 2021-05-25T06:44:53Z

src/coreclr/vm/jitinterface.cpp

-#if defined(TARGET_AMD64)
-    // Publish the new unwind information in a way that the ETW stack crawler can find
-    if (m_usedUnwindInfos == m_totalUnwindInfos)
-        UnwindInfoTable::PublishUnwindInfoForMethod(baseAddress, m_CodeHeader->GetUnwindInfo(0), m_totalUnwindInfos);


For good performance, PublishUnwindInfoForMethod expects to be called in the order that the allocations happen. If we move it to be done later, we are increasing the probability that it will be called out of order and do an expensive reallocation. Just pointing it out - I am not sure what or whether to do something about it.

It didn't feel safe to publish it before the data are written there.

janvorli · 2021-05-25T12:15:37Z

There are some asserts on ARM/ARM64, I am looking into it:
Assert failure(PID 93 [0x0000005d], Thread: 113 [0x0071]): ( RUNTIME_FUNCTION__BeginAddress(pOtherFunction) >= RUNTIME_FUNCTION__EndAddress(pRuntimeFunction, baseAddress) || RUNTIME_FUNCTION__EndAddress(pOtherFunction, baseAddress) <= RUNTIME_FUNCTION__BeginAddress(pRuntimeFunction)) File: /__w/1/s/src/coreclr/vm/jitinterface.cpp Line: 11563

janvorli · 2021-05-25T20:56:11Z

@jkotas I have put the allocation / deletion under FEATURE_WXORX until the real W^X stuff is added.

src/coreclr/vm/jitinterface.cpp

src/coreclr/vm/jitinterface.h

jkotas

LGTM

janvorli added 2 commits May 22, 2021 00:53

Generate JITted code into a scratch buffer

0aaa3e0

Copy it to the final location after the JITing is done.

Put also the code header and real code header into the buffer

951b9f1

janvorli added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 24, 2021

janvorli added this to the 6.0.0 milestone May 24, 2021

janvorli requested a review from jkotas May 24, 2021 14:22

janvorli self-assigned this May 24, 2021

Publish code via nibble map after it is copied to its final location

afa6c2f

janvorli force-pushed the jit-into-scratch-buffer branch from 0b6350b to afa6c2f Compare May 24, 2021 14:57

janvorli added 2 commits May 24, 2021 17:45

Remove m_writeableOffset

43698c3

Fix ARM64 relocation writing

263e072