-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT into scratch buffer #53173
JIT into scratch buffer #53173
Conversation
Copy it to the final location after the JITing is done.
0b6350b
to
afa6c2f
Compare
pCodeHdr = ((CodeHeader *)pCode) - 1; | ||
|
||
*pAllocatedSize = sizeof(CodeHeader) + totalSize + reserveForJumpStubs; | ||
pCodeHdrRW = (CodeHeader *)new BYTE[*pAllocatedSize]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocation has potential real cost. It may be nice to omit it in this change (together with the copy at the end) and only do it when W^X is turned on so that we can include it in the perf measurements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have measured the perf with this change using ASPNet benchmarks in our lab on Windows and haven't seen any perf difference. Running 10 rounds of PlainText benchmark that I have found to be quite sensitive to the JIT performance with R2R disabled with and without my changes show the same average results.
I have actually intentionally made this change separate from the W^X to see its effect isolated from the other W^X changes and verify that it doesn't have possible negative impact on some other tests.
Here are the results of time to first request:
State | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
With this change | 174 | 166 | 165 | 171 | 167 | 174 | 171 | 173 | 167 | 166 |
Before | 166 | 171 | 165 | 174 | 165 | 166 | 168 | 165 | 169 | 165 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That depends on how much JITing in parallel is going on and how big the methods are. PlainText does not have too much parallel JITIng and the methods are not big either.
We have seen the JIT being sensitive to performance of unmanaged memory allocations and it is why we have cache of memory blocks to be used by the JIT in JitHost (slabAllocator).
#if defined(TARGET_AMD64) | ||
// Publish the new unwind information in a way that the ETW stack crawler can find | ||
if (m_usedUnwindInfos == m_totalUnwindInfos) | ||
UnwindInfoTable::PublishUnwindInfoForMethod(baseAddress, m_CodeHeader->GetUnwindInfo(0), m_totalUnwindInfos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For good performance, PublishUnwindInfoForMethod
expects to be called in the order that the allocations happen. If we move it to be done later, we are increasing the probability that it will be called out of order and do an expensive reallocation. Just pointing it out - I am not sure what or whether to do something about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't feel safe to publish it before the data are written there.
There are some asserts on ARM/ARM64, I am looking into it: |
@jkotas I have put the allocation / deletion under FEATURE_WXORX until the real W^X stuff is added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This change generates JITted code including the code header into a scratch buffer and copies it to its final location after the JITting is complete. This is another preparation step for the W^X work.
Contributes to #50391