-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fill freed loader heap chunk with non-zero value #12731
Conversation
Requiring a private build of the runtime won't qualify as "easy" for most users. We used to have MDA (managed debugging assistants) in full .NET Framework for similar debugging aids. This particular one is CallbackOnCollectedDelegate . It is more robust than what you are proposing, gives better diagnostic message and does not require a custom build of the runtime. The code for it still there, but it is disable. We may want to rather look into adding a switch to enable it for CoreCLR. In full framework, it is configured via pretty complex xml config file. It should be more lightweight in CoreCLR. cc @yizhang82 |
Agree with @jkotas with the delegate scenario. However, this feature sounds like goodness in general in debug builds. We should probably enable it in debug build by default. I don't think the feature switch is required at all. |
What kind of debug builds do you have in mind? |
Our internal dbg/chk builds, to catch issues in our dbg/chk test runs. Not the retail ones. |
@jkotas MDA will be very useful, but this feature is intended for a bit different use cases. This feature aims to provide some hints when only coredump is available. I'm not sure, but it seems that MDA is not intended for such cases (Please correct me if I'm wrong). |
With your proposed change, you need a custom build of the runtime. With MDAs, you need to set an environment variable. Either way, it is not on by default. I think it is good idea to make the basic diagnostic - that crashes more predictably for recycled thunks - on by default. You change is trying to achieve this by overwriting the recycled delegate thunks with a different pattern, not with zeros. The problem is that it negative perf consequences because of it makes us memset all loader heap memory to 0 as side-effect. It is unnecessary because of this memory is zero filled by the OS. Could you please look into coding this such that the memset kicks in for the recycled thunks only? memseting the recycled thunks has neglible cost - it can be on by default. You actually just need to set thunk entrypoint to a breakpoint instruction, no need to memset the whole thing. |
@dotnet-bot test Windows_NT x64 Formatting please |
@jkotas Sorry for late response (I was out of office). I updated PR to invoke memset only for reclaimed regions. Reclaimed regions could be distinguished from fresh regions by reading the first byte as fresh regions will be filled with zero (by OS) and reclaimed regions will be filled with non-zero (by CLR), and thus simple check before memset will be enough in order to prevent memset for fresh regions. Unfortunately, the whole reclaimed regions should be filled with non-zero value under this scheme. |
I do not think we want to make this a yet another build option. It is not that hard to fix this up to make it enabled by default without negative perf consequence. |
clrdefinitions.cmake
Outdated
@@ -136,6 +136,7 @@ if(FEATURE_INTERPRETER) | |||
add_definitions(-DFEATURE_INTERPRETER) | |||
endif(FEATURE_INTERPRETER) | |||
add_definitions(-DFEATURE_ISYM_READER) | |||
add_definitions(-DFEATURE_LOADER_HEAP_GUARD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this
src/utilcode/loaderheap.cpp
Outdated
@@ -723,6 +729,9 @@ struct LoaderHeapFreeBlock | |||
} | |||
#endif | |||
|
|||
#if LOADER_HEAP_REDZONE_VALUE != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this:
- Add flag to UnlockedLoaderHeap that suppresses all filling of the memory by LoaderAllocator (you can steal some bits from
m_flProtect
), and set this flag for flag forSystemDomain::GetGlobalLoaderAllocator()
. - Change
UMEntryThunk::Terminate
to overwrite the entrypoint with jump to a method that reports the collected delegate problem and makes it easy to diagnose it.
While you are on this, you can delete FEATURE_WINDOWSPHONE
from UMEntryThunk::CreateUMEntryThunk/Terminate
. FEATURE_WINDOWSPHONE
has been always defined for CoreCLR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that zero-filling is not necessary for GlobalLoaderHeap. Am I right? I though that it is a part of LoaderHeap semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The zero filling should not be required for GetGlobalLoaderAllocator()->GetExecutableHeap()
. This method is used in a very few places, and none of them need it.
src/inc/loaderheap.h
Outdated
@@ -288,7 +288,8 @@ class UnlockedLoaderHeap | |||
SIZE_T dwReservedRegionSize, | |||
size_t *pPrivatePerfCounter_LoaderBytes = NULL, | |||
RangeList *pRangeList = NULL, | |||
BOOL fMakeExecutable = FALSE); | |||
BOOL fMakeExecutable = FALSE, | |||
BOOL fMakeRelaxed = FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this flag fZeroInit = TRUE to make it clear what it does.
src/vm/dllimportcallback.cpp
Outdated
#endif | ||
|
||
_ASSERTE(SystemDomain::GetGlobalLoaderAllocator()->GetExecutableHeap()->IsRelaxed()); | ||
FillMemory(&m_code, sizeof(m_code), 0xcc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other thread can allocate the block at this point. This needs to be done before the call to BackoutMem.
src/vm/loaderallocator.cpp
Outdated
@@ -893,7 +893,7 @@ void LoaderAllocator::ActivateManagedTracking() | |||
#define COLLECTIBLE_CODEHEAP_SIZE (7 * GetOsPageSize()) | |||
#define COLLECTIBLE_VIRTUALSTUBDISPATCH_HEAP_SPACE (5 * GetOsPageSize()) | |||
|
|||
void LoaderAllocator::Init(BaseDomain *pDomain, BYTE *pExecutableHeapMemory) | |||
void LoaderAllocator::Init(BaseDomain *pDomain, BYTE *pExecutableHeapMemory, BOOL fRelaxed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should not take the flag.
src/vm/loaderallocator.cpp
Outdated
@@ -1005,7 +1005,8 @@ void LoaderAllocator::Init(BaseDomain *pDomain, BYTE *pExecutableHeapMemory) | |||
dwExecutableHeapReserveSize, | |||
LOADERHEAP_PROFILE_COUNTER, | |||
NULL, | |||
TRUE /* Make heap executable */); | |||
TRUE /* Make heap executable */, | |||
fRelaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be false unconditionally.
src/vm/dllimportcallback.cpp
Outdated
#endif | ||
|
||
_ASSERTE(SystemDomain::GetGlobalLoaderAllocator()->GetExecutableHeap()->IsRelaxed()); | ||
FillMemory(&m_code, sizeof(m_code), 0xcc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0xCC works well for x86 because of it is code for the breakpoint instruction. What does it do on arm?
Use fZeroInit (instead of fMakeRelazed)
src/vm/dllimportcallback.cpp
Outdated
|
||
RETURN p; | ||
} | ||
|
||
#if defined(_TARGET_X86_) || defined(_TARGET_AMD64_) | ||
#define REDZONE_VALUE 0xcc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Red zone is area under stack pointer: https://en.m.wikipedia.org/wiki/Red_zone_(computing)
Could you please find better name for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be best to move this into a platform specific method on UMEntryThunkCode . memset may not be always appropriate to neutralize the entrypoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about poison pill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may introduce Spoil
method as you suggested, which allows us to insert a breakpoint to the exact entry point (instead of filling the whole region).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a plan. Poison
or Neutralize
may be a better name than Spoil
.
@jkotas I updated PR as suggested, and confirmed that it is working for x86/x64/arm. Unfortunately, I have no access to arm64 environment. Could you let me know if there is any way to build and test arm64 binaries from CI? |
src/vm/loaderallocator.cpp
Outdated
@@ -1005,7 +1005,8 @@ void LoaderAllocator::Init(BaseDomain *pDomain, BYTE *pExecutableHeapMemory) | |||
dwExecutableHeapReserveSize, | |||
LOADERHEAP_PROFILE_COUNTER, | |||
NULL, | |||
TRUE /* Make heap executable */); | |||
TRUE /* Make heap executable */, | |||
FALSE /* Do NOT zero-initialize */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe useful to describe why - that it is needed by the UMEntryThunkCode::Poison
.
src/vm/arm/stubs.cpp
Outdated
void UMEntryThunkCode::Poison() | ||
{ | ||
m_code[0] = 0xbebe; | ||
FlushInstructionCache(GetCurrentProcess(),&m_code,sizeof(m_code)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think FlushInstructionCache
is needed here - it is not free. This aid is not 100% reliable because of it depends on the GC timings, making it a bit less reliable by skipping the FlushInstructionCache
here should be ok.
@sdmaclea Could you please help with filling in/testing the arm64 implementation of this? |
@dotnet-bot test Windows_NT arm64 Checked |
@parjong I assume Windows testing will be sufficient. Let me know if you need it tested on Ubuntu, I make a local build. if needed. |
@sdmaclea Windows testing is sufficient for me. Thank you for your help! |
@jkotas Could you please take a look? Given the result, this PR does not introduce any regression on ARM64. |
Could you please open an issue to get the implementation filled on ARM64 and add link to the issue from the code? |
@@ -2522,6 +2522,11 @@ void UMEntryThunkCode::Encode(BYTE* pTargetCode, void* pvSecretParam) | |||
FlushInstructionCache(GetCurrentProcess(),&m_code,sizeof(m_code)); | |||
} | |||
|
|||
void UMEntryThunkCode::Poison() | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment with instruction that this translates so
src/vm/amd64/cgenamd64.cpp
Outdated
} | ||
CONTRACTL_END; | ||
|
||
m_movR10[0] = 0xCC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: You can use X86_INSTR_INT3
here - it is defined for x64 as well.
LGTM modulo comments. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
This commit implements a new feature that fills freed global loader heap chunk with some non-zero value (currently 0xCC) in order to make it easy to diagnose dangling Reverse P/Invoke callback issues.