Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fill freed loader heap chunk with non-zero value #12731

Merged
merged 12 commits into from
Jul 31, 2017

Conversation

parjong
Copy link

@parjong parjong commented Jul 11, 2017

This commit implements a new feature that fills freed global loader heap chunk with some non-zero value (currently 0xCC) in order to make it easy to diagnose dangling Reverse P/Invoke callback issues.

@parjong parjong changed the title FEATURE_LOADER_HEAP_GUARD: Fill Freed Loader Heap with non-zero value FEATURE_LOADER_HEAP_GUARD: Fill freed loader heap chunk with non-zero value Jul 11, 2017
@jkotas
Copy link
Member

jkotas commented Jul 12, 2017

This feature make it easy to detect issues caused by dangling Reverse P/Invoke callbacks.

Requiring a private build of the runtime won't qualify as "easy" for most users. We used to have MDA (managed debugging assistants) in full .NET Framework for similar debugging aids.

This particular one is CallbackOnCollectedDelegate . It is more robust than what you are proposing, gives better diagnostic message and does not require a custom build of the runtime. The code for it still there, but it is disable. We may want to rather look into adding a switch to enable it for CoreCLR. In full framework, it is configured via pretty complex xml config file. It should be more lightweight in CoreCLR.

cc @yizhang82

@yizhang82
Copy link

Agree with @jkotas with the delegate scenario. However, this feature sounds like goodness in general in debug builds. We should probably enable it in debug build by default. I don't think the feature switch is required at all.

@jkotas
Copy link
Member

jkotas commented Jul 12, 2017

We should probably enable it in debug build by default

What kind of debug builds do you have in mind?

@yizhang82
Copy link

What kind of debug builds do you have in mind?

Our internal dbg/chk builds, to catch issues in our dbg/chk test runs. Not the retail ones.

@parjong
Copy link
Author

parjong commented Jul 13, 2017

@jkotas MDA will be very useful, but this feature is intended for a bit different use cases.

This feature aims to provide some hints when only coredump is available. I'm not sure, but it seems that MDA is not intended for such cases (Please correct me if I'm wrong).

@jkotas
Copy link
Member

jkotas commented Jul 13, 2017

With your proposed change, you need a custom build of the runtime. With MDAs, you need to set an environment variable. Either way, it is not on by default.

I think it is good idea to make the basic diagnostic - that crashes more predictably for recycled thunks - on by default. You change is trying to achieve this by overwriting the recycled delegate thunks with a different pattern, not with zeros.

The problem is that it negative perf consequences because of it makes us memset all loader heap memory to 0 as side-effect. It is unnecessary because of this memory is zero filled by the OS.

Could you please look into coding this such that the memset kicks in for the recycled thunks only? memseting the recycled thunks has neglible cost - it can be on by default. You actually just need to set thunk entrypoint to a breakpoint instruction, no need to memset the whole thing.

@parjong
Copy link
Author

parjong commented Jul 17, 2017

@dotnet-bot test Windows_NT x64 Formatting please

@parjong
Copy link
Author

parjong commented Jul 18, 2017

@jkotas Sorry for late response (I was out of office). I updated PR to invoke memset only for reclaimed regions.

Reclaimed regions could be distinguished from fresh regions by reading the first byte as fresh regions will be filled with zero (by OS) and reclaimed regions will be filled with non-zero (by CLR), and thus simple check before memset will be enough in order to prevent memset for fresh regions.

Unfortunately, the whole reclaimed regions should be filled with non-zero value under this scheme.

@jkotas
Copy link
Member

jkotas commented Jul 18, 2017

I do not think we want to make this a yet another build option. It is not that hard to fix this up to make it enabled by default without negative perf consequence.

@@ -136,6 +136,7 @@ if(FEATURE_INTERPRETER)
add_definitions(-DFEATURE_INTERPRETER)
endif(FEATURE_INTERPRETER)
add_definitions(-DFEATURE_ISYM_READER)
add_definitions(-DFEATURE_LOADER_HEAP_GUARD)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this

@@ -723,6 +729,9 @@ struct LoaderHeapFreeBlock
}
#endif

#if LOADER_HEAP_REDZONE_VALUE != 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this:

  • Add flag to UnlockedLoaderHeap that suppresses all filling of the memory by LoaderAllocator (you can steal some bits from m_flProtect), and set this flag for flag for SystemDomain::GetGlobalLoaderAllocator().
  • Change UMEntryThunk::Terminate to overwrite the entrypoint with jump to a method that reports the collected delegate problem and makes it easy to diagnose it.

While you are on this, you can delete FEATURE_WINDOWSPHONE from UMEntryThunk::CreateUMEntryThunk/Terminate. FEATURE_WINDOWSPHONE has been always defined for CoreCLR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that zero-filling is not necessary for GlobalLoaderHeap. Am I right? I though that it is a part of LoaderHeap semantics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zero filling should not be required for GetGlobalLoaderAllocator()->GetExecutableHeap(). This method is used in a very few places, and none of them need it.

@parjong parjong changed the title FEATURE_LOADER_HEAP_GUARD: Fill freed loader heap chunk with non-zero value Fill freed loader heap chunk with non-zero value Jul 19, 2017
@@ -288,7 +288,8 @@ class UnlockedLoaderHeap
SIZE_T dwReservedRegionSize,
size_t *pPrivatePerfCounter_LoaderBytes = NULL,
RangeList *pRangeList = NULL,
BOOL fMakeExecutable = FALSE);
BOOL fMakeExecutable = FALSE,
BOOL fMakeRelaxed = FALSE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this flag fZeroInit = TRUE to make it clear what it does.

#endif

_ASSERTE(SystemDomain::GetGlobalLoaderAllocator()->GetExecutableHeap()->IsRelaxed());
FillMemory(&m_code, sizeof(m_code), 0xcc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other thread can allocate the block at this point. This needs to be done before the call to BackoutMem.

@@ -893,7 +893,7 @@ void LoaderAllocator::ActivateManagedTracking()
#define COLLECTIBLE_CODEHEAP_SIZE (7 * GetOsPageSize())
#define COLLECTIBLE_VIRTUALSTUBDISPATCH_HEAP_SPACE (5 * GetOsPageSize())

void LoaderAllocator::Init(BaseDomain *pDomain, BYTE *pExecutableHeapMemory)
void LoaderAllocator::Init(BaseDomain *pDomain, BYTE *pExecutableHeapMemory, BOOL fRelaxed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should not take the flag.

@@ -1005,7 +1005,8 @@ void LoaderAllocator::Init(BaseDomain *pDomain, BYTE *pExecutableHeapMemory)
dwExecutableHeapReserveSize,
LOADERHEAP_PROFILE_COUNTER,
NULL,
TRUE /* Make heap executable */);
TRUE /* Make heap executable */,
fRelaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be false unconditionally.

#endif

_ASSERTE(SystemDomain::GetGlobalLoaderAllocator()->GetExecutableHeap()->IsRelaxed());
FillMemory(&m_code, sizeof(m_code), 0xcc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0xCC works well for x86 because of it is code for the breakpoint instruction. What does it do on arm?


RETURN p;
}

#if defined(_TARGET_X86_) || defined(_TARGET_AMD64_)
#define REDZONE_VALUE 0xcc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Red zone is area under stack pointer: https://en.m.wikipedia.org/wiki/Red_zone_(computing)

Could you please find better name for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be best to move this into a platform specific method on UMEntryThunkCode . memset may not be always appropriate to neutralize the entrypoint.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about poison pill?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may introduce Spoil method as you suggested, which allows us to insert a breakpoint to the exact entry point (instead of filling the whole region).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a plan. Poison or Neutralize may be a better name than Spoil.

@parjong
Copy link
Author

parjong commented Jul 19, 2017

@jkotas I updated PR as suggested, and confirmed that it is working for x86/x64/arm. Unfortunately, I have no access to arm64 environment. Could you let me know if there is any way to build and test arm64 binaries from CI?

@@ -1005,7 +1005,8 @@ void LoaderAllocator::Init(BaseDomain *pDomain, BYTE *pExecutableHeapMemory)
dwExecutableHeapReserveSize,
LOADERHEAP_PROFILE_COUNTER,
NULL,
TRUE /* Make heap executable */);
TRUE /* Make heap executable */,
FALSE /* Do NOT zero-initialize */);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe useful to describe why - that it is needed by the UMEntryThunkCode::Poison.

void UMEntryThunkCode::Poison()
{
m_code[0] = 0xbebe;
FlushInstructionCache(GetCurrentProcess(),&m_code,sizeof(m_code));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think FlushInstructionCache is needed here - it is not free. This aid is not 100% reliable because of it depends on the GC timings, making it a bit less reliable by skipping the FlushInstructionCache here should be ok.

@jkotas
Copy link
Member

jkotas commented Jul 20, 2017

Unfortunately, I have no access to arm64 environment. Could you let me know if there is any way to build and test arm64 binaries from CI?

@sdmaclea Could you please help with filling in/testing the arm64 implementation of this?

@sdmaclea
Copy link

@dotnet-bot test Windows_NT arm64 Checked

@sdmaclea
Copy link

@parjong I assume Windows testing will be sufficient. Let me know if you need it tested on Ubuntu, I make a local build. if needed.

@parjong
Copy link
Author

parjong commented Jul 20, 2017

@sdmaclea Windows testing is sufficient for me. Thank you for your help!

@parjong
Copy link
Author

parjong commented Jul 30, 2017

@jkotas Could you please take a look? Given the result, this PR does not introduce any regression on ARM64.

@jkotas
Copy link
Member

jkotas commented Jul 31, 2017

Could you please open an issue to get the implementation filled on ARM64 and add link to the issue from the code?

@@ -2522,6 +2522,11 @@ void UMEntryThunkCode::Encode(BYTE* pTargetCode, void* pvSecretParam)
FlushInstructionCache(GetCurrentProcess(),&m_code,sizeof(m_code));
}

void UMEntryThunkCode::Poison()
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment with instruction that this translates so

}
CONTRACTL_END;

m_movR10[0] = 0xCC;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can use X86_INSTR_INT3 here - it is defined for x64 as well.

@jkotas
Copy link
Member

jkotas commented Jul 31, 2017

LGTM modulo comments. Thanks

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@jkotas jkotas merged commit 1c540c5 into dotnet:master Jul 31, 2017
@parjong parjong deleted the fix/FEATURE_LOADER_HEAP_GUARD branch July 31, 2017 07:53
@karelz karelz modified the milestone: 2.1.0 Aug 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants