Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] RhGetCodeTarget fails if executable code cannot be read #73906

Open
TheSpydog opened this issue Aug 13, 2022 · 14 comments
Open

[NativeAOT] RhGetCodeTarget fails if executable code cannot be read #73906

TheSpydog opened this issue Aug 13, 2022 · 14 comments

Comments

@TheSpydog
Copy link
Contributor

While working on porting NativeAOT to a certain game console, I'm running into a problem where RhGetCodeTarget is causing a memory access violation. This console forces the memory protection of its executable code to exclusive EXEC (no READ) and does not allow you to change the protection value. Since RhGetCodeTarget reads assembly data directly from the executable section, this causes the system to raise a memory violation error at runtime.

Given that the runtime already tracks the general section of memory where unboxing stubs are located (RuntimeInstance::m_pUnboxingStubsRegion), maybe it would be possible for the runtime to track each stub's destination in a lookup table rather than calculating it from the assembly? I have no idea if that's feasible though.

I know this is a very niche issue (even the other two console platforms work just fine as-is) so I understand this may be more trouble than it's worth to fix in this repository. But I would still appreciate any guidance on how to work around this issue in my NativeAOT fork, even if the changes can't be upstreamed.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 13, 2022
@jkotas
Copy link
Member

jkotas commented Aug 13, 2022

Option 1: Add a pointer to the code target to the unwind data. We do that for more complex instantiating stubs already. Look for "MethodAssociatedData" or "GetTargetOfUnboxingAndInstantiatingStub". The simple unboxing stubs can be on the same plan. This option works only if you are able to produce the unwind data extension.

Option 2: Add a new table (https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/Internal/Runtime/MetadataBlob.cs) that maps the entrypoints to the code targets, and replace calls of RhGetCodeTarget with lookups in this table.

Note that we have added one more case of code reading for stack unwinding recently:

int UnixNativeCodeManager::TrailingEpilogueInstructionsCount(PTR_VOID pvAddress)
. This can be again fixed by storing the right information in unwind info. cc @VSadov

upstreamed

I think we would be happy to accept code for any this upstream - it can be only enabled under a option that is off by default.

@jkotas jkotas added this to the Future milestone Aug 13, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Aug 13, 2022
@VSadov
Copy link
Member

VSadov commented Aug 13, 2022

Yes, we analyze code to detect epilogs on Unix. Since we cannot figure the stack location of the return address, return address hijacking cannot work in epilogs.
Return address hijacking will be another scenario where code is read.

@TheSpydog
Copy link
Contributor Author

Option 1: Add a pointer to the code target to the unwind data. We do that for more complex instantiating stubs already. Look for "MethodAssociatedData" or "GetTargetOfUnboxingAndInstantiatingStub". The simple unboxing stubs can be on the same plan. This option works only if you are able to produce the unwind data extension.

This is great, thanks! After looking around the code a bit, I have a few questions:

First, RhGetCodeTarget's doc comment says that it checks "if this points to an import stub or unboxing stub". What is the difference between an import stub and an unboxing stub? I found the UnboxingStubNode class in ILCompiler but I can't find an ImportStubNode or anything similar.

Second, RhGetCodeTarget has a check to exclude any code that is not within the bounds of the __unbox region. If I'm reading the ILCompiler source correctly, the only code that gets placed within that region is from UnboxingStubNodes. If this is the case, why does RhGetCodeTarget need to disambiguate between import and unboxing stubs? Wouldn't they always be unboxing stubs if they're within the region?

Third,

I think we would be happy to accept code for any this upstream - it can be only enabled under a option that is off by default.

Is this a change that could potentially be accepted/backported into the .NET 7.0 tag? I know the deadline for new features is tomorrow, but I don't think I'll be able to write up a sufficient patch in time. (Also this isn't really a new feature, I suppose, just a lateral change in functionality.)

And lastly, would this change need to be hidden behind an option, if it works equivalently to the code reading? Or could we replace the code reading logic in RhGetCodeTarget outright with a function call, similar to RhGetTargetOfUnboxingAndInstantiatingStub?

Yes, we analyze code to detect epilogs on Unix. Since we cannot figure the stack location of the return address, return address hijacking cannot work in epilogs.
Return address hijacking will be another scenario where code is read.

Thanks for pointing out these scenarios. Unfortunately this platform does not support any of the APIs necessary for return address hijacking, so PalHijack will need to remain an empty stub in this version of the PAL. I guess the upside is that for my purposes it should be sufficient to only replace RhGetCodeTarget.

@jkotas
Copy link
Member

jkotas commented Aug 15, 2022

What is the difference between an import stub and an unboxing stub?

Import stubs are used for compilation mode that produces multiple native .dlls (e.g. one native .dll that is shared between multiple apps and second native .dll that is the app). This compilation mode was used in .NET Native for UWP. We do not support this compilation mode in dotnet/runtime currently. We are keeping the code required for this compilation mode around in case it is needed in future.

You can ignore all mentions of import stubs for the purpose of this issue.

Is this a change that could potentially be accepted/backported into the .NET 7.0 tag?

I am sorry we would not be able to accept backport into .NET 7.0.

And lastly, would this change need to be hidden behind an option, if it works equivalently to the code reading?

This change would introduce some binary size regression. We would need some data about the impact of this change on the binary sizes to decide whether it is ok to take this regression for simplicity.

@TheSpydog
Copy link
Contributor Author

You can ignore all mentions of import stubs for the purpose of this issue.

Thanks for the clarification.

I've started working on implementing the first option (storing code targets as associated method data, copying off how "special" unboxing thunks work), but I'm not sure how to verify that my attempted changes are working. What does the runtime actually do with the pointer that RhGetCodeTarget returns?

I've read through the source but it's kind of difficult to follow where these code targets are used, since the logic is spread through so many intermediate layers. (As a quick and dirty test to find who reads the pointers, I tried returning nullptr in the RhGetCodeTarget function, and unfortunately nothing caught fire. 😛)

@jkotas
Copy link
Member

jkotas commented Aug 17, 2022

What does the runtime actually do with the pointer that RhGetCodeTarget returns?

It is used for Delegate.Method property. This property performs reverse lookup that goes from function pointer back to reflection MethodInfo.

Several native aot smoke tests https://github.com/dotnet/runtime/blob/main/docs/workflow/building/coreclr/nativeaot.md#running-tests exercise this path.

If I change RhGetCodeTarget to return null, I see MakeGenMethod.Test.TestReverseLookups and
GenericVirtualMethods.TestLdFtnToInstanceGenericMethod smoke tests failing:

public static void TestReverseLookups()

public static void TestLdFtnToInstanceGenericMethod()

@TheSpydog
Copy link
Contributor Author

I spent some time this week trying to get the smoke tests passing, but unfortunately I haven't made much progress. However, after spending enough hours staring at test failures, I realized there is a workaround that might be suitable for my purposes.

Since unboxing stubs are emitted to a dedicated ELF section (__unbox), I can use the objcopy tool to copy that section out into a binary blob, then append it to the end of the NativeAOT-emitted object file as a new section (__unboxro). By setting the flags of the new section to alloc,readonly,data, I now have an identical copy of the unboxing stub code that the OS is allowed to read (since it's not flagged as executable).

Then, in a platform-specific implementation of RhGetCodeTarget, I can get the starting address of both unboxing sections and use the pointer offset from the start of the __unbox section to calculate the equivalent memory address in __unboxro. The code reading logic then proceeds like normal.

Obviously, this strategy comes at the cost of a few KB of extra binary bloat since there's code duplication, but I think that's an acceptable tradeoff.

Since this is a very niche scenario and there's at least one known workaround, I'll close this issue.

@kg
Copy link
Member

kg commented Aug 20, 2022

I can imagine other platforms adding this same code security constraint, so it might be worth having the RhGetCodeTarget change upstream, if not the copied section trick as well.

@TheSpydog TheSpydog reopened this Aug 21, 2022
@VSadov
Copy link
Member

VSadov commented Aug 21, 2022

@TheSpydog

Unfortunately this platform does not support any of the APIs necessary for return address hijacking, so PalHijack will need to remain an empty stub in this version of the PAL.

I am curious - how GC suspension works on this platform? Are you relying entirely on polling or use conservative mode? Does the platform support stack unwinding at all?

@TheSpydog
Copy link
Contributor Author

@VSadov

I am curious - how GC suspension works on this platform? Are you relying entirely on polling or use conservative mode? Does the platform support stack unwinding at all?

Both Unix-based console platforms use conservative GC (via FEATURE_CONSERVATIVE_GC) and support stack unwinding.

@VSadov
Copy link
Member

VSadov commented Aug 29, 2022

and support stack unwinding.

For hijacking to work we need two things:

  • stack unwinding. (also needed for exception handling, so typically present)
  • mechanism for asynchronous thread interrupting and accessing the context - i.e. Posix signals or something like SuspendThread.

I assume the thread interruption APIs are unavailable or somehow restricted.

Without hijacking there is a risk of long suspension pauses or hangs when some thread runs managed code, but does not allocate or make calls to helpers that poll for GC.
I am mostly wondering how common this situation is and whether we can do something for platforms that cannot hijack.

@TheSpydog
Copy link
Contributor Author

I assume the thread interruption APIs are unavailable or somehow restricted.

Correct, neither Unix-like console includes support for Posix signals or has any APIs similar to SuspendThread. I'm wondering if maybe some sort of event polling could be used as a substitute in the event that a given platform does not support signals.

@VSadov
Copy link
Member

VSadov commented Aug 31, 2022

There used to be a JIT mode where it would insert GC polling checks at back branches. That was to handle platforms that cannot hijack. The approach is still in use in rare cases when a loop does not contain a point where GC could be stopped (like a call, for example) and containing method is not fully interruptible.
All the officially supported platforms can hijack, though, so while there is a way to insert GC polls in loops, there is no "instrument every loop" scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants