-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock with RtlLookupFunctionEntry-based stack unwinding on 64-bit Windows 10 #12
Comments
Michael again a massive thank you for this work. This deadlock is a killer and it would be great to swat it and get win10 compatibility closer to where it should be. Your ideas are all great - just like the way I check for loader lock but there it's so much easier as it's in the peb! But I think there is a solution to be had here. I found this from 2012 but the idea is spot on: http://workblog.pilin.name/2012/10/how-to-get-x64-dynamic-function-table.html So I've just tried adding a simple hook for AcquireSRWLockExclusive as for Win10 we needn't worry about the critical section mentioned in that post. However I haven't yet seen it fire. Please do give it a try - once we have the hook working we can set a flag to enable pointer capture, then call the bogus RtlAddFunctionTable to get the pointer, then disable capture, delete the entry then add a check to enter_hook to immediately return 0 if SRWlock held. That's the plan anyway! Let me know if you want a compiled monitor to test with.
|
Hi Kevin, I've made some progress although no breakthrough yet. I've added a call to But unfortunately it's the wrong SRW, i.e. Here's a screenshot of WinDbg of that state with output from the analyser: The code is at michaelweiser@36fb25b. So we'd likely need to defer the SRW discovery to a routine that's not part of DLL initialisation itself. Any idea of a good candidate? Other stuff I've learned:
I've also tried instead calling Trying a totally different approach, I've also attempted to learn the address of I have a lingering suspicion that some of the above stuff didn't work for me because I ran afoul of the complexity of the hook call chains involved and was just staring at bogus output or not reading it right. So any critical feedback on and maybe testing of the code is highly welcome. I'll be away starting tomorrow all next week so don't be alarmed if I fall silent for some time. @Jack28 is in the loop what we're trying to achieve and can maybe do some additional tests. |
Looking at this again today, I read the disassembly of
Searching the disassembly of |
Hi Michael, I too have been exploring a couple of possibilities; the most promising is using yara to locate RtlInsertInvertedFunctionTable and hence LdrpInvertedFunctionTableSRWLock, since we already have it compiled into the monitor. This would be nice and unintrusive, find via yara then add check to stackwalk function... I'll let you know how I get on... |
After some more digging, calling Before spending more time on this I'd love to hear if you found a way to discover those functions or the SRW at runtime. Because they don't seem to be identified by name or any other means within the DLL (because they're not exported), I could only think of using the debug symbols (which differ and would need to be downloaded for every build of ntdll) or matching some distinctive code sequence, e.g. I'm now wondering if it might actually be simpler to try and use the canary thread approach suggested by the dotnet guys, i.e. have another thread call |
Hi Michael, I've managed to avoid the deadlock by using yara to locate RtlInsertInvertedFunctionTable and hence LdrpInvertedFunctionTableSRWLock. Here's a test build of capemon for you to try - let me know if it works! |
Hi Kevin, I've tried it. Unfortunately, it still hangs but at a different point. See below screenshot for the call stack. Unfortunately, your build doesn't include symbols. So I can not do much more digging into it to tell the reason for the hang. What I can tell is that Can you provide a debug build or, even better, the source code of your modifications? BTW: I'll be unavailable next week but once again @Jack28 is in the loop and happy to try out what you throw at him. |
Aha I've just realised yara is disabled with office settings - can you try launching without? |
Happy to share source of course - bit of a mess currently as it's been hacked together but I'll try and tidy it up so I can share |
LdrpInvertedFunctionTableSRWLock.zip |
@Jack28 and now I have been looking into this again. In addition to setting the
The address correctly comes back as LdrpInvertedFunctionTableSRWLock = (PVOID)((PBYTE)RtlInsertInvertedFunctionTable + *(DWORD*)((PBYTE)RtlInsertInvertedFunctionTable + 3) + 7); but I can accept with my limitations since it works. ;) Winword still doesn't fully start though. Looking at the source code it seems the logic is not fully active yet, ending up in disabling hook cycle detection. Should we try and dig into that or are you on it already? Or is the code if (*(PVOID*)LdrpInvertedFunctionTableSRWLock)
return TRUE; meant to be a lightweight version of //if (TryAcquireSRWLockExclusive((PSRWLOCK)&LdrpInvertedFunctionTableSRWLock)) {
// ReleaseSRWLockExclusive((PSRWLOCK)&LdrpInvertedFunctionTableSRWLock);
// return FALSE;
//} ? For the latter I wonder whether its potential failure to function may be due to |
I found that this issue was affecting a range of 64-bit programs, particularly gui apps. As I still haven't a 64-bit Office to test with, I decided to look at another easier app that was similarly affected to test on: x64dbg. With this as a test case, the code committed in aa1fd55 fixes the issue. The app no longer hangs on this deadlock and loads on 64-bit Win10: So I am led to think that this specific issue is solved, and that therefore a separate distinct issue is now preventing Word from opening. Certainly the logic committed is tested and working, the commented-out test code with the TryAcquireSRWLockExclusive api was just that; I tested this and it didn't work. When I tried testing the 'raw' lock value that worked, so I went with that. As for the code to locate the lock, if you look at the disassembly of RtlInsertInvertedFunctionTable, the logic relies entirely on this line: The last 4 bytes of this instruction represent little-endian offset so 0x13954F. This is relative to the end of the instruction which is the VA 0x7FFC71CE2F8A plus the size 7, so 0x7FFC71CE2F91. Adding the two gives 0x7FFC71E1C4E0 which is the address of the lock: Please let me know if you agree that this issue is fixed, then we can perhaps create a new issue for that which now prevents Word from launching. |
Any feedback on this? I am still of the mind that this specific issue is fixed... |
Thanks for your update and explanations! Doing some tests with Winword 2013 and 2016, the new logic reliably prevents the deadlock on startup. Both crash soon thereafter but those seem to be unrelated problems. I'll try and dig into those and report any separate issues I may be able to identify. I think this issue here can be closed. Thanks again! |
Well worth trying options like minhook=1 or other hook exclusions to try to rule out a hooking issue. |
No change in behaviour with minhook=1 in a quick try. Will dig some more and report in a separate issue if I come up with anything. |
Hello @kevoreilly !
wbemuuid.lib` needs to be added in Linker/Input in VS project properties. Compiling and running the binary under loader leads to a hang: Again, as above, deactivating the I've tried various things to solve the issue.
As a side note, looking for the system processes and installed application using WMI, exposes the sandbox-related processes and applications to the malware process. At the moment, these are protected only against listing the processes with I will attempt a pull request w.r.t the above observation. Ultimately, thank you for this great sandbox! Edit: typos. |
Thank you @RaduEmanuel92 for reporting this issue. It is the same as issue #49 reported today, now fixed in 65f4e2f |
Starting a 64-bit Winword on current Windows 10 under observation by capemon gets stuck very early on. Attaching to the stuck
winword.exe
shows the following call stack:Apparently,
ntdll!RtlInsertInvertedFunctonTable
is called and (according to disassembly of the function) exclusively acquires Slim Reader/Writer lock (SRW)ntdll!LdrpInvertedFunctionTableSRWLock
. This would make sense as the function is likely to modify import tables.While holding the lock, it calls
ntdll!LdrProtectMrData
which, according to my debugging, eventually callsntdll!NtProtectVirtualMemory
, likely to protect access to those tables (again). Sincentdll!NtProtectVirtualMemory
is hooked by capemon, this triggerscapemon_x64!enter_hook
in order to decide whether to enter the capemon hook for that function or not. This decision is, amongst other things. based on whether the hook is itself called from a hook. To determine this, the stack is unwound by callingcapemon_x64!our_stackwalk
which on x64 is implemented usingntdll!RtlLookupFunctionEntry
.Unfortunately, this function does not appear to be safe for this kind of thing because it also acquires the already exclusively held
ntdll!LdrpInvertedFunctionTableSRWLock
. This leads to the observed deadlock.Disabling hooking of
ntdll!NtProtectVirtualMemory
without involvement of stack unwinding mitigates the issue:This, however, leaves a massive blindspot regarding all calls to that function. This also only works because
ntdll!NtProtectVirtualMemory
appears to be the only hooked function called whilentdll!LdrpInvertedFunctionTableSRWLock
is held.Trying to hook
ntdll!RtlInsertInvertedFunctonTable
to temporarily disable hooking for all other APIs called from it (again without involving stack-based decisions) have not been successful because the hook does not seem to be called:Another idea I had (and found elsewhere: https://microsoft.public.win32.programmer.kernel.narkive.com/qxCAoEXI/using-rtllookupfunctionentry-for-profiling) was to try to acquire the lock from
our_stackwalk
to see if it was held or free andRtlLookupFunctionEntry
would block or not. Unfortunately, the symbol is not exported fromntdll
, so I cannot get at its address.Other projects have run into this problem as well and proposed a number of solutions, e.g.: dotnet/runtime#32286, DynamoRIO/drmemory#1222
The issue seems to be somewhat Windows-10-specific, because the same capemon_x64.dll is able to start up and monitor the same version of 64-bit Winword on a 64-bit Windows 7 without above workarounds. My guess is that import table mechanics, at least regarding memory protections on them, have changed between Windows 7 and Windows 10. I have not analysed the differences in detail though.
Is my understanding of the mechanics at play correct?
Could my attempts at hooking
RtlInsertInvertedFunctionTable
or inspecting the state ofLdrpInvertedFunctionTableSRWLock
fromour_stackwalk
be made to work somehow?Any ideas what else could be done about this issue?
Thanks!
The text was updated successfully, but these errors were encountered: