-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: Use GetFinalPathNameByHandle for Process.open_files #2190
base: master
Are you sure you want to change the base?
Conversation
I'll try to reproduce the CI failures on other Windows versions. |
Interesting. FYI, the precarious situation with
Just to make things clear, a proper solution would have to:
|
I have been looking into this as well. I normally do pure python but does the deadlock refer to the computer hanging for about half a second every time open_files() is used? It seems to me that NtQuerySystemInformation is what causes the hang even with this fixed code. |
I looked into this, including the upstream bug report, and it was closed by MS as "by design", the description "for certain type of handles", and one of the comments says to use
I'm also curious about this. Is it possible that
I will add back the thread wrapper. I removed it optimistically, but I'm sure there would be situations where even a "fixed" API could hang (slow/broken SMB connections, etc.). |
This is where the call never completes, and is unrecoverable. That's the issue described in #1967 and what is affecting me. If you're trying my fix, I prematurely removed the threading code, so that may re-introduce #340. I'm going to add the thread back but around the newer API and test that on older platforms. |
Yes, it is possible that psutil's use of GetFileType is the problem (or a problem). ProcessHacker doesn't use it (which is already a red flag). If I'm reading the code right, it uses NtQueryObject with ObjectTypesInformation instead, so I definitively think this is very well worth a try. Also, there's an interesting comment in PH source code: // There are three ways of enumerating handles:
// * On Windows 8 and later, NtQueryInformationProcess with ProcessHandleInformation is the most efficient method.
// * On Windows XP and later, NtQuerySystemInformation with SystemExtendedHandleInformation.
// * Otherwise, NtQuerySystemInformation with SystemHandleInformation can be used. We use NtQuerySystemInformation with SystemExtendedHandleInformation (second method) whereas ProcessHacker uses NtQueryInformationProcess with ProcessHandleInformation on Windows 8+ (first method). Probably this will only make a difference in terms of performance, but may be also worth a try. |
Process Hacker uses the Native API instead of the Win32 API. GetFinalPathNameByHandle/Ex are both Win32 APIs exposing a limited number of information classes from the NtQueryInformationFile and NtQueryDirectoryFile Native APIs. GetFileInformationByHandleEx is used by Process Hacker but you need to search for NtQueryInformationFile 😉
GetFileType is another Win32 API and it's internally calling the NtQueryVolumeInformationFile Native API with the FileFsDeviceInformation information class. Process Hacker does use GetFileType but you need to search for NtQueryVolumeInformationFile and FileFsDeviceInformation 😉 We've never run into issues with GetFileType/NtQueryVolumeInformationFile(FileFsDeviceInformation) and it's always successfully returned the DeviceType for the handle without deadlocking unlike NtQueryObject and NtQueryInformationFile: https://github.com/winsiderss/systeminformer/blob/f17d0dc7c770fb2453bb604c39a43508bd692e07/SystemInformer/hndlprp.c#L1337-L1374
NtQueryInformationFile (GetFinalPathNameByHandle/Ex) deadlocks just like NtQueryObject does especially on Windows 10/11 for named pipes (FILE_DEVICE_NAMED_PIPE), console handles (FILE_DEVICE_CONSOLE) and some other device types. You can see here how we've had to use the same workaround for NtQueryObject with NtQueryInformationFile here: Note: These days we're mostly avoiding issues with NtQueryObject by using our kernel driver to query the name and all this code is only used as fallback when the driver isn't available.
I tried debugging the psutil code and I think it's very likely the calls to If it's not those functions then another likely reason is because the For example: The LoadLibrary function uses worker threads when loading DLLs on Windows 8/10/11 to improve the performance of import table snaps. A single worker thread terminating because it couldn't access the process heap would leak the LdrpLoaderLock and explain the exited thread (worker factory) referencing the LdrpLoaderLock in that other github issue (and obfuscate the real issue which would be using heap functions while terminating the There are also some other issues with the psutil/psutil/arch/windows/process_handles.c Lines 84 to 147 in 2da9950
Compare Note how our function doesn't allocate memory for the string and doesn't do anything except call NtQueryObject and pass values directly into a structure? It has to be done this way to safely terminate the thread. The
Here's a breakdown of the changes required with examples:
This is what the
Now the
// Optionally check the thread exit code. It will only return 0 after NtQueryObject completes otherwise STILL_ACTIVE.
If we reach this point we've either made 8 attempts at querying the handle and still failed or we've succeeded and have a valid string.
We've never run into deadlocks with GetFileType/NtQueryVolumeInformationFile and you won't deadlock querying files or directories only when you query handles for devices like NPFS (FILE_DEVICE_NAMED_PIPE), ConDrv (FILE_DEVICE_CONSOLE) or VolMgrControl. Process Hacker is designed to show handles so we don't have the luxury of limiting the query. If psutil
Feel free to @ me if you get stuck fixing psutil_get_filename or need more info about what Process Hacker is doing (Btw Process Hacker was renamed System Informer so I'll be referring to SI instead of PH going forward) 😜 There are some other methods that I haven't looked into to avoid deadlocks like process reflection. These processes can't execute and we recently started using reflection to avoid some other types of deadlocks with running processes and querying handles via reflection might also work. Let me know how you go with all the above wall of text spam and if you're still having issues then let me know and I'll see if our snapshotting/reflection stuff also avoids the deadlocks with NtQueryObject. -dmex |
Thanks for the amazing write up @dmex. It's an honor to meet you here. Needless to say, PH / SI inspired a lot of the psutil development on Windows, so I want to take this opportunity to finally say hi and thank you! I believe @wj32 even contributed some code to psutil many (>10) years ago.
I appreciate it. It's a lot of info to digest, but the overall suggestions appear clear.
That surprised me. PH is so etched into my brain that it will take me a while to get used to the new name. :) |
Yes, we only return files (paths). No handles. |
d68583a
to
b060339
Compare
I've implemented the ideas from @dmex, but I still see deadlock. Although less often, it's still fairly easy to reproduce using the steps in #1967 (using 32-bit Python 3.7 on Windows Vista). I can't reproduce on 64-bit Python 3.10 Windows 11, but I couldn't using
|
I was able to resolve the deadlock issues with NtQueryObject by using the ReOpenFile function to clone/recreate the internal file_object referenced by handle. Try using this function to query the handles with psutil (It's not 100% complete but should be straightforward enough for testing):
|
Signed-off-by: Jesse Schwartzentruber <[email protected]>
Signed-off-by: Jesse Schwartzentruber <[email protected]>
b060339
to
d37d72f
Compare
@dmex thanks again! I don't see deadlock either with your change. |
On further testing, this doesn't filter for pipes. I'm not seeing it hang/deadlock, but I do see On Vista I see |
Above is visible by running |
@jschwartzentruber That would be the expected behavior. Named pipes are created using NtCreateNamedPipeFile (npfs.sys) instead of NtCreateFile and are 'connected' by chaining each
This is why I asked if
Remember that DuplicateHandle continues referencing the existing FILE_OBJECT and NtQueryObject deadlocks because the If you need to query FILE_DEVICE_NAMED_PIPE or FILE_DEVICE_CONSOLE then ReOpenFile won't work because those file objects are 'extensions' meaning you'll still have to fallback to the original handle and a thread in those cases.
Can you enable loader snaps and find the DLL responsible for the lock?
The debugger will show the loader output. DebugView from Sysinternals can help if python doesn't: Looking at you changes here: 3eba2d3 The psutil_get_filename thread is doing absolutely nothing except calling NtQueryObject just like we're doing... NtQueryObject simply does a syscall instruction and exits so there's nothing there that can load a DLL onto that thread and terminating it shouldn't be an issue. There has to be something else going on here to be causing the problem unrelated to psutil_get_filename? |
(I don't use psutil, I found this thread in Google) We had a similar problem in Perl, we needed a way to tell if a |
There's also a third option using the built-in Windows rundown support? Does not require a kernel driver, opening handles to processes or handle duplication so it's able to support protected processes that would otherwise require a driver. Rundown is included with all versions of Windows and can be used to enumerate handles, threads, images (dlls), processes, heap etc... Also supports stack traces (even stack traces from protected processes/objects/threads): |
Summary
Process.open_files()
unsafe API usage can lead to deadlock #1967Description
This uses
GetFinalPathNameByHandle
which was added in Vista/Server 2008. This fixes #1967 based on the steps to reproduce there (only tested on Windows 11).