Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to debug complete subsystem hang? #2859

Closed
mqudsi opened this issue Jan 20, 2018 · 15 comments
Closed

How to debug complete subsystem hang? #2859

mqudsi opened this issue Jan 20, 2018 · 15 comments

Comments

@mqudsi
Copy link

mqudsi commented Jan 20, 2018

I've experienced this a number of times, but pretty sporadically, and I'm not sure how to go about getting information that would help debug this.

Sometimes after performing an action in the WSL environment, I end up with a completely deadlocked lxss where all existing (launched) WSL processes immediately become unresponsive and any new commands (even just bash) hang indefinitely.

I just experienced this on 16299.

@sunilmut
Copy link
Member

@mqudsi - You will need Windbg/kd for that. Once the system hangs, you will have to break in to the debugger and see if there is a deadlock between lxcore processes (try the !stacks debugger extension).

@poizan42
Copy link

poizan42 commented Jan 23, 2018

@sunilmut What are the "lxcore processes"? Right now on build 17074 I can see the following: The launcher process (wsl.exe), optionally distribution launcher (e.g. ubuntu.exe), the wsl host wslhost.exe and the LxssManager service running inside one of the svchost.exe instances. And then of course the wsl processes themselves, but you have blocked those from being opened with anything but PROCESS_QUERY_LIMITED_INFORMATION :( (what's the point in that anyways?)

@benhillis
Copy link
Member

@poizan42 - He means ELF processes (/bin/bash, etc).

@poizan42
Copy link

@benhillis, but you just get access denied if try to open those in windbg. I actually tried using Process Hacker to launch WinDbg with System integrity and all 31 privileges activated, and it is still blocked, so seemingly they can't be opened from usermode at all for for debugging.

@benhillis
Copy link
Member

@poizan42 - If there's a deadlock it's going to be in kernel mode, not user mode so attaching a user mode debugger isn't going to be useful. The easiest way for us to debug subsystem hangs is by looking at a memory dump. It's likely this is #2849 for which we have a fix inbound.

@mqudsi
Copy link
Author

mqudsi commented Jan 23, 2018

fwiw, It's unlikely this was the same issue as it was the shutdown sequence for neovim that caused the issue in this particular case, which shouldn't have referenced anything outside the WSL environment.

@benhillis
Copy link
Member

@mqudsi - In that case if you could collect a memory dump and forward it along to [email protected] it would be greatly appreciated.

@poizan42
Copy link

@benhillis That makes sense, but since you need to have enabled kernel mode debugging already it won't help much if you are randomly encountering a hang unless you can reproduce it or happen to be running with kernel mode debugging enabled, which didn't sound like was the situation for @mqudsi.

Actually Process Hacker can use its kernel driver to show the kernel mode stack which might be the best thing available in this case.

@mkarpoff
Copy link

@mqudsi Did you find a fix for your freezing problem? I've started getting it today on build 17074.

@sunilmut
Copy link
Member

@poizan42 - Yes, that's mostly correct. If you are encountering a hang in launching bash and it feels like a deadlock, then there are two options:

  1. Generate a full memory dump manually by following the steps here and send the dump over to [email protected]
    Make sure that the dump is set to full memory dump. Minidump will not be much use here.
  2. If you are feeling adventurous and luckily have a Windows kernel debugger hooked up to the system, then you can break into the debugger and go from there.
    If there are any tools out there that gives you the kernel mode stack for all the processes running on the system, then, yes, you can use that as well.

@fpqc
Copy link

fpqc commented Jan 27, 2018

@sunilmut Do you guys at MS run with live kernel debuggers, or do you generally generate and then debug crashes?

If you do run with live kernel debugging machines attached, I'm wondering if you guys literally frankenstein together two PCs or if you have special hardware.

@mqudsi
Copy link
Author

mqudsi commented Jan 27, 2018

@fpqc it used to be so hard, but these days a bidirectional usb 3.0 a-a cable is all it takes.

@fpqc
Copy link

fpqc commented Jan 27, 2018

@mqudsi Neat! Is local kdb suitable for debugging something like WSL? If not, what about offline debugging with the LiveKD tool? It looks like these features were added recently: https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/performing-local-kernel-debugging

@mqudsi
Copy link
Author

mqudsi commented Jan 27, 2018

They're actually old features, but the article was recently updated. So long as your PC isn't crashed (BSoD/GSoD) or totally hung, local kdb is fine.

@benhillis
Copy link
Member

A typical setup is a dev box and a test machine or test VM with a kernel debugger attached. Personally I use one physical machine and a couple VMs with different memory and virtual processor counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants