Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL is unkillable without rebooting (permissions/security problems for admins) #1086

Closed
fpqc opened this issue Sep 15, 2016 · 69 comments
Closed

Comments

@fpqc
Copy link

fpqc commented Sep 15, 2016

In my previous issue, I gave a way to turn WSL into a zombie process that is unkillable.

I continued testing ways to kill it, and I think it's worth having its own issue

The only way to kill the service is rebooting, once it hangs. I tried launching a command prompt as NT Authority\System with psexec from the sysinternals tools, and this happened:

2016-09-15 1

Not sure how access is being denied to NT Authority\System, which should not have any restrictions in Ring 0, or so I thought. Maybe this means that it's being locked by the Windows 10 hypervisor (I have hyper-V installed, so maybe it's a hypervisor-level lockout using VSM? The process explorer screenshot below says that it's not using Virtualization, so what is securing it?!)

It's running with such a high level of security that even process explorer running as admin cannot remove its protections:

accessdenied

@stehufntdev
Copy link
Collaborator

Thanks for reporting the issue. The service is a "locked down" process since it runs as a protected process (light) so access is limited from other non-protected processes - https://msdn.microsoft.com/en-us/library/windows/desktop/dn313124(v=vs.85).aspx. The service hang reported in the other issue is a bug we should fix.

@fpqc
Copy link
Author

fpqc commented Sep 15, 2016

@stehufntdev So is there a way for an admin to manually kill this kind of protected process without rebooting (like a protected-mode task killer? I have no idea) or is that kind of thing only available running in a less secure mode used for debugging (like SELinux permissive mode)?

Also, even when you fix the underlying bug in #1085 , it seems reasonable that you might want to kill the Linux instance from the Windows side if it is malfunctioning. Could you maybe make it so stopping the service kills the Linux instance more aggressively (for example, by setting a timeout after which it will force kill the instance)?

Also, I understand why you wouldn't want to allow random code injection into the service process, but it's not an antimalware program, so it seems like protecting it from being force-killed is a bit of overkill.

@therealkenc
Copy link
Collaborator

therealkenc commented Sep 16, 2016

I just about asked this same question last week when I managed to hose WSL a bunch of times. Unfortunately I don't have a useful repro1 which is why I didn't bring it up. In my case the process couldn't be killed, and if I closed the shell and tried to open another it would hang with no prompt. Reboot was the only way out. It would be nice to have a lxrun.exe /nukeinstance.


1 run gdb on a huge out of scope project with lots of threads and mutexes that call lots of unimplemented surface under memory pressure while reading and writing to files pipes and sockets

@fpqc
Copy link
Author

fpqc commented Sep 16, 2016

@therealkenc Well luckily I spent an hour and found a pretty minimal configuration on Trusty to get a repro. Hopefully they will fix both the underlying bug as well as provide the nukeinstance interface (or add a nukeinstance timer to the session manager service on manual stop).

I also had a problem trying to do an scp of a large file inside of tmux, but I did it with the messed up zsh/tmux, so who knows what caused the bug.

@dmex
Copy link

dmex commented Sep 24, 2016

@stehufntdev

Can you please share some information about why lxss even needs to run as a protected process? This would be very useful information since I may need to change the way our kernel-mode 'anti-rootkit' feature (included in Process Hacker) is presently able to terminate the LxssManager service and WSL processes.

@fpqc

You can do one of two things to terminate the LxssManager service:

  1. Open regedit and navigate to the following key:
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LxssManager
  2. Delete the LaunchProtected value.
  3. Reboot.

That will enable full access to the LxssManager service (including termination rights) but you're likely going to open up a security hole by doing so.

Alternatively, Process Hacker is able to terminate 'protected' processes such as WSL:
https://github.com/processhacker2/processhacker2/releases
https://wj32.org/processhacker/

  1. Elevate Process Hacker with administrative rights.
  2. Right-click the process or service and click Terminate.

@fpqc
Copy link
Author

fpqc commented Sep 24, 2016

@dmex How is process hacker doing that? An exploit? If process hacker can do this, mustn't it be using a local privilege escalation? What would stop malware doing the same?

Edit: Oh, I guess only if you are running with the kernel mode driver?

Edit2: I guess your kernelmode driver is signed, but what is stopping third party malware from using your signed driver to circumvent security features like this one?

@dmex
Copy link

dmex commented Sep 24, 2016

@fpqc

Process Hacker simply calls ZwTerminateProcess from kernel-mode using a kernel driver? That function allows 'protected' processes to be terminated (when called from kernel-mode) and is officially documented and supported by Microsoft - feel free to review our source code, you'll find calls to that function are very well protected :)

The major problem is that anti-virus/malware software is unable to defend against zero-day attacks (or the majority of malicious executables compiled within the last month or so) and vendors usually take up to several weeks (or even months) to create signatures and remove malicious software from your machine - instead Process Hacker allows you to terminate and remove rootkits and malware yourself.

Rootkits and malware can obviously use the same function if they create a kernel driver and obtain an EV certificate to sign it but EVs are expensive and require a lots of legal documentation to be granted (and their expensive certificate is easily revoked).

The serious question is why LXSS needs to run as a protected process... If something executing inside Bash manages to exploit WSL, then it would be unkillable by everything except Process Hacker (at minimum for several weeks until the majority of Anti-virus software created signatures and starting blocking the malicious code).

Hopefully @stehufntdev or someone can share some details about why this protection level is needed so I can decide if I need to make changes to Process Hacker to prevent LxssManager from being terminated (if it's just to protect calls to/from lxcore.sys then I can suggest a much better alternative to protection levels!) .

@fpqc
Copy link
Author

fpqc commented Sep 24, 2016

@dmex Yeah I thought about it a bit more, and it seems like it doesn't matter that much bc in order to call the process hacker kill functionality you already need ring 3 admin, in which case you can probably find some other way to obtain persistence. The point, I guess, is that unless the driver itself is vulnerable, it probably won't allow arbitrary execution of code in Supervisor mode even from Ring 3 adminland, and I assume that even from Ring 3 adminland you can't disable secure boot without exploiting the firmware, in which case certs will be enforced and unsigned/self signed rootkits can't be installed with only Ring3 admin (unless you find a bug that allows you to jump from admin into the kernel in the first place).

Makes sense, and I guess I trust process hacker to install drivers into kernel mode as much as I trust NVidia, lol.

@stehufntdev
Copy link
Collaborator

stehufntdev commented Sep 26, 2016

For now, we only want Windows signed code accessing the lxss device driver. This is accomplished today by marking the lxss device driver ACL as local system Windows PPL, and running the lxssmanager service as local system Windows PPL service.

@fpqc
Copy link
Author

fpqc commented Sep 26, 2016

@stehufntdev Is that also enforced at the COM interface on the service? Is that the reason why I'm getting an immediate crash with @ionescu007 's lxlaunch (because bash.exe is a signed Windows component, it's being allowed to connect to the service, but other binaries may not be able to do so)?

Or is it just a weird thing that's happening bc of something I screwed up while compiling it??

@benhillis
Copy link
Member

benhillis commented Sep 26, 2016

@fpqc - Non windows signed binaries should be able to connect to LxssManager (just not the driver). We are still refining the COM interface which is a large part of the reason we have not yet documented it.

@fpqc
Copy link
Author

fpqc commented Sep 26, 2016

@benhillis He just rewrote the relevant part of the program 3 days ago, I'm pretty sure it's up-to-date. It actually does a check to see if the build is >=15000, bizarrely enough (maybe someone over at MS is giving him nightly builds? No idea).

@ionescu007
Copy link

The build number check is simply a bug on my end. It was meant to be 14500. The code does not crash for me, so I will have to debug what's wrong with it. The COM interface changed to allow the creation of an unnamed IPC channel with the launcher.

Also, your questions on Protected Processes & etc, and why MS locked down the driver this way (I asked them to) are explained in the BlackHat presentation.

@ionescu007
Copy link

@benhillis You're shipping an IDL file for Lxss? ;-)

@benhillis
Copy link
Member

benhillis commented Sep 26, 2016

@ionescu007 we plan on documenting the com interfaces at some point but we want to make sure we don't do that while they are still in flux.

@ionescu007
Copy link

That's actually really cool. And yes, the proper thing is to update the IID Version when you add a parameter ;-) Understandable not to have done it for the 'beta' I guess.

No source/privates used in my research, sadly (except ole32/combase which are on the symbol server) -- too much of an NDA risk. Public symbols + the intense debug output is enough :)

@benhillis
Copy link
Member

Turns out kernel developers aren't the best at following COM best practices :)

@dmex
Copy link

dmex commented Sep 29, 2016

@stehufntdev

It might be an idea to add a /kill parameter to lxrun or somewhere (as @therealkenc suggested) that sends an ioctl to lxcore and terminates all running pico processes? It should be possible to terminate/recover your session without having to reboot (or use 3rd party software).

@ionescu007

I know you asked for PPL ;) but I was hoping ntdev might consider some changes that allow non-critical PPL processes (such as LxssManager) to be terminated using Task manager and other software. If Lxss was somehow compromised or become a runaway processes the ability to terminate it would improve both reliability and security?

@ionescu007
Copy link

The COM interface does have a Stop/Terminate command that sends the IOCTL.
I could add it to my tools :)

Best regards,
Alex Ionescu

On Thu, Sep 29, 2016 at 7:53 AM, Steven G [email protected] wrote:

@stehufntdev https://github.com/stehufntdev

It might be an idea to add a /kill parameter to lxrun or somewhere (as
@therealkenc https://github.com/therealkenc suggested) that sends an
ioctl to lxcore and terminates all running pico processes? It should be
possible to terminate/recover your session without having to reboot (or use
3rd party software).

@ionescu007 https://github.com/ionescu007

I know you asked for PPL ;) but I was hoping ntdev might consider some
changes that allow non-critical PPL processes (such as LxssManager) to be
terminated using Task manager and other software. If Lxss was somehow
compromised or become a runaway processes the ability to terminate it would
improve both reliability and security?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1086 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFxIeKf7GrduiF7dVziTkwrpPXZy-GnLks5qu9D8gaJpZM4J-aZu
.

@stehufntdev
Copy link
Collaborator

Stopping the service through scm will end up do the same thing as the COM interface's stop\terminate command.

In the hang caused by zsh, a thread was deadlocked in the kernel (non alterable wait) so it wouldn't have been possible to kill without a restart. All bugs aside in WSL, stopping the service is the right way to terminate the running instance.

@dmex
Copy link

dmex commented Sep 29, 2016

@ionescu007

Thanks! Please add that into the next version ❤️

@gdh1995
Copy link

gdh1995 commented Oct 31, 2016

I could not stop the service LxssManager: it also hanged when I wanted to restart it on Taskmgr's service panel, and then sc stop LxssManager reported Error 1061: The service cannot accept control messages at this time.

Env: Windows 10 ver 1607 (build 14955.1000)

@flrngel
Copy link

flrngel commented Dec 7, 2016

@gdh1995 In my case, I booted as safe mode. I just deleted most recent files from C:\Users\<My Account>\Local\lxss which seems to cause deadlock.

@emberquill
Copy link

I somehow managed to freeze the entire WSL by running a mv command in bash. Just renaming a folder which itself contains a single non-critical file. Could not kill the mv process by any means, nor could I kill LxssManager even with the Process Explorer method mentioned above.

@fpqc
Copy link
Author

fpqc commented Jul 8, 2017

@emberquill let me guess you're on Windows 10 1607 right?

@fpqc
Copy link
Author

fpqc commented Aug 24, 2017

quietly closing this up @benhillis . Should be marked "bydesign" by merit of it being a kernel driver.

@fpqc fpqc closed this as completed Aug 24, 2017
@jefferai
Copy link

FWIW, I realize the discussion went elsewhere, but going back to the OP on this issue, I just encountered this. Ran a normal session, then hit Ctrl-D to logout. Got "logout" in the terminal, and it hung. Ctrl-C does nothing, attempts to kill it do nothing, and of course it doesn't show up in task manager.

I tried stopping LxssManager, which timed out, and now its status just pegged at "Stopping".

@fpqc
Copy link
Author

fpqc commented Sep 13, 2017

@jefferai This is gonna happen with any deadlock in the driver. There are instructions in some of the documentation on enabling a system memory dump and manually crashing windows to generate it if you get into a deadlocked state like this. Memory will be written out to disk, and you can host the dump on onedrive and submit it to [email protected] attn: Ben Hillis or Sunil Muthuswamy

Or, if you have a reliable repro, you can submit it as a new bug report.

The problem is that if the driver deadlocks, restarting the service sends an ioctl to the kernel to stop the driver, but the driver is deadlocked so it just sorta hangs.

@jefferai
Copy link

@fpqc any chance you can point me to those docs? Happy to submit the dump if so.

@mewbow1
Copy link

mewbow1 commented Nov 4, 2019

I have found a solution, but it requires the use of third party software.
Download KProcessHacker3 from https://processhacker.sourceforge.io/
Note: Antivirus softwares may detect it as a virus as it can be used to modify or kill system processes. Please disable your Antivirus software before proceeding.

Step 1: Write click on ProcessHacker.exe and Run As Administrator.
image

Step 2: Go to services tab, search for LxssManager, right click on it and select Go to process.
image

It will automatically highlight the svchost.exe process associated with LxssManager.
image

Step 5: Since it is automatically selected, just press Del button on your keyboard to terminate the process and click on Terminate to confirm the same.
image

Step 6: Again go to services tab, search for LxssManager, right click and click on Start to start the LxssManager service.
image

Step 7: Since KProcessHacker3 creates a SYSTEM level service to manage and kill system services, open Administrator Command Prompt and use the command
sc stop kprocesshacker3
image
Before enabling antivirus again, delete the KProcessHacker folder from your Downloads folder.

Now enjoy WSL without requiring a system reboot.

@edemen
Copy link

edemen commented Dec 10, 2019

Looks like even KProcessHacker won't help if you already have initiated the service stopping. It just brings up an error about trying to access a process that is in a shutdown already.

@illtellyoulater
Copy link

Can't believe this hasn't been fixed after 4 years...

@LoganDark
Copy link

@mewbow1 Nope.

image

Running Process Hacker as administrator.

@dmex
Copy link

dmex commented Mar 15, 2020

Running Process Hacker as administrator.

Try using the nightly builds of Process Hacker - the v2.39 release doesn't support Windows 10 @LoganDark

@LoganDark
Copy link

LoganDark commented Mar 15, 2020

@dmex When running as administrator, it says it can't load the kernel module because access is denied.

image

When not running as administrator it didn't. So being the idiot that I am, I tried to start up bash and then tested if PH could terminate it.

It couldn't.

Now I have to reboot again, and I don't say that lightly.

image

@dmex
Copy link

dmex commented Mar 16, 2020

it can't load the kernel module because access is denied.

The kernel driver is required for WSL process termination. That access_denied error can only occur when Antivirus software and/or malicious software has blocked the driver from loading. You will need to add exceptions to your security software or remove the malicious software to be able to use the driver.

@LoganDark
Copy link

The kernel driver is required for WSL process termination.

Which is why I included it in my post.

That access_denied error can only occur when Antivirus software and/or malicious software has blocked the driver from loading. You will need to add exceptions to your security software or remove the malicious software to be able to use the driver.

It was my antivirus, thanks. I have it on silent to stop ad popups (shitty I know), but when I turned off silent mode, it made a popup about the kernel module. Should be able to add an exception. I am not going to test this again till later.

@iamdevlinph
Copy link

Just encountered this recently. When I boot up my PC, the WSL doesn't launch properly and when I try to stop it through the services.msc then it gets stuck on STOPPING.

And so far the only solution that I found was to reboot.

Windows version: Version 1909 (OS Build 18363.720)

@Kagami
Copy link

Kagami commented May 9, 2020

Happens to me every day. Should be somehow related to EACCESS issue in WSL1 because the process than can't be killed is node and seems to be a file watcher. Those unkillable processes probably appear when I remove folders while VS Code Remote is running.

@mewbow1 thank you so much, this actually helped. (I used current 2.39 release of Process Hacker.)

Killing svchost didn't stop the LxssManager though, so I clicked restart, Process Hacked became unresponsive, I forcibly closed it, reopened and LxssManager showed as stopped this time. Then I started the service, started Ubuntu terminal and unkillable processes has gone. Yay, no more reboots.

Looks like even KProcessHacker won't help if you already have initiated the service stopping

I didn't try to stop the service via Task Manager this time, maybe that's why it worked.

@Kagami
Copy link

Kagami commented May 9, 2020

Unfortunately this led to another problem: I couldn't rename folder which was probably occupied by that process. Search for handle didn't show anything. At the same time I could restart LxssManager service perfectly fine. So it has to be some issue with WSL/NTFS interop...

@CherryDT
Copy link

CherryDT commented Nov 7, 2020

I have the same issue recently. I'm also using VS Code Remote with WSL and I also had mysterious issues with renaming or removing folders while it's open (or running npm install).

Furthermore, a reproducible way for me to cause a hung WSL is running Jekyll with the watch option for a while and changing some files. It won't take more than a few minutes to get everything stuck.

I'm on 18363.1139. Is this fixed in newer versions? The problem is just that for many months my update screen says that the 2004 version is "being prepared" and I will get notified when it's available for my machine... (still waiting to get that notification...)

@Ronak-59
Copy link

Still not fixed after 5 years. All commands hang and LxssManager is permanently in the "Stop Pending" state even after Reboot. Spent 6+ hours trying different suggestions mentioned in this thread and none working. This needs to be fixed ASAP though.

@CherryDT
Copy link

CherryDT commented Jan 17, 2021

LxssManager is permanently in the "Stop Pending" state even after Reboot.

That doesn't sound possible, are you sure you rebooted and didn't just turn it "off" and on again (possible not executing an actual restart but for example hibernation)?

@Ronak-59
Copy link

@CherryDT I'm sure I "rebooted". I tried all possible ways I found on internet including Registry Edit. It's just not working and seems constantly in "Stop Pending" state. No command works except sc query LxssManager

Elevated CMD and elevated Powershell are not helping too.

@SalehAce1
Copy link

Will this ever actually be fixed or are we supposed to reboot our system every time LxssManager fails forever?

@iamdevlinph
Copy link

I'm at Version 20H2 (OS Build 19042.985) and haven't encountered this issue anymore.

But yeah, as far as I know, the only solution is to reboot.

@Pyker
Copy link

Pyker commented May 27, 2021

I can confirm this still happens in 20H2 (19042.985). I'm not sure if it's related to hibernation or just WSL being unused for a while.

@cristianuibar
Copy link

Just happened to me on 21H1 so still not fixed.

I believe the Bitdefender AV caused this glitch due to a PHP script virus it encountered during a GIT clone operation.

@MysteryMS
Copy link

This just happened to me on the insider builds of Windows 11.
Thanks to this thread I could find a way to fix this - ending the process related to the service (used Process Hacker tho but I'll try with normal task manager next time it happens)

I can confirm this still happens in 20H2 (19042.985). I'm not sure if it's related to hibernation or just WSL being unused for a while.

Can relate that I'm having this issue hibernating my PC, too.

@abdennour
Copy link

happen also with the corporate laptop.

PS C:> Restart-Service -Name "LxssManager"
WARNING: Waiting for service 'LxssManager (LxssManager)' to stop...
WARNING: Waiting for service 'LxssManager (LxssManager)' to stop...
WARNING: Waiting for service 'LxssManager (LxssManager)' to stop...
WARNING: Waiting for service 'LxssManager (LxssManager)' to stop...

@abdennour
Copy link

i used taskkill command (https://www.youtube.com/watch?v=e7IsO51eTYw).
I faced another issue, i reached this point (https://stackoverflow.com/a/19341022/747579)
At the end, the machine has been rebooted because i killed the process with PID 80,
then after reboot, things work

@mcxiv
Copy link

mcxiv commented Apr 3, 2023

Did someone find a solution after all these years?

@hanzlahabib
Copy link

Can't believe this hasn't been fixed after 7 years...

@FinalFortune
Copy link

FinalFortune commented Oct 2, 2023

Install linux as main OS, easiest fix. No need to use windows anymore :).

Here's a nice run once powershell script to restart it:

$a = (tasklist /svc /fi "imagename eq svchost.exe" | findstr LxssManager | % { $_ -split '\s+' })[1]; `
Stop-Process -f $a; `
Start-Service lxssManager

@CherryDT
Copy link

CherryDT commented Oct 2, 2023

@FinalFortune This doesn't work, as mentioned it is unkillable until rebooting. Doing what you wrote will get the process stuck in a "terminating" state. It stays and when you try a second time you get the error that the process is already being terminated so you can't terminate it again. From a service perspective it will stay stuck "stopping" so start-service will also fail because the service is in an invalid state for starting it.

@FinalFortune
Copy link

@FinalFortune This doesn't work, as mentioned it is unkillable until rebooting. Doing what you wrote will get the process stuck in a "terminating" state. It stays and when you try a second time you get the error that the process is already being terminated so you can't terminate it again. From a service perspective it will stay stuck "stopping" so start-service will also fail because the service is in an invalid state for starting it.

Well I'm fairly certain my WSL had borked from hibernate. It wasn't terminating, so I ran the above command, which was derived from the stackoverflow solution, and it worked. However im not sure it was the exact same state mentioned earlier in this issue, I will confirm next time with the appropriate commands.

This issue has been a constituent in making using docker on windows a collosal pain the ass to use in windows, I just come from the C# side so using docker from WSL was a natural progression, and plain Hyper V docker was far slower. However gorging myself on a bed of blades may have been a more enjoyable experience.

@pabbasi
Copy link

pabbasi commented Aug 8, 2024

@fpqc quietly closing this up @benhillis . Should be marked "bydesign" by merit of it being a kernel driver.

Since this is 'by design' , has an official workaround to this issue also been designed? 😆 ?

wsl --shutdown hangs indefinitely every time, and restarting windows gets old rather quickly.

@mtrin
Copy link

mtrin commented Aug 20, 2024

I am having a much better time after creating a powershell script that runs wsl --shutdown, waits and then hibernates the computer. Just make sure to close vscode before doing it.
since then, I haven't had an unkillable wsl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests