Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security] SSHD locks and fills hard drive when scanned by nmap (DOS Attack) #787

Closed
PillowFish opened this issue Jun 26, 2017 · 27 comments
Closed
Assignees

Comments

@PillowFish
Copy link

PillowFish commented Jun 26, 2017

"OpenSSH for Windows" version
v0.0.16.0
OpenSSH_7.5, LibreSSL 2.5.3

Server OperatingSystem
Windows 7 Professional

Client OperatingSystem
Ubuntu 14.04 (running nmap 7.50)

What is failing
The SSHD process gets stuck in an infinite loop filling the sshd.log.

Expected output
The expectation is that the service will recover from an nmap scan and continue normal operations.

Actual output
SSHD becomes unresponsive and the sshd.log file fills at an alarming rate (gigs within the hour). the log contains the same line over and over:
"33552 13:10:16:181 error: accept: Connection reset"

At this point nmap has completed its scan and no new traffic is flowing to the sshd process (confirmed by wireshark).

Turning debugging up i get an extra line in the sshd.log:
33552 13:10:16:181 debug3: accept - ERROR: async io completed with error: 10054, io:00000000004551B0
33552 13:10:16:181 error: accept: Connection reset

I can reproduce this defect with the following nmap command:
nmap TARGET -p 22 -sV

Nmap output:
Starting Nmap 7.50 ( https://nmap.org ) at 2017-06-26 13:28 EDT
Nmap scan report for TARGET (TARGET IP)
Host is up (0.00029s latency).
PORT STATE SERVICE VERSION
22/tcp open tcpwrapped
MAC Address: MAC

Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 0.78 seconds

Stopping and starting the sshd process restores sshd functionality.

@manojampalam manojampalam changed the title SSHD locks and fills hard drive when scanned by nmap (DOS Attack) [Security] SSHD locks and fills hard drive when scanned by nmap (DOS Attack) Jun 28, 2017
@manojampalam manojampalam added this to the Beta milestone Jun 28, 2017
@manojampalam
Copy link
Contributor

Thanks for reporting this. We'll follow up.

@manojampalam manojampalam modified the milestones: July-Mid, Beta Jul 10, 2017
@bagajjal
Copy link
Collaborator

@PillowFish - I couldn't reproduce this at my end.. I didn't see the error message in sshd.log file (error: accept: Connection reset") and the sshd.log is not growing..
Am I missing something here?

I ran "nmap -sV -p 22 127.0.0.1", it ran successfully..

Starting Nmap 7.50 ( https://nmap.org ) at 2017-07-10 15:24 Pacific Daylight Time
Nmap scan report for localhost (127.0.0.1)
Host is up (0.0010s latency).
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 7.5 (protocol 2.0)
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 3.33 seconds

SSHD.log

29872 15:24:55:643 debug3: socket:424, io:0000025F58D72F90, fd:5
29872 15:24:55:643 debug3: fd 5 is not O_NONBLOCK
29872 15:24:55:643 debug3: pipe - r-h:428,io:0000025F58D73080,fd:6 w-h:432,io:0000025F58D75A20,fd:7
29872 15:24:55:643 debug3: spawning E:\test\sshd.exe
29872 15:24:55:644 debug3: Register child 00000000000001B4 pid 48216, 0 zombies of 1
29872 15:24:55:644 debug3: close - io:0000025F58D72F90, type:1, fd:5, table_index:5
29872 15:24:55:644 debug1: Forked child 48216.
29872 15:24:55:644 debug3: close - io:0000025F58D75A20, type:2, fd:7, table_index:7
48216 15:24:55:674 debug1: sshd version OpenSSH_7.5, LibreSSL 2.5.3
48216 15:24:55:675 debug3: open - handle:0000000000000130, io:0000026BBC5F2630, fd:3
48216 15:24:55:680 debug3: close - io:0000026BBC5F2630, type:2, fd:3, table_index:3
48216 15:24:55:681 debug1: private host key #0: ssh-rsa SHA256:u2PBohr0znaydJyaBorQWkx5Vm1Hw10xctNFX8FNAFA
48216 15:24:55:682 debug3: open - handle:0000000000000130, io:0000026BBC5F2630, fd:3
48216 15:24:55:683 debug3: close - io:0000026BBC5F2630, type:2, fd:3, table_index:3
48216 15:24:55:683 debug1: private host key #1: ssh-dss SHA256:6/1wXar+admMPGH5dRsiurr81Mj3hT/m/jLr/SQ9qhs
48216 15:24:55:683 debug3: open - handle:0000000000000130, io:0000026BBC5F2630, fd:3
48216 15:24:55:684 debug3: close - io:0000026BBC5F2630, type:2, fd:3, table_index:3
48216 15:24:55:685 debug1: private host key #2: ecdsa-sha2-nistp256 SHA256:1eBFYcTyf5ssmjVuy/2wzLr07Ny4x/ARqwLU3IYs1Yg
48216 15:24:55:686 debug3: open - handle:00000000000000A8, io:0000026BBC642EC0, fd:3
48216 15:24:55:686 debug3: close - io:0000026BBC642EC0, type:2, fd:3, table_index:3
48216 15:24:55:686 debug1: private host key #3: ssh-ed25519 SHA256:pTVUSAOShWkyn00DYEfqSM8/W1owVDWntlL2DVHmd8w
48216 15:24:55:691 debug1: child socket: 424
48216 15:24:55:691 debug1: child startup_pipe: 432
48216 15:24:55:691 Connection from 127.0.0.1 port 30502 on 127.0.0.1 port 22
48216 15:24:55:692 Did not receive identification string from 127.0.0.1 port 30502
29872 15:24:55:699 debug3: close - io:0000025F58D73080, type:2, fd:6, table_index:6
29872 15:24:55:700 debug3: zombie'ing child at index 1, 0 zombies of 2
29872 15:24:55:700 debug3: Unregister child at index 1, 1 zombies of 2

@bagajjal bagajjal self-assigned this Jul 18, 2017
@bagajjal
Copy link
Collaborator

@PillowFish - Any update on this?

@megamorf
Copy link

@bagajjal is running the nmap scan against localhost really a reproduction of the author's issue, i.e. do you know the Windows 7 tcp/ip stack enough to say it should effectively have the same result as running the scan from a remote system?

@bingbing8 bingbing8 modified the milestones: July-End, July-Mid Jul 19, 2017
@PillowFish
Copy link
Author

Sorry for the delay,
I dont have an update other than i was no scanning localhost. Ill see if i can recreate this tomorrow when i get back in the office.

@bagajjal
Copy link
Collaborator

@megamorf @PillowFish -
I tried the similar setup (SSH Client is running on Ubuntu 16 and SSH Server is running on WIN 7) but I don't see the logs flooded..

FYI, I ran the command nmap only once..

@PillowFish
Copy link
Author

Wacky, ill rerun this on my end and see if it happens again and if there is some environmental factor im missing...

I to got into this odd state running nmap once, so that should have worked.

@manojampalam manojampalam modified the milestones: Aug-Mid, July-End Jul 31, 2017
@claptrap251
Copy link

claptrap251 commented Aug 1, 2017

Facing the same issue, file size went haywire last night.
with sshd log getting spammed by this error :

1680 20:38:23:517 error: accept: Connection reset

file size reached 50mb in less than a minute. Was also able to recreate it using nmap

@bagajjal
Copy link
Collaborator

bagajjal commented Aug 1, 2017

It would be great if you can provide us the reproduction steps.. Fyi,. We couldn't reproduce at our end

@claptrap251
Copy link

claptrap251 commented Aug 1, 2017

OS in use: Windows Server 2016 64-Bit (VM)
command used : nmap -Pn 127.0.0.1

I also tried with a different VM (Windows Server 2008 R2) but wasn't able to replicate it.

This defect doesn't show up when I see "Did not receive identification string from 127.0.0.1" in the logs

@bagajjal
Copy link
Collaborator

bagajjal commented Aug 1, 2017

@Aayush251 - Were you able to reproduce consistently on the windows server 2016 machine?

@claptrap251
Copy link

Yes

@manojampalam manojampalam modified the milestones: Aug-End, Aug-Mid Aug 17, 2017
@claptrap251
Copy link

Any updates regarding this issue?

@bagajjal
Copy link
Collaborator

@Aayush251 - Couldn't reproduce on windows server 2016...

@logmein345
Copy link

I am getting this also:
v0.0.19.0
Server 2016

My issue does not seem to have to be triggered by a scan or anything, it just seems to happen after a user is connected for a while. In a matter of hours it can rip through 20+GB with error: accept: Connection reset.

@manojampalam manojampalam modified the milestones: Sep-2017-Mid, Aug-End Sep 5, 2017
@bagajjal
Copy link
Collaborator

bagajjal commented Sep 6, 2017

@PillowFish, @Aayush251 , @logmein345 - I couldn't reproduce this issue.. It would be great if you can share the sshd.log (for the first 1-2 minutes so that I can dig more into this issue)..

FYI, it works for me... please see this,
d-1

@bingbing8 bingbing8 modified the milestones: Sep-2017-End, Oct-Mid Oct 2, 2017
@bagajjal
Copy link
Collaborator

bagajjal commented Oct 9, 2017

@PillowFish, @Aayush251 , @logmein345 - is there an update on this?

@bingbing8 bingbing8 modified the milestones: Oct-Mid, Oct-End Oct 17, 2017
@bagajjal bagajjal modified the milestones: Oct-End, Integration to OpenSSH Portable Oct 23, 2017
@jvalladaresBest
Copy link

Is there any update on this? We are facing the same issue

@bagajjal
Copy link
Collaborator

bagajjal commented Nov 3, 2017

@jvalladaresBest - We are unable to reproduce this at our end. It would be great if you can share the sshd.log (for the first 1-2 minutes so that I can dig more into this issue)..

@jvalladaresBest
Copy link

jvalladaresBest commented Nov 13, 2017 via email

@ST159357
Copy link

ST159357 commented Nov 17, 2017

Hello, log dump can be found here: https://pastebin.com/KJ7qicib

Scanning software is Qualsys. The "connection reset" lines go on indefinitely until the drive fills up. That's all I know at the moment.

@bagajjal
Copy link
Collaborator

@ST159357 - Please set the log level to DEBUG3, provide the sshd.log.

@itnic itnic mentioned this issue Dec 2, 2017
@itnic
Copy link

itnic commented Dec 6, 2017

Hi, I think I my patch is resolving this issue. Please can someone confirm it?

itnic added a commit to itnic/openssh-portable that referenced this issue Dec 14, 2017
This is related to the win32-OpenSSH issue #787
PowerShell/Win32-OpenSSH#787

As request, the PR has been moved here

There is a bug in the Microsoft POSIX compatibility layer used to
translate POSIX socket API to Windows socket API. The bug concerns the
emulated function accept() (socketio_accept() in the file socketio.c),
which may return an invalid socket and may lock the program in an
infinite loop.

A race is happening between the Windows kernel signaling a socket
issue to the POSIX compatibility layer. If a connection to the ssh
service is dropped before being fully handled by the POSIX
compatibility layer, that layer may reach a state where the Windows
kernel is aware of the dropped connection and update all the socket
states of the Windows socket API. However, the POSIX compatibility
layer is in the middle of the accept() function task and fails without
updating the local state of the emulated POSIX socket.

When the emulated accept() exits, the current socket state is not
updated and the emulated select() call would carry on detecting
activity (previously already detected) on the socket, triggering the
same exact error in the accept() call, running then in an infinite
loop.

This may not happen all the time: if the Windows kernel signals the
error before the compatibility POSIX layer has setup its accept()
state and returned successfully, other calls such as send() and recv()
will update the state of the socket and will handle the issue.

However, on a loaded machine where the synchronization between the
Windows socket API and the compatibility POSIX layer may be slower,
the emulated accept() call may finish before being notified by the
Windows kernel of a client disconnection. This may be triggered with
nmap, which makes two TCP connections in a row quickly: one to detect
an opened port, and a second to retrieve the ssh banner.

The fix proposed is to update the emulated accept() function to modify
the internal state of the compatibility POSIX layer as a regular POSIX
kernel would do. I personally had this issue and this patch fixed it.

Please note that we have identified other situations where it could
potentially happen (in particular the socketio_setsockopt() call). But
we haven't investigated this issue more deeply. Other issues such as
PowerShell#414 or #606 may be related.
@cxhercules
Copy link

FYI changed to Debug3 and did not get any new results. I have been able to mitigate this by adding scheduled powershell task that runs every five minutes and checks for sshd.log being bigger than 25MB, and restart sshd. You have to make sure user you are running this job for is allowed to logon as batch since you will run if logon or not. You will also need to be able to store local password.

if ((Get-ChildItem 'C:\Program Files\OpenSSH\logs\sshd.log').Length -gt 26214400)
{
copy-item 'C:\Program Files\OpenSSH\logs\sshd.log' 'C:\Program Files\OpenSSH\logs\sshd.log.prev'
$null | Set-Content 'C:\Program Files\OpenSSH\logs\sshd.log'
}

Restart-Service sshd

I put restart outside loop because I ran into instance where disk space was filled, but log file did not show it. I restarted sshd and space was freed, so as precaution decided to put outside loop. So far has worked out as workaround, till this is solved.

@itnic
Copy link

itnic commented Jan 11, 2018

I have proposed a patch which resolves this issue (I tested my patch, it is working). I have been asked to provide the patch for powershell/openssh-portable instead of here, see:

PowerShell/openssh-portable#252

This has been done more than one month ago...

Please can you apply this patch to avoid such trivial Denial of Service.This becomes critical (my ssh servers are scanned on the internet, resulting of massive DoS, just with a simple nmap scan... Should I open a CVE for that ?

@bagajjal
Copy link
Collaborator

@itnic - It will be part of our next release (Jan-Mid release).

@bagajjal
Copy link
Collaborator

Fixed as part of latest release 1.0.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests