-
-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for channel session semaphore from thread blocking #1071
Fix for channel session semaphore from thread blocking #1071
Conversation
- The following article describes some of the issues with the double check lock that we have seen issues with: https://www.sudhanshutheone.com/posts/double-check-lock-csharp
Just to expand on the report above. I can attach the test program that showed the issue if required. In our reproduction, I attached windbg to a .NET process that was running a test program that was using the library to execute "ps aux" on a Linux machine from roughly ten worker threads which were looping. After 30 minutes or so, windbg reported an access violation, which if you continue is translated into a NRE. The null was coming from the _session field in the Channel.cs class. This had been disposed, and the field set to null, even though this was later going to be accessed by the thread above. The NRE wasn't surfaced outside the library (as far as I could see), and it meant that the Semaphore was not released leading to eventual starvation. The fix was to avoid setting the field to null in the This manifested as the Semaphore not being released, and the SSH client eventually grinding to a halt., as shown above. |
- The following article describes some of the issues with the double check lock that we have seen issues with: https://www.sudhanshutheone.com/posts/double-check-lock-csharp
aff1a58
to
a530090
Compare
@WojciechNagorski great to see you maintaining this project and actively looking at these PRs. I can see the It is also worth noting we have this fix now in production (running our fork) as well as having our service run continuously in a test environment to check for long term stability without issue. I've fixed the AppVeyor issue so it is ready to be merged should you be happy to do that 😄 |
@patrick-yates-redgate Can you describe to me what the problem is with setting the session to null? |
Hi @WojciechNagorski, thanks for getting back to me. @clivetong is the best person to discuss this with as he found the issue. His comment here explains the problem and why setting _session to null was causing it |
There is some more info at #400 The sequence of events appears to be something like:
The key points being that The That's what I can figure out anyway. I think the one problem with this fix is that the semaphore will be released twice instead of once, which could eventually result in Side note: I don't really like the default policy of setting fields to null in |
Actually, it seems like the roll-your-own |
Also (sorry for the triple comment), I must be missing something above because while it explains the NRE it does not explain the thread starvation |
I think you may be wrong about the double release.
When I had applied the fix I left it running for 7+ days (and the problem usually occurred after 30 minutes). IIRC I left the debugger attached and saw no sign of other exceptions.
In our blocking case we saw the semaphore was never incremented, so threads started blocking when they tried to get the semaphore (ie its count was zero but no one owned the semaphore). We could see this in the crash dumps we took.
More explicitly, the NRE stopped the release and the Semaphore lost count.
We can probably provide a reproduction if you need it. The repo would hang after about 30 minutes using two small boxes running in Azure.
|
Thanks. I understand the problem, just not quite the steps which occur where the semaphore is not released at all. I will try and come up with a reproduction, and if not I will get back to you. I would be happy for this change to go in either way. |
@Rob-Hague it would be great. |
Ok I think I have worked it out. Firstly, my table above is a bit misleading: the sequence is always initiated by the listener thread receiving a channel close message. The main thread is waiting in Secondly, there cannot be more than one release of the semaphore thanks to the interlocked logic in SSH.NET/src/Renci.SshNet/Channels/ChannelSession.cs Lines 442 to 448 in dd2e552
But there could be no releases: The listener thread enters |
Awesome, thank you both @Rob-Hague and @WojciechNagorski for analysing the issue and getting this merged 🤩 |
It's indeed amazing! Thank you so much! Well, don't take it wrong, but... ;) are there any plans for an official release anytime soon? |
@Greg-Smulko I need to wait a few days, so we can say it will be soon :) |
* Assets/logos (#782) * Added logo assets * Added PNG 1260x640 with white border Co-authored-by: 103filgualan <[email protected]> * OPENSSH KeyReader for more keys (#614) * OPENSSH KeyReader for more keys Add support to parse OpenSSH Keys with ECDSA 256/384/521 and RSA. https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.key Change-Id: Iaa9cce0f2522e5fee377a82cb252f81f0b7cc563 * Fix ED25519Key KeyLength * Fix ED25519 PubKey-auth LeadingZeros of BigInteger-Conversion have to be removed before sending the Key. * Add interface to SftpFile #120 (#812) * Create ISftpFile interface. SftpFile sealed. Return ISftpFile from SftpClient instead of SftpFile. Make ISftpClient interface disposable. Co-authored-by: Wojciech Swieboda <[email protected]> * Start MessageListener with ThreadAbstraction.ExecuteThreadLongRunning (#902) * Fix Thread pool exhaustion due to MessageListener running on ThreadPool * Mark long running thread as background * Add async support to SftpClient and SftpFileStream (#819) * Add FEATURE_TAP and net472 target * Add TAP async support to SftpClient and SftpFileStream * Add async support to DnsAbstraction and SocketAbstraction * Add async support to *Connector and refactor the hierarchy * Add ConnectAsync to BaseClient * Add CODEOWNERS file. * Fix virus false-positive by Defender on Renci.SSHNet.Tests.dll (#867) Co-authored-by: Pedro Fonseca <[email protected]> * Add unit tests for task-based asynchronous API (#906) * Fix runtime and culture dependant tests. * Set C# 7.3 in Tests.csproj to limit intellisense's suggestions under different targets * Add SftpClientTest.*Async * Add SftpFileStreamTest_OpenAsync_* * Add SftpFileStreamTest_WriteAsync_* * Add SftpFileStreamTest_ReadAsync_* * Align AppVeyor script with Test project target frameworks * correct 'Documenation' to 'Documentation' (#838) in the documentation's window title * Agent auth and Keygen (#794) * Allow to set PrivateKeyFile Key directly So you can add your own Key-Classes to SSH.NET * Add ED25519 ctor for just pub key part. * Make ECDSA Key Bits accessible You cant export imported CngKeys. To be able to export them to agent or Key-Files make the private bits also accessible. * Better NETFRAMEWORK vs NETSTANDARD handling * Add Comment Property to Key * Add IPrivateKeySource So Extension can add own PrivateKeyFiles, e.g. PuttyKeyFile. * Use cryptographically secure random number generator. Fixes CVE-2022-29245. * Remove unused import. * Add IBaseClient for BaseClient and ISftpClient to inherit from (#975) Add IBaseClient for BaseClient and ISftpClient to inherit from * fix typo (#999) * Fix Seek Operations in SftpFileStream (#910) * Fix offset operations in SftpFileStream.Seek * Fix seek exception message and add default case for invalid seek origin * Use named params when throwing ArgumentException * Add tests for seeking from end of file * Add back copyright to license. (#1060) Fixes #1059. * Removing old target frameworks (#1109) Remove support for legacy / deprecated target frameworks while adding support for .NET 6.0 (and higher). The supported target frameworks are now: * .NETFramework 4.6.2 (and higher) * .NET Standard 2.0 * .NET 6.0 (and higher) * Remove old features [Part 1] (#1117) Remove obsolete feature switches (now that we've remove support for legacy target frameworks) and remove corresponding conditional code. * Remove FEATURE_DIRECTORYINFO_ENUMERATEFILES (#1119) * Remove FEATURE_DIRECTORYINFO_ENUMERATEFILES * Add exception documentation * Fix some (lots of) issues reported by analyzers. (#1125) Fix some (lots of) issues reported by analyzers. * Round 2 of analyzer fixes and general cleanup. (#1132) * Analyzer fixes round 3. (#1135) * Replace Array<T>.Empty with Array.Empty<T>() (#1137) * Replace IsNullOrWhiteSpace extension (#1142) * Use License Expression for NuGet Package licenseUrl is deprecated, see NuGet/Announcements#32 * Integration tests * Remove todos * Update CODEOWNERS * Use correct SSH.NET * ListDirectoryAsync return IAsyncEnumerable (#1126) * ListDirectoryAsync return IAsyncEnumerable * Fix documentation * Update README.md * Fix * Add Sftp ListDirectoryAsync test * Revert * Integration tests for ListDirectoryAsync with IAsyncEnumerable * Fix the assembly resolution build warning (#1165) * Delete performance/longrunning tests (#1143) Co-authored-by: Wojciech Nagórski <[email protected]> * Move Integration tests (#1173) * Renci.SshNet.IntegrationTests * Renci.SshNet.TestTools.OpenSSH * Move integration tests to main repo * Move old tests to new integration tests * Move old integration tests to new integration tests * Move more tests * Move authentication tests * Move SshClientTests * Fix some tests * Remove duplicated test * Poc of ProcessDisruptor * Rename * Some fixes * Remove performance tests * Small improvements * Add a benchmarks project (#1151) * Add a benchmarks project * Small improvements --------- Co-authored-by: Wojciech Nagórski <[email protected]> * Use ExceptionDispatchInfo to retain call stack in Session.WaitOnHandle() (#936) * Use ExceptionDispatchInfo to retain call stack in Session.WaitOnHandle() * merge * Update src/Renci.SshNet/Session.cs Co-authored-by: Rob Hague <[email protected]> --------- Co-authored-by: Wojciech Nagórski <[email protected]> Co-authored-by: Rob Hague <[email protected]> * Support SHA256 fingerprints for host key validation (#1098) * Add tests for HostKeyEventArgs * Add SHA256 fingerprint support * Add support for RSA SHA-2 public key algorithms (#1177) * Abstract out the hash algorithm from RsaDigitalSignature * Add integration tests * Add DigitalSignature property to KeyHostAlgorithm * Add IHostAlgorithmsProvider interface * Verify the host signature * Fix HostKeyEventArgsTest after merge * Remove PubkeyAcceptedAlgorithms ssh-rsa * Add test coverage for RSA keys in PrivateKeyFile * Obsolete IPrivateKeySource --------- Co-authored-by: Wojciech Nagórski <[email protected]> * Improvements after #1177 (#1180) * Use ExceptionDispatchInfo in more places (#1182) Co-authored-by: Wojciech Nagórski <[email protected]> * Try to "fix" the flaky test (#1185) * Enable DSA tests (#1181) Co-authored-by: Wojciech Nagórski <[email protected]> * FingerPrints (#1186) * Use OS-agnostic socket error codes to allow tests run on different OSes (#1179) SocketErrorCode is OS agnostic, ErrorCode is OS specific. On Windows ErrorCode = (int) SocketErrorCode, but on Mac and Unix it is not. For example ExitCode for HostNotFound (11001) on Windows is 11001, on Mac & Unix is -131073. So testing for ExitCode == 11001 fails on Mac & Unix. Co-authored-by: Wojciech Nagórski <[email protected]> * Fix for channel session semaphore from thread blocking (#1071) * Merging fix from @clivetong into our own SSH.NET fork - The following article describes some of the issues with the double check lock that we have seen issues with: https://www.sudhanshutheone.com/posts/double-check-lock-csharp * Merging fix from @clivetong into our own SSH.NET fork - The following article describes some of the issues with the double check lock that we have seen issues with: https://www.sudhanshutheone.com/posts/double-check-lock-csharp * Update Channel to fix AppVeyor failure (field should be readonly) * Update ISftpClient for #120 (#1193) * Implement set last write and access time (#1194) * Add/migrate hmac+cipher integration tests (#1189) * Add/migrate hmac+cipher integration tests * fix integration tests --------- Co-authored-by: Wojciech Nagórski <[email protected]> * Update tests for SetLastAccessTime(Utc) to also verify the time component and the Kind of the DateTime value returned by GetLastAccessTime(Utc). (#1198) --------- Co-authored-by: Filippo Gualandi <[email protected]> Co-authored-by: 103filgualan <[email protected]> Co-authored-by: Stefan Rinkes <[email protected]> Co-authored-by: wxtsxt <[email protected]> Co-authored-by: Wojciech Swieboda <[email protected]> Co-authored-by: Igor Milavec <[email protected]> Co-authored-by: drieseng <[email protected]> Co-authored-by: Pedro Fonseca <[email protected]> Co-authored-by: Pedro Fonseca <[email protected]> Co-authored-by: Maximiliano Jabase <[email protected]> Co-authored-by: Owen Krueger <[email protected]> Co-authored-by: Masuri <[email protected]> Co-authored-by: LemonPi314 <[email protected]> Co-authored-by: Gert Driesen <[email protected]> Co-authored-by: Rob Hague <[email protected]> Co-authored-by: Rob Hague <[email protected]> Co-authored-by: Marius Thesing <[email protected]> Co-authored-by: Dāvis Mošenkovs <[email protected]> Co-authored-by: Dmitry Tsarevich <[email protected]> Co-authored-by: Patrick Yates <[email protected]>
Version 2023.0.0 has been published https://www.nuget.org/packages/SSH.NET/2023.0.0 |
Thanks so much for publishing the new release and notifying us on this ticket @WojciechNagorski. Much appreciated 🎉 |
* Release 2023.0.0 (#1201) * Assets/logos (#782) * Added logo assets * Added PNG 1260x640 with white border Co-authored-by: 103filgualan <[email protected]> * OPENSSH KeyReader for more keys (#614) * OPENSSH KeyReader for more keys Add support to parse OpenSSH Keys with ECDSA 256/384/521 and RSA. https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.key Change-Id: Iaa9cce0f2522e5fee377a82cb252f81f0b7cc563 * Fix ED25519Key KeyLength * Fix ED25519 PubKey-auth LeadingZeros of BigInteger-Conversion have to be removed before sending the Key. * Add interface to SftpFile #120 (#812) * Create ISftpFile interface. SftpFile sealed. Return ISftpFile from SftpClient instead of SftpFile. Make ISftpClient interface disposable. Co-authored-by: Wojciech Swieboda <[email protected]> * Start MessageListener with ThreadAbstraction.ExecuteThreadLongRunning (#902) * Fix Thread pool exhaustion due to MessageListener running on ThreadPool * Mark long running thread as background * Add async support to SftpClient and SftpFileStream (#819) * Add FEATURE_TAP and net472 target * Add TAP async support to SftpClient and SftpFileStream * Add async support to DnsAbstraction and SocketAbstraction * Add async support to *Connector and refactor the hierarchy * Add ConnectAsync to BaseClient * Add CODEOWNERS file. * Fix virus false-positive by Defender on Renci.SSHNet.Tests.dll (#867) Co-authored-by: Pedro Fonseca <[email protected]> * Add unit tests for task-based asynchronous API (#906) * Fix runtime and culture dependant tests. * Set C# 7.3 in Tests.csproj to limit intellisense's suggestions under different targets * Add SftpClientTest.*Async * Add SftpFileStreamTest_OpenAsync_* * Add SftpFileStreamTest_WriteAsync_* * Add SftpFileStreamTest_ReadAsync_* * Align AppVeyor script with Test project target frameworks * correct 'Documenation' to 'Documentation' (#838) in the documentation's window title * Agent auth and Keygen (#794) * Allow to set PrivateKeyFile Key directly So you can add your own Key-Classes to SSH.NET * Add ED25519 ctor for just pub key part. * Make ECDSA Key Bits accessible You cant export imported CngKeys. To be able to export them to agent or Key-Files make the private bits also accessible. * Better NETFRAMEWORK vs NETSTANDARD handling * Add Comment Property to Key * Add IPrivateKeySource So Extension can add own PrivateKeyFiles, e.g. PuttyKeyFile. * Use cryptographically secure random number generator. Fixes CVE-2022-29245. * Remove unused import. * Add IBaseClient for BaseClient and ISftpClient to inherit from (#975) Add IBaseClient for BaseClient and ISftpClient to inherit from * fix typo (#999) * Fix Seek Operations in SftpFileStream (#910) * Fix offset operations in SftpFileStream.Seek * Fix seek exception message and add default case for invalid seek origin * Use named params when throwing ArgumentException * Add tests for seeking from end of file * Add back copyright to license. (#1060) Fixes #1059. * Removing old target frameworks (#1109) Remove support for legacy / deprecated target frameworks while adding support for .NET 6.0 (and higher). The supported target frameworks are now: * .NETFramework 4.6.2 (and higher) * .NET Standard 2.0 * .NET 6.0 (and higher) * Remove old features [Part 1] (#1117) Remove obsolete feature switches (now that we've remove support for legacy target frameworks) and remove corresponding conditional code. * Remove FEATURE_DIRECTORYINFO_ENUMERATEFILES (#1119) * Remove FEATURE_DIRECTORYINFO_ENUMERATEFILES * Add exception documentation * Fix some (lots of) issues reported by analyzers. (#1125) Fix some (lots of) issues reported by analyzers. * Round 2 of analyzer fixes and general cleanup. (#1132) * Analyzer fixes round 3. (#1135) * Replace Array<T>.Empty with Array.Empty<T>() (#1137) * Replace IsNullOrWhiteSpace extension (#1142) * Use License Expression for NuGet Package licenseUrl is deprecated, see NuGet/Announcements#32 * Integration tests * Remove todos * Update CODEOWNERS * Use correct SSH.NET * ListDirectoryAsync return IAsyncEnumerable (#1126) * ListDirectoryAsync return IAsyncEnumerable * Fix documentation * Update README.md * Fix * Add Sftp ListDirectoryAsync test * Revert * Integration tests for ListDirectoryAsync with IAsyncEnumerable * Fix the assembly resolution build warning (#1165) * Delete performance/longrunning tests (#1143) Co-authored-by: Wojciech Nagórski <[email protected]> * Move Integration tests (#1173) * Renci.SshNet.IntegrationTests * Renci.SshNet.TestTools.OpenSSH * Move integration tests to main repo * Move old tests to new integration tests * Move old integration tests to new integration tests * Move more tests * Move authentication tests * Move SshClientTests * Fix some tests * Remove duplicated test * Poc of ProcessDisruptor * Rename * Some fixes * Remove performance tests * Small improvements * Add a benchmarks project (#1151) * Add a benchmarks project * Small improvements --------- Co-authored-by: Wojciech Nagórski <[email protected]> * Use ExceptionDispatchInfo to retain call stack in Session.WaitOnHandle() (#936) * Use ExceptionDispatchInfo to retain call stack in Session.WaitOnHandle() * merge * Update src/Renci.SshNet/Session.cs Co-authored-by: Rob Hague <[email protected]> --------- Co-authored-by: Wojciech Nagórski <[email protected]> Co-authored-by: Rob Hague <[email protected]> * Support SHA256 fingerprints for host key validation (#1098) * Add tests for HostKeyEventArgs * Add SHA256 fingerprint support * Add support for RSA SHA-2 public key algorithms (#1177) * Abstract out the hash algorithm from RsaDigitalSignature * Add integration tests * Add DigitalSignature property to KeyHostAlgorithm * Add IHostAlgorithmsProvider interface * Verify the host signature * Fix HostKeyEventArgsTest after merge * Remove PubkeyAcceptedAlgorithms ssh-rsa * Add test coverage for RSA keys in PrivateKeyFile * Obsolete IPrivateKeySource --------- Co-authored-by: Wojciech Nagórski <[email protected]> * Improvements after #1177 (#1180) * Use ExceptionDispatchInfo in more places (#1182) Co-authored-by: Wojciech Nagórski <[email protected]> * Try to "fix" the flaky test (#1185) * Enable DSA tests (#1181) Co-authored-by: Wojciech Nagórski <[email protected]> * FingerPrints (#1186) * Use OS-agnostic socket error codes to allow tests run on different OSes (#1179) SocketErrorCode is OS agnostic, ErrorCode is OS specific. On Windows ErrorCode = (int) SocketErrorCode, but on Mac and Unix it is not. For example ExitCode for HostNotFound (11001) on Windows is 11001, on Mac & Unix is -131073. So testing for ExitCode == 11001 fails on Mac & Unix. Co-authored-by: Wojciech Nagórski <[email protected]> * Fix for channel session semaphore from thread blocking (#1071) * Merging fix from @clivetong into our own SSH.NET fork - The following article describes some of the issues with the double check lock that we have seen issues with: https://www.sudhanshutheone.com/posts/double-check-lock-csharp * Merging fix from @clivetong into our own SSH.NET fork - The following article describes some of the issues with the double check lock that we have seen issues with: https://www.sudhanshutheone.com/posts/double-check-lock-csharp * Update Channel to fix AppVeyor failure (field should be readonly) * Update ISftpClient for #120 (#1193) * Implement set last write and access time (#1194) * Add/migrate hmac+cipher integration tests (#1189) * Add/migrate hmac+cipher integration tests * fix integration tests --------- Co-authored-by: Wojciech Nagórski <[email protected]> * Update tests for SetLastAccessTime(Utc) to also verify the time component and the Kind of the DateTime value returned by GetLastAccessTime(Utc). (#1198) --------- Co-authored-by: Filippo Gualandi <[email protected]> Co-authored-by: 103filgualan <[email protected]> Co-authored-by: Stefan Rinkes <[email protected]> Co-authored-by: wxtsxt <[email protected]> Co-authored-by: Wojciech Swieboda <[email protected]> Co-authored-by: Igor Milavec <[email protected]> Co-authored-by: drieseng <[email protected]> Co-authored-by: Pedro Fonseca <[email protected]> Co-authored-by: Pedro Fonseca <[email protected]> Co-authored-by: Maximiliano Jabase <[email protected]> Co-authored-by: Owen Krueger <[email protected]> Co-authored-by: Masuri <[email protected]> Co-authored-by: LemonPi314 <[email protected]> Co-authored-by: Gert Driesen <[email protected]> Co-authored-by: Rob Hague <[email protected]> Co-authored-by: Rob Hague <[email protected]> Co-authored-by: Marius Thesing <[email protected]> Co-authored-by: Dāvis Mošenkovs <[email protected]> Co-authored-by: Dmitry Tsarevich <[email protected]> Co-authored-by: Patrick Yates <[email protected]> * Remove code examples --------- Co-authored-by: Filippo Gualandi <[email protected]> Co-authored-by: 103filgualan <[email protected]> Co-authored-by: Stefan Rinkes <[email protected]> Co-authored-by: wxtsxt <[email protected]> Co-authored-by: Wojciech Swieboda <[email protected]> Co-authored-by: Igor Milavec <[email protected]> Co-authored-by: drieseng <[email protected]> Co-authored-by: Pedro Fonseca <[email protected]> Co-authored-by: Pedro Fonseca <[email protected]> Co-authored-by: Maximiliano Jabase <[email protected]> Co-authored-by: Owen Krueger <[email protected]> Co-authored-by: Masuri <[email protected]> Co-authored-by: LemonPi314 <[email protected]> Co-authored-by: Gert Driesen <[email protected]> Co-authored-by: Rob Hague <[email protected]> Co-authored-by: Rob Hague <[email protected]> Co-authored-by: Marius Thesing <[email protected]> Co-authored-by: Dāvis Mošenkovs <[email protected]> Co-authored-by: Dmitry Tsarevich <[email protected]> Co-authored-by: Patrick Yates <[email protected]>
Reason
When running continuously over a long period of time we see after 3 or 4 days our SshConnections lock up and no longer allow us to run ssh commands. We debugged this and found it was creating a thread blocking issue
Diagnosis
After analysing the memory dumps a colleague found that the session semaphore's were causing this and we have run several tests over reasonably long periods of time which confirm to us that the fix in this PR resolves our issue.
Fix
The fix is to remove the line where we set _session to null. ISession itself does not seem to be disposed of properly so you may want to make additional changes, however simply not setting to null appears to allow the GC to clean this up anyway. This article is not describing the exact same issue, but is related: https://www.sudhanshutheone.com/posts/double-check-lock-csharp
If you have any questions about the fix or the issues we found I am happy to discuss further/put you in touch with my colleague.