Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SshOperationTimeout issue while using renci library #840

Closed
vivek4434 opened this issue Jun 28, 2021 · 11 comments
Closed

SshOperationTimeout issue while using renci library #840

vivek4434 opened this issue Jun 28, 2021 · 11 comments
Milestone

Comments

@vivek4434
Copy link

Hi everyone,

We are using renci library for SSHing into routers. In past, ssh timeout issue was around but the count was low. But now we are seeing the timeout errors more frequently. Below are error logs in failure calls:

FYI : Text in bold are custom logs.

ReceiveMessage from server: 'KeyExchangeInitMessage': 'SSH_MSG_KEXINIT'.
Inside : OnKeyExchangeInitReceived
ReceiveMessage from server: 'KeyExchangeDhGroupExchangeGroup': 'SSH_MSG_KEX_DH_GEX_GROUP'.
ReceiveMessage from server: 'KeyExchangeDhGroupExchangeReply': 'SSH_MSG_KEX_DH_GEX_REPLY'.
ReceiveMessage from server: 'NewKeysMessage': 'SSH_MSG_NEWKEYS'.
ReceiveMessage from server: 'ServiceAcceptMessage': 'SSH_MSG_SERVICE_ACCEPT'.
ReceiveMessage from server: 'FailureMessage': 'SSH_MSG_USERAUTH_FAILURE'.
ReceiveMessage from server: 'InformationRequestMessage': 'SSH_MSG_USERAUTH_INFO_REQUEST'.
ReceiveMessage from server: 'FailureMessage': 'SSH_MSG_USERAUTH_FAILURE'.
ReceiveMessage from server: 'FailureMessage': 'SSH_MSG_USERAUTH_FAILURE'.

Renci.SshNet.Common.SshOperationTimeoutException: Session operation has timed out
at Renci.SshNet.Session.WaitOnHandle(WaitHandle waitHandle)
at Renci.SshNet.Session.Connect()
….

Sometime we do see below logs:
ReceiveMessage from server: 'KeyExchangeInitMessage': 'SSH_MSG_KEXINIT'.
Inside : OnKeyExchangeInitReceived

System.Net.Sockets.SocketException (0x80004005): An established connection was aborted by the software in your host machine
at Renci.SshNet.Session.WaitOnHandle(WaitHandle waitHandle)
at Renci.SshNet.Session.Connect()
….

Sometime we see below logs:
ReceiveMessage from server: 'KeyExchangeInitMessage': 'SSH_MSG_KEXINIT'.
Inside : OnKeyExchangeInitReceived
ReceiveMessage from server: 'KeyExchangeDhGroupExchangeGroup': 'SSH_MSG_KEX_DH_GEX_GROUP'.
ReceiveMessage from server: 'KeyExchangeDhGroupExchangeReply': 'SSH_MSG_KEX_DH_GEX_REPLY'.

Renci.SshNet.Common.SshOperationTimeoutException: Session operation has timed out
at Renci.SshNet.Session.WaitOnHandle(WaitHandle waitHandle)
at Renci.SshNet.Session.Connect()
….

We tried adding more logs on the client side to analyze what causing the sshtimeout, we observed that, failure is coming at different stages before Channel open success stage as :

  • It is happening after/at-time of Key exchange init
  • It is happening after/at-time of user authorisation
  • It is happening after/at-time of service acceptance

Connection timeout at each wait handle (KeyExchange, ServiceAccept, UserAuth) were increased from default 30 to 180 with little improvements.

What we experimented in renci code?
While trying to make parallel calls, we observed that calls were waiting for semaphore to acquire. After looking into renci code, it was noticed that the semaphore value was hardcoded to 30 (old renci value), which was stopping 31st call to go-in.
And this causes thread to accumulate and end result is – our application was end-up running with high thread counts and hence it requires timely restart otherwise application becomes unresponsive. Below code reference:

SSH.NET/Session.cs at develop · sshnet/SSH.NET (github.com)
Private static readonly SemaphoreLight AuthenticationConnection = new semaphoreLight(3) – Semaphore is static and is shared across sessions

To overcomes this problem we thought of increasing semaphore value to 3000 and observe the application performance but since then we are seeing increase in SshOperationTimeout issue.

Logically speaking, increasing semaphore value is just making more calls to go-in instead of waiting for semaphore. So, as per our understanding it should not create any problem but we would like to double check on it.

Is there any limit in Renci wrt number of concurrent ssh sessions ?

Can anyone from community help to resolve the problem or have insights around the same?

@vivek4434
Copy link
Author

Adding @drieseng
If you have any thoughts around the same.

@drieseng
Copy link
Member

@vivek4434 I wasn't involved when the initial version of SSH.NET was implemented. I suppose the concurrent authentication threshold was put in place because some servers reject all too many concurrent login attempts, but I can't be sure of that.

Do you establish multiple connections to the same server (with the same credentials)?

Can you access (SSH) logs on these routers?

@giantjunkbox
Copy link

I was having the exact same problem awhile ago with Cisco gear. The problem would seem to be somewhat random; however I could keep re-trying and it would eventually work.

I could SSH into the gear without issue using Putty.

I downgraded to 2016.1.0 and all of the problems went away.

I thought it might have to do with executing a command via the SSH library, not writing the commands to an output stream. That may, or may not, be true.

Perhaps try downgrading and see if the problem goes away?

What kind of gear are you using? Can you share the make, model, and version of code they are running?

@vivek4434
Copy link
Author

@drieseng , yes, I am trying to connect multiple routers and that too with same and different credentials i.e depends on device.
Let me try finding logs on the router.

@giantjunkbox , I am trying different devices ... like Arista, Juniper, Cisco etc. When you downgraded to 2016.1.0 do you see any eye catching difference in code? Let me try to share more info on version currently in use.

@vivek4434
Copy link
Author

vivek4434 commented Jun 30, 2021

@giantjunkbox could you please help in checking whether the version you are using contains below code changes:

protected void PopulateClientExchangeValue()
        {
            if (_group.IsZero)
                throw new ArgumentNullException("_group");

            if (_prime.IsZero)
                throw new ArgumentNullException("_prime");

            // generate private exponent that is twice the hash size (RFC 4419) with a minimum
            // of 1024 bits (whatever is less)
            var privateExponentSize = Math.Max(HashSize * 2, 1024);

            do
            {
                // create private component
                _privateExponent = BigInteger.Random(privateExponentSize);
                // generate public component
                _clientExchangeValue = BigInteger.ModPow(_group, _privateExponent, _prime);
            } while (_clientExchangeValue < 1 || _clientExchangeValue > (_prime - 1));
        }

in class : KeyExchangeDiffieHellman.cs

@giantjunkbox
Copy link

giantjunkbox commented Jul 1, 2021

@vivek4434 I'm using the 2020.0.1 build from nuget. The method in that version appears to be different then the above. I included it below.

Is there newer code checked in that has not been released yet?

Thank you

/// <summary>Populates the client exchange value.</summary>
protected void PopulateClientExchangeValue()
{
  if (this._group.IsZero)
    throw new ArgumentNullException("_group");
  if (this._prime.IsZero)
    throw new ArgumentNullException("_prime");
  int bitLength = Math.Max(this.HashSize * 2, 1024);
  BigInteger bigInteger;
  do
  {
    this._privateExponent = BigInteger.Random(bitLength);
    bigInteger = BigInteger.ModPow(this._group, this._privateExponent, this._prime);
  }
  while (bigInteger < 1L || bigInteger > this._prime - (BigInteger) 1);
  this._clientExchangeValue = bigInteger.ToByteArray().Reverse<byte>();
}

@giantjunkbox
Copy link

@vivek4434 2016.1.0 was a drop in replacement for what I was using it for. Presumably there are some new features or what not that caused the version number to bump. Easy way to tell, just downgrade the package and see if your code still builds ;).

@vivek4434
Copy link
Author

Hi @giantjunkbox , I tried making few changes in the code, now I am seeing issue :
ErrorCode: 10053, System.Net.Sockets.SocketException (0x80004005): An established connection was aborted by the software in your host machine
at Renci.SshNet.Session.WaitOnHandle(WaitHandle waitHandle)
at Renci.SshNet.Session.Connect()
at Renci.SshNet.BaseClient.Connect()
...

Wondering if you ever faced this issue.
Tagging - @drieseng if you know about such issue already.

@giantjunkbox
Copy link

@vivek4434 I haven't. The only issue I have encountered is the timeout issue.

Can you confirm that the older version works for you?

If you aren't doing it already, perhaps working backwards from what changed would be good. It is also interesting, at least to me, that it is somewhat random. In my mind that means either a race condition or a deliberately random part of the connection setup is making poor choices.

Thanks

@WojciechNagorski
Copy link
Collaborator

It's done in #1274

@WojciechNagorski WojciechNagorski added this to the 2023.0.1 milestone Dec 21, 2023
@WojciechNagorski
Copy link
Collaborator

The 2023.0.1 version has been released to Nuget: https://www.nuget.org/packages/SSH.NET/2023.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants