SSH session may be broken or closed during the operation #730

AstroProfundis · 2020-08-29T18:00:24Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

What did you do?

We store SSH sessions for hosts in context and reuse them for every SSH operations, but it is possible for the session to be closed or somehow broken before we finish all commands.

There is a cased that when ControlMaster and ControlPath is set in the SSH config of console server, random timeout error might be observed during operations like start for the cluster, while filtering only some roles / nodes avoids that error.

It's possible that when there are more instances than some number, the SSH session is used too late after it was created, and at that time it is already closed or stalled.

What did you expect to see?
We should find a better way handling SSH sessions, maybe send keepalive packets?
What did you see instead?
Some operations got random timeout error on large cluster.
What version of TiUP are you using (tiup --version)?
v1.0.8, and should be valid on v1.1.0 too.

The text was updated successfully, but these errors were encountered:

AstroProfundis added type/bug Categorizes issue as related to a bug. category/stability Categorizes issue or PR as a stability enhancement. labels Aug 29, 2020

AstroProfundis mentioned this issue Aug 29, 2020

Add an option to skip the forced cluster start operation during scaling out #729

Closed

july2993 added the status/WIP label Aug 31, 2020

lucklove added status/TODO Categorizes issue as we will do it. and removed status/WIP labels Aug 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSH session may be broken or closed during the operation #730

SSH session may be broken or closed during the operation #730

AstroProfundis commented Aug 29, 2020

SSH session may be broken or closed during the operation #730

SSH session may be broken or closed during the operation #730

Comments

AstroProfundis commented Aug 29, 2020

Bug Report