SSH session may be broken or closed during the operation #730
Labels
category/stability
Categorizes issue or PR as a stability enhancement.
status/TODO
Categorizes issue as we will do it.
type/bug
Categorizes issue as related to a bug.
Bug Report
Please answer these questions before submitting your issue. Thanks!
We store SSH sessions for hosts in context and reuse them for every SSH operations, but it is possible for the session to be closed or somehow broken before we finish all commands.
There is a cased that when
ControlMaster
andControlPath
is set in the SSH config of console server, random timeout error might be observed during operations likestart
for the cluster, while filtering only some roles / nodes avoids that error.It's possible that when there are more instances than some number, the SSH session is used too late after it was created, and at that time it is already closed or stalled.
What did you expect to see?
We should find a better way handling SSH sessions, maybe send keepalive packets?
What did you see instead?
Some operations got random timeout error on large cluster.
What version of TiUP are you using (
tiup --version
)?v1.0.8, and should be valid on v1.1.0 too.
The text was updated successfully, but these errors were encountered: