-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent blocking forever when transport channel fails to open #11875
Conversation
I could be wrong, but I would just backport to any versions where customers are currently hitting this. If it's an uncommon issue, just v9 should be fine. |
Sounds good to me. For some context I found this while working on some agent changes that would reject this channel in certain cases. Other than that this should be very rare and only result in a small memory leak. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the tests 👍
api/utils/sshutils/conn_test.go
Outdated
go func() { | ||
defer conn.Close() | ||
sconn, _, _, err := ssh.NewServerConn(conn, s.config) | ||
require.NoError(s.t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using require
within a goroutine is a bad idea. If the assertion fails it calls t.FailNow
and according to the docs shouldn't be called from spawned goroutines:
FailNow marks the function as having failed and stops its execution by calling runtime.Goexit (which then runs all deferred calls in the current goroutine). Execution will continue at the next test or benchmark. FailNow must be called from the goroutine running the test or benchmark function, not from other goroutines created during the test. Calling FailNow does not stop those other goroutines.
https://pkg.go.dev/testing#T.FailNow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops didn't realize this. Updated a59074c
api/utils/sshutils/conn_test.go
Outdated
}) | ||
|
||
go server.Run() | ||
defer server.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer server.Stop() | |
t.Cleanup(func() {require.NoError(t, server.Stop()}) |
api/utils/sshutils/conn_test.go
Outdated
defer server.Stop() | ||
|
||
sconn, nc, _ := server.GetClient() | ||
defer sconn.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer sconn.Close() | |
t.Cleanup(func() {require.NoError(t, sconn.Close()}) |
api/utils/sshutils/conn_test.go
Outdated
require.Error(t, err) | ||
|
||
sconn, nc, _ = server.GetClient() | ||
defer sconn.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer sconn.Close() | |
t.Cleanup(func() { require.NoError(t, sconn.Close()}) |
func generateSigner(t *testing.T) ssh.Signer { | ||
private, err := rsa.GenerateKey(rand.Reader, 2048) | ||
require.NoError(t, err) | ||
|
||
block := &pem.Block{ | ||
Type: "RSA PRIVATE KEY", | ||
Bytes: x509.MarshalPKCS1PrivateKey(private), | ||
} | ||
|
||
privatePEM := pem.EncodeToMemory(block) | ||
signer, err := ssh.ParsePrivateKey(privatePEM) | ||
require.NoError(t, err) | ||
|
||
return signer | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a helper for that already
teleport/lib/auth/native/native.go
Line 152 in bb121d7
func GenerateKeyPair(passphrase string) ([]byte, []byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we import any thing from /lib
into /api
so leaving this as is.
api/utils/sshutils/conn_test.go
Outdated
s.mu.RLock() | ||
if s.closed { | ||
s.mu.RUnlock() | ||
return | ||
} | ||
s.mu.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the closed
flag here? After you close the listener you will get EOF
or connection closed
error which you can use to know that the listener has been closed. Example:
teleport/lib/srv/db/postgres/test.go
Line 116 in 06fef2a
if utils.IsOKNetworkError(err) { |
api/utils/sshutils/conn_test.go
Outdated
config *ssh.ServerConfig | ||
handler func(*ssh.ServerConn) | ||
t *testing.T | ||
mu sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it's hard to say what this mutex is protecting. From what I see it's used for closed
flag. Can you rename it to myClosed
for example to indicate that?
api/utils/sshutils/conn_test.go
Outdated
listener net.Listener | ||
config *ssh.ServerConfig | ||
handler func(*ssh.ServerConn) | ||
t *testing.T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe this is my personal preference but I'd remove t
from here and just pass it as an argument to each function that needs it. I think it's easier to understand the code if you know which function needs t
, but if you want to keep it here I don't mind either.
api/utils/sshutils/conn_test.go
Outdated
go func() { | ||
defer conn.Close() | ||
sconn, _, _, err := ssh.NewServerConn(conn, s.config) | ||
assert.NoError(s.t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert
or require
are not going to work. You can only make assertion from the main goroutine. I think that the best that you can do there is to create an chan error
and propagate the error from the goroutine. Example
teleport/lib/srv/db/access_test.go
Lines 1196 to 1215 in 663e3d0
asyncErrors := make(chan error, concurrentConnections) | |
defer close(asyncErrors) | |
for i := 0; i < concurrentConnections; i++ { | |
wg.Add(1) | |
go func() { | |
defer wg.Done() | |
if err := increment("counter"); err != nil { | |
asyncErrors <- err | |
} | |
}() | |
} | |
wg.Wait() | |
select { | |
case err := <-asyncErrors: | |
require.FailNow(t, "failed to increment counter", err) | |
default: | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah didn't realize this thanks. Updated here a59074c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to use a channel and now I understand not to use require
since it calls FailNow under the hood. But using assert
calls Fail
under the hood which waits for the test to return so it sounds like that should be safe to call from a goroutine as long as we wait for goroutines to finish before the test ends.
I don't plan on changing this back to using assert just trying to understand if it would work.
#16506) Co-authored-by: David Boslee <[email protected]>
#16506) Co-authored-by: David Boslee <[email protected]>
#16510) Co-authored-by: David Boslee <[email protected]>
This was an issue caused by calling
ssh.DiscardRequests
when an error occurs while opening a transport channel. This call would block forever. This scenario can occur when the channel is rejected or the connection closes while trying to open the channel.I've added tests to ensure these scenarios are no longer blocking.
This was added in #5625 Which was first introduced in teleport v7 and backported to v6.
Let me know what versions I should backport this fix to.