Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent blocking forever when transport channel fails to open #11875

Merged
merged 15 commits into from
Apr 20, 2022

Conversation

dboslee
Copy link
Contributor

@dboslee dboslee commented Apr 11, 2022

This was an issue caused by calling ssh.DiscardRequests when an error occurs while opening a transport channel. This call would block forever. This scenario can occur when the channel is rejected or the connection closes while trying to open the channel.

I've added tests to ensure these scenarios are no longer blocking.

This was added in #5625 Which was first introduced in teleport v7 and backported to v6.

Let me know what versions I should backport this fix to.

@github-actions github-actions bot requested review from jakule and timothyb89 April 11, 2022 16:11
@dboslee dboslee requested a review from Joerger April 11, 2022 16:12
@dboslee dboslee self-assigned this Apr 11, 2022
@Joerger
Copy link
Contributor

Joerger commented Apr 11, 2022

I could be wrong, but I would just backport to any versions where customers are currently hitting this. If it's an uncommon issue, just v9 should be fine.

@dboslee
Copy link
Contributor Author

dboslee commented Apr 11, 2022

I could be wrong, but I would just backport to any versions where customers are currently hitting this. If it's an uncommon issue, just v9 should be fine.

Sounds good to me. For some context I found this while working on some agent changes that would reject this channel in certain cases. Other than that this should be very rare and only result in a small memory leak.

@pierrebeaucamp pierrebeaucamp added bug blocked is blocked by another item - please include the blocker labels Apr 14, 2022
Copy link
Contributor

@hatched hatched left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the tests 👍

go func() {
defer conn.Close()
sconn, _, _, err := ssh.NewServerConn(conn, s.config)
require.NoError(s.t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using require within a goroutine is a bad idea. If the assertion fails it calls t.FailNow and according to the docs shouldn't be called from spawned goroutines:

FailNow marks the function as having failed and stops its execution by calling runtime.Goexit (which then runs all deferred calls in the current goroutine). Execution will continue at the next test or benchmark. FailNow must be called from the goroutine running the test or benchmark function, not from other goroutines created during the test. Calling FailNow does not stop those other goroutines.
https://pkg.go.dev/testing#T.FailNow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops didn't realize this. Updated a59074c

})

go server.Run()
defer server.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer server.Stop()
t.Cleanup(func() {require.NoError(t, server.Stop()})

defer server.Stop()

sconn, nc, _ := server.GetClient()
defer sconn.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer sconn.Close()
t.Cleanup(func() {require.NoError(t, sconn.Close()})

require.Error(t, err)

sconn, nc, _ = server.GetClient()
defer sconn.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer sconn.Close()
t.Cleanup(func() { require.NoError(t, sconn.Close()})

Comment on lines +75 to +89
func generateSigner(t *testing.T) ssh.Signer {
private, err := rsa.GenerateKey(rand.Reader, 2048)
require.NoError(t, err)

block := &pem.Block{
Type: "RSA PRIVATE KEY",
Bytes: x509.MarshalPKCS1PrivateKey(private),
}

privatePEM := pem.EncodeToMemory(block)
signer, err := ssh.ParsePrivateKey(privatePEM)
require.NoError(t, err)

return signer
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a helper for that already

func GenerateKeyPair(passphrase string) ([]byte, []byte, error) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we import any thing from /lib into /api so leaving this as is.

Comment on lines 50 to 55
s.mu.RLock()
if s.closed {
s.mu.RUnlock()
return
}
s.mu.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the closed flag here? After you close the listener you will get EOF or connection closed error which you can use to know that the listener has been closed. Example:

if utils.IsOKNetworkError(err) {

config *ssh.ServerConfig
handler func(*ssh.ServerConn)
t *testing.T
mu sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it's hard to say what this mutex is protecting. From what I see it's used for closed flag. Can you rename it to myClosed for example to indicate that?

listener net.Listener
config *ssh.ServerConfig
handler func(*ssh.ServerConn)
t *testing.T
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe this is my personal preference but I'd remove t from here and just pass it as an argument to each function that needs it. I think it's easier to understand the code if you know which function needs t, but if you want to keep it here I don't mind either.

go func() {
defer conn.Close()
sconn, _, _, err := ssh.NewServerConn(conn, s.config)
assert.NoError(s.t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert or require are not going to work. You can only make assertion from the main goroutine. I think that the best that you can do there is to create an chan error and propagate the error from the goroutine. Example

asyncErrors := make(chan error, concurrentConnections)
defer close(asyncErrors)
for i := 0; i < concurrentConnections; i++ {
wg.Add(1)
go func() {
defer wg.Done()
if err := increment("counter"); err != nil {
asyncErrors <- err
}
}()
}
wg.Wait()
select {
case err := <-asyncErrors:
require.FailNow(t, "failed to increment counter", err)
default:
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah didn't realize this thanks. Updated here a59074c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to use a channel and now I understand not to use require since it calls FailNow under the hood. But using assert calls Fail under the hood which waits for the test to return so it sounds like that should be safe to call from a goroutine as long as we wait for goroutines to finish before the test ends.

I don't plan on changing this back to using assert just trying to understand if it would work.

@dboslee dboslee requested review from jakule and rosstimothy April 14, 2022 23:16
@dboslee dboslee enabled auto-merge (squash) April 18, 2022 21:13
@dboslee dboslee disabled auto-merge April 19, 2022 20:09
@dboslee dboslee merged commit 9d6b093 into master Apr 20, 2022
@dboslee dboslee deleted the david/fix-ssh-reject branch April 20, 2022 15:26
@pierrebeaucamp pierrebeaucamp added blocked is blocked by another item - please include the blocker and removed blocked is blocked by another item - please include the blocker labels Apr 20, 2022
@pierrebeaucamp pierrebeaucamp removed the blocked is blocked by another item - please include the blocker label Apr 28, 2022
@webvictim webvictim mentioned this pull request Jun 8, 2022
espadolini added a commit that referenced this pull request Sep 19, 2022
espadolini added a commit that referenced this pull request Sep 19, 2022
espadolini added a commit that referenced this pull request Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants