-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: synchronously verify server preface in newClientTransport #5731
Conversation
|
||
// OnceFunc returns a function wrapping f which ensures f is only executed | ||
// once even if the returned function is executed multiple times. | ||
func OnceFunc(f func()) func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is super trivial, but given that it is exported and can be used from non-test code, adding a test would be good.
internal/transport/http2_client.go
Outdated
return | ||
} | ||
t.onPrefaceReceipt() | ||
close(errCh) // received settings frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is closing this channel here any different to pushing a nil
error? I believe not. Should we instead push a nil
error here instead of closing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷
They're equivalent. I'm okay either way but this seems fine and I see no compelling arguments in favor of pushing a nil
explicitly. Is there a better primitive for this purpose than a channel, perhaps?
WDYT about this (see diffs) refactor into another function?
internal/transport/http2_client.go
Outdated
// Monitor context; close connection if expired or canceled before returning. | ||
ctxMonitorDone := grpcsync.NewEvent() | ||
newClientCtx, newClientDone := context.WithCancel(connectCtx) | ||
defer func() { | ||
newClientDone() | ||
// Wait for the goroutine to exit. If we do not wait before returning, | ||
// the caller could cancel the connectCtx after we return, but we might | ||
// see this and close the connection. | ||
<-ctxMonitorDone.Done() | ||
}() | ||
go func(conn net.Conn) { | ||
defer ctxMonitorDone.Fire() // Signal this goroutine has exited. | ||
<-newClientCtx.Done() | ||
if connectCtx.Err() == nil { | ||
// Only newClientCtx was canceled; success. | ||
return | ||
} | ||
// connectCtx expired. Hard close the connection. | ||
conn.Close() | ||
}(conn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code was not easy to read and parse and understand. All your comments make absolute sense after I understood what the code is doing. But it did not help initially to understand what is happening in here and why.
In fact my other comment (and concern) about the deadline not being set on the underlying conn is cleared now that I understand what is happening in here. But I think if I come back to this piece of code after 6 months, I will again spend a lot of time trying to understand what is happening.
Could you please be a little more verbose in the comments here. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. Responses inline.
internal/transport/http2_client.go
Outdated
// Monitor context; close connection if expired or canceled before returning. | ||
ctxMonitorDone := grpcsync.NewEvent() | ||
newClientCtx, newClientDone := context.WithCancel(connectCtx) | ||
defer func() { | ||
newClientDone() | ||
// Wait for the goroutine to exit. If we do not wait before returning, | ||
// the caller could cancel the connectCtx after we return, but we might | ||
// see this and close the connection. | ||
<-ctxMonitorDone.Done() | ||
}() | ||
go func(conn net.Conn) { | ||
defer ctxMonitorDone.Fire() // Signal this goroutine has exited. | ||
<-newClientCtx.Done() | ||
if connectCtx.Err() == nil { | ||
// Only newClientCtx was canceled; success. | ||
return | ||
} | ||
// connectCtx expired. Hard close the connection. | ||
conn.Close() | ||
}(conn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done?
internal/transport/http2_client.go
Outdated
return | ||
} | ||
t.onPrefaceReceipt() | ||
close(errCh) // received settings frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷
They're equivalent. I'm okay either way but this seems fine and I see no compelling arguments in favor of pushing a nil
explicitly. Is there a better primitive for this purpose than a channel, perhaps?
WDYT about this (see diffs) refactor into another function?
internal/grpcsync/oncefunc_test.go
Outdated
block.Fire() // Unblock them. | ||
wg.Wait() // Wait for them to complete. | ||
if v != 1 { | ||
t.Fatalf("v = %v; want 1", v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
t.Fatalf("OnceFunc() called %v times, want 1", v)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
internal/transport/transport_test.go
Outdated
_, err := sfr.ReadFrame() | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: combine the two lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
internal/transport/http2_client.go
Outdated
func (t *http2Client) reader(errCh chan<- error) { | ||
defer close(t.readerDone) | ||
|
||
t.readServerPreface(errCh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead make readServerPreface()
return an error and handle it here (push it on to the channel and return)? Currently, readServerPreface()
might have run into an error and pushed something on to the channel. But since we don't have a return value from it, we continue in this function instead of returning early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly liked it this way because it could defer the channel closure. But we probably should be returning early from this function even though it's okay not to (it will exit when a connection error occurs). Maybe it's not that useful to be a separate function, but I'll leave it. Reworked.
internal/transport/transport_test.go
Outdated
copts := ConnectOptions{ChannelzParentID: channelz.NewIdentifierForTesting(channelz.RefSubChannel, time.Now().Unix(), nil)} | ||
_, err = NewClientTransport(connectCtx, context.Background(), resolver.Address{Addr: lis.Addr().String()}, copts, func(GoAwayReason) {}, func() {}) | ||
t.Logf("NewClientTransport() = _, %v", err) | ||
if time.Now().Sub(timeBefore) > 2*time.Second { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this 2s
and 1.5s
come from. The connectCtx
is canceled in 100ms
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to say "1.9". I tried different values here to balance the test running quickly with not having false failures/successes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why we want to be so generous with the time it takes to return from NewClientTransport
. We cancel the context in 100ms
. Do we really have to give it 2s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tighter you make this check, the more likely you are to have a spurious failure caused by it taking too long to exit, which could be due to a hiccup on github actions / etc. Ultimately, if we require that it is less than the deadline (3s) then we should be able to distinguish the reason it exited. And actually, I'll extend the deadline and loosen this slightly since we're not concerned about how long it takes to work if it fails.
And, since there is no secondary cancelation mechanism in the second case, I just removed the time check entirely; the only failure is if no error is returned or if it never returns and times out because it's not honoring the context.
internal/transport/transport_test.go
Outdated
_, err = NewClientTransport(connectCtx, context.Background(), resolver.Address{Addr: lis.Addr().String()}, copts, func(GoAwayReason) {}, func() {}) | ||
t.Logf("NewClientTransport() = _, %v", err) | ||
if time.Now().Sub(timeBefore) > 2*time.Second { | ||
t.Fatalf("NewClientTransport returned > 1.5s after context deadline") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments for this block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this, too. Couldn't find this comment in the diffs view for some reason.
internal/transport/transport_test.go
Outdated
copts := ConnectOptions{ChannelzParentID: channelz.NewIdentifierForTesting(channelz.RefSubChannel, time.Now().Unix(), nil)} | ||
_, err = NewClientTransport(connectCtx, context.Background(), resolver.Address{Addr: lis.Addr().String()}, copts, func(GoAwayReason) {}, func() {}) | ||
t.Logf("NewClientTransport() = _, %v", err) | ||
if time.Now().Sub(timeBefore) > 2*time.Second { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why we want to be so generous with the time it takes to return from NewClientTransport
. We cancel the context in 100ms
. Do we really have to give it 2s
?
internal/transport/transport_test.go
Outdated
|
||
// Test context cancelation. | ||
timeBefore := time.Now() | ||
connectCtx, cancel := context.WithTimeout(context.Background(), 3*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please use defaultTestTimeout
which is defined for this package instead of a hard-coded value. We usually end up with different values in different tests over time. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. If it ever changes to a value less than 3s then this test becomes broken, though, so it is more sensitive to the value than most other tests.
Fixes #5385
grpcsync.OnceFunc
, which is a pattern I also needed in the ORCA PR (orca: create ORCA producer for LB policies to use to receive OOB load reports #5669).ClientConn.Close()
block until all resources are released easier -- previously, if a connection was in flight it would take 20+s for it to stop, which would/should delay that return. With these changes it will stop very quickly.RELEASE NOTES: none