-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: close net.Conn unconditionally after accepting #106072
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall)
pkg/sql/pgwire/pre_serve.go
line 220 at r1 (raw file):
_ = s.sendErr(ctx, s.st, conn, err) _ = conn.Close()
If you remove this one you need to edit the code in pkg/server/server_controller_sql.go
to close the conn after sendSQLRoutingError
is called (on the error path).
pkg/sql/pgwire/pre_serve.go
line 332 at r1 (raw file):
} else { return conn, st, errors.New("cannot read cancel key") }
I'm noticing there is a bug here from when I refactored this:
when readCancelKey
returns without error, the code here should still return
inconditionally. The code under the switch
becomes invalid.
When the conn.Close
was still there (before your change) the bug was invisible, it would just result in some bogus error in the first read below the switch
. But now you're not closing the conn any more the bug becomes apparent.
pkg/sql/pgwire/pre_serve.go
line 379 at r1 (raw file):
State: PreServeCancel, CancelKey: key, }, nil
ditto cancel
pkg/util/netutil/net.go
line 187 at r1 (raw file):
err := s.stopper.RunAsyncTask(ctx, "tcp-serve", func(ctx context.Context) { defer func() { _ = rw.Close()
Are you sure about this? Why does this need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/sql/pgwire/pre_serve.go
line 220 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
If you remove this one you need to edit the code in
pkg/server/server_controller_sql.go
to close the conn aftersendSQLRoutingError
is called (on the error path).
AFAICT the conn net.Conn
param of serverController.sqlMux
always comes from TCPServer.ServeWith
so it will be closed after the serveConn func(context.Context, net.Conn)
callback completes.
pkg/sql/pgwire/pre_serve.go
line 332 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I'm noticing there is a bug here from when I refactored this:
whenreadCancelKey
returns without error, the code here should stillreturn
inconditionally. The code under theswitch
becomes invalid.When the
conn.Close
was still there (before your change) the bug was invisible, it would just result in some bogus error in the first read below theswitch
. But now you're not closing the conn any more the bug becomes apparent.
Maybe I am misunderstanding, but the code under the switch will never run because either the if
returns or the else
returns. I can remove the else
if you like, but that is just a code style change.
pkg/util/netutil/net.go
line 187 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Are you sure about this? Why does this need to change?
I was having trouble following where the connection was being closed because it happens in many different layers in different branches (it is also hard to tell if every single branch ends up closing it somewhere).
I think we can always do it here instead and the various layers inside serveConn(ctx, rw)
just need to return in order for the connection to end up closing instead of being coupled to closing it directly.
There are other resources we create in the server that I think could follow the same pattern to reduce the chance of it not being closed properly (either a branch is missing the resource.close()
call or panic happens - the defer will be called regardless) and to reduce coupling between the various layers of useResource(resource)
and the resource's lifecycle.
resource := createResource()
defer resource.close()
useResource(resource)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a double take the first time around but thanks to Evan's explanation all became clear.
LGTM - but @rafiss pls have a look too.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall)
pkg/sql/pgwire/pre_serve.go
line 332 at r1 (raw file):
Previously, ecwall (Evan Wall) wrote…
Maybe I am misunderstanding, but the code under the switch will never run because either the
if
returns or theelse
returns. I can remove theelse
if you like, but that is just a code style change.
lol you're right. thanks.
pkg/util/netutil/net.go
line 187 at r1 (raw file):
Previously, ecwall (Evan Wall) wrote…
I was having trouble following where the connection was being closed because it happens in many different layers in different branches (it is also hard to tell if every single branch ends up closing it somewhere).
I think we can always do it here instead and the various layers inside
serveConn(ctx, rw)
just need to return in order for the connection to end up closing instead of being coupled to closing it directly.There are other resources we create in the server that I think could follow the same pattern to reduce the chance of it not being closed properly (either a branch is missing the
resource.close()
call or panic happens - the defer will be called regardless) and to reduce coupling between the various layers ofuseResource(resource)
and the resource's lifecycle.resource := createResource() defer resource.close() useResource(resource)
thanks for explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)
pkg/sql/pgwire/server.go
line 1263 at r1 (raw file):
telemetry.Inc(sqltelemetry.CancelRequestCounter) backendKeyDataBits, err := buf.GetUint64() // The connection that issued the cancel is not a SQL session -- it's an
i think this one might be an undesired change. the idea here was that for cancel requests, we immediately close the connection before processing the request. the reason is that the pgwire cancel protocol is unauthenticated, so we don't want to reveal whether the request is being processed or not, which after this change can be inferred based on how long it takes for the server to close the connection
Informs #105448 Previously after the server accepted a connection, it was closed in multiple paths and in multiple layers after it was done being used. Now it is always closed in the same layer that accepted it after the serveConn callback returns. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)
pkg/sql/pgwire/server.go
line 1263 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i think this one might be an undesired change. the idea here was that for cancel requests, we immediately close the connection before processing the request. the reason is that the pgwire cancel protocol is unauthenticated, so we don't want to reveal whether the request is being processed or not, which after this change can be inferred based on how long it takes for the server to close the connection
That's good to know. It looks like serverController.sqlMux()
was already handling this by wrapping the cancelations in RunAsyncTask
. I changed SQLServerWrapper.serveConn()
to match this behavior and added comments to each based on what you wrote.
IMO this is better than calling conn.close()
directly because it achieves the same thing without being tied to the connection lifecycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)
bors r=rafiss,knz |
Build succeeded: |
Informs #105448
Previously after the server accepted a connection, it was closed in multiple paths and in multiple layers after it was done being used. Now it is always closed in the same layer that accepted it after the serveConn callback returns.
Release note: None