Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: close net.Conn unconditionally after accepting #106072

Merged
merged 1 commit into from
Jul 7, 2023
Merged

server: close net.Conn unconditionally after accepting #106072

merged 1 commit into from
Jul 7, 2023

Conversation

ecwall
Copy link
Contributor

@ecwall ecwall commented Jul 3, 2023

Informs #105448

Previously after the server accepted a connection, it was closed in multiple paths and in multiple layers after it was done being used. Now it is always closed in the same layer that accepted it after the serveConn callback returns.

Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Jul 3, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ecwall ecwall marked this pull request as ready for review July 5, 2023 19:45
@ecwall ecwall requested review from a team as code owners July 5, 2023 19:45
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall)


pkg/sql/pgwire/pre_serve.go line 220 at r1 (raw file):

	_ = s.sendErr(ctx, s.st, conn, err)
	_ = conn.Close()

If you remove this one you need to edit the code in pkg/server/server_controller_sql.go to close the conn after sendSQLRoutingError is called (on the error path).


pkg/sql/pgwire/pre_serve.go line 332 at r1 (raw file):

		} else {
			return conn, st, errors.New("cannot read cancel key")
		}

I'm noticing there is a bug here from when I refactored this:
when readCancelKey returns without error, the code here should still return inconditionally. The code under the switch becomes invalid.

When the conn.Close was still there (before your change) the bug was invisible, it would just result in some bogus error in the first read below the switch. But now you're not closing the conn any more the bug becomes apparent.


pkg/sql/pgwire/pre_serve.go line 379 at r1 (raw file):

				State:     PreServeCancel,
				CancelKey: key,
			}, nil

ditto cancel


pkg/util/netutil/net.go line 187 at r1 (raw file):

		err := s.stopper.RunAsyncTask(ctx, "tcp-serve", func(ctx context.Context) {
			defer func() {
				_ = rw.Close()

Are you sure about this? Why does this need to change?

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/sql/pgwire/pre_serve.go line 220 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

If you remove this one you need to edit the code in pkg/server/server_controller_sql.go to close the conn after sendSQLRoutingError is called (on the error path).

AFAICT the conn net.Conn param of serverController.sqlMux always comes from TCPServer.ServeWith so it will be closed after the serveConn func(context.Context, net.Conn) callback completes.


pkg/sql/pgwire/pre_serve.go line 332 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I'm noticing there is a bug here from when I refactored this:
when readCancelKey returns without error, the code here should still return inconditionally. The code under the switch becomes invalid.

When the conn.Close was still there (before your change) the bug was invisible, it would just result in some bogus error in the first read below the switch. But now you're not closing the conn any more the bug becomes apparent.

Maybe I am misunderstanding, but the code under the switch will never run because either the if returns or the else returns. I can remove the else if you like, but that is just a code style change.


pkg/util/netutil/net.go line 187 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Are you sure about this? Why does this need to change?

I was having trouble following where the connection was being closed because it happens in many different layers in different branches (it is also hard to tell if every single branch ends up closing it somewhere).

I think we can always do it here instead and the various layers inside serveConn(ctx, rw) just need to return in order for the connection to end up closing instead of being coupled to closing it directly.

There are other resources we create in the server that I think could follow the same pattern to reduce the chance of it not being closed properly (either a branch is missing the resource.close() call or panic happens - the defer will be called regardless) and to reduce coupling between the various layers of useResource(resource) and the resource's lifecycle.

resource := createResource()
defer resource.close()
useResource(resource)

@ecwall ecwall requested a review from knz July 7, 2023 12:36
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a double take the first time around but thanks to Evan's explanation all became clear.
LGTM - but @rafiss pls have a look too.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall)


pkg/sql/pgwire/pre_serve.go line 332 at r1 (raw file):

Previously, ecwall (Evan Wall) wrote…

Maybe I am misunderstanding, but the code under the switch will never run because either the if returns or the else returns. I can remove the else if you like, but that is just a code style change.

lol you're right. thanks.


pkg/util/netutil/net.go line 187 at r1 (raw file):

Previously, ecwall (Evan Wall) wrote…

I was having trouble following where the connection was being closed because it happens in many different layers in different branches (it is also hard to tell if every single branch ends up closing it somewhere).

I think we can always do it here instead and the various layers inside serveConn(ctx, rw) just need to return in order for the connection to end up closing instead of being coupled to closing it directly.

There are other resources we create in the server that I think could follow the same pattern to reduce the chance of it not being closed properly (either a branch is missing the resource.close() call or panic happens - the defer will be called regardless) and to reduce coupling between the various layers of useResource(resource) and the resource's lifecycle.

resource := createResource()
defer resource.close()
useResource(resource)

thanks for explaining.

@ecwall ecwall requested a review from rafiss July 7, 2023 14:15
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)


pkg/sql/pgwire/server.go line 1263 at r1 (raw file):

	telemetry.Inc(sqltelemetry.CancelRequestCounter)
	backendKeyDataBits, err := buf.GetUint64()
	// The connection that issued the cancel is not a SQL session -- it's an

i think this one might be an undesired change. the idea here was that for cancel requests, we immediately close the connection before processing the request. the reason is that the pgwire cancel protocol is unauthenticated, so we don't want to reveal whether the request is being processed or not, which after this change can be inferred based on how long it takes for the server to close the connection

Informs #105448

Previously after the server accepted a connection, it was closed in multiple
paths and in multiple layers after it was done being used. Now it is always
closed in the same layer that accepted it after the serveConn callback returns.

Release note: None
@ecwall ecwall requested a review from a team July 7, 2023 18:08
@ecwall ecwall requested a review from a team as a code owner July 7, 2023 18:08
Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)


pkg/sql/pgwire/server.go line 1263 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i think this one might be an undesired change. the idea here was that for cancel requests, we immediately close the connection before processing the request. the reason is that the pgwire cancel protocol is unauthenticated, so we don't want to reveal whether the request is being processed or not, which after this change can be inferred based on how long it takes for the server to close the connection

That's good to know. It looks like serverController.sqlMux() was already handling this by wrapping the cancelations in RunAsyncTask. I changed SQLServerWrapper.serveConn() to match this behavior and added comments to each based on what you wrote.

IMO this is better than calling conn.close() directly because it achieves the same thing without being tied to the connection lifecycle.

@ecwall ecwall requested a review from rafiss July 7, 2023 18:40
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)

@ecwall
Copy link
Contributor Author

ecwall commented Jul 7, 2023

bors r=rafiss,knz

@craig craig bot merged commit 38cd5c4 into cockroachdb:master Jul 7, 2023
@craig
Copy link
Contributor

craig bot commented Jul 7, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants