Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc,*: sanitize the configuration of rpc.Context #107868

Closed
wants to merge 0 commits into from

Conversation

knz
Copy link
Contributor

@knz knz commented Jul 30, 2023

The main goal of this change is to offer a .RPCClientConn() method
on test servers.

To achieve this, it was necessary to lift the code previously in
cli.getClientGRPCConn into a more reusable version of it,
now hosted in rpc.NewClientContext().

I also took the opportunity to remove the dependency of rpc.Context
on base.Config, by spelling out precisely which fields are necessary
to RPC connections.

Numerous tests could be simplified as a result.

Needed for #107866.
Epic: CRDB-18499

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz force-pushed the 20230730-rpc-config branch from b64eb95 to 652d0c9 Compare July 30, 2023 21:59
@knz knz requested review from aliher1911 and yuzefovich July 30, 2023 21:59
@knz knz marked this pull request as ready for review July 30, 2023 21:59
@knz knz requested a review from a team July 30, 2023 21:59
@knz knz requested review from a team as code owners July 30, 2023 21:59
@knz knz requested a review from a team July 30, 2023 21:59
@knz knz requested review from a team as code owners July 30, 2023 21:59
@knz knz requested review from nkodali, herkolategan and smg260 and removed request for a team July 30, 2023 21:59
@knz knz force-pushed the 20230730-rpc-config branch 3 times, most recently from 357ce25 to 8b4747d Compare July 31, 2023 02:09
@knz knz added the db-cy-23 label Jul 31, 2023
@knz knz requested a review from stevendanna July 31, 2023 11:47
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 7 of 7 files at r5, 58 of 58 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @herkolategan, @knz, @nkodali, @smg260, and @stevendanna)


pkg/base/config.go line 350 at r6 (raw file):

	// It is set after the server instance has been created, when the
	// network listeners are being set up.
	AdvertiseAddrH

For my own edification: what does H stand for? "Host"?


pkg/cli/rpc_client.go line 82 at r4 (raw file):

}

func addrWithDefaultHost(addr string) (string, error) {

nit: the removal of addrWithDefaultHost should be in the first commit, not the second.


pkg/cli/rpc_client.go line 28 at r5 (raw file):

)

// getClientGRPCConn returns a ClientConn, a Clock and a method that blocks

nit: the comment on getClientGRPCConn needs an update in the first commit.

Update: oh, it's removed in the third commit, up to you.


pkg/cli/democluster/demo_cluster.go line 1077 at r6 (raw file):

	// Find a server to use to send the Decommission request.
	for _, s := range c.servers {
		if s.adminClient != nil && s.nodeID != roachpb.NodeID(nodeID) {

It seems like previously we would get an admin client to the first node which might be the same as nodeID, right? So in a sense we're fixing a bug here - should it be extracted in a separate commit then?


pkg/cli/democluster/demo_cluster.go line 1183 at r6 (raw file):

	conn, err := serv.RPCClientConnE(username.RootUserName())
	if err != nil {
		serv.Stopper().Stop(context.Background())

nit: we do have ctx in scope.


pkg/rpc/client.go line 159 at r6 (raw file):

	addr, err := addr.AddrWithDefaultLocalhost(cfg.ServerAddr)
	if err != nil {
		stopper.Stop(ctx)

nit: perhaps extract Stop into the unconditional deferred function that Stops the stopper in all error cases.


pkg/rpc/context.go line 319 at r6 (raw file):

	// The following fields come from base.Config and only needed by
	// SecurityContextOptions. They are not really used by the 'rpc'
	// code. They really belong elsehere - see comments in

nit: s/elsehere/elsewhere/g.


pkg/server/testserver.go line 2345 at r6 (raw file):

) *rpc.Context {
	ctx = logtags.AddTag(ctx, "testclient", nil)
	ctx = logtags.AddTag(ctx, "user", user)

The original code also added nodeID as the tag - should we add SQL instance ID here?


pkg/server/testserver.go line 2362 at r6 (raw file):

	ccfg := rpc.MakeClientConnConfigFromBaseConfig(*cfg,
		user,
		nil, /* tracer */

Why do we not pass the tracer from s? This probably deserves a comment.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @herkolategan, @nkodali, @smg260, @stevendanna, and @yuzefovich)


pkg/base/config.go line 350 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

For my own edification: what does H stand for? "Host"?

Handle.
I could also call it "container". Unsure what's best.


pkg/cli/rpc_client.go line 82 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: the removal of addrWithDefaultHost should be in the first commit, not the second.

Done.


pkg/cli/rpc_client.go line 28 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: the comment on getClientGRPCConn needs an update in the first commit.

Update: oh, it's removed in the third commit, up to you.

Done.


pkg/cli/democluster/demo_cluster.go line 1077 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It seems like previously we would get an admin client to the first node which might be the same as nodeID, right? So in a sense we're fixing a bug here - should it be extracted in a separate commit then?

The firstServer is still special-cased elsewhere so i think the bug remains hidden. But point well taken. I've extracted it in a separate commit.


pkg/cli/democluster/demo_cluster.go line 1183 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we do have ctx in scope.

Fixed.


pkg/rpc/client.go line 159 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: perhaps extract Stop into the unconditional deferred function that Stops the stopper in all error cases.

Good idea. Done.


pkg/rpc/context.go line 319 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/elsehere/elsewhere/g.

Fixed.


pkg/server/testserver.go line 2345 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

The original code also added nodeID as the tag - should we add SQL instance ID here?

Done.


pkg/server/testserver.go line 2362 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Why do we not pass the tracer from s? This probably deserves a comment.

Done.

@knz knz force-pushed the 20230730-rpc-config branch 2 times, most recently from 5d2d5c9 to f04f92f Compare July 31, 2023 20:47
@knz knz requested a review from yuzefovich July 31, 2023 21:14
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this cleanup. Don't have much to add over Yahor's existing review.

Comment on lines 348 to 350
// It is set after the server instance has been created, when the
// network listeners are being set up.
AdvertiseAddrH
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a non-pointer, I'm assuming from the code that this is the value of the handle that everyone else takes a pointer of. I suppose that makes any function that takes base.Config rather than *base.Config a bit suspicious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a mixed bag really. Most places take the pointer indeed, but the few that don't seem to be safe.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @herkolategan, @knz, @nkodali, @smg260, and @stevendanna)


pkg/base/config.go line 350 at r6 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Handle.
I could also call it "container". Unsure what's best.

No opinion from me.


pkg/cli/democluster/demo_cluster.go line 1028 at r9 (raw file):

}

// findOtherServer returns a an admin RPC client to another

nit: "a an admin".


pkg/rpc/client.go line 163 at r10 (raw file):

		// even if the context used to create the client conn is
		// canceled.
		stopper.Stop(context.Background())

Switching to context.Background() seems like a non-trivial change to me (granted, I'm not too familiar with this code). I did some archeology to see where usage of ctx and the comment about "tying the lifetime" was introduced and landed on these two (1, 2) commits. Can you provide some more context for why we're making ctx -> context.Background() switch? Perhaps someone from replication team (perhaps @erikgrinaker) could take a quick look at it too?

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, @herkolategan, @nkodali, @smg260, @stevendanna, and @yuzefovich)


pkg/cli/democluster/demo_cluster.go line 1028 at r9 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: "a an admin".

Fixed.


pkg/rpc/client.go line 163 at r10 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Switching to context.Background() seems like a non-trivial change to me (granted, I'm not too familiar with this code). I did some archeology to see where usage of ctx and the comment about "tying the lifetime" was introduced and landed on these two (1, 2) commits. Can you provide some more context for why we're making ctx -> context.Background() switch? Perhaps someone from replication team (perhaps @erikgrinaker) could take a quick look at it too?

This latter commit da5dca5 from Tobias introduced a bug. Consider:

@@ -939,31 +941,31 @@ func getClientGRPCConn(ctx context.Context) (*grpc.ClientConn, *hlc.Clock, error
        stopper.AddCloser(stop.CloserFn(func() {
                _ = conn.Close()
        }))

        // Tie the lifetime of the stopper to that of the context.
-       go func() {
-               <-ctx.Done()
-               // Don't use the `ctx` as it's already cancelled.
-               stopper.Stop(context.Background())
-       }()
-       return conn, clock, nil
+       closer := func() {
+               stopper.Stop(ctx)
+       }
+       return conn, clock, closer, nil
 }

Observe: the original code (introduced in 7a3cdaa) was doing two separate things:

  1. waiting for the context to be cancelled before calling .Stop"
  2. using a separate context for the call to .Stop to ensure the already-cancelled context does not get passed to the closers.

Point (1) was considered to be a problem - instead we want the caller to call finish() explicitly.
While that is true (and achieved by Tobi's patch), it introduced a new problem: the ctx is passed directly to .Stop now. This defeats the protection of point (2).

My PR here is restoring that protection.

@knz knz force-pushed the 20230730-rpc-config branch from f04f92f to f083f4f Compare August 1, 2023 20:10
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, @herkolategan, @nkodali, @smg260, and @stevendanna)


pkg/rpc/client.go line 163 at r10 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

This latter commit da5dca5 from Tobias introduced a bug. Consider:

@@ -939,31 +941,31 @@ func getClientGRPCConn(ctx context.Context) (*grpc.ClientConn, *hlc.Clock, error
        stopper.AddCloser(stop.CloserFn(func() {
                _ = conn.Close()
        }))

        // Tie the lifetime of the stopper to that of the context.
-       go func() {
-               <-ctx.Done()
-               // Don't use the `ctx` as it's already cancelled.
-               stopper.Stop(context.Background())
-       }()
-       return conn, clock, nil
+       closer := func() {
+               stopper.Stop(ctx)
+       }
+       return conn, clock, closer, nil
 }

Observe: the original code (introduced in 7a3cdaa) was doing two separate things:

  1. waiting for the context to be cancelled before calling .Stop"
  2. using a separate context for the call to .Stop to ensure the already-cancelled context does not get passed to the closers.

Point (1) was considered to be a problem - instead we want the caller to call finish() explicitly.
While that is true (and achieved by Tobi's patch), it introduced a new problem: the ctx is passed directly to .Stop now. This defeats the protection of point (2).

My PR here is restoring that protection.

Thanks, makes sense, I can sign off on this now. Do you think it's worth extracting this into a separate commit that is also backported? Perhaps we'll fix some extremely rare flakes? Perhaps it's worth extracting it into a separate commit regardless of the backport in order to provide the explanation from your message? Anyway, up to you.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, @herkolategan, @nkodali, @smg260, @stevendanna, and @yuzefovich)


pkg/rpc/client.go line 163 at r10 (raw file):

it's worth extracting this into a separate commit

ok, sure. Done.

that is also backported?

I'm not so sure about that. Until 23.2 (not released yet) all the contexts used in cli code were not cancellable, so this didn't hit.

@knz knz force-pushed the 20230730-rpc-config branch from f083f4f to 810bf7a Compare August 1, 2023 22:13
@knz knz requested a review from a team August 1, 2023 22:13
@knz knz force-pushed the 20230730-rpc-config branch from 810bf7a to 607c697 Compare August 1, 2023 22:59
@knz
Copy link
Contributor Author

knz commented Aug 1, 2023

TFYRs!

bors r=yuzefovich,stevendanna

@knz
Copy link
Contributor Author

knz commented Aug 1, 2023

bors r-

@craig
Copy link
Contributor

craig bot commented Aug 1, 2023

Canceled.

@knz
Copy link
Contributor Author

knz commented Aug 1, 2023

bors single on
bors r=yuzefovich,stevendanna

@knz
Copy link
Contributor Author

knz commented Aug 2, 2023

bors r-

@craig
Copy link
Contributor

craig bot commented Aug 2, 2023

Canceled.

@knz

This comment was marked as outdated.

1 similar comment
@knz
Copy link
Contributor Author

knz commented Aug 2, 2023

bors r=yuzefovich,stevendanna

craig bot pushed a commit that referenced this pull request Aug 2, 2023
107868: rpc,*: sanitize the configuration of rpc.Context r=yuzefovich,stevendanna a=knz

The main goal of this change is to offer a `.RPCClientConn()` method
on test servers.

To achieve this, it was necessary to lift the code previously in
`cli.getClientGRPCConn` into a more reusable version of it,
now hosted in `rpc.NewClientContext()`.

I also took the opportunity to remove the dependency of `rpc.Context`
on `base.Config`, by spelling out precisely which fields are necessary
to RPC connections.

Numerous tests could be simplified as a result.

Needed for  #107866.
Epic: CRDB-18499

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 2, 2023

Build succeeded:

@knz knz closed this Aug 2, 2023
@knz knz force-pushed the 20230730-rpc-config branch from 961ec98 to 607c697 Compare August 2, 2023 11:52
@knz knz deleted the 20230730-rpc-config branch August 2, 2023 11:52
craig bot pushed a commit that referenced this pull request Aug 2, 2023
107866: testserver: remove direct casts to `*server.TestServer` r=yuzefovich,stevendanna a=knz

First 3 commits from #107868.

In a later stage we would like the result of `StartServer` to return different concrete types depending on parameters. Before we can do that however, we need to remove direct casts of that return value to `*server.TestServer`.


Incidentally, this patch improves the APIs as follows:

- the health probe for secondary tenant servers now properly
  includes the draining state of the RPC interface (previously,
  only the SQL draining state was included).

- `ApplicationLayerInterface` now includes `Readiness()` and
  `DefaultZoneConfig()`.

- `StorageLayerInterface` now includes
  `ScratchRangeWithExpirationLease()`, `GetRangeLease()`, `TsDB()`,
  `Locality()` and `DefaultSystemZoneConfig()`.

Epic: CRDB-18499

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants