Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/sqlproxyccl: directory proto and test server #64171

Merged
merged 1 commit into from
Apr 29, 2021

Conversation

darinpp
Copy link
Contributor

@darinpp darinpp commented Apr 24, 2021

  • Defines a new interface between a tenant directory client and server
  • Moves the tenant directory from the CC repo over
  • Tenant directory modified to use the new interface
  • Tenant directory modified to use stop.Stopper
  • Modified pod watcher to use streaming grpc call
  • Renamed package from directory to tenant, tenantID to ID etc
  • Renamed references to k8s, pods to server, endpoints etc
  • Prevents test tenant servers to start for non-existing/inactive tenants
  • Adds ability to shut down individual tenant servers within a test cluster
  • Adds a test directory server that can start/stop tenants
  • Adds tests of the directory running agianst the test server
  • Allow insecure connections from test tenant to KV server
  • Fixed a race in kvtenantccl

Release note: None

@darinpp darinpp requested a review from andy-kimball April 24, 2021 00:33
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @darinpp)


pkg/ccl/kvccl/kvtenantccl/connector.go, line 372 at r1 (raw file):

			return nil, err
		}
		c.mu.RLock()

It'd be good to add a comment here explaining why the read lock is in fact needed.


pkg/ccl/sqlproxyccl/tenant/directory.go, line 66 at r1 (raw file):

type Directory struct {
	// ctl is the directory client instance used to make directory server calls.
	ctl DirectoryClient

NIT: I think this should be renamed to client or dir rather than using ctl like in the operator.


pkg/ccl/sqlproxyccl/tenant/directory.go, line 88 at r1 (raw file):

// NewDirectory constructs a new Directory instance that tracks SQL tenant processes
// managed by a given Directory server. The given context

NIT: spacing in comment.


pkg/ccl/sqlproxyccl/tenant/directory.go, line 257 at r1 (raw file):

// a notification and update the directory to reflect that change.
func (d *Directory) watchEndpoints(ctx context.Context, stopper *stop.Stopper) error {

NIT: extra blank line - you're triggering my OCD : )


pkg/ccl/sqlproxyccl/tenant/directory.go, line 260 at r1 (raw file):

	req := WatchEndpointsRequest{}

	var waitInit sync.WaitGroup

Maybe add a comment here explaining why this WaitGroup is needed.


pkg/ccl/sqlproxyccl/tenant/directory.go, line 281 at r1 (raw file):

					}
					log.Errorf(ctx, "err creating new watch endpoint client: %s", err)
					time.Sleep(time.Second)

Use the sleepContext helper method for this and any other sleeps, so that we cancel more quickly (and also consistently use it).


pkg/ccl/sqlproxyccl/tenant/directory.go, line 307 at r1 (raw file):

			}

			// Ensure that a directory entry exists for this tenant.

I think we decided to not create a new entry for pods that don't have a corresponding tenant in the cache yet. That would mean passing allowCreate=false below.


pkg/ccl/sqlproxyccl/tenant/directory.proto, line 15 at r1 (raw file):

import "gogoproto/gogo.proto";

message WatchEndpointsRequest {}

This file needs comments on enums/messages/fields/etc.


pkg/ccl/sqlproxyccl/tenant/directory_test.go, line 30 at r1 (raw file):

)

func CreateTenant(tc serverutils.TestClusterInterface, id ID) error {

Is there some reason that these functions need to be exported? Also, I think it's nicer when helper code like this is at the bottom of the file, so that the actual tests can be placed at the top of the file.


pkg/ccl/sqlproxyccl/tenant/entry.go, line 33 at r1 (raw file):

	TenantID ID

	// Full name of the tenant's cluster i.e. dim-dog

NIT: missing period at end of comment.


pkg/ccl/sqlproxyccl/tenant/entry.go, line 68 at r1 (raw file):

// that in the entry. After this is called once, all future calls return the
// same result (and do nothing).
func (e *tenantEntry) Initialize(ctx context.Context, ctl DirectoryClient) error {

NIT: similar to above, rename ctl to client or dir.


pkg/ccl/sqlproxyccl/tenant/entry.go, line 192 at r1 (raw file):

// endpoints available rather than blocking.
func (e *tenantEntry) ensureTenantEndpoint(
	ctx context.Context, ctl DirectoryClient, errorIfNoEndpoints bool,

NIT: rename ctl here as well (and any other places I missed).


pkg/ccl/sqlproxyccl/tenant/entry.go, line 232 at r1 (raw file):

		// Get endpoint information for the newly resumed tenant. Except in rare race
		// conditions, this is expected to immediately find an IP address, since
		// the above call started a tenant process that  already has an IP address.

NIT: double space in comment.


pkg/ccl/sqlproxyccl/tenant/entry.go, line 259 at r1 (raw file):

	ctx context.Context, ctl DirectoryClient,
) (ips []string, err error) {
	// List endpoints with a "tenant-id" label that matches this tenant's ID.

NIT: This comment is out-of-date.


pkg/ccl/sqlproxyccl/tenant/tenant_id.go, line 26 at r1 (raw file):

// For more information, see:
//   https://github.com/cockroachdb/cockroach/blob/release-20.2/pkg/roachpb/tenant.go#L19-L24
type ID uint64

We should use roachpb.TenantID instead of defining a new variant in CRDB.


pkg/ccl/sqlproxyccl/tenant/test_directory_svr.go, line 73 at r1 (raw file):

	// When both mutexes need to be held, the locking should always be
	// proc first and listen second

NIT: missing period at end of comment.


pkg/ccl/sqlproxyccl/tenant/test_directory_svr.go, line 273 at r1 (raw file):

		return nil, err
	}
	process := &Process{SQL: sql.Addr()}

What happens if the test is stopped using ctrl-C? Will the processes still be cleaned up?


pkg/testutils/serverutils/test_server_shim.go, line 225 at r1 (raw file):

	// StartTenant spawns off tenant process connecting to this TestServer with
	// context
	StartTenantWithCtx(ctx context.Context, params base.TestTenantArgs) (TestTenantInterface, error)

Why not just update StartTenant and then change all callers to pass a context? I don't see much value in having both variants.

@darinpp darinpp requested review from a team and dt and removed request for a team April 27, 2021 20:42
@dt dt removed their request for review April 27, 2021 20:54
Copy link
Contributor Author

@darinpp darinpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball)


pkg/ccl/kvccl/kvtenantccl/connector.go, line 372 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

It'd be good to add a comment here explaining why the read lock is in fact needed.

I refactored it a bit as there was another issue with this code. Now all paths lock when accessing the client. We generally don't put comments justifying locking if there is access from separate threads to the same field. This is the case here so I guess a comment would be appropriate only of the locks are not being obtained (as it was before).


pkg/ccl/sqlproxyccl/tenant/directory.go, line 66 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: I think this should be renamed to client or dir rather than using ctl like in the operator.

renamed to client


pkg/ccl/sqlproxyccl/tenant/directory.go, line 88 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: spacing in comment.

fixed


pkg/ccl/sqlproxyccl/tenant/directory.go, line 257 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: extra blank line - you're triggering my OCD : )

fixed


pkg/ccl/sqlproxyccl/tenant/directory.go, line 260 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Maybe add a comment here explaining why this WaitGroup is needed.

added


pkg/ccl/sqlproxyccl/tenant/directory.go, line 281 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Use the sleepContext helper method for this and any other sleeps, so that we cancel more quickly (and also consistently use it).

changed


pkg/ccl/sqlproxyccl/tenant/directory.go, line 307 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I think we decided to not create a new entry for pods that don't have a corresponding tenant in the cache yet. That would mean passing allowCreate=false below.

OK. I fixed to not create entry. This also required a change in one of the tests.


pkg/ccl/sqlproxyccl/tenant/directory.proto, line 15 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

This file needs comments on enums/messages/fields/etc.

Added.


pkg/ccl/sqlproxyccl/tenant/directory_test.go, line 30 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Is there some reason that these functions need to be exported? Also, I think it's nicer when helper code like this is at the bottom of the file, so that the actual tests can be placed at the top of the file.

No good reason. I changed them so they are not exported anymore.


pkg/ccl/sqlproxyccl/tenant/entry.go, line 33 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: missing period at end of comment.

fixed


pkg/ccl/sqlproxyccl/tenant/entry.go, line 68 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: similar to above, rename ctl to client or dir.

done


pkg/ccl/sqlproxyccl/tenant/entry.go, line 192 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: rename ctl here as well (and any other places I missed).

all changed


pkg/ccl/sqlproxyccl/tenant/entry.go, line 232 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: double space in comment.

fixed


pkg/ccl/sqlproxyccl/tenant/entry.go, line 259 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: This comment is out-of-date.

fixed


pkg/ccl/sqlproxyccl/tenant/tenant_id.go, line 26 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

We should use roachpb.TenantID instead of defining a new variant in CRDB.

done


pkg/ccl/sqlproxyccl/tenant/test_directory_svr.go, line 73 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

NIT: missing period at end of comment.

fixed


pkg/ccl/sqlproxyccl/tenant/test_directory_svr.go, line 273 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

What happens if the test is stopped using ctrl-C? Will the processes still be cleaned up?

The child processes won't be killed when the main process is killed. I'll add something in another PR that will be OS specific most likely to take care of this.


pkg/testutils/serverutils/test_server_shim.go, line 225 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Why not just update StartTenant and then change all callers to pass a context? I don't see much value in having both variants.

Agree. Fixed.

@@ -365,12 +365,17 @@ func (c *Connector) getClient(ctx context.Context) (roachpb.InternalClient, erro
dialCtx := c.AnnotateCtx(context.Background())
dialCtx, cancel := c.rpcContext.Stopper.WithCancelOnQuiesce(dialCtx)
defer cancel()
err := c.rpcContext.Stopper.RunTaskWithErr(dialCtx, "kvtenant.Connector: dial", c.dialAddrs)
var client roachpb.InternalClient
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvanbenschoten take a look and let me know if this will take care of the locking problem.

@darinpp darinpp requested a review from nvanbenschoten April 27, 2021 21:01
Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

All the directory stuff looks good to me. It'd be good to get another reviewer for some of the scaffolding code like the connector, stopping, testing, etc., which I'm not too familiar with.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @nvanbenschoten)


pkg/ccl/sqlproxyccl/tenant/directory.proto, line 40 at r2 (raw file):

// ListEndpointsRequest is used to query the server for the list of current endpoints of a given tenant.
message ListEndpointsRequest {
  // tenant id identifies the tenant for which the client is requesting a list of the endpoints.

NIT: All of the field comments should be capitalized in Go-style (e.g. TenantID) because these will be inserted into the generated Go code after being converted from snake case.

@darinpp darinpp requested a review from andy-kimball April 27, 2021 21:34
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @darinpp)


pkg/ccl/kvccl/kvtenantccl/connector.go, line 368 at r2 (raw file):

Previously, darinpp wrote…

@nvanbenschoten take a look and let me know if this will take care of the locking problem.

This is close, but I think we still want to set c.mu.client in this singleflight function, instead of having each goroutine do so after the result has been broadcasted back out. We just don't want to lock, set it, unlock, and then read it again before returning it, lest it be reset by someone else during the period where we weren't holding the lock.

So I think we'll want to replace the old:

// NB: read lock not needed.
return c.mu.client, nil

With:

c.mu.Lock()
defer c.mu.Unlock()
c.mu.client = client
return client, nil

And then remove the change below in the case res := <-ch: block.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @darinpp)


pkg/ccl/kvccl/kvtenantccl/connector.go, line 368 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is close, but I think we still want to set c.mu.client in this singleflight function, instead of having each goroutine do so after the result has been broadcasted back out. We just don't want to lock, set it, unlock, and then read it again before returning it, lest it be reset by someone else during the period where we weren't holding the lock.

So I think we'll want to replace the old:

// NB: read lock not needed.
return c.mu.client, nil

With:

c.mu.Lock()
defer c.mu.Unlock()
c.mu.client = client
return client, nil

And then remove the change below in the case res := <-ch: block.

This part :lgtm: now. Thanks for cleaning it up.

* Defines a new interface between a tenant directory client and server
* Moves the tenant directory from the CC repo over
* Tenant directory modified to use the new interface
* Tenant directory modified to use stop.Stopper
* Modified pod watcher to use streaming grpc call
* Renamed package from directory to tenant, tenantID to ID etc
* Renamed references to k8s, pods to server, endpoints etc
* Prevents test tenant servers to start for non-existing/inactive tenants
* Adds ability to shut down individual tenant servers within a test cluster
* Adds a test directory server that can start/stop tenants
* Adds tests of the directory running agianst the test server
* Allow insecure connections from test tenant to KV server
* Fixed a race in kvtenantccl

Release note: None
@darinpp
Copy link
Contributor Author

darinpp commented Apr 29, 2021

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 29, 2021

Build succeeded:

@craig craig bot merged commit 974b144 into cockroachdb:master Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants