-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/sqlproxyccl: directory proto and test server #64171
Conversation
2b26f5f
to
d5c0167
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @darinpp)
pkg/ccl/kvccl/kvtenantccl/connector.go, line 372 at r1 (raw file):
return nil, err } c.mu.RLock()
It'd be good to add a comment here explaining why the read lock is in fact needed.
pkg/ccl/sqlproxyccl/tenant/directory.go, line 66 at r1 (raw file):
type Directory struct { // ctl is the directory client instance used to make directory server calls. ctl DirectoryClient
NIT: I think this should be renamed to client
or dir
rather than using ctl
like in the operator.
pkg/ccl/sqlproxyccl/tenant/directory.go, line 88 at r1 (raw file):
// NewDirectory constructs a new Directory instance that tracks SQL tenant processes // managed by a given Directory server. The given context
NIT: spacing in comment.
pkg/ccl/sqlproxyccl/tenant/directory.go, line 257 at r1 (raw file):
// a notification and update the directory to reflect that change. func (d *Directory) watchEndpoints(ctx context.Context, stopper *stop.Stopper) error {
NIT: extra blank line - you're triggering my OCD : )
pkg/ccl/sqlproxyccl/tenant/directory.go, line 260 at r1 (raw file):
req := WatchEndpointsRequest{} var waitInit sync.WaitGroup
Maybe add a comment here explaining why this WaitGroup is needed.
pkg/ccl/sqlproxyccl/tenant/directory.go, line 281 at r1 (raw file):
} log.Errorf(ctx, "err creating new watch endpoint client: %s", err) time.Sleep(time.Second)
Use the sleepContext
helper method for this and any other sleeps, so that we cancel more quickly (and also consistently use it).
pkg/ccl/sqlproxyccl/tenant/directory.go, line 307 at r1 (raw file):
} // Ensure that a directory entry exists for this tenant.
I think we decided to not create a new entry for pods that don't have a corresponding tenant in the cache yet. That would mean passing allowCreate=false below.
pkg/ccl/sqlproxyccl/tenant/directory.proto, line 15 at r1 (raw file):
import "gogoproto/gogo.proto"; message WatchEndpointsRequest {}
This file needs comments on enums/messages/fields/etc.
pkg/ccl/sqlproxyccl/tenant/directory_test.go, line 30 at r1 (raw file):
) func CreateTenant(tc serverutils.TestClusterInterface, id ID) error {
Is there some reason that these functions need to be exported? Also, I think it's nicer when helper code like this is at the bottom of the file, so that the actual tests can be placed at the top of the file.
pkg/ccl/sqlproxyccl/tenant/entry.go, line 33 at r1 (raw file):
TenantID ID // Full name of the tenant's cluster i.e. dim-dog
NIT: missing period at end of comment.
pkg/ccl/sqlproxyccl/tenant/entry.go, line 68 at r1 (raw file):
// that in the entry. After this is called once, all future calls return the // same result (and do nothing). func (e *tenantEntry) Initialize(ctx context.Context, ctl DirectoryClient) error {
NIT: similar to above, rename ctl
to client
or dir
.
pkg/ccl/sqlproxyccl/tenant/entry.go, line 192 at r1 (raw file):
// endpoints available rather than blocking. func (e *tenantEntry) ensureTenantEndpoint( ctx context.Context, ctl DirectoryClient, errorIfNoEndpoints bool,
NIT: rename ctl here as well (and any other places I missed).
pkg/ccl/sqlproxyccl/tenant/entry.go, line 232 at r1 (raw file):
// Get endpoint information for the newly resumed tenant. Except in rare race // conditions, this is expected to immediately find an IP address, since // the above call started a tenant process that already has an IP address.
NIT: double space in comment.
pkg/ccl/sqlproxyccl/tenant/entry.go, line 259 at r1 (raw file):
ctx context.Context, ctl DirectoryClient, ) (ips []string, err error) { // List endpoints with a "tenant-id" label that matches this tenant's ID.
NIT: This comment is out-of-date.
pkg/ccl/sqlproxyccl/tenant/tenant_id.go, line 26 at r1 (raw file):
// For more information, see: // https://github.com/cockroachdb/cockroach/blob/release-20.2/pkg/roachpb/tenant.go#L19-L24 type ID uint64
We should use roachpb.TenantID
instead of defining a new variant in CRDB.
pkg/ccl/sqlproxyccl/tenant/test_directory_svr.go, line 73 at r1 (raw file):
// When both mutexes need to be held, the locking should always be // proc first and listen second
NIT: missing period at end of comment.
pkg/ccl/sqlproxyccl/tenant/test_directory_svr.go, line 273 at r1 (raw file):
return nil, err } process := &Process{SQL: sql.Addr()}
What happens if the test is stopped using ctrl-C? Will the processes still be cleaned up?
pkg/testutils/serverutils/test_server_shim.go, line 225 at r1 (raw file):
// StartTenant spawns off tenant process connecting to this TestServer with // context StartTenantWithCtx(ctx context.Context, params base.TestTenantArgs) (TestTenantInterface, error)
Why not just update StartTenant
and then change all callers to pass a context? I don't see much value in having both variants.
d5c0167
to
c90caeb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball)
pkg/ccl/kvccl/kvtenantccl/connector.go, line 372 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
It'd be good to add a comment here explaining why the read lock is in fact needed.
I refactored it a bit as there was another issue with this code. Now all paths lock when accessing the client. We generally don't put comments justifying locking if there is access from separate threads to the same field. This is the case here so I guess a comment would be appropriate only of the locks are not being obtained (as it was before).
pkg/ccl/sqlproxyccl/tenant/directory.go, line 66 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: I think this should be renamed to
client
ordir
rather than usingctl
like in the operator.
renamed to client
pkg/ccl/sqlproxyccl/tenant/directory.go, line 88 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: spacing in comment.
fixed
pkg/ccl/sqlproxyccl/tenant/directory.go, line 257 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: extra blank line - you're triggering my OCD : )
fixed
pkg/ccl/sqlproxyccl/tenant/directory.go, line 260 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Maybe add a comment here explaining why this WaitGroup is needed.
added
pkg/ccl/sqlproxyccl/tenant/directory.go, line 281 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Use the
sleepContext
helper method for this and any other sleeps, so that we cancel more quickly (and also consistently use it).
changed
pkg/ccl/sqlproxyccl/tenant/directory.go, line 307 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I think we decided to not create a new entry for pods that don't have a corresponding tenant in the cache yet. That would mean passing allowCreate=false below.
OK. I fixed to not create entry. This also required a change in one of the tests.
pkg/ccl/sqlproxyccl/tenant/directory.proto, line 15 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
This file needs comments on enums/messages/fields/etc.
Added.
pkg/ccl/sqlproxyccl/tenant/directory_test.go, line 30 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Is there some reason that these functions need to be exported? Also, I think it's nicer when helper code like this is at the bottom of the file, so that the actual tests can be placed at the top of the file.
No good reason. I changed them so they are not exported anymore.
pkg/ccl/sqlproxyccl/tenant/entry.go, line 33 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: missing period at end of comment.
fixed
pkg/ccl/sqlproxyccl/tenant/entry.go, line 68 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: similar to above, rename
ctl
toclient
ordir
.
done
pkg/ccl/sqlproxyccl/tenant/entry.go, line 192 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: rename ctl here as well (and any other places I missed).
all changed
pkg/ccl/sqlproxyccl/tenant/entry.go, line 232 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: double space in comment.
fixed
pkg/ccl/sqlproxyccl/tenant/entry.go, line 259 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: This comment is out-of-date.
fixed
pkg/ccl/sqlproxyccl/tenant/tenant_id.go, line 26 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
We should use
roachpb.TenantID
instead of defining a new variant in CRDB.
done
pkg/ccl/sqlproxyccl/tenant/test_directory_svr.go, line 73 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
NIT: missing period at end of comment.
fixed
pkg/ccl/sqlproxyccl/tenant/test_directory_svr.go, line 273 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
What happens if the test is stopped using ctrl-C? Will the processes still be cleaned up?
The child processes won't be killed when the main process is killed. I'll add something in another PR that will be OS specific most likely to take care of this.
pkg/testutils/serverutils/test_server_shim.go, line 225 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Why not just update
StartTenant
and then change all callers to pass a context? I don't see much value in having both variants.
Agree. Fixed.
@@ -365,12 +365,17 @@ func (c *Connector) getClient(ctx context.Context) (roachpb.InternalClient, erro | |||
dialCtx := c.AnnotateCtx(context.Background()) | |||
dialCtx, cancel := c.rpcContext.Stopper.WithCancelOnQuiesce(dialCtx) | |||
defer cancel() | |||
err := c.rpcContext.Stopper.RunTaskWithErr(dialCtx, "kvtenant.Connector: dial", c.dialAddrs) | |||
var client roachpb.InternalClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvanbenschoten take a look and let me know if this will take care of the locking problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the directory stuff looks good to me. It'd be good to get another reviewer for some of the scaffolding code like the connector, stopping, testing, etc., which I'm not too familiar with.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @darinpp, and @nvanbenschoten)
pkg/ccl/sqlproxyccl/tenant/directory.proto, line 40 at r2 (raw file):
// ListEndpointsRequest is used to query the server for the list of current endpoints of a given tenant. message ListEndpointsRequest { // tenant id identifies the tenant for which the client is requesting a list of the endpoints.
NIT: All of the field comments should be capitalized in Go-style (e.g. TenantID
) because these will be inserted into the generated Go code after being converted from snake case.
c90caeb
to
4e2f96c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @darinpp)
pkg/ccl/kvccl/kvtenantccl/connector.go, line 368 at r2 (raw file):
Previously, darinpp wrote…
@nvanbenschoten take a look and let me know if this will take care of the locking problem.
This is close, but I think we still want to set c.mu.client
in this singleflight function, instead of having each goroutine do so after the result has been broadcasted back out. We just don't want to lock, set it, unlock, and then read it again before returning it, lest it be reset by someone else during the period where we weren't holding the lock.
So I think we'll want to replace the old:
// NB: read lock not needed.
return c.mu.client, nil
With:
c.mu.Lock()
defer c.mu.Unlock()
c.mu.client = client
return client, nil
And then remove the change below in the case res := <-ch:
block.
4e2f96c
to
50da532
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @darinpp)
pkg/ccl/kvccl/kvtenantccl/connector.go, line 368 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is close, but I think we still want to set
c.mu.client
in this singleflight function, instead of having each goroutine do so after the result has been broadcasted back out. We just don't want to lock, set it, unlock, and then read it again before returning it, lest it be reset by someone else during the period where we weren't holding the lock.So I think we'll want to replace the old:
// NB: read lock not needed. return c.mu.client, nilWith:
c.mu.Lock() defer c.mu.Unlock() c.mu.client = client return client, nilAnd then remove the change below in the
case res := <-ch:
block.
* Defines a new interface between a tenant directory client and server * Moves the tenant directory from the CC repo over * Tenant directory modified to use the new interface * Tenant directory modified to use stop.Stopper * Modified pod watcher to use streaming grpc call * Renamed package from directory to tenant, tenantID to ID etc * Renamed references to k8s, pods to server, endpoints etc * Prevents test tenant servers to start for non-existing/inactive tenants * Adds ability to shut down individual tenant servers within a test cluster * Adds a test directory server that can start/stop tenants * Adds tests of the directory running agianst the test server * Allow insecure connections from test tenant to KV server * Fixed a race in kvtenantccl Release note: None
50da532
to
65c30a3
Compare
bors r+ |
Build succeeded: |
Release note: None