-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: Reject SQL Pods with an invalid tenant id. #65683
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rimadeodhar)
pkg/server/server_sql.go, line 763 at r1 (raw file):
// Tenant has been bootstrapped successfully return nil }
How about we put this in server/tenant.go? Also, can you explain in a sentence that the heuristic: Check if the system database entry in the namespace table exists inside the tenant and that is used because it is part of the bootstrap data for a tenant keyspace.
pkg/server/server_sql.go, line 788 at r1 (raw file):
Quoted 7 lines of code…
// Confirm tenant exists prior to initialization. This is a sanity // check for the dev environment to ensure that a tenant has been // successfully created before attempting to initialize a SQL // server for it. if err := checkTenantExists(ctx, s.execCfg.Codec, s.execCfg.DB); err != nil { return err }
how do you feel about putting this inside the above tenant block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @knz)
pkg/server/server_sql.go, line 763 at r1 (raw file):
Previously, ajwerner wrote…
How about we put this in server/tenant.go? Also, can you explain in a sentence that the heuristic: Check if the system database entry in the namespace table exists inside the tenant and that is used because it is part of the bootstrap data for a tenant keyspace.
I can move this function to server/tenant.go and add a comment. I put it in here since for now, its mainly getting used within server_sql code. But don't feel very strongly about this.
pkg/server/server_sql.go, line 788 at r1 (raw file):
Previously, ajwerner wrote…
// Confirm tenant exists prior to initialization. This is a sanity // check for the dev environment to ensure that a tenant has been // successfully created before attempting to initialize a SQL // server for it. if err := checkTenantExists(ctx, s.execCfg.Codec, s.execCfg.DB); err != nil { return err }
how do you feel about putting this inside the above tenant block?
Do you mean putting this function call within StartTenant in tenant.go? If so, my initial thought was to put the call there. However, during testing, I realized that I need the s.tenantConnect.Start call to have completed so I would have to put the tenant Exists check after the call to preStart. That seemed inefficient as there seemed to be lot of functionality within preStart thats's unnecessary if the tenant hasn't been created and poisons the tenant id for future use. So I made the call to the checkTenantExists function within preStart,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rimadeodhar)
pkg/server/server_sql.go, line 752 at r1 (raw file):
// has been bootstrapped for the tenant. This is not a complete check and is only sufficient // to be used in the dev environment. func checkTenantExists(ctx context.Context, codec keys.SQLCodec, db *kv.DB) error {
Can you rename this to maybeCheckTenantExists
and then return early if codec.ForSystemTenant()
?
pkg/server/server_sql.go, line 761 at r1 (raw file):
return errors.New("System DB uninitialized, check if tenant is non existent") } // Tenant has been bootstrapped successfully
nit: period on sentences
pkg/server/server_sql.go, line 763 at r1 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
I can move this function to server/tenant.go and add a comment. I put it in here since for now, its mainly getting used within server_sql code. But don't feel very strongly about this.
I don't care much about where you put the function, but the commentary would be appreciated.
pkg/server/server_sql.go, line 788 at r1 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
Do you mean putting this function call within StartTenant in tenant.go? If so, my initial thought was to put the call there. However, during testing, I realized that I need the s.tenantConnect.Start call to have completed so I would have to put the tenant Exists check after the call to preStart. That seemed inefficient as there seemed to be lot of functionality within preStart thats's unnecessary if the tenant hasn't been created and poisons the tenant id for future use. So I made the call to the checkTenantExists function within preStart,
I was thinking just up one line to be under the tenantConnect conditional. I'm realizing that doesn't work for the use case we're considering because for testing we don't use a tenantConnect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @knz, and @rimadeodhar)
pkg/server/server_sql.go, line 752 at r1 (raw file):
Previously, ajwerner wrote…
Can you rename this to
maybeCheckTenantExists
and then return early ifcodec.ForSystemTenant()
?
Sure, will do.
pkg/server/server_sql.go, line 763 at r1 (raw file):
Previously, ajwerner wrote…
I don't care much about where you put the function, but the commentary would be appreciated.
Okay, I'll add the additional comments.
eadc7f7
to
0ad0476
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, last nice thing would be a test to show this works. It's probably as simple as adding a flag in TestTenantArgs to skip the check whether it exists and then also setting Existing
and calling StartTenant
on the TestServer
.
cockroach/pkg/server/testserver.go
Lines 719 to 729 in 83a204d
rowCount, err := ts.InternalExecutor().(*sql.InternalExecutor).Exec( | |
ctx, "testserver-check-tenant-active", nil, | |
"SELECT 1 FROM system.tenants WHERE id=$1 AND active=true", | |
params.TenantID.ToUint64(), | |
) | |
if err != nil { | |
return nil, err | |
} | |
if rowCount == 0 { | |
return nil, errors.New("not found") | |
} |
Reviewed 1 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rimadeodhar)
pkg/server/server_sql.go, line 754 at r2 (raw file):
func maybeCheckTenantExists(ctx context.Context, codec keys.SQLCodec, db *kv.DB) error { if codec.ForSystemTenant() { // Skip check for system tenant and return early
nit: period
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in addition to local run test, I did try previously to add a unit test for this method. I used the same logic that you mentioned here and used a non existent tenant id. Oddly enough, the maybeCheckTenantExists as implemented above passes. While debugging it, I wasn't able to figure out whether the test environment mocks the db state somehow and so the system db shows up as existing. I'm trying to figure out why the check passes in the unit test environment.
The code definitely works as expected while running locally and fails for non existent tenants with the following error:
❯ cockroach mt start-sql --insecure --tenant-id 678 --http-addr 127.0.0.1:8081 --kv-addrs 127.0.0.1:26257 --sql-addr 127.0.0.1:36257 --log-dir=logs/mt-678 --logtostderr=INFO
Flag --log-dir has been deprecated, use --log instead to specify 'file-defaults: {dir: ...}'
Flag --logtostderr has been deprecated, use --log instead to specify 'sinks: {stderr: {filter: ...}}'.
W210525 18:20:37.680600 1 1@cli/start.go:1010 [-] 1 ALL SECURITY CONTROLS HAVE BEEN DISABLED!
W210525 18:20:37.680600 1 1@cli/start.go:1010 [-] 1 +
W210525 18:20:37.680600 1 1@cli/start.go:1010 [-] 1 +This mode is intended for non-production testing only.
W210525 18:20:37.680600 1 1@cli/start.go:1010 [-] 1 +
W210525 18:20:37.680600 1 1@cli/start.go:1010 [-] 1 +In this mode:
W210525 18:20:37.680600 1 1@cli/start.go:1010 [-] 1 +- Your cluster is open to any client that can access any of your IP addresses.
W210525 18:20:37.680600 1 1@cli/start.go:1010 [-] 1 +- Intruders with access to your machine or network can observe client-server traffic.
W210525 18:20:37.680600 1 1@cli/start.go:1010 [-] 1 +- Intruders can log in without password and read or write any data in the cluster.
W210525 18:20:37.680600 1 1@cli/start.go:1010 [-] 1 +- Intruders can consume all your server's resources and cause unavailability.
I210525 18:20:37.681510 1 1@cli/start.go:1020 [-] 2 To start a secure server without mandating TLS for clients,
I210525 18:20:37.681510 1 1@cli/start.go:1020 [-] 2 +consider --accept-sql-without-tls instead. For other options, see:
I210525 18:20:37.681510 1 1@cli/start.go:1020 [-] 2 +
I210525 18:20:37.681510 1 1@cli/start.go:1020 [-] 2 +- https://go.crdb.dev/issue-v/53404/v21.2
I210525 18:20:37.681510 1 1@cli/start.go:1020 [-] 2 +- https://www.cockroachlabs.com/docs/v21.2/secure-a-cluster.html
W210525 18:20:37.681592 1 1@cli/start.go:976 [-] 3 Using the default setting for --cache (128 MiB).
W210525 18:20:37.681592 1 1@cli/start.go:976 [-] 3 + A significantly larger value is usually needed for good performance.
W210525 18:20:37.681592 1 1@cli/start.go:976 [-] 3 + If you have a dedicated server a reasonable setting is --cache=.25 (8.0 GiB).
I210525 18:20:37.681640 1 1@cli/start.go:1049 [-] 4 CockroachDB CCL v21.2.0-alpha.00000000-754-g5980f9c353-dirty (x86_64-apple-darwin20.4.0, built 2021/05/25 18:19:32, go1.15.11)
I210525 18:20:37.683532 1 1@cli/start.go:1129 [-] 5 GEOS loaded from directory lib
I210525 18:20:37.692857 1 3@vendor/github.com/cockroachdb/pebble/version_set.go:156 [pebble] 6 [JOB 1] MANIFEST created 000001
I210525 18:20:37.693265 1 3@vendor/github.com/cockroachdb/pebble/open.go:305 [pebble] 7 [JOB 1] WAL created 000002
I210525 18:20:37.693671 40 3@vendor/github.com/cockroachdb/pebble/table_stats.go:118 [pebble] 8 [JOB 2] all initial table stats loaded
I210525 18:20:37.705098 1 server/goroutinedumper/goroutinedumper.go:120 [-] 9 writing goroutine dumps to /Users/rdeodhar/go/src/github.com/cockroachdb/cockroach/logs/mt-678/goroutine_dump
I210525 18:20:37.705316 1 server/heapprofiler/heapprofiler.go:49 [-] 10 writing go heap profiles to /Users/rdeodhar/go/src/github.com/cockroachdb/cockroach/logs/mt-678/heap_profiler at leas
t every 1h0m0s
I210525 18:20:37.705347 1 server/heapprofiler/cgoprofiler.go:53 [-] 11 to enable jmalloc profiling: "export MALLOC_CONF=prof:true" or "ln -s prof:true /etc/malloc.conf"
I210525 18:20:37.705369 1 server/heapprofiler/statsprofiler.go:54 [-] 12 writing memory stats to /Users/rdeodhar/go/src/github.com/cockroachdb/cockroach/logs/mt-678/heap_profiler at last ev
ery 1h0m0s
W210525 18:20:37.710999 127 ccl/kvccl/kvtenantccl/connector.go:159 [sql,tenant-connector] 13 error consuming GossipSubscription RPC: rpc error: code = Canceled desc = context canceled
ERROR: System DB uninitialized, check if tenant is non existent
Failed running "mt start-sql"
So there is definitely something in the test framework which leads this tenant existence check to pass even for a non existent tenant.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rimadeodhar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you push the test so we can get to the bottom of it?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rimadeodhar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nvm, I think I figured out whats going on. I'll publish a new iteration with the unit test :).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rimadeodhar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for completeness, let me explain the reason. The issue I was facing was because I was using a tenant id without a security certificate. So it would block at the tenantConnect.Start and fail. Using the serverutil predefined tenant ids:
func TestTenantID() roachpb.TenantID { |
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rimadeodhar)
0ad0476
to
bc43a30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r2, 4 of 4 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz and @rimadeodhar)
pkg/server/server_sql.go, line 763 at r3 (raw file):
} if result.Value == nil || result.ValueInt() != keys.SystemDatabaseID { return errors.New("System DB uninitialized, check if tenant is non existent")
nit: please do not use capital letters to start errors (see https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/1676804870/Error+concepts+and+handling+in+CockroachDB#Error-messages%2C-hints-and-codes)
pkg/server/server_sql.go, line 766 at r3 (raw file):
} // Tenant has been confirmed to be bootstrapped successfully // as system database, which is a part of the bootstrap data for
nit: "the system database"
bc43a30
to
ad13981
Compare
This commit fixes an issue where attempting to start a SQL Pod through cockroach mt start-sql command returns an error if the tenant id has not been created before. Previously, attempting to start a SQL Pod with a non existent tenant id would crash. Even worse, it would poison the tenant id causing future crdb_internal.create_sql_tenant calls to fail. With this fix, the cockroach mt start-sql command returns an error for non existent tenant ids and future calls to crdb_internal.create_sql_tenant are successful. Resolves: cockroachdb#64963 Release note (bug fix): The cockroach mt start-sql command with a non existent tenant id returns an error. Previously, it would crash and poison the tenant id for future usage.
ad13981
to
57cad09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru and @knz)
TFTR bors r=ajwerner |
Build succeeded: |
This commit fixes an issue where attempting to start a SQL Pod
through cockroach mt start-sql command returns an error if the
tenant id has not been created before. Previously, attempting to
start a SQL Pod with a non existent tenant id would crash. Even
worse, it would poison the tenant id causing future
crdb_internal.create_sql_tenant calls to fail.
With this fix, the cockroach mt start-sql command returns an error
for non existent tenant ids and future calls to
crdb_internal.create_sql_tenant are successful.
Resolves: #64963
Release note (bug fix): The cockroach mt start-sql command with a
non existent tenant id returns an error. Previously, it would
crash and poison the tenant id for future usage.