Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: multitenant/distsql tests access diagnostics.active_query_dumps.enabled setting and use Span after Finish #113170

Closed
DarrylWong opened this issue Oct 26, 2023 · 2 comments · Fixed by #113553
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-multitenant Issues owned by the multi-tenant virtual team

Comments

@DarrylWong
Copy link
Contributor

DarrylWong commented Oct 26, 2023

When runtime assertions are enabled, multitenant/distsql roachtests fails with the assertion error invalid access to SystemOnly setting diagnostics.active_query_dumps.enabled from a virtual cluster. See #110676 for why this is an issue.

The test also fails with use of Span after Finish. Span: server start. Finish previously called at: <stack not captured. Set debugUseAfterFinish>. These failures should be investigated and fixed.

To reproduce this, runtime assertions must be enabled. This can be done by replacing t.StandardCockroach() with t.RuntimeAssertionsCockroach() in multitenant_distsql.go.

https://github.com/cockroachdb/cockroach/blob/1d6e2db725167afd308dc96c2d57e4bd27246064/pkg/cmd/roachtest/tests/multitenant_distsql.go#L65

Then run any of the multitenant/distsql roachtests. The test should always fail with the following errors:

* ERROR: a panic has occurred!
* programming error: invalid access to SystemOnly setting diagnostics.active_query_dumps.enabled from a virtual cluster!
* (1) attached stack trace
*   -- stack trace:
*   | runtime.gopanic
*   | 	GOROOT/src/runtime/panic.go:884
*   | [...repeated from below...]
* Wraps: (2) assertion failure
* Wraps: (3) attached stack trace
*   -- stack trace:
*   | github.com/cockroachdb/cockroach/pkg/settings.(*valuesContainer).checkForbidden
*   | 	github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:130
*   | github.com/cockroachdb/cockroach/pkg/settings.(*valuesContainer).getInt64
*   | 	github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:108
*   | github.com/cockroachdb/cockroach/pkg/settings.(*Values).getInt64
*   | 	github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:242
*   | github.com/cockroachdb/cockroach/pkg/settings.(*BoolSetting).Get
*   | 	github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/bool.go:30
*   | github.com/cockroachdb/cockroach/pkg/server/profiler.(*ActiveQueryProfiler).shouldDump
*   | 	github.com/cockroachdb/cockroach/pkg/server/profiler/activequeryprofiler.go:126
*   | github.com/cockroachdb/cockroach/pkg/server/profiler.(*ActiveQueryProfiler).MaybeDumpQueries
*   | 	github.com/cockroachdb/cockroach/pkg/server/profiler/activequeryprofiler.go:108
*   | github.com/cockroachdb/cockroach/pkg/server.startSampleEnvironment.func1
*   | 	github.com/cockroachdb/cockroach/pkg/server/env_sampler.go:197
*   | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
*   | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484
*   | runtime.goexit
*   | 	src/runtime/asm_amd64.s:1598
* Wraps: (4) programming error: invalid access to SystemOnly setting diagnostics.active_query_dumps.enabled from a virtual cluster!
*   |
*   | TIP: use class ApplicationLevel for settings that configure just 1
*   | virtual cluster; SystemOnly for settings that affect only the shared
*   | storage layer; and SystemVisible for settings that affect the storage
*   | layer and also must be visible to all virtual clusters.
* Error types: (1) *withstack.withStack (2) *assert.withAssertionFailure (3) 
* ERROR: a panic has occurred!
* use of Span after Finish. Span: server start. Finish previously called at: <stack not captured. Set debugUseAfterFinish>
* (1) attached stack trace
*   -- stack trace:
*   | runtime.gopanic
*   | 	GOROOT/src/runtime/panic.go:884
*   | [...repeated from below...]
* Wraps: (2) assertion failure
* Wraps: (3) attached stack trace
*   -- stack trace:
*   | github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).detectUseAfterFinish
*   | 	github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:182
*   | github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).RecordingType
*   | 	github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:442
*   | github.com/cockroachdb/cockroach/pkg/util/tracing.(*Span).IsVerbose
*   | 	github.com/cockroachdb/cockroach/pkg/util/tracing/span.go:450
*   | github.com/cockroachdb/cockroach/pkg/util/log.getSpanOrEventLog
*   | 	github.com/cockroachdb/cockroach/pkg/util/log/trace.go:92
*   | github.com/cockroachdb/cockroach/pkg/util/log.vEventf
*   | 	github.com/cockroachdb/cockroach/pkg/util/log/trace.go:212
*   | github.com/cockroachdb/cockroach/pkg/util/log.VEventf
*   | 	github.com/cockroachdb/cockroach/pkg/util/log/trace.go:238
*   | github.com/cockroachdb/cockroach/pkg/rpc.(*peer).launch
*   | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:227
*   | github.com/cockroachdb/cockroach/pkg/rpc.(*Context).newPeer.func2.1
*   | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:185
*   | runtime/pprof.Do
*   | 	GOROOT/src/runtime/pprof/runtime.go:44
*   | github.com/cockroachdb/cockroach/pkg/rpc.(*Context).newPeer.func2
*   | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/peer.go:184
*   | github.com/cockroachdb/cockroach/pkg/util/circuit.(*Breaker).maybeTriggerProbe
*   | 	github.com/cockroachdb/cockroach/pkg/util/circuit/circuitbreaker.go:260
*   | github.com/cockroachdb/cockroach/pkg/util/circuit.(*Breaker).Probe
*   | 	github.com/cockroachdb/cockroach/pkg/util/circuit/circuitbreaker.go:96
*   | github.com/cockroachdb/cockroach/pkg/rpc.(*Context).grpcDialNodeInternal
*   | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:2177
*   | github.com/cockroachdb/cockroach/pkg/rpc.(*Context).GRPCDialNode
*   | 	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:2128
*   | github.com/cockroachdb/cockroach/pkg/rpc/nodedialer.(*Dialer).dial
*   | 	github.com/cockroachdb/cockroach/pkg/rpc/nodedialer/nodedialer.go:170
*   | github.com/cockroachdb/cockroach/pkg/rpc/nodedialer.(*Dialer).Dial
*   | 	github.com/cockroachdb/cockroach/pkg/rpc/nodedialer/nodedialer.go:103
*   | github.com/cockroachdb/cockroach/pkg/sql.runnerRequest.run
*   | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:120
*   | github.com/cockroachdb/cockroach/pkg/sql.(*runnerCoordinator).init.func1
*   | 	github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:163
*   | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
*   | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484
*   | runtime.goexit
*   | 	src/runtime/asm_amd64.s:1598
* Wraps: (4) use of Span after Finish. Span: server start. Finish previously called at: <stack not captured. Set debugUseAfterFinish>
* Error types: (1) *withstack.withStack (2) *assert.withAssertionFailure (3) *withstack.withStack (4) *errutil.leafError
*

In my testing, disabling either runtime assertion still caused the other to appear, so I believe they are unrelated bugs, but my understanding here is limited so I wouldn't rule it out.

Jira issue: CRDB-32786

@DarrylWong DarrylWong added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-multitenant Issues owned by the multi-tenant virtual team labels Oct 26, 2023
@DarrylWong
Copy link
Contributor Author

The logic to run roachtests with runtime assertions is added in #111949. Just a heads up in case anyone tries to investigate this before it is merged to either wait or rebase off of that pr.

@yuzefovich
Copy link
Member

I tracked down the misuse of the tracing spans to makeTenantSQLServerArgs in which startupCtx might be held in two places (rpc.NewContext and newGRPCServer) throughout the whole life cycle of the tenant server, outliving the time that the tracing span isn't finished (which is done in the defer in createAndStartServerAsync). This diff

diff --git a/pkg/server/tenant.go b/pkg/server/tenant.go
index 40faf044d65..c9eea83f506 100644
--- a/pkg/server/tenant.go
+++ b/pkg/server/tenant.go
@@ -1085,7 +1085,7 @@ func makeTenantSQLServerArgs(
        // RPCs; so it should refuse to serve SQL-to-KV RPCs completely.
        rpcCtxOpts.TenantRPCAuthorizer = tenantcapabilitiesauthorizer.NewAllowNothingAuthorizer()
 
-       rpcContext := rpc.NewContext(startupCtx, rpcCtxOpts)
+       rpcContext := rpc.NewContext(context.Background(), rpcCtxOpts)
 
        if !baseCfg.Insecure {
                // This check mirrors that done in NewServer().
@@ -1252,7 +1252,7 @@ func makeTenantSQLServerArgs(
        externalStorage := esb.makeExternalStorage
        externalStorageFromURI := esb.makeExternalStorageFromURI
 
-       grpcServer, err := newGRPCServer(startupCtx, rpcContext)
+       grpcServer, err := newGRPCServer(context.Background(), rpcContext)
        if err != nil {
                return sqlServerArgs{}, err
        }

fixes that issue.

However, this is probably an undesirable fix. Instead, we can just create a new tracing span in runStartSQL in mt_start_sql.go which is never finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-multitenant Issues owned by the multi-tenant virtual team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants