Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
70466: sql: fix EXPLAIN (VEC) in some cases r=yuzefovich a=yuzefovich

This commit fixes the incorrect order of `Close` and `Release`
operations which are executed as part of `EXPLAIN (VEC)` implementation.
The reversed incorrect order (releasing before closing) has been present
since the feature was introduced, but until 21.2 it wasn't causing
issues. In 21.2 time frame we introduce `OpWithMetaInfo` struct which
upon its `Release` method nils out all `Closer`s, so when closing the
input tree to the `explainVecNode` we now can hit a nil pointer crash.

The issue is fixed by delaying the release until after all things have
been closed.

Fixes: #70438.

Release note (bug fix): Previously, `EXPLAIN (VEC)` on some queries
could lead to a crash. The bug has been present only in 21.2 testing
releases.

70498: cli: add system.settings to debug zip r=ajwerner a=ajwerner

This is handy to figure out whether a setting was explicitly set.
It may not be as elegant as annotating crdb_internal.cluster_settings,
but it is simple, backwards compatible, and something.

Probably resolves #59673.

Release note (ops change): cockroach debug zip will now include the raw
system.settings table. This table makes it possible to determine whether
a cluster settings has been explicitly set.

70501: backupccl: unskip TestRestoreOldVersions r=adityamaru a=jbowens

This test was skipped because of #70154. Although we saw two recent
failures, we do expect this to be very infrequent and a long-existing
bug that may affect any unit test with an engine. There's no need to
keep this test skipped until we resolve it.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
  • Loading branch information
4 people committed Sep 21, 2021
4 parents 16d935d + 8b90b55 + 5cf185e + be36f1f commit 1f345ec
Show file tree
Hide file tree
Showing 13 changed files with 60 additions and 11 deletions.
1 change: 0 additions & 1 deletion pkg/ccl/backupccl/restore_old_versions_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@ import (
//
func TestRestoreOldVersions(t *testing.T) {
defer leaktest.AfterTest(t)()
skip.WithIssue(t, 70154, "flaky test")
defer log.Scope(t).Close(t)
const (
testdataBase = "testdata/restore_old_versions"
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/testdata/zip/partial1
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ debug zip --concurrency=1 --cpu-profile-duration=0s /dev/null
[cluster] retrieving SQL data for system.descriptor... writing output: debug/system.descriptor.txt... done
[cluster] retrieving SQL data for system.namespace... writing output: debug/system.namespace.txt... done
[cluster] retrieving SQL data for system.scheduled_jobs... writing output: debug/system.scheduled_jobs.txt... done
[cluster] retrieving SQL data for system.settings... writing output: debug/system.settings.txt... done
[cluster] retrieving SQL data for "".crdb_internal.create_statements... writing output: debug/crdb_internal.create_statements.txt... done
[cluster] retrieving SQL data for "".crdb_internal.create_type_statements... writing output: debug/crdb_internal.create_type_statements.txt... done
[cluster] retrieving SQL data for crdb_internal.kv_node_liveness... writing output: debug/crdb_internal.kv_node_liveness.txt... done
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/testdata/zip/partial1_excluded
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ debug zip /dev/null --concurrency=1 --exclude-nodes=2 --cpu-profile-duration=0
[cluster] retrieving SQL data for system.descriptor... writing output: debug/system.descriptor.txt... done
[cluster] retrieving SQL data for system.namespace... writing output: debug/system.namespace.txt... done
[cluster] retrieving SQL data for system.scheduled_jobs... writing output: debug/system.scheduled_jobs.txt... done
[cluster] retrieving SQL data for system.settings... writing output: debug/system.settings.txt... done
[cluster] retrieving SQL data for "".crdb_internal.create_statements... writing output: debug/crdb_internal.create_statements.txt... done
[cluster] retrieving SQL data for "".crdb_internal.create_type_statements... writing output: debug/crdb_internal.create_type_statements.txt... done
[cluster] retrieving SQL data for crdb_internal.kv_node_liveness... writing output: debug/crdb_internal.kv_node_liveness.txt... done
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/testdata/zip/partial2
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ debug zip --concurrency=1 --cpu-profile-duration=0 /dev/null
[cluster] retrieving SQL data for system.descriptor... writing output: debug/system.descriptor.txt... done
[cluster] retrieving SQL data for system.namespace... writing output: debug/system.namespace.txt... done
[cluster] retrieving SQL data for system.scheduled_jobs... writing output: debug/system.scheduled_jobs.txt... done
[cluster] retrieving SQL data for system.settings... writing output: debug/system.settings.txt... done
[cluster] retrieving SQL data for "".crdb_internal.create_statements... writing output: debug/crdb_internal.create_statements.txt... done
[cluster] retrieving SQL data for "".crdb_internal.create_type_statements... writing output: debug/crdb_internal.create_type_statements.txt... done
[cluster] retrieving SQL data for crdb_internal.kv_node_liveness... writing output: debug/crdb_internal.kv_node_liveness.txt... done
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/testdata/zip/testzip
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ debug zip --concurrency=1 --cpu-profile-duration=1s /dev/null
[cluster] retrieving SQL data for system.descriptor... writing output: debug/system.descriptor.txt... done
[cluster] retrieving SQL data for system.namespace... writing output: debug/system.namespace.txt... done
[cluster] retrieving SQL data for system.scheduled_jobs... writing output: debug/system.scheduled_jobs.txt... done
[cluster] retrieving SQL data for system.settings... writing output: debug/system.settings.txt... done
[cluster] retrieving SQL data for "".crdb_internal.create_statements... writing output: debug/crdb_internal.create_statements.txt... done
[cluster] retrieving SQL data for "".crdb_internal.create_type_statements... writing output: debug/crdb_internal.create_type_statements.txt... done
[cluster] retrieving SQL data for crdb_internal.kv_node_liveness... writing output: debug/crdb_internal.kv_node_liveness.txt... done
Expand Down
3 changes: 3 additions & 0 deletions pkg/cli/testdata/zip/testzip_concurrent
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,9 @@ zip
[cluster] retrieving SQL data for system.scheduled_jobs...
[cluster] retrieving SQL data for system.scheduled_jobs: done
[cluster] retrieving SQL data for system.scheduled_jobs: writing output: debug/system.scheduled_jobs.txt...
[cluster] retrieving SQL data for system.settings...
[cluster] retrieving SQL data for system.settings: done
[cluster] retrieving SQL data for system.settings: writing output: debug/system.settings.txt...
[cluster] retrieving the node status to get the SQL address...
[cluster] retrieving the node status to get the SQL address: ...
[cluster] using SQL address: ...
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/zip.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ var customQuery = map[string]string{
") SELECT * FROM spans, LATERAL crdb_internal.payloads_for_span(span_id)",
"system.jobs": "SELECT *, to_hex(payload) AS hex_payload, to_hex(progress) AS hex_progress FROM system.jobs",
"system.descriptor": "SELECT *, to_hex(descriptor) AS hex_descriptor FROM system.descriptor",
"system.settings": "SELECT *, to_hex(value::bytes) as hex_value FROM system.settings",
}

type debugZipContext struct {
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/zip_cluster_wide.go
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ var debugZipTablesPerCluster = []string{
"system.descriptor", // descriptors also contain job-like mutation state.
"system.namespace",
"system.scheduled_jobs",
"system.settings", // get the raw settings to determine what's explicitly set.

// The synthetic SQL CREATE statements for all tables.
// Note the "". to collect across all databases.
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/zip_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ ORDER BY name ASC`)
"system.descriptor",
"system.namespace",
"system.scheduled_jobs",
"system.settings",
)
sort.Strings(tables)

Expand Down
33 changes: 25 additions & 8 deletions pkg/sql/colflow/explain_vec.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,15 @@ type flowWithNode struct {

// ExplainVec converts the flows (that are assumed to be vectorizable) into the
// corresponding string representation.
//
// It also supports printing of already constructed operator chains which takes
// priority if non-nil (flows are ignored). All operators in opChains are
// assumed to be planned on the gateway.
//
// As the second return parameter it returns a non-nil cleanup function which
// can be called only **after** closing the planNode tree containing the
// explainVecNode (if ExplainVec is used by another caller, then it can be
// called at any time).
func ExplainVec(
ctx context.Context,
flowCtx *execinfra.FlowCtx,
Expand All @@ -113,14 +119,25 @@ func ExplainVec(
gatewayNodeID roachpb.NodeID,
verbose bool,
distributed bool,
) ([]string, error) {
) (_ []string, cleanup func(), _ error) {
tp := treeprinter.NewWithStyle(treeprinter.CompactStyle)
root := tp.Child("│")
var conversionErr error
var (
cleanups []func()
err error
conversionErr error
)
defer func() {
cleanup = func() {
for _, c := range cleanups {
c()
}
}
}()
// It is possible that when iterating over execinfra.OpNodes we will hit a
// panic (an input that doesn't implement OpNode interface), so we're
// catching such errors.
if err := colexecerror.CatchVectorizedRuntimeError(func() {
if err = colexecerror.CatchVectorizedRuntimeError(func() {
if opChains != nil {
formatChains(root, gatewayNodeID, opChains, verbose)
} else {
Expand All @@ -132,8 +149,8 @@ func ExplainVec(
// last.
sort.Slice(sortedFlows, func(i, j int) bool { return sortedFlows[i].nodeID < sortedFlows[j].nodeID })
for _, flow := range sortedFlows {
opChains, cleanup, err := convertToVecTree(ctx, flowCtx, flow.flow, localProcessors, !distributed)
defer cleanup()
opChains, cleanup, err = convertToVecTree(ctx, flowCtx, flow.flow, localProcessors, !distributed)
cleanups = append(cleanups, cleanup)
if err != nil {
conversionErr = err
return
Expand All @@ -142,12 +159,12 @@ func ExplainVec(
}
}
}); err != nil {
return nil, err
return nil, nil, err
}
if conversionErr != nil {
return nil, conversionErr
return nil, nil, conversionErr
}
return tp.FormattedRows(), nil
return tp.FormattedRows(), nil, nil
}

func formatChains(
Expand Down
3 changes: 2 additions & 1 deletion pkg/sql/distsql_physical_planner.go
Original file line number Diff line number Diff line change
Expand Up @@ -740,11 +740,12 @@ func (p *PlanningCtx) getDefaultSaveFlowsFunc(
if planner.instrumentation.collectBundle && planner.curPlan.flags.IsSet(planFlagVectorized) {
flowCtx := newFlowCtxForExplainPurposes(p, planner)
getExplain := func(verbose bool) []string {
explain, err := colflow.ExplainVec(
explain, cleanup, err := colflow.ExplainVec(
ctx, flowCtx, flows, p.infra.LocalProcessors, opChains,
planner.extendedEvalCtx.DistSQLPlanner.gatewayNodeID,
verbose, planner.curPlan.flags.IsDistributed(),
)
cleanup()
if err != nil {
// In some edge cases (like when subqueries are present or
// when certain component doesn't implement execinfra.OpNode
Expand Down
7 changes: 6 additions & 1 deletion pkg/sql/explain_vec.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ type explainVecNode struct {
lines []string
// The current row returned by the node.
values tree.Datums
// cleanup will be called after closing the input tree.
cleanup func()
}
}

Expand Down Expand Up @@ -74,7 +76,7 @@ func (n *explainVecNode) startExec(params runParams) error {
return errors.New("vectorize is set to 'off'")
}
verbose := n.options.Flags[tree.ExplainFlagVerbose]
n.run.lines, err = colflow.ExplainVec(
n.run.lines, n.run.cleanup, err = colflow.ExplainVec(
params.ctx, flowCtx, flows, physPlan.LocalProcessors, nil, /* opChains */
distSQLPlanner.gatewayNodeID, verbose, willDistribute,
)
Expand Down Expand Up @@ -131,4 +133,7 @@ func (n *explainVecNode) Next(runParams) (bool, error) {
func (n *explainVecNode) Values() tree.Datums { return n.run.values }
func (n *explainVecNode) Close(ctx context.Context) {
n.plan.close(ctx)
if n.run.cleanup != nil {
n.run.cleanup()
}
}
17 changes: 17 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/vectorize_local
Original file line number Diff line number Diff line change
Expand Up @@ -447,3 +447,20 @@ EXPLAIN (VEC)
└ *sql.planNodeToRowSource
└ *colfetcher.ColIndexJoin
└ *colfetcher.ColBatchScan

# Regression test for releasing operators before closing them with EXPLAIN (VEC)
# (#70438).
statement ok
CREATE TABLE t70438 (k INT PRIMARY KEY, v INT, UNIQUE INDEX foo (v));
INSERT INTO t70438 VALUES (1, 2), (3, 4), (5, 6), (7, 8);

query T
EXPLAIN (VEC) DELETE FROM t70438 WHERE k=3 OR v=6
----
└ Node 1
└ *sql.planNodeToRowSource
└ *colexec.unorderedDistinct
└ *colexec.SerialUnorderedSynchronizer
├ *colfetcher.ColBatchScan
└ *colfetcher.ColBatchScan

0 comments on commit 1f345ec

Please sign in to comment.