Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: remove race-causing lazy calculation of properties #36148

Closed
maddyblue opened this issue Mar 26, 2019 · 17 comments
Closed

opt: remove race-causing lazy calculation of properties #36148

maddyblue opened this issue Mar 26, 2019 · 17 comments
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sqlsmith
Milestone

Comments

@maddyblue
Copy link
Contributor

WARNING: DATA RACE
Read at 0x000005ee2ca8 by goroutine 340:
  github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*FiltersItem).ScalarProps()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/expr.og.go:6310 +0x47
  github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*CustomFuncs).IsContradiction()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/custom_funcs.go:429 +0x5e
  github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*Factory).ConstructSelect()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/factory.og.go:273 +0x1291
  github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*Factory).ConstructSelect()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/factory.og.go:159 +0x671
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildWhere()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:728 +0x42b
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildSelectClause()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:636 +0xd0
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildSelect()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:589 +0x47e
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildStmt()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/builder.go:215 +0x3c4
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*scope).replaceSubquery()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/scope.go:1102 +0x28c
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*scope).VisitPre()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/scope.go:878 +0x2fc
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.WalkExpr()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:680 +0x83
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*scope).walkExprTree()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/scope.go:266 +0x7e
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*scope).resolveType()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/scope.go:302 +0x59
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).analyzeSelectList()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/project.go:146 +0x65e
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).analyzeReturningList()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/project.go:98 +0x28a
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*mutationBuilder).buildReturning()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/mutation_builder.go:677 +0x9ad
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*mutationBuilder).buildDelete()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/delete.go:92 +0x199
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildDelete()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/delete.go:78 +0x4d4
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildStmt()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/builder.go:206 +0x28f
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildDataSource()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:121 +0x105e
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildDataSource()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:57 +0x3da
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildFromTables()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:748 +0xa1
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildFromTables()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:755 +0x115
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildFrom()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:694 +0xdd
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildSelectClause()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:635 +0x85
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildSelect()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:589 +0x47e
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildStmt()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/builder.go:215 +0x3c4
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*scope).replaceSubquery()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/scope.go:1102 +0x28c
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*scope).VisitPre()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/scope.go:878 +0x2fc
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.WalkExpr()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:680 +0x83
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*CastExpr).Walk()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:131 +0x6c
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.WalkExpr()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:683 +0x45b
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*BinaryExpr).Walk()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:73 +0xe3
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.WalkExpr()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:683 +0x45b
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*ParenExpr).Walk()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:466 +0x7e
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.WalkExpr()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:683 +0x45b
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*CastExpr).Walk()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:131 +0x6c
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.WalkExpr()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:683 +0x45b
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.walkExprSlice()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:515 +0xe4
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*FuncExpr).Walk()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:303 +0x97
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.WalkExpr()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:683 +0x45b
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*CastExpr).Walk()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:131 +0x6c
  github.com/cockroachdb/cockroach/pkg/sql/sem/tree.WalkExpr()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/walk.go:683 +0x45b
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*scope).walkExprTree()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/scope.go:266 +0x7e
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*scope).resolveType()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/scope.go:302 +0x59
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).analyzeSelectList()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/project.go:146 +0x65e
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).analyzeProjectionList()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/project.go:80 +0x302
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildSelectClause()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:644 +0x31e
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildSelect()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:589 +0x47e
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*mutationBuilder).buildInputForInsert()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/insert.go:538 +0x325
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildInsert()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/insert.go:221 +0x446
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildStmt()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/builder.go:212 +0x4c0
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).Build()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/builder.go:159 +0x23e
  github.com/cockroachdb/cockroach/pkg/sql.(*optPlanningCtx).buildExecMemo()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:425 +0x3b3
  github.com/cockroachdb/cockroach/pkg/sql.(*planner).makeOptimizerPlan()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:154 +0xea
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).makeExecPlan()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:987 +0x1ee
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:874 +0x1f5
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:459 +0xf86
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:101 +0x7d9
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1183 +0x37d3
  github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:433 +0xef
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl.func4()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:338 +0xfc

Previous write at 0x000005ee2ca8 by goroutine 260:
  github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*logicalPropsBuilder).buildFiltersItemProps()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/logical_props_builder.go:1101 +0x275
  github.com/cockroachdb/cockroach/pkg/sql/opt/memo.(*FiltersItem).ScalarProps()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/memo/expr.og.go:6311 +0x97
  github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*CustomFuncs).IsContradiction()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/custom_funcs.go:429 +0x5e
  github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*Factory).ConstructSelect()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/factory.og.go:273 +0x1291
  github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*Factory).ConstructSelect()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/factory.og.go:159 +0x671
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildWhere()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/select.go:728 +0x42b
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*mutationBuilder).buildInputForUpdateOrDelete()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/mutation_builder.go:198 +0x1aa
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildUpdate()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/update.go:108 +0x414
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).buildStmt()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/builder.go:224 +0x23c
  github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder.(*Builder).Build()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/opt/optbuilder/builder.go:159 +0x23e
  github.com/cockroachdb/cockroach/pkg/sql.(*optPlanningCtx).buildExecMemo()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:425 +0x3b3
  github.com/cockroachdb/cockroach/pkg/sql.(*planner).makeOptimizerPlan()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:154 +0xea
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).makeExecPlan()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:987 +0x1ee
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:874 +0x1f5
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:459 +0xf86
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:101 +0x7d9
  github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1183 +0x37d3
  github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:433 +0xef
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl.func4()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:338 +0xfc

Goroutine 340 (running) created at:
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:321 +0x1580
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.serveConn()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:170 +0x2e8
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*Server).ServeConn()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:518 +0xc4e
  github.com/cockroachdb/cockroach/pkg/server.(*Server).Start.func20.1()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1713 +0x1b7
  github.com/cockroachdb/cockroach/pkg/util/netutil.(*Server).ServeWith.func1()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/util/netutil/net.go:139 +0xdf

Goroutine 260 (running) created at:
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:321 +0x1580
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.serveConn()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:170 +0x2e8
  github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*Server).ServeConn()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:518 +0xc4e
  github.com/cockroachdb/cockroach/pkg/server.(*Server).Start.func20.1()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1713 +0x1b7
  github.com/cockroachdb/cockroach/pkg/util/netutil.(*Server).ServeWith.func1()
      /home/mjibson/src/github.com/cockroachdb/cockroach/pkg/util/netutil/net.go:139 +0xdf

Found while running multiple sqlsmiths:

make testrace 'PKG=./pkg/sql/tests' 'TESTS=SQLSmith' 'TESTFLAGS=-rsg 10m -rsg-routines 5 -v'
@maddyblue maddyblue added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 26, 2019
@justinj
Copy link
Contributor

justinj commented Mar 26, 2019

I haven't investigated this at all, but is there some way we can rig this up to capture more info? Like having each goroutine running queries keep the last n queries it ran, to have something as a starting point?

@jordanlewis
Copy link
Member

I don't think such a rig will help much, as it's probably pretty random when something like this triggers. It looks like a case of memory reuse gone wrong.

@maddyblue
Copy link
Contributor Author

Is there even a way to get a callback into our own Go code after the data race detector triggers? It doesn't panic the program, it keeps on running. That is, even if we kept this log, how would we know when to print it? Alternatively we could just always write queries and timestamps to files and then try to backtrack from when the data race occurred what was running?

@RaduBerinde
Copy link
Member

@mjibson do you know what sha this was on?

@RaduBerinde
Copy link
Member

We are lazily building the scalar props for FiltersItem, which seems problematic when sharing memos. However in practice I believe it should always have been calculated during the initial memo creation.

In any case, the code path is through this case (in plan_opt.go:425) so we aren't using a cached memo:

	// We are executing a statement for which there is no reusable memo
	// available.
	f := opc.optimizer.Factory()
	bld := optbuilder.New(ctx, &p.semaCtx, p.EvalContext(), &opc.catalog, f, opc.p.stmt.AST)
	if err := bld.Build(); err != nil {
		return nil, bld.IsCorrelated, err
	}

@maddyblue
Copy link
Contributor Author

I ran this last night on master.

@RaduBerinde
Copy link
Member

My guess is that the race is caused by two queries getting planned on the same planner. Both goroutines were created by this code in pgwire/conn.serveImpl:

	var writerErr error
	if sqlServer != nil {
		wg.Add(1)
		go func() {
			defer func() {
				if sqlServer.GetExecutorConfig().TestingKnobs.CatchPanics {
					if r := recover(); r != nil {
						// Catch the panic and return it to the client as an error.
						err := pgerror.NewAssertionErrorf("caught fatal error: %v", r)
						_ = writeErr(ctx, &sqlServer.GetExecutorConfig().Settings.SV,
							err, &c.msgBuilder, &c.writerState.buf)
						_ /* n */, _ /* err */ = c.writerState.buf.WriteTo(c.conn)
						c.stmtBuf.Close()
						// Send a ready for query to make sure the client can react.
						c.bufferReadyForQuery('I')
					}
				}
				wg.Done()
				cancelConn()
			}()
			writerErr = sqlServer.ServeConn(ctx, connHandler, reserved, cancelConn)
			// TODO(andrei): Should we sometimes transmit the writerErr's to the
			// client?
		}()
	}

This code was reworked in #35776. I'm not sure if the problem remains.

@maddyblue
Copy link
Contributor Author

I'll run the test again with that PR and see what happens.

@RaduBerinde
Copy link
Member

After talking to Andrei and thinking about it some more, I don't think that PR would make a difference. This is where the executor goroutine is expected to be spawned from. The question is why would these two instances have anything in common with each other.

@maddyblue
Copy link
Contributor Author

Indeed. I confirmed the race is still present after that PR.

@jordanlewis
Copy link
Member

If we had a normal goroutine dump from this time, we could compare the planner pointers. As it is, it's hard to see what's being shared exactly.

@RaduBerinde
Copy link
Member

I've been trying to repro but had no luck. I wanted to try reproing with the diff below to see if we are indeed planning with the same memo:

diff --git a/pkg/sql/opt/memo/memo.go b/pkg/sql/opt/memo/memo.go
index 80b5015..b9df367 100644
--- a/pkg/sql/opt/memo/memo.go
+++ b/pkg/sql/opt/memo/memo.go
@@ -16,6 +16,8 @@ package memo
 
 import (
 	"context"
+	"fmt"
+	"sync/atomic"
 
 	"github.com/cockroachdb/cockroach/pkg/sql/opt"
 	"github.com/cockroachdb/cockroach/pkg/sql/opt/cat"
@@ -144,6 +146,21 @@ type Memo struct {
 	curID opt.ScalarID
 
 	// WARNING: if you add more members, add initialization code in Init.
+	useCount int32
+}
+
+func (m *Memo) Lock() {
+	val := atomic.AddInt32(&m.useCount, 1)
+	if val != 1 {
+		panic(fmt.Sprintf("already in use!! val: %d", val))
+	}
+}
+
+func (m *Memo) Unlock() {
+	val := atomic.AddInt32(&m.useCount, -1)
+	if val != 0 {
+		panic(fmt.Sprintf("unexpected value %d after unlock", val))
+	}
 }
 
diff --git a/pkg/sql/plan_opt.go b/pkg/sql/plan_opt.go
index e943177..ae73627 100644
--- a/pkg/sql/plan_opt.go
+++ b/pkg/sql/plan_opt.go
@@ -150,6 +150,9 @@ func (p *planner) makeOptimizerPlan(ctx context.Context) (_ *planTop, isCorrelat
 
 	opc := &p.optPlanningCtx
 	opc.reset()
+	m := opc.optimizer.Factory().Memo()
+	m.Lock()
+	defer m.Unlock()
 
 	execMemo, isCorrelated, err := opc.buildExecMemo(ctx)
 	if err != nil {

@RaduBerinde
Copy link
Member

Still can't repro. Ran for 1hr on gceworker.

@maddyblue
Copy link
Contributor Author

This only reproduced two times for me. One time was immediately on startup. Maybe run the test with a 1m timeout in a loop? Sometimes sqlsmith generates very long running queries and all go routines get stuck waiting for those. Running it in a loop might help things? Unclear.

@RaduBerinde
Copy link
Member

Good call, I was able to hit it once. This was with some additional checks so I ruled out that we're sharing planners.

@RaduBerinde
Copy link
Member

RaduBerinde commented Mar 28, 2019

I hit it again and we are definitely not using the same memo. I think this race is due to the singletons like FalseFilter. It is possible that two threads will try to create the scalar properties for one of these at the same time. This would explain why we only hit the race in the beginning - once a query builds these properties, we won't ever hit it.

The race is innocuous so this shouldn't be a problem in production.

@RaduBerinde RaduBerinde added this to the 19.2 milestone Apr 24, 2019
@RaduBerinde RaduBerinde changed the title opt: data race: opt/memo.props.Scalar opt: remove race-causing lazy calculation of properties Apr 29, 2019
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 2, 2019
This commit fixes a race condition where two threads could be
simultaneously trying to build the logical properties of a filters
item and stepping on each others toes. In particular, one thread
could set scalar.Constraints to nil, causing a panic when another
thread tries to check whether scalar.Constraints.IsUnconstrained().
This commit fixes the issue by using a local variable to check whether
the constraint set is unconstrained.

Fixes cockroachdb#37951
Informs cockroachdb#37073
Informs cockroachdb#36148

Release note (bug fix): Fixed a race condition that could cause a
panic during query planning.
craig bot pushed a commit that referenced this issue Jun 3, 2019
37972: opt: fix data race when building filters item props r=rytaft a=rytaft

This commit fixes a race condition where two threads could be
simultaneously trying to build the logical properties of a filters
item and stepping on each others toes. In particular, one thread
could set `scalar.Constraints` to nil, causing a panic when another
thread tries to check whether `scalar.Constraints.IsUnconstrained()`.
This commit fixes the issue by using a local variable to check whether
the constraint set is unconstrained.

Fixes #37951
Informs #37073
Informs #36148

Release note (bug fix): Fixed a race condition that could cause a
panic during query planning.

Co-authored-by: Rebecca Taft <[email protected]>
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 3, 2019
This commit fixes a race condition where two threads could be
simultaneously trying to build the logical properties of a filters
item and stepping on each others toes. In particular, one thread
could set scalar.Constraints to nil, causing a panic when another
thread tries to check whether scalar.Constraints.IsUnconstrained().
This commit fixes the issue by using a local variable to check whether
the constraint set is unconstrained.

Fixes cockroachdb#37951
Informs cockroachdb#37073
Informs cockroachdb#36148

Release note (bug fix): Fixed a race condition that could cause a
panic during query planning.
@RaduBerinde
Copy link
Member

See some discussion in #37974 (comment)

andy-kimball added a commit to andy-kimball/cockroach that referenced this issue Dec 20, 2019
Previously, ListItem scalar operators had a method like this:

  ScalarProps(mem *memo.Memo)

It lazily constructed scalar properties when called. The problem was
that this required the Memo to be passed to many contexts simply for
the purpose of calling ScalarProps, which was inconvenient. In
addition, there will be thread-safety issues if we ever call this
method after the expression tree becomes immutable (e.g. after it's
been added to the query cache).

To fix this, this patch constructs the ScalarProps greedily rather
than lazily. All locations that directly constructed ListItem structs
now call into new Factory methods that both construct the structs and
populate them with scalar properties. This way, the scalar properties
are immutable and always available.

Fixes cockroachdb#36148

Release note: none
@craig craig bot closed this as completed in 2e04e21 Dec 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sqlsmith
Projects
None yet
Development

No branches or pull requests

5 participants