Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: v20.1.6: unexpected leftover bytes on flow shutdown of changefeeds #55408

Closed
cockroach-teamcity opened this issue Oct 10, 2020 · 6 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity O-sentry Originated from an in-the-wild panic report. T-sql-queries SQL Queries Team X-stale

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Oct 10, 2020

This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry link: https://sentry.io/organizations/cockroach-labs/issues/1948355510/?referrer=webhooks_plugin

Panic message:

bytes_usage.go:397: flow: unexpected 286720 leftover bytes | string

Stacktrace (expand for inline code snippets):

if check && mm.mu.curAllocated != 0 {
log.ReportOrPanic(
ctx, &mm.settings.SV,
in pkg/util/mon.(*BytesMonitor).doStop
func (mm *BytesMonitor) Stop(ctx context.Context) {
mm.doStop(ctx, true)
}
in pkg/util/mon.(*BytesMonitor).Stop
func (ctx *EvalContext) Stop(c context.Context) {
ctx.Mon.Stop(c)
}
in pkg/sql/sem/tree.(*EvalContext).Stop
// This closes the monitor opened in ServerImpl.setupFlow.
f.EvalCtx.Stop(ctx)
for _, p := range f.processors {
in pkg/sql/flowinfra.(*FlowBase).Cleanup
func (f *rowBasedFlow) Cleanup(ctx context.Context) {
f.FlowBase.Cleanup(ctx)
f.Release()
in pkg/sql/rowflow.(*rowBasedFlow).Cleanup
curPlan.close(ctx)
flow.Cleanup(ctx)
}
in pkg/sql.(*DistSQLPlanner).Run.func8
if recv.commErr != nil || res.Err() != nil {
return recv.bytesRead, recv.rowsRead, recv.commErr
}
in pkg/sql.(*connExecutor).execWithDistSQLEngine
ex.sessionTracing.TraceExecStart(ctx, "distributed")
bytesRead, rowsRead, err := ex.execWithDistSQLEngine(ctx, planner, stmt.AST.StatementType(), res, distributePlan, progAtomic)
ex.sessionTracing.TraceExecEnd(ctx, res.Err(), res.RowsAffected())
in pkg/sql.(*connExecutor).dispatchToExecutionEngine
p.autoCommit = os.ImplicitTxn.Get() && !ex.server.cfg.TestingKnobs.DisableAutoCommit
if err := ex.dispatchToExecutionEngine(ctx, p, res); err != nil {
return nil, nil, err
in pkg/sql.(*connExecutor).execStmtInOpenState
} else {
ev, payload, err = ex.execStmtInOpenState(ctx, stmt, res, pinfo)
}
in pkg/sql.(*connExecutor).execStmt
if !portal.exhausted {
ev, payload, err = ex.execStmt(stmtCtx, curStmt, stmtRes, pinfo)
// Portal suspension is supported via a "side" state machine
in pkg/sql.(*connExecutor).execPortal
res = stmtRes
ev, payload, err = ex.execPortal(ctx, portal, portalName, stmtRes, pinfo)
if err != nil {
in pkg/sql.(*connExecutor).execCmd
var err error
if err = ex.execCmd(ex.Ctx()); err != nil {
if err == io.EOF || err == errDrainingComplete {
in pkg/sql.(*connExecutor).run
}()
return h.ex.run(ctx, s.pool, reserved, cancel)
}
in pkg/sql.(*Server).ServeConn
reservedOwned = false // We're about to pass ownership away.
retErr = sqlServer.ServeConn(ctx, connHandler, reserved, cancelConn)
}()
in pkg/sql/pgwire.(*conn).processCommandsAsync.func1

pkg/util/mon/bytes_usage.go in pkg/util/mon.(*BytesMonitor).doStop at line 397
pkg/util/mon/bytes_usage.go in pkg/util/mon.(*BytesMonitor).Stop at line 384
pkg/sql/sem/tree/eval.go in pkg/sql/sem/tree.(*EvalContext).Stop at line 2932
pkg/sql/flowinfra/flow.go in pkg/sql/flowinfra.(*FlowBase).Cleanup at line 423
pkg/sql/rowflow/row_based_flow.go in pkg/sql/rowflow.(*rowBasedFlow).Cleanup at line 414
pkg/sql/distsql_running.go in pkg/sql.(*DistSQLPlanner).Run.func8 at line 422
pkg/sql/conn_executor_exec.go in pkg/sql.(*connExecutor).execWithDistSQLEngine at line 946
pkg/sql/conn_executor_exec.go in pkg/sql.(*connExecutor).dispatchToExecutionEngine at line 832
pkg/sql/conn_executor_exec.go in pkg/sql.(*connExecutor).execStmtInOpenState at line 537
pkg/sql/conn_executor_exec.go in pkg/sql.(*connExecutor).execStmt at line 96
pkg/sql/conn_executor_exec.go in pkg/sql.(*connExecutor).execPortal at line 152
pkg/sql/conn_executor.go in pkg/sql.(*connExecutor).execCmd at line 1461
pkg/sql/conn_executor.go in pkg/sql.(*connExecutor).run at line 1335
pkg/sql/conn_executor.go in pkg/sql.(*Server).ServeConn at line 479
pkg/sql/pgwire/conn.go in pkg/sql/pgwire.(*conn).processCommandsAsync.func1 at line 582
Tag Value
Cockroach Release v20.1.6
Cockroach SHA: be8c0a7
Platform linux amd64
Distribution CCL
Environment v20.1.6
Command server
Go Version go1.13.9
# of CPUs 2
# of Goroutines 522

Jira issue: CRDB-3670

@cockroach-teamcity cockroach-teamcity added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. labels Oct 10, 2020
@yuzefovich
Copy link
Member

EXPERIMENTAL CHANGEFEED FOR TABLE _ WITH _, _, _ = _

@yuzefovich yuzefovich changed the title sentry: bytes_usage.go:397: flow: unexpected 286720 leftover bytes | string sql: v20.1.6: unexpected leftover bytes on flow shutdown of changefeeds Oct 10, 2020
@yuzefovich
Copy link
Member

I think I have a reliable repro of this issue here with make test PKG=./pkg/ccl/changefeedccl/ TESTS=TestChangefeedBasics/sinkless TESTFLAGS='-v --show-logs'. It is likely related to the vectorized engine, so taking this back to SQL execution board.

@yuzefovich
Copy link
Member

I think I understand what's going on, and it's not related to the vectorized engine.

The issue is that we might be stopping kvFeed memory monitor before the memory accounts bound to it are closed - this can occur because the two components are stopped in different goroutines: the former is stopped in the main goroutine of changeAggregator processor when it is shutdown, and the latter is stopped when kvFeed.runUntilTableEvent returns. It seems to me that some synchronization is missing, so I'm kicking this off back to CDC project. cc @HonoreDB @ajwerner

@yuzefovich
Copy link
Member

yuzefovich commented Oct 19, 2020

Actually, I spent a bit more time looking into this - if my understanding was correct, then we would be seeing such behavior on master as well, but it only happens on my branch when we're trying to use some parts of the vectorized engine in combination with the changefeeds. Now I think the problem is that in the vectorized engine we rely on the context cancellation in order to shutdown all of the components of the flow, and it seems like such shutdown trips up the concurrent goroutines of kv feed. Not sure where exactly the issue lies though, but it seems like yet another incompatibility of changefeeds and the vectorized engine. I think I'll take it back into our project, and we'll probably put it on the backlog for now given #55616.

@github-actions
Copy link

github-actions bot commented Sep 6, 2023

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-project-automation github-project-automation bot moved this to Triage in SQL Queries May 2, 2024
Copy link

github-actions bot commented Sep 9, 2024

We have marked this issue as stale because it has been inactive for
12 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 23, 2024
@github-project-automation github-project-automation bot moved this from Backlog to Done in SQL Queries Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. no-issue-activity O-sentry Originated from an in-the-wild panic report. T-sql-queries SQL Queries Team X-stale
Projects
Archived in project
Development

No branches or pull requests

3 participants