Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: sqlsmith: nil ctx when draining the vectorized flow #62514

Closed
cockroach-teamcity opened this issue Mar 24, 2021 · 3 comments · Fixed by #63108
Closed

roachtest: sqlsmith: nil ctx when draining the vectorized flow #62514

cockroach-teamcity opened this issue Mar 24, 2021 · 3 comments · Fixed by #63108
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot.

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).sqlsmith/setup=rand-tables/setting=no-ddl failed on release-21.1@23e7cb53bf5baede071832b59bd92ea8164531a6:

The test failed on branch=release-21.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/sqlsmith/setup=rand-tables/setting=no-ddl/run_1
	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2808064-1616565608-14-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: 4961
		2: 5834
		3: dead
		1: 7159
		Error: UNCLASSIFIED_PROBLEM: 3: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 3: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /sqlsmith/setup=rand-tables/setting=no-ddl

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-release-21.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 24, 2021
@yuzefovich
Copy link
Member

yuzefovich commented Mar 24, 2021

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xe5e702]

goroutine 14186 [running]:
panic(0x4494ce0, 0x7b6e120)
	/usr/local/go/src/runtime/panic.go:1064 +0x545 fp=0xc002627068 sp=0xc002626fa0 pc=0x48b285
runtime.panicmem(...)
	/usr/local/go/src/runtime/panic.go:212
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:742 +0x413 fp=0xc002627098 sp=0xc002627068 pc=0x4a1fd3
github.com/cockroachdb/cockroach/pkg/util/tracing.SpanFromContext(0x0, 0x0, 0x4)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/context.go:25 +0x22 fp=0xc0026270d0 sp=0xc002627098 pc=0xe5e702
github.com/cockroachdb/cockroach/pkg/util/log.getSpanOrEventLog(0x0, 0x0, 0x3, 0xc002627100, 0x73f725)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/trace.go:94 +0x39 fp=0xc002627108 sp=0xc0026270d0 pc=0xf79b99
github.com/cockroachdb/cockroach/pkg/util/log.vEventf(0x0, 0x0, 0x0, 0x1, 0x2, 0x4c98368, 0x57, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/trace.go:242 +0x12a fp=0xc002627238 sp=0xc002627108 pc=0xf7af8a
github.com/cockroachdb/cockroach/pkg/util/log.VEvent(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/trace.go:263
github.com/cockroachdb/cockroach/pkg/sql/execinfra.MisplannedRanges(0x0, 0x0, 0xc000ef9da0, 0x1, 0x1, 0xc000000003, 0xc0012dea80, 0x60, 0x60, 0x7f63a0e4fe98)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/readerbase.go:66 +0x8c fp=0xc002627510 sp=0xc002627238 pc=0x23655ec
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*tableReader).generateMeta(0xc002d0c000, 0x0, 0x0, 0x763fcf, 0x203000, 0xc000b883f0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/tablereader.go:298 +0x4b1 fp=0xc0026276e8 sp=0xc002627510 pc=0x2607b71
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*tableReader).generateTrailingMeta(0xc002d0c000, 0x0, 0x20, 0x3)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/tablereader.go:174 +0x47 fp=0xc002627740 sp=0xc0026276e8 pc=0x2606a67
github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*tableReader).generateTrailingMeta-fm(0xc000b88460, 0x70, 0x70)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/tablereader.go:171 +0x2a fp=0xc002627770 sp=0xc002627740 pc=0x261d92a
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).moveToTrailingMeta(0xc002d0c000)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:727 +0x24e fp=0xc0026278b0 sp=0xc002627770 pc=0x2363d4e
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).MoveToDraining(0xc002d0c000, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:611 +0x1d8 fp=0xc0026279a0 sp=0xc0026278b0 pc=0x2363658
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).ConsumerDone(0xc002d0c000)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:942 +0x33 fp=0xc0026279c8 sp=0xc0026279a0 pc=0x2365253
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).MoveToDraining(0xc002d0c900, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:608 +0x1ad fp=0xc002627ab8 sp=0xc0026279c8 pc=0x236362d
github.com/cockroachdb/cockroach/pkg/sql/colexec.(*Columnarizer).DrainMeta(0xc002d0c900, 0x59797c0, 0xc002474150, 0x0, 0x0, 0xc001f29f20)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/columnarizer.go:252 +0x57 fp=0xc002627b70 sp=0xc002627ab8 pc=0x2c2a9b7
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc.(*Outbox).sendMetadata(0xc002af30e0, 0x59797c0, 0xc002474150, 0x7f6372addf80, 0xc001f29dd0, 0x58d9300, 0xc001f29e30, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:311 +0x142 fp=0xc002627db0 sp=0xc002627b70 pc=0x2cb4a62
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc.(*Outbox).runWithStream(0xc002af30e0, 0x59797c0, 0xc002474150, 0x7f6372addf80, 0xc001f29dd0, 0xc001f29d50)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:351 +0x170 fp=0xc002627e28 sp=0xc002627db0 pc=0x2cb51f0
github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc.(*Outbox).Run(0xc002af30e0, 0x5979700, 0xc0032e1c80, 0x58d7c60, 0xc000113f40, 0xaa5a064400000001, 0xa1af8a8dd04a5e65, 0xc072b0180c, 0x1, 0xc001f29d50, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/outbox.go:194 +0x3c9 fp=0xc002627ef8 sp=0xc002627e28 pc=0x2cb44a9
github.com/cockroachdb/cockroach/pkg/sql/colflow.(*vectorizedFlowCreator).setupRemoteOutputStream.func1(0x5979700, 0xc002b73f00, 0xc002c86900)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:657 +0x125 fp=0xc002627f90 sp=0xc002627ef8 pc=0x3287585
github.com/cockroachdb/cockroach/pkg/sql/colflow.(*vectorizedFlowCreatorHelper).accumulateAsyncComponent.func1.1(0xc002dee1e0, 0x5979700, 0xc002b73f00, 0xc002c86900, 0xc002555400)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:1319 +0x44 fp=0xc002627fb8 sp=0xc002627f90 pc=0x328a2e4
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc002627fc0 sp=0xc002627fb8 pc=0x4c46c1
created by github.com/cockroachdb/cockroach/pkg/sql/colflow.(*vectorizedFlowCreatorHelper).accumulateAsyncComponent.func1
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:1318 +0x6f

The SHA contains all of the known fixes 😭

@yuzefovich yuzefovich added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 24, 2021
@yuzefovich yuzefovich changed the title roachtest: sqlsmith/setup=rand-tables/setting=no-ddl failed roachtest: sqlsmith: nil ctx when draining the vectorized flow Mar 25, 2021
@yuzefovich yuzefovich self-assigned this Mar 30, 2021
@yuzefovich
Copy link
Member

My current guess is the following: in sqlsmith test we inject panics in the vectorized engine, both in Init and Next; when an Outbox attempts to initialize its inputs (which is a chain of Columnarizer -> rowexec.tableReader), a panic is emitted; this results in the table reader not being started, so its ProcessorBase.Ctx remains nil); the outbox then proceeds to emit the panic error as metadata and will also drain the metadata sources - which is when we encounter a crash.

I'll try to confirm it, but I think the action items for this issue are:

  • wrap DrainMeta calls with panic-catchers (I actually thought about doing that a few weeks ago but forgot to actually do that)
  • possibly track whether Init succeeded and then swallow any errors from the catcher if it didn't (this is in order to reduce the toil of sqlsmith test - e.g. the outbox will still emit "panic injected in Init" error).

@yuzefovich
Copy link
Member

Yeah, I think this is it - I increased the probability of panic in Init, and almost every run fails, but if I disable the panic injection, then things are fine.

craig bot pushed a commit that referenced this issue Apr 7, 2021
63108: colexec: wrap DrainMeta with panic-catcher and protect columnarizer r=yuzefovich a=yuzefovich

**colexec: wrap DrainMeta with panic-catcher and protect columnarizer**

Previously, in some edge cases (like when a panic is encountered during
`Operator.Init`) the metadata sources could have been uninitialized, so
when we tried to drain them, we'd encounter a crash. In order to avoid
that in the future, now all root components will wrap the draining with
the panic-catcher. Additionally, we now protect the columnarizer in this
case explicitly - if it wasn't initialized, it won't drain the wrapped
processor in `DrainMeta`.

Fixes: #62514.

Release note: None

**rowexec: remove redundant implementations of MetadataSource interface**

Previously, some row-by-row processors implemented
`execinfra.MetadataSource` interface. The idea behind that originally
was to allow for wrapped processors to return their metadata in the
vectorized flow, but nothing explicit is actually needed because every
wrapped processor has a columnarizer after it which will drain the
processor according to row-by-row model (by moving into draining state
and exhausting the trailing meta). This commit removes those redundant
implementations.

This allows us to move the interface into `colexecop` package where it
belongs.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig craig bot closed this as completed in fe1327d Apr 7, 2021
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants