Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: storage engine is used after being closed in TestCheckConsistencyInconsistent #110293

Closed
yuzefovich opened this issue Sep 8, 2023 · 2 comments · Fixed by #106195
Closed
Assignees
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered).

Comments

@yuzefovich
Copy link
Member

yuzefovich commented Sep 8, 2023

Pick up #106195, and then we get this (found when working on #106459)

=== RUN   TestCheckConsistencyInconsistent
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/59d2a1c5950246cd61480297d44235f9/logTestCheckConsistencyInconsistent2374593241
    test_log_scope.go:81: use -show-logs to present logs inline
    consistency_queue_test.go:382: found a checkpoint at auxiliary/checkpoints/r5_at_30
    consistency_queue_test.go:382: found a checkpoint at auxiliary/checkpoints/r5_at_30
    consistency_queue_test.go:382: found a checkpoint at auxiliary/checkpoints/r5_at_30
*
* ERROR: a panic has occurred!
* engine is closed; the Engine-provided filesystem cannot be used after Close
* (1) attached stack trace
*   -- stack trace:
*   | runtime.gopanic
*   | 	GOROOT/src/runtime/panic.go:884
*   | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Stop
*   | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:521
*   | runtime.gopanic
*   | 	GOROOT/src/runtime/panic.go:890
*   | [...repeated from below...]
* Wraps: (2) assertion failure
* Wraps: (3) attached stack trace
*   -- stack trace:
*   | github.com/cockroachdb/cockroach/pkg/storage.(*noUseAfterClose).ensureStillOpen
*   | 	github.com/cockroachdb/cockroach/pkg/storage/pebble.go:2787
*   | github.com/cockroachdb/cockroach/pkg/storage.(*noUseAfterClose).Open
*   | 	github.com/cockroachdb/cockroach/pkg/storage/pebble.go:2856
*   | github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestCheckConsistencyInconsistent
*   | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/consistency_queue_test.go:416
*   | testing.tRunner
*   | 	GOROOT/src/testing/testing.go:1446
*   | runtime.goexit
*   | 	GOROOT/src/runtime/asm_amd64.s:1594
* Wraps: (4) engine is closed; the Engine-provided filesystem cannot be used after Close
* Error types: (1) *withstack.withStack (2) *assert.withAssertionFailure (3) *withstack.withStack (4) *errutil.leafError
*
    stopper.go:230: -- test log scope end --

ERROR: a panic has occurred!
Details cannot be printed yet because we are still unwinding.
Hopefully the test harness prints the panic below, otherwise check the test logs.

test logs left over in: /artifacts/tmp/_tmp/59d2a1c5950246cd61480297d44235f9/logTestCheckConsistencyInconsistent2374593241
    stopper.go:230: panic: engine is closed; the Engine-provided filesystem cannot be used after Close
--- FAIL: TestCheckConsistencyInconsistent (11.31s)
panic: engine is closed; the Engine-provided filesystem cannot be used after Close [recovered]
	panic: engine is closed; the Engine-provided filesystem cannot be used after Close [recovered]
	panic: engine is closed; the Engine-provided filesystem cannot be used after Close [recovered]
	panic: engine is closed; the Engine-provided filesystem cannot be used after Close [recovered]
	panic: engine is closed; the Engine-provided filesystem cannot be used after Close

goroutine 19134 [running]:
testing.tRunner.func1.2({0x60c3280, 0xc004e7e9e0})
	GOROOT/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
	GOROOT/src/testing/testing.go:1399 +0x39f
panic({0x60c3280, 0xc004e7e9e0})
	GOROOT/src/runtime/panic.go:884 +0x212
github.com/cockroachdb/cockroach/pkg/util/leaktest.AfterTest.func2()
	github.com/cockroachdb/cockroach/pkg/util/leaktest/leaktest.go:133 +0x2f5
panic({0x60c3280, 0xc004e7e9e0})
	GOROOT/src/runtime/panic.go:890 +0x262
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).recover(0x0?, {0x7f9a090, 0xc0000c0070})
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:230 +0x6a
panic({0x60c3280, 0xc004e7e9e0})
	GOROOT/src/runtime/panic.go:884 +0x212
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Stop(0xc00abcb0e0, {0x7f9a090, 0xc0000c0070})
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:521 +0x19b
panic({0x60c3280, 0xc004e7e9e0})
	GOROOT/src/runtime/panic.go:890 +0x262
github.com/cockroachdb/cockroach/pkg/storage.(*noUseAfterClose).ensureStillOpen(...)
	github.com/cockroachdb/cockroach/pkg/storage/pebble.go:2787
github.com/cockroachdb/cockroach/pkg/storage.(*noUseAfterClose).Open(0x56200a?, {0xc008fa4ba0?, 0x1d?}, {0x0?, 0x7eefbfc?, 0xc0069c2c01?})
	github.com/cockroachdb/cockroach/pkg/storage/pebble.go:2856 +0x116
github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestCheckConsistencyInconsistent(0xc0053ea1a0)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/consistency_queue_test.go:416 +0x165f
testing.tRunner(0xc0053ea1a0, 0x6b2a7c0)
	GOROOT/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
	GOROOT/src/testing/testing.go:1493 +0x35f

Jira issue: CRDB-31358

@yuzefovich yuzefovich added the C-test-failure Broken test (automatically or manually discovered). label Sep 8, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Sep 8, 2023
@erikgrinaker erikgrinaker added T-kv-replication and removed T-kv KV Team labels Sep 11, 2023
@blathers-crl
Copy link

blathers-crl bot commented Sep 11, 2023

cc @cockroachdb/replication

@blathers-crl
Copy link

blathers-crl bot commented Sep 13, 2023

Hi @erikgrinaker, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@erikgrinaker erikgrinaker added the branch-master Failures and bugs on the master branch. label Sep 13, 2023
craig bot pushed a commit that referenced this issue Sep 14, 2023
106195: storage: ensure no filesystem use after close in test builds r=jbowens a=jbowens

In test builds, panic if any of the VFS filesystem facilities are used after the Engine is closed.

Epic: none
Informs #106119.
Resolves #110293.
Release note: None

110576: kvserver: add kill switch for scheduled rangefeed processor r=erikgrinaker a=aliher1911

This commit adds env var `COCKROACH_RANGEFEED_DISABLE_SCHEDULER` which is preventing creation of scheduler based rangefeed processors forcing replicas to always create legacy processors.
This feature allows nodes to be recovered if scheduler based processor failes to process requests.

Epic: none
Fixes: #110559

Release note: None

110625: sql: fix cleanup of not exhausted pausable portals in some cases r=yuzefovich a=yuzefovich

**execinfra: relax RowSource.ConsumerClosed contract**

This commit adjusts RowSource.ConsumerClosed contract so that it's
possible for this method to be called multiple times. Previously, vast
majority of implementations already supported that (and some actually
assumed that), but there were a couple that would result in a panic or
an error if called multiple times - those have been relaxed.

This is needed for the follow-up commit which will introduce an
unconditional call of this method on the "head" processor of the flow
(needed for pausable portals execution model).

Release note: None

**sql: fix cleanup of not exhausted pausable portals in some cases**

Previously, if we executed a flow (either vectorized or row-based) that
contains a row-by-row processor in "pausable portal" model and didn't
fully exhaust that flow (meaning that it could still produce more rows),
this could result in an error when closing the portal because that
processor might not be closed properly. Vectorized operators aren't
affected by this, but row-by-row processors effectively require
`ConsumerClosed` method to be called on them, and this might not happen.
In particular, it was assumed that this method would get called by
`execinfra.Run` loop, but for pausable portal model we needed to adjust
it so that the loop could be exited (and then re-entered) when switching
to another portal; thus, we lost the guarantee that `ConsumerClosed` is
always called.

This commit fixes this problem by teaching both vectorized and row-based
flows to always call `ConsumerClosed` on their "head" processor on
`Cleanup` (which is called when closing the portal). Now, `Cleanup`
calls `ConsumerClosed` unconditionally (when the flow _might_ be running
in the pausable portal mode), and this is safe given the interface
adjustment in the previous commit.

Fixes: #110486

Release note (bug fix): Previously, when evaluating some queries (most
likely with a lookup join) via "multiple active portals" execution
mode (preview feature that can be enabled via
`multiple_active_portals_enabled` session variable) CockroachDB could
encounter an internal error like `unexpected 40960 leftover bytes` if
that portal wasn't fully consumed. This is now fixed.

Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Oleg Afanasyev <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig craig bot closed this as completed in ef71b2f Sep 14, 2023
@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure and removed GA-blocker labels Sep 19, 2023
@github-project-automation github-project-automation bot moved this to Closed in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered).
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants