panic crash while on the database pages #83935

maryliag · 2022-07-06T20:16:48Z

Seeing a panic crash, (other people mentioned they saw on other occasions, but these are the steps I was able to reproduce):

Created a build with make build
Started CRDB with ./cockroach demo --insecure --multitenant=false
Start the console with make ui-watch TARGET=http://localhost:8080/
Open the db console on one of the database list of tables, e.g. http://localhost:3000/#/database/system

Wait for awhile (sometime it took me a few minutes, sometimes 30min, sometimes doesn't happen at all) and it will crash and the trace shows on the terminal

Trace:

# Server version: CockroachDB CCL v22.2.0-alpha.00000000-1090-g4dc922688e (x86_64-apple-darwin21.5.0, built 2022/07/05 21:13:24, go1.17.2) (same version as client)
# Cluster ID: 02f64d33-30bc-468c-967a-8884de0ff2ba
# Organization: Cockroach Demo
#
# Enter \? for a brief introduction.
#
[email protected]:26257/movr> panic: kvfetcher-0-unlimited-1: no bytes in account to release, current 0, free 82 [recovered]
	panic: kvfetcher-0-unlimited-1: no bytes in account to release, current 0, free 82 [recovered]
	panic: kvfetcher-0-unlimited-1: no bytes in account to release, current 0, free 82

goroutine 101639 [running]:
github.com/cockroachdb/cockroach/pkg/sql/colexecerror.CatchVectorizedRuntimeError.func1()
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:58 +0x3bb
panic({0x86dc4e0, 0xc00cb04f00})
	/usr/local/opt/go/libexec/src/runtime/panic.go:1038 +0x215
github.com/cockroachdb/cockroach/pkg/sql/colexecerror.CatchVectorizedRuntimeError.func1()
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:58 +0x3bb
panic({0x86dc4e0, 0xc00cb04f00})
	/usr/local/opt/go/libexec/src/runtime/panic.go:1038 +0x215
github.com/cockroachdb/cockroach/pkg/util/log/logcrash.ReportOrPanic({0xb5560e8, 0xc00e45c3c0}, 0xc001108a80, {0x8bbc4a0, 0xc00cb07f00}, {0xc00cd1d, 0x5, 0x5})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/util/log/logcrash/crash_reporting.go:378 +0x1c5
github.com/cockroachdb/cockroach/pkg/util/mon.(*BoundAccount).Shrink(0xc00e45c270, {0xb5560e8, 0xc00e45c3c0}, 0x52)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/util/mon/bytes_usage.go:715 +0x1d5
github.com/cockroachdb/cockroach/pkg/sql/row.(*txnKVFetcher).reset(0xc00cb7c300, {0xb5560e8, 0xc00e45c3c0})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kv_batch_fetcher.go:648 +0x125
github.com/cockroachdb/cockroach/pkg/sql/row.(*txnKVFetcher).close(0x400e6bd, {0xb5560e8, 0xc00e45c3c0})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kv_batch_fetcher.go:654 +0x25
github.com/cockroachdb/cockroach/pkg/sql/row.(*KVFetcher).Close(...)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kv_fetcher.go:300
github.com/cockroachdb/cockroach/pkg/sql/colfetcher.(*cFetcher).Close(0xc00ba2c000, {0xb5560e8, 0xc00e45c3c0})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colfetcher/cfetcher.go:1330 +0x6d
github.com/cockroachdb/cockroach/pkg/sql/colfetcher.(*ColBatchScan).Close(0xc00b330d20, {0x8a82b1a, 0x8a8183c})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colfetcher/colbatch_scan.go:298 +0x4b
github.com/cockroachdb/cockroach/pkg/sql/colexecop.Closers.CloseAndLogOnErr.func1()
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:177 +0xae
github.com/cockroachdb/cockroach/pkg/sql/colexecerror.CatchVectorizedRuntimeError(0xe62b7f8)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:91 +0x62
github.com/cockroachdb/cockroach/pkg/sql/colexecop.Closers.CloseAndLogOnErr({0xc00cb07e90, 0xb4b50e0, 0xc00f5e42e0}, {0xb556040, 0xc00cb49800}, {0x8a9fa10, 0xc})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:175 +0xcd
github.com/cockroachdb/cockroach/pkg/sql/colexec.(*Materializer).close(0xc00cba4960)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/materializer.go:324 +0x98
github.com/cockroachdb/cockroach/pkg/sql/colexec.newMaterializerInternal.func1()
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/materializer.go:210 +0x1d
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBaseNoHelper).moveToTrailingMeta(0xc00cba4960)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:682 +0x364
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBaseNoHelper).DrainHelper(0xc00cba4960)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:561 +0x189
github.com/cockroachdb/cockroach/pkg/sql/colexec.(*Materializer).Next(0xc00cba4960)
	/Users/maryliag/go/src/github.com/cockroachdb/coc:311 +0xa5
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBaseNoHelper).DrainHelper(0xc003de2240)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:557 +0x11c
github.com/cockroachdb/cockroach/pkg/sql/colflow.(*FlowCoordinator).next(0x0)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/flow_coordinator.go:143 +0x92
github.com/cockroachdb/cockroach/pkg/sql/colflow.(*FlowCoordinator).nextAdapter(...)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/flow_coordinator.go:147
github.com/cockroachdb/cockroach/pkg/sql/colexecerror.CatchVectorizedRuntimeError(0x0)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:91 +0x62
github.com/cockroachdb/cockroach/pkg/sql/colflow.(*FlowCoordinator).Next(0xc003de2240)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/flow_coordinator.go:152 +0x3e
github.com/cockroachdb/cockroach/pkg/sql/execinfra.DrainAndForwardMetadata({0xb556040, 0xc00cb49800}, {0xb57f4d8, 0xc003de2240}, {0xb4cfc50, 0xc004348700})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/base.go:220 +0x75
github.com/cockroachdb/cockroach/pkg/sql/execinfra.Run({0xb556040, 0xc00cb49800}, {0xb57f4d8, 0xc003de2240}, {0xb4cfc50, 0xc004348700})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/base.go:193 +0xe5
github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBaseNoHelper).Run(0xc003de2240, {0xb556040, 0xc00cb49800})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:722 +0x5b
github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run(0xc00cba4780, {0xb556040, 0xc00cb49800}, 0xc003de2240)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:475 +0x258
github.com/cockroachdb/cockroach/pkg/sql/colflow.(*vectorizedFlow).Run(0xc00c7182d0, {0xb556040, 0xc00cb49800}, 0xc00cc888c0)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:249 +0x205
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run(0xc001ea3a40, {0xb5560e8, 0xc00e45c0c0}, 0xc00c76b420, 0xc00cc888c0, 0xc00cd02300, 0xc004348700, 0xc00c7182d0, 0x0)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:607 +0xb04
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun(0xb5560e8, {0xb5560e8, 0xc00e45c0c0}, 0xc00c718010, 0xc00c76b420, 0xc00e81bf20, {{0xb557c08, 0xc00cd02280}, 0x0}, 0xc004348700)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:1461 +0x25c
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithDistSQLEngine(0xc00c717900, {0xb5560e8, 0xc00e45c0c0}, 0xc00c718010, 0xc00e45c0c0, {0xb5fdf18, 0xc00e81bf20}, 0x50, 0xc0091a3618)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1485 +0x614
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine(0xc00c717900, {0xb5560e8, 0xc00e81bfb0}, 0xc00c718010, {0xb5fdf18, 0xc00e81bf20})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1159 +0xb87
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState(0xc00c717900, {0xb556040, 0xc00cb495c0}, {{0xb586798, 0xc00c72be50}, {0xc002e1f380, 0x51}, 0x0, 0x1}, 0x0, ...)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:690 +0x2091
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt(0xc00c717900, {0xb556040, 0xc00cb495c0}, {{0xb586798, 0xc00c72be50}, {0xc002e1f380, 0x51}, 0x0, 0x1}, 0x0, ...)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:145 +0x59e
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd.func1({{{0xb586798, 0xc00c72be50}, {0xc002e1f380, 0x51}, 0x0, 0x1}, {0xc0a94c0b66814cc0, 0x366afd67ace, 0x0}, {0xc0a94c0b66814cc0, ...}, ...}, ...)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1892 +0x2f6
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd(0xc00c717900)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1896 +0xb48
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run(0xc00c717900, {0xb5560e8, 0xc00e81bbc0}, 0xc00e81b9e0, {0x0, 0x0, 0x0, 0x0, 0x0}, 0x0)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1818 +0x26c
github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).initConnEx.func1()
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:206 +0xa5
created by github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).initConnEx
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:205 +0x5f1

Jira issue: CRDB-17362

The text was updated successfully, but these errors were encountered:

maryliag · 2022-07-07T00:21:04Z

I was testing again, this time without using -insecure, and it switches between the panic above and another one

[email protected]:26257/movr> panic: session: unexpected 10240 leftover bytes

goroutine 19233 [running]:
github.com/cockroachdb/cockroach/pkg/util/log/logcrash.ReportOrPanic({0xb58f848, 0xc0025fc2a0}, 0xc00110ca80, {0x8b338fa, 0x0}, {0xc0023f1d10, 0x6f4d750, 0x0})
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/util/log/logcrash/crash_reporting.go:378 +0x1c5
github.com/cockroachdb/cockroach/pkg/util/mon.(*BytesMonitor).doStop(0xc003ff8aa0, {0xb58f848, 0xc0025fc2a0}, 0x1)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/util/mon/bytes_usage.go:435 +0x233
github.com/cockroachdb/cockroach/pkg/util/mon.(*BytesMonitor).Stop(...)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/util/mon/bytes_usage.go:415
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).close(0xc0042a5900, {0xb58f848, 0xc0025fc2a0}, 0x2)
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1170 +0x82b
github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).initConnEx.func1()
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:215 +0x137
created by github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).initConnEx
	/Users/maryliag/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:205 +0x5f1

Video of reproducing the panic: https://www.loom.com/share/9a6f143ae950448b851ffd860ddb05d7

yuzefovich · 2022-07-07T00:43:48Z

Hm, I just tried doing the same as in the video with 41228d1 SHA, and it doesn't seem to crash.

Where does the SHA 4dc922688e that the binary was built on come from? I can't seem to find it. Based on the stack trace, I can see it was after #83010 was merged.

maryliag · 2022-07-07T00:48:59Z

The branch I'm using is from this PR #83677 (if it helps). I did a rebase and rebuilt, and then got those panics

yuzefovich · 2022-07-07T04:26:48Z

Thanks, I can repro on your branch.

yuzefovich · 2022-07-07T20:53:51Z

I'm somewhat confident that #83615 is to blame - it somehow exposed some issue with releasing of the prepared statements in some cases, still looking.

yuzefovich · 2022-07-08T01:20:03Z

Alright, I have figured out the cause of the first stack trace and made some progress on the second. I believe it occurs when --max-sql-memory budget is exceeded which is 128MiB by default for demo, and on the database page we run into the memory accounting leak that is fixed by #83678.

I still don't fully understand how we get those "leftover bytes" errors - I'm pretty sure it has to do with the memory account used in PreparedStatement structs, and I found some minor issues, but couldn't get to the bottom of them. Since in the release builds we won't crash and will just release these "leftover bytes", it seems ok to leave it at that.

In short, a couple of small PRs (which I'm about to open) plus #83678 should address this.

yuzefovich · 2022-07-08T05:37:14Z

Alright, I finally figured it out - #84049 will solve this stack trace, even in face of a memory leak.

83597: Colocate auth logging with auth metric for consistency r=rafiss a=ecwall refs #83224 Release note (bug fix): Move connection OK log and metric to same location after auth completes for consistency. This resolves an inconsistency (see linked isssue) in the DB console where the log and metric did not match. 83731: kvserver: acquire replica lease on queue check r=nvanbenschoten a=kvoli This patch adds a check within the replication for when a replica is the raft leader and does not have a valid lease. The necessary conditions are that it is currently the raft leader and that the lease status is expired. This ensures that following a node restart, a replicas with a valid lease will be installed within the replica scanner interval. **single nodes 10k ranges with change** ![image](https://user-images.githubusercontent.com/39606633/176971656-317c38d3-7103-47a0-a18a-d9f29c49baa5.png) **5 node, 3k ranges** *without change* ![image](https://user-images.githubusercontent.com/39606633/177620933-56cfe528-c45c-429f-a4d9-9d3ba90fe8e1.png) *with change* ![image](https://user-images.githubusercontent.com/39606633/177621186-ee467043-47d5-4279-bb69-5478e7ad445a.png) resolves #83444 Release note: None 84044: ui: option to search exact statement on SQL Activity r=maryliag a=maryliag Previously, when doing a search on SQL Activity page, it was returning all statements that contained all terms from the search, but not necessarily on the same order. This commit adds an option when you wrap the search in quotes it will only return results with the exact match in order. https://www.loom.com/share/442c6eaee84b4c71a1acdef0b63b74bf Release note (ui change): Ability to search for the exact terms in order when wrapping the search in quotes. 84047: sql: remove unused error return value in a method of connExecutor r=yuzefovich a=yuzefovich Found while looking into #83935. Release note: None 84082: roachtest: skip multitenant/fairness r=cucaroach a=cucaroach Informs: #83994 Release note: None 84085: roachtest: fix zipping of artifacts to include other zips r=srosenberg a=renatolabs When artifacts are zipped in preparation for being published to TeamCity, other zip files are skipped. The idea is that we won't try to recursively zip artifacts.zip itself, or debug.zip, which is published separately. However, some tests (notably, `tpchvec`) download their own zip files in the `logs` directory so that they'll be available for analysis when a test fails. While there was an intention to skip only top-level zip files (as indicated by existing comments), the code itself would skip any zip files found in the artifacts directory. This commit updates the zipping logic to skip only toplevel zip files, allowing tests to write their own zip files to the `logs` directory and have them available for inspection later. Release note: None. Co-authored-by: Evan Wall <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Tommy Reilly <[email protected]> Co-authored-by: Renato Costa <[email protected]>

84048: row: only store the accounted for memory if the reservation is approved r=yuzefovich a=yuzefovich Previously, we would update the counter about the reserved memory before doing the reservation. If that reservation is denied, then later on, in `txnKVFetcher.close` we could try to release more memory than we registered. This is now fixed. Addresses: #83935. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>

maryliag added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jul 6, 2022

maryliag assigned yuzefovich Jul 6, 2022

maryliag added the T-sql-queries SQL Queries Team label Jul 6, 2022

This was referenced Jul 8, 2022

sql: remove unused error return value in a method of connExecutor #84047

Merged

row: only store the accounted for memory if the reservation is approved #84048

Merged

sql: fix memory accounting of prepared statements and portals in error cases #84049

Merged

craig bot closed this as completed in 10853d2 Jul 11, 2022

craig bot closed this as completed in #84049 Jul 11, 2022

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

panic crash while on the database pages #83935

panic crash while on the database pages #83935

maryliag commented Jul 6, 2022 •

edited by cockroach-jira-scripts

Loading

maryliag commented Jul 7, 2022 •

edited

Loading

yuzefovich commented Jul 7, 2022

maryliag commented Jul 7, 2022

yuzefovich commented Jul 7, 2022

yuzefovich commented Jul 7, 2022

yuzefovich commented Jul 8, 2022

yuzefovich commented Jul 8, 2022

panic crash while on the database pages #83935

panic crash while on the database pages #83935

Comments

maryliag commented Jul 6, 2022 • edited by cockroach-jira-scripts Loading

maryliag commented Jul 7, 2022 • edited Loading

yuzefovich commented Jul 7, 2022

maryliag commented Jul 7, 2022

yuzefovich commented Jul 7, 2022

yuzefovich commented Jul 7, 2022

yuzefovich commented Jul 8, 2022

yuzefovich commented Jul 8, 2022

maryliag commented Jul 6, 2022 •

edited by cockroach-jira-scripts

Loading

maryliag commented Jul 7, 2022 •

edited

Loading