Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: schemachange/during/tpcc failed [possible disk slowness] #112303

Closed
cockroach-teamcity opened this issue Oct 13, 2023 · 4 comments
Closed
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Oct 13, 2023

roachtest.schemachange/during/tpcc failed with artifacts on release-23.1 @ 1d5f07d043035d14735c328a59384cd96cb47b09:

(monitor.go:153).Wait: monitor failure: unexpected node event: n4: cockroach process died (exit code 7)
test artifacts and logs in: /artifacts/schemachange/during/tpcc/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-32345

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Oct 13, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Oct 13, 2023
@rafiss
Copy link
Collaborator

rafiss commented Oct 13, 2023

seems like disk issues, will move to Storage for them to confirm.

W231013 13:25:32.708118 424 kv/kvserver/liveness/liveness.go:908 ⋮ [T1,n4,liveness-hb] 5721  slow heartbeat took 2.919299815s; err=disk write failed while updating node liveness: interrupted during singleflight ‹engine sync:0›: context canceled

E231013 13:25:32.718537 81916916 kv/kvserver/replica_consistency.go:722 ⋮ [T1,n4,s4,r232/4:‹/Table/113/1/{34/806…-42/730…}›] 5846  checksum computation failed: context canceled
F231013 13:25:32.716388 82038343 storage/pebble.go:1268 ⋮ [T1,n4] 5847  file write stall detected: disk slowness detected: syncto on file 074677.sst (10518528 bytes) has been ongoing for 21.4s

@rafiss rafiss added T-storage Storage Team and removed T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Oct 13, 2023
@blathers-crl blathers-crl bot added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Oct 13, 2023
@rafiss rafiss changed the title roachtest: schemachange/during/tpcc failed roachtest: schemachange/during/tpcc failed [possible disk slowness] Oct 13, 2023
@cockroach-teamcity
Copy link
Member Author

roachtest.schemachange/during/tpcc failed with artifacts on release-23.1 @ 366fe013b4b63bef98c953d03b302dc5e0b13ee7:

(monitor.go:153).Wait: monitor failure: unexpected node event: n4: cockroach process died (exit code 7)
test artifacts and logs in: /artifacts/schemachange/during/tpcc/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

@jbowens
Copy link
Collaborator

jbowens commented Oct 16, 2023

goroutine 82042397 [syscall]:
syscall.Syscall6(0x1000?, 0x1000?, 0x0?, 0xbd30000?, 0x474?, 0x56790000?, 0xc02a236318?)
	GOROOT/src/syscall/syscall_linux.go:90 +0x36
golang.org/x/sys/unix.SyncFileRange(0x4f5cd4?, 0x8000000000000000?, 0x658ac690fde?, 0x0?)
	golang.org/x/sys/unix/external/org_golang_x_sys/unix/zsyscall_linux_amd64.go:374 +0x3f
github.com/cockroachdb/pebble/vfs.(*linuxFile).SyncTo(0xb4e2380?, 0xc1426f2dca131286?)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/default_linux_noarm.go:101 +0x65
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).SyncTo.func1()
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:265 +0x3b
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).timeDiskOp(0xc0f786df90, 0x4, 0x180000, 0xc00cb20398)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:298 +0x12f
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).SyncTo(0x1000?, 0xc00cb20398?)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:264 +0x71
github.com/cockroachdb/pebble/vfs.(*enospcFile).SyncTo(0x11def82?, 0x126b000052192f9e?)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_full.go:417 +0x22
github.com/cockroachdb/pebble/vfs.(*syncingFile).maybeSync(0xc10305e410)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/syncing_file.go:150 +0x134
github.com/cockroachdb/pebble/vfs.(*syncingFile).Write(0xc10305e410, {0xc0d52b5000, 0x1000, 0x1000})
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/syncing_file.go:72 +0xff

Yeah, first one is a disk stall. The relevant syscall is there in the panic.

Same with the second one: the relevant SyncData syscall is there:

goroutine 120550167 [syscall]:
syscall.Syscall(0x5d000021?, 0xc01b619de8?, 0xc01b619e88?, 0x11dfbaa?)
	GOROOT/src/syscall/syscall_linux.go:68 +0x27
golang.org/x/sys/unix.Fdatasync(0x88e46f4f687?)
	golang.org/x/sys/unix/external/org_golang_x_sys/unix/zsyscall_linux.go:733 +0x30
github.com/cockroachdb/pebble/vfs.(*linuxFile).SyncData(0x88e46f4f687?)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/default_linux_noarm.go:65 +0x1d
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).SyncData.func1()
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:257 +0x2e
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).timeDiskOp(0xc071a9d6d0, 0x3, 0x0, 0xc01b619f28)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:298 +0x12f
github.com/cockroachdb/pebble/vfs.(*diskHealthCheckingFile).SyncData(0x1000?)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_health.go:256 +0x55
github.com/cockroachdb/pebble/vfs.(*enospcFile).SyncData(0xc05069b368?)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/disk_full.go:413 +0x22
github.com/cockroachdb/pebble/vfs.(*syncingFile).Sync(0xc0fff0a5c0?)
	github.com/cockroachdb/pebble/vfs/external/com_github_cockroachdb_pebble/vfs/syncing_file.go:113 +0x4b
github.com/cockroachdb/pebble/objstorage.(*fileBufferedWritable).Finish(0xc05cce6a20)
	github.com/cockroachdb/pebble/objstorage/external/com_github_cockroachdb_pebble/objstorage/vfs_writable.go:44 +0x42
github.com/cockroachdb/pebble/sstable.(*Writer).Close(0xc05069aa80)
	github.com/cockroachdb/pebble/sstable/external/com_github_cockroachdb_pebble/sstable/writer.go:2003 +0x1664
github.com/cockroachdb/pebble.(*DB).runCompaction.func6({0xc06b105680, 0x14, 0x18?})
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2931 +0x3b9
github.com/cockroachdb/pebble.(*DB).runCompaction(0xc000652500, 0xe695, 0xc02fb9e000)
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:3214 +0x2348
github.com/cockroachdb/pebble.(*DB).compact1(0xc000652500, 0xc02fb9e000, 0x0)
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2536 +0x1a5
github.com/cockroachdb/pebble.(*DB).compact.func1({0x7361630, 0xc13e7ed290})
	github.com/cockroachdb/pebble/external/com_github_cockroachdb_pebble/compaction.go:2507 +0xad

@jbowens jbowens closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2023
@jbowens jbowens added X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Oct 16, 2023
@exalate-issue-sync exalate-issue-sync bot added release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. and removed X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue labels Oct 16, 2023
@jbowens jbowens added X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Oct 16, 2023
@jbowens
Copy link
Collaborator

jbowens commented Oct 20, 2023

Linking to #97968.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Projects
Archived in project
Development

No branches or pull requests

3 participants