vfs: include start time in disk health checker timing stack traces #3009

jbowens · 2023-10-23T19:28:07Z

When a disk stall results in the termination of a process and GOTRACEBACK is set appropriately, the panic includes a dump of stack traces. These stack traces can be used to confirm the presence of a goroutine stuck in the described syscall. However, there's no way to verify that the two syscall invocations are actually the same... The previous syscall could have completed and a new one introduced. We could add the unix timestamp in nanoseconds of the operation's start as a parameter to the function performing the timing. This would allow us to inspect the stack trace and verify that the start time matches the alleged start of the disk stall.

The text was updated successfully, but these errors were encountered:

Pass the start time in the form of nanoseconds since the unix epoch as a parameter to timeDiskOp and timeFilesystemOp. This aids post-mortem debugging when the disk-health checker fatals the process and GOTRACEBACK is set to dump the stacks, including arguments. The start time argument will be printed in hex form, allowing us to decode the start time of the operation. This can be used to confirm that the timed operation was still inflight at the time stacks were collected. Close cockroachdb#3009.

Pass the start time in the form of nanoseconds since the unix epoch as a parameter to timeDiskOp and timeFilesystemOp. This aids post-mortem debugging when the disk-health checker fatals the process and GOTRACEBACK is set to dump the stacks, including arguments. The start time argument will be printed in hex form, allowing us to decode the start time of the operation. This can be used to confirm that the timed operation was still inflight at the time stacks were collected. Close #3009.

jbowens added C-enhancement New feature or request T-storage A-storage labels Oct 23, 2023

kvoli mentioned this issue Oct 23, 2023

roachtest: allocbench/nodes=7/cpu=8/kv/r=0/ops=skew failed cockroachdb/cockroach#112818

Closed

jbowens mentioned this issue Oct 27, 2023

vfs: pass start time in nanos to disk-health timing functions #3020

Merged

jbowens self-assigned this Oct 31, 2023

jbowens closed this as completed in #3020 Oct 31, 2023

jbowens added this to [Deprecated] Storage Jun 4, 2024

jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vfs: include start time in disk health checker timing stack traces #3009

vfs: include start time in disk health checker timing stack traces #3009

jbowens commented Oct 23, 2023

vfs: include start time in disk health checker timing stack traces #3009

vfs: include start time in disk health checker timing stack traces #3009

Comments

jbowens commented Oct 23, 2023