Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: rely on pebble for log message in case of write stall #95436

Merged

Conversation

joshimhoff
Copy link
Collaborator

@joshimhoff joshimhoff commented Jan 18, 2023

Part of #67856. Haven't added additional context re: op type & size of write to pebble.DiskSlowInfo but will do that shortly. With this change, if we make changes to how much context is included in pebble.DiskSlowInfo, we will not need to make a CRDB change to have the context in the CRDB logs. Note also that I will adjust the pebble String method to be less accusatory to disks ("disk slow" -> "write slow").

See below for log message with this commit in.

$ ./dev test pkg/storage -v --filter='TestPebbleMetricEventListener' --show-logs
$ bazel test pkg/storage:all --test_env=GOTRACEBACK=all --test_filter=TestPebbleMetricEventListener --test_arg -test.v --test_arg -show-logs --test_sharding_strategy=disabled --test_output all
Starting local Bazel server and connecting to it...
INFO: Invocation ID: a4365f4b-9190-4ef3-8a54-de3b4c0c46b7
INFO: Analyzed 3 targets (626 packages loaded, 16556 targets configured).
INFO: Found 2 targets and 1 test target...
INFO: From Testing //pkg/storage:storage_test:
==================== Test output for //pkg/storage:storage_test:
initialized metamorphic constant "span-reuse-rate" with value 58
initialized metamorphic constant "mvcc-value-disable-simple-encoding" with value true
initialized metamorphic constant "mvcc-max-iters-before-seek" with value 2
initialized metamorphic constant "span-reuse-rate" with value 39
initialized metamorphic constant "mvcc-value-disable-simple-encoding" with value true
initialized metamorphic constant "mvcc-max-iters-before-seek" with value 0
=== RUN   TestPebbleMetricEventListener
    test_log_scope.go:76: test logs captured to: /private/var/tmp/_bazel_joshimhoff/69f147d3e5e8d9a9bc7bc343361a1a94/sandbox/darwin-sandbox/6/execroot/com_github_cockroachdb_cockroach/_tmp/88d640957a51906868fc787a6e8a38b4/logTestPebbleMetricEventListener440509452
E230118 21:01:15.941881 28 storage/pebble.go:1001  [-] 1  write stall detected: disk slowness detected: write to file  has been ongoing for 70.0s
    pebble_test.go:356: -- test log scope end --
--- PASS: TestPebbleMetricEventListener (0.05s)
PASS
================================================================================
INFO: Elapsed time: 29.194s, Critical Path: 8.66s
INFO: 7 processes: 1 internal, 6 darwin-sandbox.
INFO: Build completed successfully, 7 total actions
//pkg/storage:storage_test                                               PASSED in 0.6s

Executed 1 out of 1 test: 1 test passes.
INFO: Build completed successfully, 7 total actions

storage: rely on pebble for log message in case of write stall

In case of a write stall, CRDB logs out details re: the write stall. By
default, CRDB logs at fatal level & then crashes. Before this commit, tho the
pebble event had a String method, it was not used in the CRDB log message.
This means adding additional context to the pebble event will not add the
context to the CRDB logs, unless a CRDB change is made in addition to a pebble
change. With this commit, the log message calls the pebble event's String
method.

Release note (ops change): CRDB log message in case of write stall has been
adjusted slightly.

@joshimhoff joshimhoff requested a review from a team as a code owner January 18, 2023 16:07
@joshimhoff joshimhoff requested a review from jbowens January 18, 2023 16:07
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@joshimhoff joshimhoff force-pushed the improve_logs_in_case_of_write_stall branch 2 times, most recently from e19c756 to 47c4fce Compare January 18, 2023 21:00
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @joshimhoff)


pkg/storage/pebble.go line 1001 at r1 (raw file):

					log.Fatalf(ctx, "write stall detected: %s", info)
				} else {
					log.Errorf(ctx, "write stall detected: %s", info)

nit: can we use language like "file write stall"? there's a separate stall that's possible which we typically call a "WriteStall" (see up above where we install a WriteStallBegin handler). that other form of "write stall" happens when we've reached the memtable limit are blocked on the flush completing before we'll allow new writes to commit.

@joshimhoff joshimhoff force-pushed the improve_logs_in_case_of_write_stall branch from 47c4fce to 32bf820 Compare January 24, 2023 18:10
@blathers-crl
Copy link

blathers-crl bot commented Jan 24, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@joshimhoff
Copy link
Collaborator Author

Good idea! I've updated the log message as you suggested. Reviewable is being slow for me, so responding here. TTFR!

@joshimhoff
Copy link
Collaborator Author

bors r+

@joshimhoff
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 24, 2023

Already running a review

@joshimhoff joshimhoff force-pushed the improve_logs_in_case_of_write_stall branch from 32bf820 to ca3adf5 Compare January 24, 2023 18:37
@craig
Copy link
Contributor

craig bot commented Jan 24, 2023

Canceled.

@joshimhoff joshimhoff force-pushed the improve_logs_in_case_of_write_stall branch from ca3adf5 to 7b5e7e3 Compare January 24, 2023 18:37
In case of a write stall, CRDB logs out details re: the write stall. By
default, CRDB logs at fatal level & then crashes. Before this commit, tho the
pebble event had a String method, it was not used in the CRDB log message.
This means adding additional context to the pebble event will not add the
context to the CRDB logs, unless a CRDB change is made in addition to a pebble
change. With this commit, the log message calls the pebble event's String
method.

Release note (ops change): CRDB log message in case of write stall has been
adjusted slightly.
@joshimhoff joshimhoff force-pushed the improve_logs_in_case_of_write_stall branch from 7b5e7e3 to c48e3c7 Compare January 24, 2023 19:35
@joshimhoff
Copy link
Collaborator Author

bors r+

@craig craig bot merged commit 64b182c into cockroachdb:master Jan 24, 2023
@craig
Copy link
Contributor

craig bot commented Jan 24, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants