-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: improve node liveness failure log, point to docs #51870
kvserver: improve node liveness failure log, point to docs #51870
Conversation
This commit improves the message logged on node liveness failures, which is logged roughly every 4.5 seconds during liveness unavailability. The improved message describes some of the common causes of liveness unavailability (resource starvation and network connectivity problems) and then links to our troubleshooting docs about the topic. This was an action item coming out of a recent support incident postmortem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, thanks for taking care of this!
@jseldess would you or someone from your team care to give this a quick review? What do you think of this strategy to write more verbose log messages (really, on-line documentation) during common troubleshooting scenarios?
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)
Also, while here, could you confirm that we should be linking to |
This very much LGTM! I love the idea of providing more guidance and links to docs in log messages as well as in error messages. Thanks for writing this, @nvanbenschoten, and for looping me in, @jordanlewis. @nvanbenschoten, I think the link you're using is best and matches what I see done in other parts of the code base. The Beyond the scope of this PR: I think texts like these should ultimately be centralized in a file or files somewhere, or even housed outside of the cockroach repo and fetched as a part of the build process. That would make it much easier for writers to know what and when to review log and error texts, and perhaps there'd even be a way to auto-assign docs reviewers in cases where those are touched. In an ideal world, this would apply to:
|
TFTR!
Got it, that's very helpful to know. And that's even true of section headers like
Centralizing string assets like these seems like a natural progression. I think that's actually a common approach, especially when applications start thinking about internationalization. This would also make it much easier to auto-assign docs reviewers because we could match on the package of the files being changed. bors r+ |
Yep! I'm not sure how thoughtful we've been with header-level redirects, but I'll bring it up in the docs meeting. |
I'm currently work on extracting out our admin ui HTTP endpoints into auto docs. If we want to put this stuff somewhere else it could follow a similar path. In general it's not too much work. Up to you. If users or operators of cockroach would find it useful, ask the dev interfaces team. |
Build failed: |
drive by kicking bors again, since it failed bors r+ |
Build succeeded: |
This commit improves the message logged on node liveness failures, which
is logged roughly every 4.5 seconds during liveness unavailability. The
improved message describes some of the common causes of liveness
unavailability (resource starvation and network connectivity problems)
and then links to our troubleshooting docs about the topic.
This was an action item coming out of a recent support incident postmortem.