Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: log RPC heartbeat errors at error level #93388

Merged
merged 1 commit into from
Dec 13, 2022

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Dec 10, 2022

Currently heartbeat failures are logged at level INFO. This makes it hard to identify them as anomalies when scanning the log files. This commit changes the level to ERROR to make them stand out during troubleshooting.

Epic: none
Release note: None

@erikgrinaker erikgrinaker requested review from tbg and a team December 10, 2022 20:24
@erikgrinaker erikgrinaker self-assigned this Dec 10, 2022
@erikgrinaker erikgrinaker requested a review from a team as a code owner December 10, 2022 20:24
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor

knz commented Dec 11, 2022

Otherwise, these are not visible in the main cockroach.log file.

Wait what? the HEALTH events are saved at severity level INFO by default. This should not be needed.

@erikgrinaker
Copy link
Contributor Author

RPC heartbeat failures due to e.g. network latencies can cause clusters to completely fall apart. Currently, there is no clear indication of this happening at all in the default cockroach.log. These shouldn't be hidden away in the health log at info level.

@knz
Copy link
Contributor

knz commented Dec 11, 2022

I am not objecting to your change, but the PR description as-is is factually incorrect. Try this:

"Currently heartbeat failures are logged at level INFO. This makes it hard to identify them as anomalies when scanning the log files. This commit changes the level to ERROR to make them stand out during troubleshooting."

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Dec 11, 2022

I see, I'll reword the message. Isn't it factually correct though, in that health INFO events aren't logged to cockroach.log, only cockroach-health.log?

Currently heartbeat failures are logged at level INFO. This makes it
hard to identify them as anomalies when scanning the log files. This
commit changes the level to ERROR to make them stand out during
troubleshooting.

Epic: none
Release note: None
@knz
Copy link
Contributor

knz commented Dec 12, 2022

You can move your commit message under "Release note (cli change):" since we document changes to non-DEV logging output.

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Dec 12, 2022

The original change to INFO was done recently in #88625 (which didn't have a release note for this), so this PR essentially reverts to the 22.2 behavior.

@erikgrinaker
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 12, 2022

Build failed:

@erikgrinaker
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 12, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 12, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 13, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 13, 2022

Build failed:

@erikgrinaker
Copy link
Contributor Author

bors retry

@craig
Copy link
Contributor

craig bot commented Dec 13, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Dec 13, 2022

Build succeeded:

@craig craig bot merged commit da0dace into cockroachdb:master Dec 13, 2022
@erikgrinaker erikgrinaker deleted the rpc-log-error branch December 28, 2022 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants