Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/util/log: flush buffered network sinks on panic #109186

Merged
merged 2 commits into from
Aug 28, 2023

Conversation

abarganier
Copy link
Contributor

@abarganier abarganier commented Aug 21, 2023

Previously, our crash reporter system would flush file log sinks
as part of the process to handle a panic.

This was an incomplete process, since buffered network sinks were
not included in part of this flush process. This means that many
times, panic logs would not make it to the network target, leading
to a loss in observability.

This patch introduces log.FlushAllSync(), which flushes both file
and buffered network log sinks. It then updates the crash reporter
to call into this, instead of just flushing file log sinks.

FlushAllSync() contains timeout logic to prevent the process from
completing if one of the underlying child sinks that a bufferedSink
wraps becomes unavailable/hangs on its output() call.

We originally attempted to fix this in #101562, but a bug in the
bufferedSink code led us to roll back those changes. The bug in the
bufferedSink code has since been fixed (#108928), so we can safely
introduce this logic again.

Release note: none

Fixes: #106345

@abarganier abarganier requested review from knz, dhartunian and a team August 21, 2023 21:29
@abarganier abarganier requested review from a team as code owners August 21, 2023 21:29
@abarganier abarganier requested a review from a team August 21, 2023 21:29
@abarganier abarganier requested review from a team as code owners August 21, 2023 21:29
@abarganier abarganier requested review from rhu713, DrewKimball and jayshrivastava and removed request for a team August 21, 2023 21:29
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@abarganier abarganier removed request for a team August 21, 2023 21:30
@abarganier
Copy link
Contributor Author

abarganier commented Aug 21, 2023

NB: Please ignore the 1st & 2nd commits, which are being reviewed separately here: #108928

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit r1 LGTM

r2 LGTM but the two calls in cli/start.go should remain "flush all" not just "flush files", otherwise we'd lose the last entries stored in the buffered sink and not yet flushed upon hard server shutdowns.

Reviewed 4 of 4 files at r1, 39 of 39 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)


pkg/cli/start.go line 732 at r2 (raw file):

	go func() {
		// Ensure that the log files see the startup messages immediately.
		defer log.FlushFiles()

This one and the one below should become "Flush all".


pkg/util/log/buffered_sink.go line 227 at r1 (raw file):

		err := bs.mu.buf.appendMsg(msg)
		if err != nil {
			return err

Consider calling putBuffer here on the error return (it was missing in the original code).

@@ -2126,7 +2126,7 @@ func TestChangefeedSchemaChangeBackfillCheckpoint(t *testing.T) {

cdcTestWithSystem(t, testFn, feedTestEnterpriseSinks)

log.Flush()
log.FlushFiles()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it looks like we do this a lot in tests.

log.FlushFiles()
log.FetchEntriesFromFiles(...)

It would be nice if you didn't have to worry about flushing in tests. Maybe this could be one function call to a test helper FlushAndFetchEntriesFromFiles. I don't feel too strongly about this though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could just roll it into the existing FetchEntriesFromFiles without renaming it?

Presumably, if you want to fetch an entry from a file, you want to flush the thing first 😅

@abarganier abarganier force-pushed the flush-network-sinks branch from cc24177 to a38ba67 Compare August 22, 2023 17:12
Copy link
Contributor Author

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, RFAL!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @jayshrivastava, and @knz)


pkg/cli/start.go line 732 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

This one and the one below should become "Flush all".

Thank you for pointing this out! Done.


pkg/util/log/buffered_sink.go line 227 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Consider calling putBuffer here on the error return (it was missing in the original code).

Sorry for the confusion, please review this 1st & 2nd commits here 🙂 #108928 - I wanted to keep discussions separate since the bufferedSink changes are especially sensitive.

@abarganier abarganier requested a review from knz August 22, 2023 17:14
@abarganier abarganier force-pushed the flush-network-sinks branch 3 times, most recently from eff3a52 to 1f93941 Compare August 22, 2023 18:41
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything after the 1st commit LGTM

Reviewed 1 of 1 files at r6, 35 of 39 files at r7, 5 of 5 files at r8, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @jayshrivastava)

Currently, the `log.Flush` function's name is misleading. Since
it was originally introduced, we've added additional types of
logging sinks (e.g. fluent-server and http-server network sinks).
Therefore, `log.Flush` didn't actually perform a full flush of
all sinks active in the logger.

We're going to introduce a way to do so, but first, we need to
rename this function to be more descriptive. This patch does so.

Release note: none
@abarganier abarganier force-pushed the flush-network-sinks branch from 1f93941 to a7d48a9 Compare August 25, 2023 14:37
Previously, our crash reporter system would flush file log sinks
as part of the process to handle a panic.

This was an incomplete process, since buffered network sinks were
not included in part of this flush process. This means that many
times, panic logs would not make it to the network target, leading
to a loss in observability.

This patch introduces `log.FlushAllSync()`, which flushes both file
and buffered network log sinks. It then updates the crash reporter
to call into this, instead of just flushing file log sinks.

`FlushAllSync()` contains timeout logic to prevent the process from
completing if one of the underlying child sinks that a bufferedSink
wraps becomes unavailable/hangs on its `output()` call.

Release note: none
@abarganier abarganier force-pushed the flush-network-sinks branch from a7d48a9 to f113fb4 Compare August 25, 2023 16:42
@abarganier
Copy link
Contributor Author

TFTR!

bors r=knz

@craig
Copy link
Contributor

craig bot commented Aug 28, 2023

This PR was included in a batch that timed out, it will be automatically retried

@abarganier
Copy link
Contributor Author

bors retry

@craig
Copy link
Contributor

craig bot commented Aug 28, 2023

Already running a review

@craig
Copy link
Contributor

craig bot commented Aug 28, 2023

Build succeeded:

@craig craig bot merged commit 6a31120 into cockroachdb:master Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pkg/util/log: flush buffered log sinks on shutdown/panic
4 participants