-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/util/log: flush buffered network sinks on panic #109186
Conversation
NB: Please ignore the 1st & 2nd commits, which are being reviewed separately here: #108928 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit r1 LGTM
r2 LGTM but the two calls in cli/start.go should remain "flush all" not just "flush files", otherwise we'd lose the last entries stored in the buffered sink and not yet flushed upon hard server shutdowns.
Reviewed 4 of 4 files at r1, 39 of 39 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @dhartunian)
pkg/cli/start.go
line 732 at r2 (raw file):
go func() { // Ensure that the log files see the startup messages immediately. defer log.FlushFiles()
This one and the one below should become "Flush all".
pkg/util/log/buffered_sink.go
line 227 at r1 (raw file):
err := bs.mu.buf.appendMsg(msg) if err != nil { return err
Consider calling putBuffer
here on the error return (it was missing in the original code).
@@ -2126,7 +2126,7 @@ func TestChangefeedSchemaChangeBackfillCheckpoint(t *testing.T) { | |||
|
|||
cdcTestWithSystem(t, testFn, feedTestEnterpriseSinks) | |||
|
|||
log.Flush() | |||
log.FlushFiles() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it looks like we do this a lot in tests.
log.FlushFiles()
log.FetchEntriesFromFiles(...)
It would be nice if you didn't have to worry about flushing in tests. Maybe this could be one function call to a test helper FlushAndFetchEntriesFromFiles
. I don't feel too strongly about this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could just roll it into the existing FetchEntriesFromFiles
without renaming it?
Presumably, if you want to fetch an entry from a file, you want to flush the thing first 😅
cc24177
to
a38ba67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, RFAL!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @jayshrivastava, and @knz)
pkg/cli/start.go
line 732 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
This one and the one below should become "Flush all".
Thank you for pointing this out! Done.
pkg/util/log/buffered_sink.go
line 227 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Consider calling
putBuffer
here on the error return (it was missing in the original code).
Sorry for the confusion, please review this 1st & 2nd commits here 🙂 #108928 - I wanted to keep discussions separate since the bufferedSink changes are especially sensitive.
eff3a52
to
1f93941
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything after the 1st commit LGTM
Reviewed 1 of 1 files at r6, 35 of 39 files at r7, 5 of 5 files at r8, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @jayshrivastava)
Currently, the `log.Flush` function's name is misleading. Since it was originally introduced, we've added additional types of logging sinks (e.g. fluent-server and http-server network sinks). Therefore, `log.Flush` didn't actually perform a full flush of all sinks active in the logger. We're going to introduce a way to do so, but first, we need to rename this function to be more descriptive. This patch does so. Release note: none
1f93941
to
a7d48a9
Compare
Previously, our crash reporter system would flush file log sinks as part of the process to handle a panic. This was an incomplete process, since buffered network sinks were not included in part of this flush process. This means that many times, panic logs would not make it to the network target, leading to a loss in observability. This patch introduces `log.FlushAllSync()`, which flushes both file and buffered network log sinks. It then updates the crash reporter to call into this, instead of just flushing file log sinks. `FlushAllSync()` contains timeout logic to prevent the process from completing if one of the underlying child sinks that a bufferedSink wraps becomes unavailable/hangs on its `output()` call. Release note: none
a7d48a9
to
f113fb4
Compare
TFTR! bors r=knz |
This PR was included in a batch that timed out, it will be automatically retried |
bors retry |
Already running a review |
Build succeeded: |
Previously, our crash reporter system would flush file log sinks
as part of the process to handle a panic.
This was an incomplete process, since buffered network sinks were
not included in part of this flush process. This means that many
times, panic logs would not make it to the network target, leading
to a loss in observability.
This patch introduces
log.FlushAllSync()
, which flushes both fileand buffered network log sinks. It then updates the crash reporter
to call into this, instead of just flushing file log sinks.
FlushAllSync()
contains timeout logic to prevent the process fromcompleting if one of the underlying child sinks that a bufferedSink
wraps becomes unavailable/hangs on its
output()
call.We originally attempted to fix this in #101562, but a bug in the
bufferedSink code led us to roll back those changes. The bug in the
bufferedSink code has since been fixed (#108928), so we can safely
introduce this logic again.
Release note: none
Fixes: #106345