log: indicate which sink is the cause of an error when a logging error is reporte #72461

knz · 2021-11-04T21:02:44Z

This is follow up work to #70330.

Currently in log/flags.go we see the following:

  errCallback ≔ λ(err error) {
    // TODO(knz): explain which sink is encountering the error in the
    // error message.
    Ops.Errorf(context.Background(), "logging error: %v", err)
  }
  if s.criticality {
    // TODO(knz): explain which sink is encountering the error in the
    // error message.
    errCallback = λ(err error) {
      Ops.Errorf(context.Background(), "logging error: %v", err)

     ...

The logging error message does not include detail about which sink is causing the error, which makes troubleshooting difficult.

We want to annotate some information about the sink in the error report.

Jira issue: CRDB-11169

The text was updated successfully, but these errors were encountered:

Release note: None

This is a rewrite of bufferSink (now called bufferedSink). As opposed to before, the code is smaller, more efficient, and, most importantly, simpler. I couldn't understand the previous code well, and I think it made new additions to it (in particular, adding a shutdown sequence) hard to analyze. The semantics of this buffer are now better and simpler. Before, a log message was traveling through a sequence of two buffers before being eventually delivered to the wrapped sink: 1. The message was first placed into an accumulator buffer. 2. Then, when that buffer grows too old or too large, the accumulator buffer is placed into a queue to be flushed later, and the accumulator was reset. 3. There was a limit on the size of this queue of batches to be flushed. When the queue was full, new messages were dropped. There were two goroutines managing these two buffers, which is one too many. This behavior of queuing multiple, distinct flushes, was peculiar at best. There's generally no reason to not coalesce all the flushes into one, thus speeding them up and reducing the risk of message drops. The bufferSink's memory limit was not explicit, and difficult enough to infer: it was set indirectly from the combination of limits on the two buffers, expressed in different units. Now, the sequence of two buffers (the accumulator and the flush queue) is gone. There's a single buffer accumulating new messages, plus at most one in-flight flush which captures its buffer. As soon as a flush is triggered, the buffer is reset. The buffer is now configured with a memory limit and, when that limit is reached, the _oldest_ messages in the buffer are dropped, instead of the newest. This, combined with the blocking semantics of the forceSync option, means that a forceSync message is never dropped (unless its size is, by itself, larger than the buffer limit). The number of goroutines running for the bufferSink drops from two to one. Fixes cockroachdb#72461 by including the type name of the failed sink in error messages. I think that should be good enough. Release note: None

knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-logging In and around the logging infrastructure. A-error-handling Error messages, error propagations/annotations labels Nov 4, 2021

knz added a commit to rauchenstein/cockroach that referenced this issue Nov 4, 2021

log: link issue cockroachdb#72461 where relevant

bab2fa9

Release note: None

knz mentioned this issue Nov 4, 2021

util/log: add buffer sink decorator #70330

Merged

knz added a commit to rauchenstein/cockroach that referenced this issue Nov 4, 2021

log: link issue cockroachdb#72461 where relevant

2752c92

Release note: None

craig bot closed this as completed in 32d749b Jun 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log: indicate which sink is the cause of an error when a logging error is reporte #72461

log: indicate which sink is the cause of an error when a logging error is reporte #72461

knz commented Nov 4, 2021 •

edited by cockroach-jira-scripts

Loading

log: indicate which sink is the cause of an error when a logging error is reporte #72461

log: indicate which sink is the cause of an error when a logging error is reporte #72461

Comments

knz commented Nov 4, 2021 • edited by cockroach-jira-scripts Loading

knz commented Nov 4, 2021 •

edited by cockroach-jira-scripts

Loading