Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log: indicate which sink is the cause of an error when a logging error is reporte #72461

Closed
knz opened this issue Nov 4, 2021 · 0 comments
Closed
Labels
A-error-handling Error messages, error propagations/annotations A-logging In and around the logging infrastructure. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@knz
Copy link
Contributor

knz commented Nov 4, 2021

This is follow up work to #70330.

Currently in log/flags.go we see the following:

  errCallbackλ(err error) {
    // TODO(knz): explain which sink is encountering the error in the
    // error message.
    Ops.Errorf(context.Background(), "logging error: %v", err)
  }
  if s.criticality {
    // TODO(knz): explain which sink is encountering the error in the
    // error message.
    errCallback = λ(err error) {
      Ops.Errorf(context.Background(), "logging error: %v", err)

     ...

The logging error message does not include detail about which sink is causing the error, which makes troubleshooting difficult.

We want to annotate some information about the sink in the error report.

Jira issue: CRDB-11169

@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-logging In and around the logging infrastructure. A-error-handling Error messages, error propagations/annotations labels Nov 4, 2021
knz added a commit to rauchenstein/cockroach that referenced this issue Nov 4, 2021
knz added a commit to rauchenstein/cockroach that referenced this issue Nov 4, 2021
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jun 3, 2022
This is a rewrite of bufferSink (now called bufferedSink). As opposed to
before, the code is smaller, more efficient, and, most importantly,
simpler. I couldn't understand the previous code well, and I think it
made new additions to it (in particular, adding a shutdown sequence)
hard to analyze. The semantics of this buffer are now better and simpler.

Before, a log message was traveling through a sequence of two buffers
before being eventually delivered to the wrapped sink:
1. The message was first placed into an accumulator buffer.
2. Then, when that buffer grows too old or too large, the accumulator
   buffer is placed into a queue to be flushed later, and the
   accumulator was reset.
3. There was a limit on the size of this queue of batches to
   be flushed. When the queue was full, new messages were dropped.
There were two goroutines managing these two buffers, which is one too
many.

This behavior of queuing multiple, distinct flushes, was peculiar at
best. There's generally no reason to not coalesce all the flushes into
one, thus speeding them up and reducing the risk of message drops. The
bufferSink's memory limit was not explicit, and difficult enough to
infer: it was set indirectly from the combination of limits on the two
buffers, expressed in different units.

Now, the sequence of two buffers (the accumulator and the flush queue)
is gone. There's a single buffer accumulating new messages, plus at most
one in-flight flush which captures its buffer. As soon as a flush is
triggered, the buffer is reset. The buffer is now configured with a
memory limit and, when that limit is reached, the _oldest_ messages in
the buffer are dropped, instead of the newest. This, combined with the
blocking semantics of the forceSync option, means that a forceSync
message is never dropped (unless its size is, by itself, larger than the
buffer limit). The number of goroutines running for the bufferSink drops
from two to one.

Fixes cockroachdb#72461 by including the type name of the failed sink in error
messages. I think that should be good enough.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jun 7, 2022
This is a rewrite of bufferSink (now called bufferedSink). As opposed to
before, the code is smaller, more efficient, and, most importantly,
simpler. I couldn't understand the previous code well, and I think it
made new additions to it (in particular, adding a shutdown sequence)
hard to analyze. The semantics of this buffer are now better and simpler.

Before, a log message was traveling through a sequence of two buffers
before being eventually delivered to the wrapped sink:
1. The message was first placed into an accumulator buffer.
2. Then, when that buffer grows too old or too large, the accumulator
   buffer is placed into a queue to be flushed later, and the
   accumulator was reset.
3. There was a limit on the size of this queue of batches to
   be flushed. When the queue was full, new messages were dropped.
There were two goroutines managing these two buffers, which is one too
many.

This behavior of queuing multiple, distinct flushes, was peculiar at
best. There's generally no reason to not coalesce all the flushes into
one, thus speeding them up and reducing the risk of message drops. The
bufferSink's memory limit was not explicit, and difficult enough to
infer: it was set indirectly from the combination of limits on the two
buffers, expressed in different units.

Now, the sequence of two buffers (the accumulator and the flush queue)
is gone. There's a single buffer accumulating new messages, plus at most
one in-flight flush which captures its buffer. As soon as a flush is
triggered, the buffer is reset. The buffer is now configured with a
memory limit and, when that limit is reached, the _oldest_ messages in
the buffer are dropped, instead of the newest. This, combined with the
blocking semantics of the forceSync option, means that a forceSync
message is never dropped (unless its size is, by itself, larger than the
buffer limit). The number of goroutines running for the bufferSink drops
from two to one.

Fixes cockroachdb#72461 by including the type name of the failed sink in error
messages. I think that should be good enough.

Release note: None
@craig craig bot closed this as completed in 32d749b Jun 7, 2022
abarganier pushed a commit to abarganier/cockroach that referenced this issue Jun 22, 2022
This is a rewrite of bufferSink (now called bufferedSink). As opposed to
before, the code is smaller, more efficient, and, most importantly,
simpler. I couldn't understand the previous code well, and I think it
made new additions to it (in particular, adding a shutdown sequence)
hard to analyze. The semantics of this buffer are now better and simpler.

Before, a log message was traveling through a sequence of two buffers
before being eventually delivered to the wrapped sink:
1. The message was first placed into an accumulator buffer.
2. Then, when that buffer grows too old or too large, the accumulator
   buffer is placed into a queue to be flushed later, and the
   accumulator was reset.
3. There was a limit on the size of this queue of batches to
   be flushed. When the queue was full, new messages were dropped.
There were two goroutines managing these two buffers, which is one too
many.

This behavior of queuing multiple, distinct flushes, was peculiar at
best. There's generally no reason to not coalesce all the flushes into
one, thus speeding them up and reducing the risk of message drops. The
bufferSink's memory limit was not explicit, and difficult enough to
infer: it was set indirectly from the combination of limits on the two
buffers, expressed in different units.

Now, the sequence of two buffers (the accumulator and the flush queue)
is gone. There's a single buffer accumulating new messages, plus at most
one in-flight flush which captures its buffer. As soon as a flush is
triggered, the buffer is reset. The buffer is now configured with a
memory limit and, when that limit is reached, the _oldest_ messages in
the buffer are dropped, instead of the newest. This, combined with the
blocking semantics of the forceSync option, means that a forceSync
message is never dropped (unless its size is, by itself, larger than the
buffer limit). The number of goroutines running for the bufferSink drops
from two to one.

Fixes cockroachdb#72461 by including the type name of the failed sink in error
messages. I think that should be good enough.

Release note: None
abarganier pushed a commit to abarganier/cockroach that referenced this issue Jun 22, 2022
This is a rewrite of bufferSink (now called bufferedSink). As opposed to
before, the code is smaller, more efficient, and, most importantly,
simpler. I couldn't understand the previous code well, and I think it
made new additions to it (in particular, adding a shutdown sequence)
hard to analyze. The semantics of this buffer are now better and simpler.

Before, a log message was traveling through a sequence of two buffers
before being eventually delivered to the wrapped sink:
1. The message was first placed into an accumulator buffer.
2. Then, when that buffer grows too old or too large, the accumulator
   buffer is placed into a queue to be flushed later, and the
   accumulator was reset.
3. There was a limit on the size of this queue of batches to
   be flushed. When the queue was full, new messages were dropped.
There were two goroutines managing these two buffers, which is one too
many.

This behavior of queuing multiple, distinct flushes, was peculiar at
best. There's generally no reason to not coalesce all the flushes into
one, thus speeding them up and reducing the risk of message drops. The
bufferSink's memory limit was not explicit, and difficult enough to
infer: it was set indirectly from the combination of limits on the two
buffers, expressed in different units.

Now, the sequence of two buffers (the accumulator and the flush queue)
is gone. There's a single buffer accumulating new messages, plus at most
one in-flight flush which captures its buffer. As soon as a flush is
triggered, the buffer is reset. The buffer is now configured with a
memory limit and, when that limit is reached, the _oldest_ messages in
the buffer are dropped, instead of the newest. This, combined with the
blocking semantics of the forceSync option, means that a forceSync
message is never dropped (unless its size is, by itself, larger than the
buffer limit). The number of goroutines running for the bufferSink drops
from two to one.

Fixes cockroachdb#72461 by including the type name of the failed sink in error
messages. I think that should be good enough.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-error-handling Error messages, error propagations/annotations A-logging In and around the logging infrastructure. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

1 participant