Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
52201: util/log: prevent ReportPanic() from swallowing panics in some cases r=andreimatei a=knz

Accompanies #52200 (but is an independent change)

Prior to this patch, if `ReportPanic()` was called and the panic
object ultimately discarded (ie. caught) during tests that were not otherwise using
`TestLogScope`, the panic object would be lost. I believe this was directly noticed by @andreimatei prior.

I also believe this was causing panic objects to disappear quite a lot in fact,
because the test runner *also* catches panics and fails to report
them adequately in some cases (e.g. when the panic occurs during a
`stress` run).

To alleviate the situation, this patch removes some logic previously
present in the code that was attempting (and failing) to remove
duplicate panic prints. That logic was misdesigned to start with anyway
(by me), because it was working under the assumption that
`ReportPanic()` was *only* called for uncaught panics, and another
function `RecoverAndReportNonfatalPanic()` just before that was
blatantly violating that assumption.

Removing the logic causes reportable panics to always be reported in
logs and on stderr, regardless of whether it's caught or not. This
provides more guarantees that the panic object will be seen, at the
expense of having a duplicate print in some edge cases.

Release note: None

52250: sql: use a string builder in decodeCopy r=mjibson a=mjibson

For returning strings, a string builder is better than a bytes buffer.

```
name                  old time/op    new time/op    delta
DecodeCopySimple-12     10.8ns ± 1%    10.6ns ± 1%   -2.03%  (p=0.024 n=5+5)
DecodeCopyEscaped-12     249ns ± 1%     178ns ± 2%  -28.71%  (p=0.008 n=5+5)

name                  old alloc/op   new alloc/op   delta
DecodeCopySimple-12      0.00B          0.00B          ~     (all equal)
DecodeCopyEscaped-12     96.0B ± 0%     40.0B ± 0%  -58.33%  (p=0.008 n=5+5)

name                  old allocs/op  new allocs/op  delta
DecodeCopySimple-12       0.00           0.00          ~     (all equal)
DecodeCopyEscaped-12      2.00 ± 0%      2.00 ± 0%     ~     (all equal)
```

Release note: None

52254: roachtest: bump PredecessorVersion(20.2) to 20.1.4 r=RaduBerinde a=RaduBerinde

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Matt Jibson <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
  • Loading branch information
4 people committed Aug 3, 2020
4 parents b9bed22 + e9a159b + b4e6ea6 + 80e7156 commit e36a617
Show file tree
Hide file tree
Showing 5 changed files with 33 additions and 93 deletions.
2 changes: 1 addition & 1 deletion pkg/cli/interactive_tests/test_missing_log_output.tcl
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ system "($argv sql --insecure -e \"select crdb_internal.force_panic('helloworld'
# Check the panic is reported on the server's stderr
eexpect "a SQL panic has occurred"
eexpect "panic: helloworld"
eexpect "goroutine"
eexpect "stack trace"
eexpect ":/# "
# Check the panic is reported on the server log file
send "cat logs/db/logs/cockroach.log\r"
Expand Down
2 changes: 1 addition & 1 deletion pkg/cmd/roachtest/test_runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -1155,7 +1155,7 @@ func PredecessorVersion(buildVersion version.Version) (string, error) {
// (see runVersionUpgrade). The same is true for adding a new key to this
// map.
verMap := map[string]string{
"20.2": "20.1.3",
"20.2": "20.1.4",
"20.1": "19.2.9",
"19.2": "19.1.11",
"19.1": "2.1.9",
Expand Down
3 changes: 2 additions & 1 deletion pkg/sql/copy.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (
"context"
"io"
"strconv"
"strings"
"time"
"unsafe"

Expand Down Expand Up @@ -418,7 +419,7 @@ func (c *copyMachine) addRow(ctx context.Context, line []byte) error {
//
// See: https://www.postgresql.org/docs/9.5/static/sql-copy.html#AEN74432
func decodeCopy(in string) string {
var buf bytes.Buffer
var buf strings.Builder
start := 0
for i, n := 0, len(in); i < n; i++ {
if in[i] != '\\' {
Expand Down
61 changes: 0 additions & 61 deletions pkg/util/log/clog.go
Original file line number Diff line number Diff line change
Expand Up @@ -368,54 +368,6 @@ func DumpStacks(ctx context.Context) {
Infof(ctx, "stack traces:\n%s", allStacks)
}

// printPanicToFile is used by ReportPanic() to copy the panic details to the log file. This is
// used when we understand that the Go runtime will only automatically
// print the panic details to the external stderr stream (e.g.
// when we're not redirecting that to a file).
//
// This function is a lightweight version of outputLogEntry() which
// does not exit the process in case of error.
func (l *loggerT) printPanicToFile(ctx context.Context, depth int, r interface{}) {
if !l.logDir.IsSet() {
// There's no log file. Can't do anything.
return
}

l.mu.Lock()
defer l.mu.Unlock()

// Mark logging as active, so that further configuration changes
// are disabled. We need to do this ourselves here because we are
// not using outputLogEntry() which does it for us and
// this function also creates files in the logging directory.
// See IsActive() and its callers for details.
setActive()

if err := l.ensureFile(); err != nil {
// We're already exiting; no need to pile an error upon an
// error. Simply report the logging error and continue.
l.reportErrorEverywhereLocked(ctx, err)
return
}

// Create a fully structured log entry. This ensures there a
// timestamp in front of the panic object.
entry := l.makeEntryForPanicObject(ctx, depth+1, r)
buf := logging.processForFile(entry, debug.Stack())
defer putBuffer(buf)

// Actually write the panic object to a file.
if err := l.writeToFile(buf.Bytes()); err != nil {
// Ditto; report the error but continue. We're terminating anyway.
l.reportErrorEverywhereLocked(ctx, err)
}
}

func (l *loggerT) makeEntryForPanicObject(ctx context.Context, depth int, r interface{}) Entry {
return MakeEntry(
ctx, Severity_ERROR, &l.logCounter, depth+1, l.redactableLogs.Get(), "panic: %v", r)
}

func setActive() {
logging.mu.Lock()
defer logging.mu.Unlock()
Expand All @@ -442,19 +394,6 @@ func resetActive() (restore func()) {
}
}

// printPanicToExternalStderr is used by ReportPanic() in case we
// understand that the Go runtime will not print the panic object to
// the external stderr itself (e.g. because we've redirected it to a
// file).
func (l *loggerT) printPanicToExternalStderr(ctx context.Context, depth int, r interface{}) {
entry := l.makeEntryForPanicObject(ctx, depth+1, r)
if err := l.outputToStderr(entry, debug.Stack()); err != nil {
l.mu.Lock()
defer l.mu.Unlock()
l.reportErrorEverywhereLocked(ctx, err)
}
}

// outputToStderr writes the provided entry and potential stack
// trace(s) to the process' external stderr stream.
func (l *loggerT) outputToStderr(entry Entry, stacks []byte) error {
Expand Down
58 changes: 29 additions & 29 deletions pkg/util/log/crash_reporting.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,44 +101,44 @@ func RecoverAndReportNonfatalPanic(ctx context.Context, sv *settings.Values) {
}

// ReportPanic reports a panic has occurred on the real stderr.
//
// Note that ReportPanic() does not assume that the panic object
// will be left uncaught to terminate the process. For example,
// at the time of this writing, ReportPanic() is called from
// RecoverAndReportNonfatalPanic() above.
func ReportPanic(ctx context.Context, sv *settings.Values, r interface{}, depth int) {
// Announce the panic has occurred to all places. The purpose
// of this call is double:
// of this call is threefold:
// - it ensures there's a notice on the terminal, in case
// logging would only go to file otherwise;
// - it ensures there's a notice on the output file, in
// case the panic is uncaught and internal stderr
// writes by the Go runtime have not been set up to
// redirect to a separate log file.
// - it places a timestamp in front of the panic object,
// in case the various configuration options make
// the Go runtime solely responsible for printing
// out the panic object to the log file.
// (The go runtime doesn't time stamp its output.)
Shout(ctx, Severity_ERROR, "a panic has occurred!")

if stderrLog.redirectInternalStderrWrites {
// If we decided that the internal stderr writes performed by the
// Go runtime are going to our file, that's also where the panic
// details will go automatically when the runtime processes the
// uncaught panic.
// This means the panic details won't get copied to
// the process' external stderr stream automatically
// any more.
// Now, if the user asked that fatal errors be copied
// to the external stderr as well, we'll take that
// as an indication they also want to see panic details
// there. Do it here.
if LoggingToStderr(Severity_FATAL) {
stderrLog.printPanicToExternalStderr(ctx, depth+1, r)
}
} else {
// If we are not redirecting internal stderr writes, then the
// details printed by the Go runtime won't automatically reach our
// log file and instead go to the process' external stderr.
//
// However, we actually want to persist these details. So print
// them in the log file ourselves.
stderrLog.printPanicToFile(ctx, depth+1, r)
}

sendCrashReport(ctx, sv, PanicAsError(depth+1, r), ReportTypePanic)
//
// Note that this code will cause the panic object to be printed
// twice in some cases (specifically, when the panic is uncaught).
// A previous version of this code was trying to prevent the
// double print and was failing to do so effectively, causing
// instead panics to be lost in the case where they were
// recovered (eg via RecoverAndReportNonfatalPanic()).
//
// To properly prevent double prints, the API could be changed to
// indicate whether the Go runtime will *eventually* print the panic
// on its own. Unfortunately, this is a bit hard to do, as the
// caller of ReportPanic() may not be in a position to know for
// sure, whether some other caller further in the call stack is
// catching the panic object in the end or not.
panicErr := PanicAsError(depth+1, r)
Shoutf(ctx, Severity_ERROR, "a panic has occurred!\n%+v", panicErr)

// In addition to informing the user, also report the details to telemetry.
sendCrashReport(ctx, sv, panicErr, ReportTypePanic)

// Ensure that the logs are flushed before letting a panic
// terminate the server.
Expand Down

0 comments on commit e36a617

Please sign in to comment.