Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stability: crash on trace #4786

Closed
mberhault opened this issue Mar 2, 2016 · 3 comments
Closed

stability: crash on trace #4786

mberhault opened this issue Mar 2, 2016 · 3 comments
Assignees
Milestone

Comments

@mberhault
Copy link
Contributor

On a running cluster:
oddly, I only have one stack trace.

@tschottdorf: you're probably the most familiar with tracing code.
The relevant line is:

        // Discard the middle events.
        di := int((cap(tr.events) - 1) / 2)
        if d, ok := tr.events[di].What.(*discarded); ok {
W160302 20:57:40.694712 storage/replica.go:1745  [node=3,store=3,range=1] failed to resolve intents: range 1: replica node_id:3 store_id:3 replica_id:14  not leader; leader is node_id:2 store_id:2 replica_id:13
I160302 20:57:40.701867 storage/replica_command.go:1272  range 1: new leader lease replica {3 3 14} 2016-03-02 20:57:40 +0000 UTC +1.003s
W160302 20:57:40.702697 storage/replica.go:1745  [node=3,store=3,range=1] failed to resolve intents: context deadline exceeded
panic: runtime error: index out of range

goroutine 85 [running]:
golang.org/x/net/trace.(*trace).addEvent(0xc820a46600, 0x15ff280, 0xc827a97f20, 0xc837c30000)
        /go/src/golang.org/x/net/trace/trace.go:757 +0x92b
golang.org/x/net/trace.(*trace).LazyPrintf(0xc820a46600, 0x17ef700, 0xe, 0x0, 0x0, 0x0)
        /go/src/golang.org/x/net/trace/trace.go:786 +0xa2
github.com/cockroachdb/cockroach/util/tracing.(*netTraceWrapSpan).LogEvent(0xc836804180, 0x17ef700, 0xe)
        /go/src/github.com/cockroachdb/cockroach/util/tracing/nettrace.go:44 +0x67
github.com/cockroachdb/cockroach/storage.(*Replica).processRaftCommand(0xc8363061a0, 0xc84f9e5d60, 0x8, 0xc2ec, 0x1, 0x300000003, 0xe, 0x1438225cb06309ed, 0x1, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1265 +0x2a8
github.com/cockroachdb/cockroach/storage.(*Replica).handleRaftReady(0xc8363061a0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1130 +0xa53
github.com/cockroachdb/cockroach/storage.(*Store).processRaft.func1()
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:1762 +0x318
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc8202b5810, 0xc820153480)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:98 +0x52
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:99 +0x62
@tamird
Copy link
Contributor

tamird commented Mar 2, 2016

Only one stack trace because of changes in 1.6. See
https://golang.org/pkg/runtime/ (search GOTRACEBACK).

On Wed, Mar 2, 2016 at 4:09 PM, marc [email protected] wrote:

On a running cluster:
oddly, I only have one stack trace.

@tschottdorf https://github.com/tschottdorf: you're probably the most
familiar with tracing code.
The relevant line is:

    // Discard the middle events.
    di := int((cap(tr.events) - 1) / 2)
    if d, ok := tr.events[di].What.(*discarded); ok {

W160302 20:57:40.694712 storage/replica.go:1745 [node=3,store=3,range=1] failed to resolve intents: range 1: replica node_id:3 store_id:3 replica_id:14 not leader; leader is node_id:2 store_id:2 replica_id:13
I160302 20:57:40.701867 storage/replica_command.go:1272 range 1: new leader lease replica {3 3 14} 2016-03-02 20:57:40 +0000 UTC +1.003s
W160302 20:57:40.702697 storage/replica.go:1745 [node=3,store=3,range=1] failed to resolve intents: context deadline exceeded
panic: runtime error: index out of range

goroutine 85 [running]:golang.org/x/net/trace.(_trace).addEvent(0xc820a46600, 0x15ff280, 0xc827a97f20, 0xc837c30000)
/go/src/golang.org/x/net/trace/trace.go:757 +0x92bgolang.org/x/net/trace.(_trace).LazyPrintf(0xc820a46600, 0x17ef700, 0xe, 0x0, 0x0, 0x0)
/go/src/golang.org/x/net/trace/trace.go:786 +0xa2github.com/cockroachdb/cockroach/util/tracing.(_netTraceWrapSpan).LogEvent(0xc836804180, 0x17ef700, 0xe)
/go/src/github.com/cockroachdb/cockroach/util/tracing/nettrace.go:44 +0x67github.com/cockroachdb/cockroach/storage.(_Replica).processRaftCommand(0xc8363061a0, 0xc84f9e5d60, 0x8, 0xc2ec, 0x1, 0x300000003, 0xe, 0x1438225cb06309ed, 0x1, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/storage/replica.go:1265 +0x2a8github.com/cockroachdb/cockroach/storage.(_Replica).handleRaftReady(0xc8363061a0, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/storage/replica.go:1130 +0xa53github.com/cockroachdb/cockroach/storage.(_Store).processRaft.func1()
/go/src/github.com/cockroachdb/cockroach/storage/store.go:1762 +0x318github.com/cockroachdb/cockroach/util/stop.(_Stopper).RunWorker.func1(0xc8202b5810, 0xc820153480)
/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:98 +0x52
created by github.com/cockroachdb/cockroach/util/stop.(_Stopper).RunWorker
/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:99 +0x62


Reply to this email directly or view it on GitHub
#4786.

@tbg
Copy link
Member

tbg commented Mar 2, 2016

Without having looked in detail, this is likely trace getting finished after a context timeout, but then still coming out of Raft, so use-after-finish. Had trouble with this before - annoying stuff.

@tbg tbg self-assigned this Mar 2, 2016
@petermattis petermattis added this to the Beta milestone Mar 4, 2016
@mberhault
Copy link
Contributor Author

still happening, somewhat regularly (eg: a few times a day on a 4 node cluster with load):

W160304 20:30:35.183289 storage/replica.go:1795  could not GC completed transaction: context deadline exceeded
W160304 20:30:35.184098 storage/replica.go:1751  [node=2,store=2,range=1] failed to resolve intents: context deadline exceeded
W160304 20:30:35.184225 storage/replica.go:1751  [node=2,store=2,range=1] failed to resolve intents: context deadline exceeded
I160304 20:30:35.184720 storage/replica_command.go:1272  range 33: new leader lease replica {2 2 8} 2016-03-04 20:30:29 +0000 UTC +6.922s
W160304 20:30:35.198180 storage/replica.go:1751  [node=2,store=2,range=1] failed to resolve intents: context deadline exceeded
panic: runtime error: index out of range

goroutine 64 [running]:
golang.org/x/net/trace.(*trace).addEvent(0xc821fb3680, 0x16074a0, 0xc8233f7da0, 0x1440000)
    /go/src/golang.org/x/net/trace/trace.go:757 +0x92b
golang.org/x/net/trace.(*trace).LazyPrintf(0xc821fb3680, 0x17dafb0, 0x2, 0xc824e8fbe0, 0x1, 0x1)
    /go/src/golang.org/x/net/trace/trace.go:786 +0xa2
github.com/opentracing/basictracer-go.glob.func1.1(0x1696e40, 0xc821e8dc00)
    /go/src/github.com/opentracing/basictracer-go/event_nettrace.go:19 +0x343
github.com/opentracing/basictracer-go.(*spanImpl).onLog(0xc84d9d5e60, 0x0, 0x0, 0x0, 0x17f9a70, 0xe, 0x0, 0x0)
    /go/src/github.com/opentracing/basictracer-go/event.go:40 +0xc4
github.com/opentracing/basictracer-go.(*spanImpl).Log(0xc84d9d5e60, 0xece6be76b, 0xc070c2c, 0x25b2b60, 0x17f9a70, 0xe, 0x0, 0x0)
    /go/src/github.com/opentracing/basictracer-go/span.go:97 +0x1e6
github.com/opentracing/basictracer-go.(*spanImpl).LogEvent(0xc84d9d5e60, 0x17f9a70, 0xe)
    /go/src/github.com/opentracing/basictracer-go/span.go:74 +0x7d
github.com/cockroachdb/cockroach/storage.(*Replica).processRaftCommand(0xc8252aef70, 0xc824e8fbb0, 0x8, 0x9bc4, 0x1, 0x200000002, 0x7, 0x1438be0b4c0ab6a1, 0x0, 0x0, ...)
    /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1268 +0x2a8
github.com/cockroachdb/cockroach/storage.(*Replica).handleRaftReady(0xc8252aef70, 0x0, 0x0)
    /go/src/github.com/cockroachdb/cockroach/storage/replica.go:1133 +0xa53
github.com/cockroachdb/cockroach/storage.(*Store).processRaft.func1()
    /go/src/github.com/cockroachdb/cockroach/storage/store.go:1762 +0x318
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc8202f6a80, 0xc8202f32d0)
    /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:98 +0x52
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
    /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:99 +0x62

tbg added a commit to tbg/cockroach that referenced this issue Mar 5, 2016
@mberhault mberhault changed the title misc: crash on trace stability: crash on trace Mar 5, 2016
@tbg tbg closed this as completed in #4876 Mar 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants