panic: 1a state.commit 75451 is out of range [78103, 78161] [recovered] #10834

nictuku · 2016-11-20T00:59:10Z

This node won't ever startup:

build: beta-20161006 @ 2016/10/06 16:30:12 (go1.7.1)
admin: http://scw-96cd6e:8080
sql: postgresql://root@scw-96cd6e:26257?sslmode=disable
logs: cockroach-data/logs
store[0]: path=cockroach-data
join[0]: scw-d0bfca,scw-324c38
panic: 1a state.commit 75451 is out of range [78103, 78161] [recovered]
panic: 1a state.commit 75451 is out of range [78103, 78161]

goroutine 66 [running]:
panic(0x158f820, 0xc420839710)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/cockroachdb/cockroach/util/stop.(*Stopper).Recover(0xc4203d2000)
/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:177 +0x6e
panic(0x158f820, 0xc420839710)
/usr/local/go/src/runtime/panic.go:458 +0x243
github.com/cockroachdb/cockroach/storage.(*raftLogger).Panicf(0xc42082bc00, 0x175e96a, 0x2b, 0xc42076e940, 0x4, 0x4)
/go/src/github.com/cockroachdb/cockroach/storage/raft.go:111 +0x107
github.com/coreos/etcd/raft.(*raft).loadState(0xc4207e1950, 0x201ba, 0x19, 0x126bb, 0x0, 0x0, 0x0)
/go/src/github.com/coreos/etcd/raft/raft.go:1091 +0x1db
github.com/coreos/etcd/raft.newRaft(0xc420a39320, 0xc420a39140)
/go/src/github.com/coreos/etcd/raft/raft.go:289 +0xc9b
github.com/coreos/etcd/raft.NewRawNode(0xc420a39320, 0x0, 0x0, 0x0, 0x6503e3, 0x5830f262, 0x3af79c09)
/go/src/github.com/coreos/etcd/raft/rawnode.go:79 +0x71
github.com/cockroachdb/cockroach/storage.(*Replica).withRaftGroupLocked(0xc420843180, 0x18a5f00, 0xc420a39820, 0x88ed9b, 0xc420843220)
/go/src/github.com/cockroachdb/cockroach/storage/replica.go:411 +0x2a4
github.com/cockroachdb/cockroach/storage.(*Replica).withRaftGroup(0xc420843180, 0xc420a39820, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/storage/replica.go:470 +0x92
github.com/cockroachdb/cockroach/storage.(*Store).processRaftRequest(0xc42015e840, 0x7f21e0ccc778, 0xc4206fa9c0, 0xc420c22000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/storage/store.go:2563 +0xbb8
github.com/cockroachdb/cockroach/storage.(*Store).processRequestQueue(0xc42015e840, 0x15f)
/go/src/github.com/cockroachdb/cockroach/storage/store.go:2751 +0x180
github.com/cockroachdb/cockroach/storage.(*raftScheduler).worker(0xc42040df80, 0xc4203d2000)
/go/src/github.com/cockroachdb/cockroach/storage/scheduler.go:204 +0x308
github.com/cockroachdb/cockroach/storage.(*raftScheduler).Start.func1()
/go/src/github.com/cockroachdb/cockroach/storage/scheduler.go:160 +0x33
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc4203d2000, 0xc4207d89c0)
/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:188 +0x7d
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
/go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:189 +0x66

Node that I recently upgraded a 2nd node:

initiating graceful shutdown of server
server drained and shutdown completed
build:      beta-20161103 @ 2016/11/03 15:03:54 (go1.7.3)
admin:      http://scw-d0bfca:8080
sql:        postgresql://root@scw-d0bfca:26257?sslmode=disable
logs:       cockroach-data/logs
store[0]:   path=cockroach-data
status:     restarted pre-existing node
clusterID:  {99a27c6f-20c7-47f0-b5ef-ec82e2ed7c40}
nodeID:     3

This shutdown is normal I believe.

The 3rd node is also running the 20161006 version (same as the corrupt node) but did not observe any data corruption.

sql:       postgresql://root@scw-324c38:26257?sslmode=disable
logs:      cockroach-data/logs
store[0]:  path=cockroach-data
join[0]:   scw-d0bfca,scw-96cd6e
proto: no encoder for Error string [GetProperties]
proto: no encoder for Code int [GetProperties]
initiating graceful shutdown of server
build:     beta-20161006 @ 2016/10/06 16:30:12 (go1.7.1)
admin:     http://scw-324c38:8080
sql:       postgresql://root@scw-324c38:26257?sslmode=disable
logs:      cockroach-data/logs
store[0]:  path=cockroach-data
join[0]:   scw-d0bfca,scw-96cd6e
proto: no encoder for Error string [GetProperties]
proto: no encoder for Code int [GetProperties]

(this is all normal to me)

I do not know right now if the corruption happened only after I upgraded the other node or not. My guess is that it's older than that and may have been triggered by an OOM event. My nodes exit and restart quite often because I intentionally give them very little RAM.

Sorry if this is a terse bug report, but I'm on holidays with the family right now. I can provide more logs later, including INFO logs. (the server is crash-looping so it's producing zillions of logs).

Right now I just wanted to get this tracked somewhere.

The text was updated successfully, but these errors were encountered:

bdarnell · 2016-11-20T04:13:40Z

This sounds like #9037, especially if you're seeing a lot of OOM restarts. We have a fix for that coming soon in #10690

bdarnell closed this as completed Nov 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

panic: 1a state.commit 75451 is out of range [78103, 78161] [recovered] #10834

panic: 1a state.commit 75451 is out of range [78103, 78161] [recovered] #10834

nictuku commented Nov 20, 2016

bdarnell commented Nov 20, 2016

panic: 1a state.commit 75451 is out of range [78103, 78161] [recovered] #10834

panic: 1a state.commit 75451 is out of range [78103, 78161] [recovered] #10834

Comments

nictuku commented Nov 20, 2016

bdarnell commented Nov 20, 2016